Digging More Missing Proteins Using an Enrichment Approach with

Sep 29, 2017 - MS Excel. pr7b00353_si_003.xlsx (21.63 MB). Citing Articles; Related Content. Citation data is made available by participants in Crossr...
2 downloads 10 Views 2MB Size
Subscriber access provided by UNIV OF ESSEX

Article

Digging more missing proteins using an enrichment approach with ProteoMiner™ Siqi Li, Yanbin He, Zhilong Lin, Shaohang Xu, Ruo Zhou, Feng Liang, Jian Wang, Huanming Yang, Siqi Liu, and Yan Ren J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00353 • Publication Date (Web): 29 Sep 2017 Downloaded from http://pubs.acs.org on September 30, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Digging more missing proteins using an enrichment approach with ProteoMinerTM Siqi Li1#, Yanbin He1#, Zhilong Lin1, Shaohang Xu1, Ruo Zhou1, Feng Liang1, Jian Wang1, Huanming Yang1,2, Siqi Liu1*, Yan Ren1* 1

BGI-Shenzhen, Beishan Industrial Zone 11th building, Yantian District, Shenzhen,

Guangdong, 518083, China, 2

James D. Watson Institute of Genome Sciences, Hangzhou 310008, China

# The authors have the equal contributions for this work.

*To whom correspondence should be addressed: Siqi Liu, BGI-Shenzhen, Beishan Industrial Zone 11th building, Yantian District, Shenzhen,

Guangdong,

518083,

China.

Tel:

86-755-36307403;

E-mail:

[email protected] Yan Ren, BGI-Shenzhen, Beishan Industrial Zone 11th building, Yantian District, Shenzhen,

Guangdong,

518083,

China.

Tel:

86-755-36307403;

[email protected]

1

ACS Paragon Plus Environment

E-mail:

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 34

Abstract Human Proteome Project (HPP) aims at mapping the entire human proteins with a systematic effort upon all the emerging techniques, which would enhance understanding of human biology and lay a foundation for development of medical applications. Till now, 2,563 missing proteins (MPs, PE2-4) are still undetected even using the most sensitive approach of protein detection. Herein, we propose that enrichment of low-abundance proteins benefits MPs finding. ProteoMinerTM is an equalizing

technique

by

reducing

high-abundance

proteins

and

enriching

low-abundance proteins in biological liquids. With triton X-100/TBS buffer extraction, ProteoMinerTM enrichment and peptide fractionation, 20 MPs (at least 2 non-nested unique peptides with more than 8 a.a. length) with 60 unique peptides were identified from four human tissues, including 8 membrane/secreted proteins and 5 nucleus proteins. Then 15 of them were confirmed with 2 non-nested unique peptide (≥9 a.a) identification by matching well with their chemically synthetic peptides in PRM assay. Hence these results demonstrated ProteoMinerTM as a powerful means in discovery of MPs.

Keywords Missing proteins, ProteoMinerTM protein enrichment kit, Human tissue, LC-MS/MS, Triton X-100

2

ACS Paragon Plus Environment

Page 3 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Launched by the Human Proteome Organization (HUPO), the Human Proteome Project (HPP, http://www.thehpp.org) aims to stepwisely complete a thorough human proteome ‘map’ by identifying and characterizing all the proteins of the 20,179 encoding genes at its initial phase (1, 2). HPP considered that the protein identified through MS signals required high-stringency evidence, and proposed its own interpretation guidelines of MS data for protein identification (2, 3). Although in 2014 Kim MS et al. claimed the protein products encoded by 17,294 genes from 30 histologically normal human tissue samples were identified by MS-based approaches (4), some proteins in the reservoir were not recognized as identified ones under the strict criteria. The proteins in UniProtKB/Swiss-Prot are broadly classified into five categories (PE1-5) according to the protein existence (PE) evidences. The proteins in PE2 to 4 are defined as missing proteins (MPs) because of lack of the protein existence evidence (3, 5, 6). In the newest release (2017 April) of neXtProt (https://www.nextprot.org/about/protein-existence), total of 17,045 proteins are defined as the identified proteins confirmed with at least 2 non-nested unique peptides of over 8 residue length (PE1). There are still 2563 missing proteins to be detected and confirmed yet.

It is generally accepted that at current MS with high resolution is a good solution to dig MPs. This technique, however, has met some obstacles in the digging process, 1) the proteins at low-abundance whose MS signals are suppressed by other proteins 3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

with higher abundance or are undetected due to MS detection restriction, 2) the proteins with higher hydrophobicity that are difficultly extracted and poorly digested, 3) the proteins having higher sequence homology so that they have few unique peptides to highlight the corresponding identity, 4) the proteins with short amino acid sequences that contain a few of or no tryptic peptides easily detected by MS, 5) the protein existence being time- and space-dependent (7-12). Since HPP was launched in 2010, the strategies and approaches to discover MPs have been made a great progress, such as enrichment of low-abundance proteins in subcellular level, deep fractionation of peptides to reduce complexity, extension of LC gradient, and tuning resolution, sensitivity and accuracy in MS (7). Corresponding to such effort from the HPP teams, the number of MPs has been shrunk from 3868 (2013) to 2563 (April 2017)(5, 6). On the other hand, we have to realize that exploring MPs becomes more and more difficult and badly needs new strategies or techniques (13).

As peptide ionization and fragmentation are a random process, peptide abundance is a key factor that impacts the feasibility of peptide detection by MS. If the peptide abundance diversity over three orders of magnitude, the peptides at lower abundance were difficult to be identified by MS (14). Doubtlessly reducing dynamic range of protein abundance in a protein complex is an efficient way for detection of low-abundance proteins. Based on the interaction of highly diverse hexapeptides coated on beads with the sequences of complex proteins, the ProteoMinerTM kit was designed for protein equalization by reducing high-abundance proteins and enriching 4

ACS Paragon Plus Environment

Page 4 of 34

Page 5 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

low-abundance proteins in body liquids (14-16) although Keidel EM proved that the equalization was performed by a hydrophobic interaction mechanism, not by the diverse interaction of surface ligands of hexapeptides for reduction of complexity, dynamic range and high abundance proteins(17). ProteoMinerTM has been successfully applied to low-abundance protein enrichment of human serum (18, 19), urine (20), synovial fluid (21), bile fluid (22), bovine colostrums (23), bacteria outer membrane proteins (24), and cell culture supernatants(25). The treatment with ProteoMinerTM did offer a good opportunity to identify the low-abundance proteins. In fact, the ProteoMinerTM method has been seldom adopted for profiling proteins in tissues or cells due to free of a large diverse abundance of proteins interfering. Thus, a question is naturally raised whether the abundance normalization for proteins could facilitate the MPs discovery in tissues or cells. As well known, membrane proteins with low copy and high hydrophobicity are not easily identified by MS due to poor performance in protein extraction and tryptic digestion (26-28). TX-100, a nonionic detergent, is widely used for solubilizing membrane proteins under a non-denature condition (29-31). It was reported that the membrane proteins in the TX-100 extracts occupied higher percentage than that extracted by urea or SDS (32). Notably, TX-100 well matches with the experimental conditions required for ProteoMinerTM treatment to proteins at non-denature status. Hence, we postulate that the ProteoMinerTM treatment in a TX-100 solution followed by elution with urea solution under denatured condition could benefit not only for enrichment of low-abundance proteins but also for digestion and detection of membrane proteins. 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In the communication, 20 MPs were identified with at least 2 non-nested unique peptides (≥ 9 a.a.) from the four human tissues with TX-100 extraction and ProteoMinerTM enrichment. These MPs mapping to 15 chromatins included 8 membrane/secreted proteins, 5 nucleus proteins and 4 MPs without cellular location definition. Then 15 of these MPs were further verified by matching with their chemically synthesized peptide identification (2 non-nested unique peptides/protein, ≥ 9 a.a.) in PRM assay. The results indicated that TX-100 extraction and ProteoMinerTM capture benefited the identification of MPs, especially those with higher hydrophobicity or low-abundance in nucleus.

Materials and Methods Materials, Chemicals and reagents Human hepatoma and their adjacent tissues, colorectal cancer and their adjacent tissues were provided by China Xijing Hospital; human bladder cancer and adjacent tissues were obtained from Shenzhen 2nd People’s Hospital (China); and human kidney tissues were supported by Nanjing General Hospital (China). Mice were cultured in the animal house of BGI, Shenzhen. All the clinical samples in this study were taken under surgery and all the patients signed informed consent forms. The protocol of sample collection and the data usage were submitted to and approved by The Ethics Committees in the three hospitals. Permission of BGI Animal Ethics Committee has been obtained for the use of animal tissues for this study. The scheme 6

ACS Paragon Plus Environment

Page 6 of 34

Page 7 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

figure which stated the tissue sample pathological information and experimental flowchart was provided in supplementary Figure 1.

All of the reagents were purchased from Sigma-Aldrich (St. Louis, MO) or Thermo Fisher Scientific (Waltham, MA) unless specified otherwise. The ProteoMinerTM Protein Enrichment Kits were purchased from Bio-Rad (Hercules, CA). Protease inhibitor cocktails were ordered from Roche (Basel, Switzerland). Trypsin for protein digestion was ordered from Progema (Madison, WI). Bovine serum albumin was obtained from Sangon Biotech (Shanghai, China). The peptides used for PRM assay were chemically synthesized by GL Biochem (Shanghai, China). Solutions for proteome extraction, purification, proteolysis and fractionation were prepared with Milli-Q water (Merk Millipore, Darmstadt, Germany), and solutions for subsequent steps were prepared with HPLC grade water (Fisher Scientific, Waltham, MA).

Tissue protein extraction and lower abundance protein enrichment by ProteoMinerTM kit Tissues were ground in a tissue lyser and further lysed by probe sonication in the extraction buffer of PBS (pH 7.5), HEPES (20 mM, pH 8.0), NH4HCO3 (50 mM, pH 8.0) or triton buffer (1% Triton X-100, 20 mM Tris, 150 mM NaCl, pH 7.4) with protease inhibitor cocktail at

0-4 ℃. Bradford assay was adopted to measure the

concentrations of extracted proteins with bovine serum albumin as a standard. Lower abundance proteins were enriched with ProteoMinerTM Protein Enrichment Kits 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

according to manufacturer’s protocol. In brief, the columns were conditioned by the extraction buffer and then the extracted tissue proteins (5mg) were loaded onto them (50 µL) under gravity speed. The nonspecific binding proteins were washed out by the same extraction buffer, and then the specific binding proteins were eluted by the elution buffer provided in the kits. The effects of depletion on higher abundance proteins were evaluated by protein recovery ratio and the images of SDS-PAGE gels to show the protein pattern changes with ProteoMinerTM treatment.

Protein Digestion and peptide desalting After the enriched lower abundance proteins were eluted from the column, they were reduced in 10 mM dithiothreitol (30 ℃ , 2h) prior to alkylation in 55 mM iodoacetamide (room temperature in the dark, 45 min). The mixture of protein solutions was diluted with 50 mM NH4HCO3 buffer to reduce the concentration of urea in the elution buffer to improve the efficiency of trypsin. The proteins were digested by trypsin (1:50) at 37℃ with overnight incubation. The resultant solution containing tryptic peptides was adjusted to pH 2-3 with TFA and desalted using Waters Sep-Pak C18 cartridges (Milford, MA) following the procedure of conditioning (0.1% TFA), loading sample, washing (0.1% TFA) and elution (75% ACN with 0.1% TFA ). The peptides were then dried using a speed-vac (LaboGene, Lynge, Denmark), dissolved by 5% ACN in ammonia (pH 9.8) for first dimensional separation.

8

ACS Paragon Plus Environment

Page 8 of 34

Page 9 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The first dimensional fractionation of peptides Desalted peptides were separated by a Gemini® high-pH RP column (5µm, 110Å, 250 ×4.6 mm, Phenomenex, Torrance, CA) equipped on a Shimadzu UFLC system (Kyoto, Japan). Fractionation was performed over a 60min period at a constant flow rate of 1ml/min using a gradient of 5% B for 10 min, 5%-35% B for 40 min, 35%-95% B in 1 min, 95% B for 3 min and dropped to 5% within 1 min before equilibrating with 5% B for 10 min (buffer A: 5% ACN in ammonia, pH 9.8 and buffer B: 95% ACN in ammonia, pH 9.8). All 60 fractions were collected and pooled into 20 fractions by a concatenation mode(33). Fractions were dried in a speed-vac (LaboGene, Lynge, Denmark).

The identification of missing proteins on Mass Spectrometry The peptides (2µg) were passed onto a Q Exactive™ HF hybrid quadrupole-Orbitrap mass spectrometer coupled to an UltiMate 3000 UHPLC (Thermo Scientific, Waltham, MA) for identification, in which a 300 µm×5 mm C18 TRAP column (µ-Precolumn, Thermo Scientific, Waltham, MA) and a 75 µm×25 cm in-house packed analytical column containing 3 µm Ultimate® LP-C18 particles (120 Å) from Welch (Shanghai, China) were used. Each fraction was loaded on enrichment column at flow rate 5 µl/min for 5 min with buffer A (2% ACN and 0.1% formic acid), followed by a 115 min gradient at 300 nl/min: from 5% B (98% ACN and 0.1% formic acid) to 26% B in 85 min, to 35% B in 10 min, to 80% B in 10 min, at 80% B for 5 min, dropped to 5% within 0.5 min and then kept at 5% B for 4.5 min. MS parameters were listed as 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

following: spray voltage 2 kV, capillary temperature 320 ℃, positive mode, scan range 350-1500 m/z, loop count 30, NCE 26, MS resolution 120,000, MS/MS HCD scans with resolution 30,000, dynamic exclusion duration 30 s; isolation window 2.0 m/z; intensity threshold 1.0 e4; charge exclusion, exclude 1,7,8,>8. Each fraction was injected twice for more confident identification.

Database searching Acquired MS data were converted to MGF files by Proteome Discoverer 1.4 (Thermo Scientific, Waltham, MA) and the exported MGF files were searched using Mascot version 2.3.02 against the Swiss-prot human database ( released on 2017_04 with 20183 protein sequences) with a decoy database involved. The false discovery rate (FDR) was set to less than 1% at both PSM and protein level during searching and automatically calculated by the software. Trypsin was selected as the specific enzyme with a maximum of two missed cleavages permitted per peptide. Parameters included fixed modification: Carbamidomethylation (C); variable modification: Oxidation (M), Deamidatioin (N, Q). Data were searched with a peptide mass tolerance of 20ppm and a fragment mass tolerance of 0.05Da. The Mascot results were further processed by IQuant software by the Mascot Percolator to re-score the peptide spectrum matches (PSMs) and the Occam’s razor approach to assemble the identified peptide sequences into a set of confident proteins(34, 35). The protein level FDR was calculated by using the picked protein FDR strategy with setting of less than 1%(36).

10

ACS Paragon Plus Environment

Page 10 of 34

Page 11 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The confirmation of MP identification by PRM The dat files of DDA data search results were put into Skyline and used to direct the extraction of peptides for PRM assay(37). In this situation, only MP unique peptide sequences were input as baits to get their MS identification information used for direct PRM confirmation. The list of target peptides with m/z and sequence obtained from Skyline analysis was imported for targeted identification (Supplementary Table 1). PRM assay was also performed on a Q Exactive™ HF hybrid quadrupole-Orbitrap mass spectrometer coupled with the same column used for DDA identification. The fractions with MP unique peptide identification were selected for PRM confirmation and separated with a 60min gradient: 5 min with buffer A (2% ACN and 0.1% formic acid), from 5% B (98% ACN and 0.1% formic acid) to 30% B in 44 min, to 80% B in 3 min, at 80% B for 2 min, dropped to 5% within 1 min and then kept at 5% B for 5 min. During PRM scan, the PRM mode was set, while most of MS parameters were kept as same as DDA identification, such as spray voltage, capillary temperature, resolution, polarity, AGC target and scan range, except the maximum IT was set up at 250ms for improvement of the detection sensitivity. The PRM data were processed by Skyline software with the following steps, selecting the settings for peptides and fragments based upon the guidelines of the software, inputting the information of target peptide list, importing PRM raw data files, locking the native peptides identification based on the database search and the chromatographic and MS behaviors of synthetic peptides and finally exporting the results of comparison on RT and product ions peak area percentage for the native and synthetic peptides. 11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results and Discussion Optimization of the ProteominerTM protocol for equalizing the abundance of tissue proteins ProteoMinerTM technology was adopted to improve the detection rate towards the low-abundance proteins in samples through expelling the high-abundance proteins (15, 17, 38). It is often implemented in a body fluid, whereas is not extensively used for abundance equalization of the tissue proteins (20, 38, 39). Rivers et. al adopted the equalization approach to eliminate the bias in protein abundance in the soluble fraction of skeletal muscle with a dramatic asymmetry in the range of abundances of proteins and found that loading at different buffer condition on the beads lead to capture of different subpopulations of proteins(40). For the sake of optimization condition that allows an efficient protein equalization, we first evaluated several extraction buffers for the non-denatured proteins in mouse liver, which would be also used for binding of extracted proteins and hexapeptides. HEPES and phosphorylated buffer saline (PBS) or Tris-HCl buffer saline (TBS) are generally taken to extract proteins under non-denatured condition, while NH4HCO3 works for effectively extracting those proteins with higher solubility. Three buffers, 50mM NH4HCO3 (pH 8.0), 20 mM HEPES (pH8.0), and PBS pH7.5, were employed in protein extraction and ProteoMiner binding. SDS-PAGE gels were used as a primary tool to overview the equalization effects of ProteoMiner and check whether a new asymmetry of protein expression was reached as reported in a soluble fraction of skeletal muscle(40). 12

ACS Paragon Plus Environment

Page 12 of 34

Page 13 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The SDS-PAGE images with Coomassie blue staining for the equalized proteins and the protein recovery ratio (the eluted binding-proteins/total of in-put proteins) were regarded as the evaluation parameters for depletion efficiency. As depicted in Figure 1, the electrophoretic images of the staining bands corresponding to three lanes (lysate, flowthrough and binding fractions) exhibited similar patterns in the all three buffers (HEPES, PBS and NH4HCO3) coupled with ProteoMinerTM column. The protein recoveries at 4.3% in HEPES and 4.1% in PBS out of 5mg original proteins were comparable, however, NH4HCO3 was found in the lowest protein recovery at 2.9% in the three extraction buffers. NH4HCO3 therefore appears a better non-denature buffer to equalize the tissue protein abundance in the ProteoMinerTM treatment. Triton X-100 (TX-100) is widely accepted as a mild detergent to extract the proteins with higher hydrophobicity (29, 41). We further inquired to whether TX-100 could get an improvement of enrichment of low-abundance and hydrophobic proteins in ProteoMinerTM (14, 42). The electrophoretic bands derived from the mouse liver extraction by TX-100/TBS showed an obviously different pattern with more proteins at lower molecular masses, as compared with the SDS-PAGE images from the other three extraction buffers (Figure 1), while the treatment of TX-100/TBS resulted in 3.84% protein recovery. TX100/TBS seems another alternative buffer choice to have more

hydrophobic

proteins

in

ProteoMinerTM.

Furthermore,

we

applied

ProteoMinerTM coupled with the two buffers, TX100/TBS and NH4HCO3, to extract and equalize the proteins in human liver. As presented in Figure 2A, similar to the extraction result for mouse liver proteins by TX-100/TBS, the proteins extracted by 13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

this buffer displayed the SDS-PAGE images with obvious enriched proteins with low molecular masses. The protein recoveries for human liver by TX100/TBS and NH4HCO3 were 4.6% and 2.6%, respectively. The two sets of the protein extracted were tryptic digested and the digested peptides were fractioned into 20 parts on a high pH reverse-phase chromatograph followed by peptide identification by Q Exactive™ HF hybrid quadrupole-Orbitrap mass spectrometer. Total of 6,200 and 6,778 proteins with 67,091 and 76,251 peptides were identified in NH4HCO3 and TX100/TBS, respectively, in which 5,576 and 6,007 proteins identified with at least 2 unique peptides (≥9 a.a, Supplementary Table 1). Notably, 2 and 5 MPs with at least 2 non-nested unique peptides (≥9 a.a) were found in the NH4HCO3 and TX-100 extraction, respectively, in which one protein (Ras-related protein Rap-1b-like protein) was co-identified in the two extractions, and five proteins were assigned as membrane or secretary ones. These results demonstrated that the ProteoMinerTM treatment offered an assistant for discovery of MPs. In addition, as ProteoMinerTM in TX-100/TBS was able to find more MPs, we chose the buffer system in later experiments for protein extraction from other human tissues.

The MPs identification from the treated proteins by ProteoMinerTM In approximately 20,000 human genes, about 50% of the genes were detectable for their transcripts in all analyzed tissues and approximately 40% show tissue priority with an elevated expression in one of the analyzed tissues(43, 44). The genes with a significant elevated level of expression in a particular tissue or a group of related 14

ACS Paragon Plus Environment

Page 14 of 34

Page 15 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

tissues consist of three major subtypes: the tissue enriched genes, the group enriched genes and the tissue enhanced genes. The amount of tissue enriched genes (at least five-fold higher mRNA levels in a particular tissue as compared to all other tissues) is highly variable between the analyzed tissue types (45, 46). The testis, brain and liver tissues are ranked at the top tissues with the enriched genes in the list of all 34 human tissues (44). C-HPP teams have found a lot of MPs in testis since 2012(47-50). On the basis of the human tissues collected in our laboratory, four tissues were chosen for MP searching in this study, in which liver and kidney have more tissue-enriched genes, while bladder and colon have less ones (44-46). The tumor and its adjacent tissue from human kidney, bladder or colon were first mixed and the corresponding proteins were extracted by TX-100/TBS and then treated by ProteoMinerTM. The protein quantification showed that the protein recoveries for kidney, bladder and colon proteins through ProteoMinerTM columns were 3.08%, 3.12% and 5.02%, respectively. The SDS-PAGE images clearly exhibited after the ProteoMinerTM treatment that the band intensities for the high-abundance proteins were obviously attenuated, while that with low-abundance proteins were enriched (Figure 2 B, C and D). Totally, 8,432, 7,985 and 7,598 proteins with 102,964, 104,390, 62,454 peptides were identified from the tissue of kidney, bladder and colon, respectively (Supplementary Table 2 and 3), in which 8,302, 7,845 and 6,740 proteins were identified with at least two unique peptides (≥9 a.a). In these results, the unique peptides assigned to MPs might be shared by MPs and PE1 proteins, which should be deleted from the unique peptide list. Even for the identified unique peptides only assigned to MPs, those shared by the 15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 34

different MP groups should be also discarded from the final results. The identification of MPs in the final list should be only contributed by the unique peptides belonging to a single MP group. Recently, Schaefferin et al. developed a peptide uniqueness checker to help scientists define which peptides can be used as a unique peptide to validate

the

existence

of

human

MPs(51)

( https://www.nextprot.org/tools/peptide-uniqueness-checker). Finally, the quality of all the unique peptide spectra were manually checked to exclude those with low intensity or less fragment information for peptide full sequence coverage. After filtering with these checking, 20 MPs with 60 peptides were finally confirmed in identification with high-stringency evidence of at least 2 non-nested unique peptides (≥9 a.a.) including 17 PE2, 1 PE3 and 2 PE4 proteins, in which 11 MPs were both identified from the two injections of LC-MS/MS with at least 2 non-nested unique peptides (≥9 a.a.). The detailed information for these MPs was listed in Table 1, including identified peptide sequence, chromatin location, cellular location and unique isoform identification. The spectra of all the unique peptides labeled with pLable software were summarized in Supplementary Figure 2 (52) and the precursor m/z, mass error, and expect value for each spectrum were provided in Supplementary Table 4. The 20 MPs (1 from liver; 0 from colon; 8 from kidney; 11 from bladder) were mapped to 15 chromatins, 3 of which were mapped to the chromatin 1 (3PE2), and 2 of which were mapped to each of the chromatin 2(PE4/PE2), 6 (2PE2), 11 (2PE2). The MPs were composed of two major subclasses in cellular localization, membrane/secreted and nucleus. Totally 8 MPs were identified as membrane or 16

ACS Paragon Plus Environment

Page 17 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

secreted proteins, 5 identified as nucleus proteins, in which 1, 2 and 5 membrane/secreted MPs from liver, kidney and bladder and 2 and 3 nucleus MPs from kidney and bladder, respectively. It is reasonable to find that 40% of the MPs are membrane proteins due to the efficient extraction of TX-100 to membrane proteins. Why the ProteoMinerTM enrichment is also efficient to nucleus proteins? We speculate that the structure or hydrophobicity of these nucleus proteins is the favorite of hexapeptides. The kidney, which has more tissue enriched genes and express a large number of membrane bound transport proteins, is supposed to give more membrane MP identification (53, 54). It is surprising that 11 MPs are identified from bladder with 5 membrane/secreted proteins. It might be due to the efficient enrichment of ProteoMinerTM beads to these low-abundance membrane proteins which had relatively higher expression in bladder. One MP-coding gene may have more than one protein product expressed and these protein products share with the same accession number and higher homology. In the results of MP identification shown in Table 1, the sequences of the identified peptides for APOL4 and BTBD8 did not show any isoform characteristics, whereas for NPT1, BICL1, RNF180, CAD26 and PPMIM displayed some specific features for isoforms, which led to the identification of partial members in the MP groups. These results reached the HPP goal for identifying and charactering at least one protein product from each protein-coding gene.

The characterization of the new MPs found in this study Baker MS et al. summarized the top five MP families as uncharacterized proteins, 17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 34

olfactory receptors, zinc finger proteins, non-GPCR transmembrane proteins and coil-coil domain proteins(7). Out of 20 identified MPs, 1 uncharacterized proteins, 1 zinc finger protein, 2 GPCR membrane proteins and 1 coil-coil domain protein were identified. If the proteins have smaller molecular weight with few or no unique tryptic peptides of 9-30 amino acids or if the proteins are highly hydrophobic, the corresponding peptide detection would be difficultly conducted (5). In our primary identification list for 35 MPs without any filtering, Ras-related protein Rap-1b-like protein was defined as a potential MP with 4 unique peptides (non-nested, ≥9 a.a.) detected in liver and colon tissue. After examining by checker, 3 of them were filtrated out due to the shared peptides that have high homology sequences with a PE1 protein, Ras-related protein Rap-1b. As a matter of fact, the two proteins have only 2 a.a difference in 184 a.a length. The peptide with one a.a difference in Ras-related protein Rap-1b-like protein was identified in our experiment, however, the other one a.a difference was difficult to identify because of its tryptic peptide too short to be detected. As shown in Figure 3A, the 2,563 MPs have the amino acid sequence lengths obviously shorter than that in the MPs and PE1 proteins identified in this study. In this study, the distribution of MPs a.a sequence lengths is similar with that of PE1 proteins, suggesting that ProteoMinerTM enrichment did not show a bias on protein lengths and the protein binding to the resin was mainly based on the interactions of the hexapeptide structures and the protein complexes. The hydrophobicity

analysis

by

ProPAS

(http://bioinfo.hupo.org.cn/tools/ProPAS/propas.htm) clearly revealed that the MPs 18

ACS Paragon Plus Environment

Page 19 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

found in this study had a higher hydrophobicity distribution due to 40% membrane protein identified in these MPs, while the 2,563 MPs appeared two peaks in hydrophobicity (Figure 3B). As some of the MPs yet to be identified have higher hydrophobicity than all the proteins already identified, there is still a long way to get all MP identified if they are not well extracted or enriched.

Confirmation of MP identification by Parallel reaction monitoring (PRM) PRM assay, which provides high selectivity, high sensitivity with confident targeted peptide confirmation, was adopted as an orthogonal method to validate the MPs identified by DDA (55-57). The identified unique peptides for MPs in Table 1 were used as the targets for PRM verification. The two typical PRM traces derived from native tissue and synthetic peptides were shown in Figure 4, in which the 2 unique peptides of Sodium-dependent phosphate transport protein 1 (Q14916/NPT1, PE2, kidney membrane protein) were obtained confirmation with more than 8 peptide fragments, and the 2 unique peptides of a PE2 bladder secreted protein, Apolipoprotein L4 (Q9BPW4/APOL4) were also successfully detected in PRM assay. Totally 16 MPs with at least 2 non-nested unique peptides and 4 MPs with only one unique peptide were perceived in this assay. Then 32 peptides from the 16 MPs were synthesized (2 peptides/protein) and put into the matrix of the equalized peptides for PRM testing. The chromatographic and MS behaviors of these synthetic peptides were compared with that of native peptides from human tissue. Figure 4 revealed the good match of the synthetic peptides and tissue native peptides from NPT1 and 19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

APOL4 in RT and fragment abundance patterns. In total, 15 MPs were verified with two non-nested unique peptides whose chromatographic and MS behaviors in RT and fragment abundance patterns were matched well with that of their corresponding synthetic peptides (Table 1, Figure 4 and Supplementary Figure 3). To evaluate the reproducibility of ProteoMinerTM depletion performance, the PRM assays were performed to evaluate the equalized peptides that were generated from human bladder and kidney proteins which were twice treated with ProteoMinerTM. As shown in Supplementary Figure 4, 10 peptides were identified with a good reproducibility in the biological duplicates and with a good consistence at the chromatographic and MS behaviors compared with their corresponding synthetic peptides.

Conclusions For the first time, we proposed that ProteoMinerTM is capable of equalizing protein abundance in human tissues for MP discovery. Combined with TX-100/TBS and ProteoMinerTM for protein extraction, we achieved 20 MPs with higher scores of confidential identification in four human tissues, including 8 membrane/secreted proteins and 5 nucleus proteins, 15 of which were confirmed to be identified by matching with their chemically synthetic peptides. The results support our hypothesis that equalization of protein abundance with ProteoMinerTM prefers in finding of some membrane or nucleus MPs with lower abundance.

20

ACS Paragon Plus Environment

Page 20 of 34

Page 21 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Acknowledgements This work was supported by the National Basic Research Program of China (2014CBA02002,2014CBA02005) and the National Natural Science Foundation of China (31500670). All the shotgun proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD006833, and the PRM data have been uploaded to PeptideAtlas (http://www.peptideatlas.org) with the dataset identifier PASS01068 and PASS01095. References: 1.

Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.;

Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol 2012, 30, (3), 221-3. 2.

Paik, Y. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.;

Yamamoto, T.; Legrain, P.; Lee, H. J.; Na, K.; Jeong, S. K.; He, F.; Binz, P. A.; Nishimura, T.; Keown, P.; Baker, M. S.; Yoo, J. S.; Garin, J.; Archakov, A.; Bergeron, J.; Salekdeh, G. H.; Hancock, W. S., Standard guidelines for the chromosome-centric human proteome project. J Proteome Res 2012, 11, (4), 2005-13. 3.

Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y. K.; Weintraub, S. T.; Lane, L.;

Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S., Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J Proteome Res 2016, 15, (11), 3961-3970. 4.

Kim, M. S.; Pinto, S. M.; Getnet, D.; Nirujogi, R. S.; Manda, S. S.; Chaerkady, R.; Madugundu, A. K.;

Kelkar, D. S.; Isserlin, R.; Jain, S.; Thomas, J. K.; Muthusamy, B.; Leal-Rojas, P.; Kumar, P.; Sahasrabuddhe, N. A.; Balakrishnan, L.; Advani, J.; George, B.; Renuse, S.; Selvan, L. D.; Patil, A. H.; Nanjappa, V.; Radhakrishnan, A.; Prasad, S.; Subbannayya, T.; Raju, R.; Kumar, M.; Sreenivasamurthy, S. K.; Marimuthu, A.; Sathe, G. J.; Chavan, S.; Datta, K. K.; Subbannayya, Y.; Sahu, A.; Yelamanchi, S. D.; Jayaram, S.; Rajagopalan, P.; Sharma, J.; Murthy, K. R.; Syed, N.; Goel, R.; Khan, A. A.; Ahmad, S.; Dey, G.; Mudgal, K.; Chatterjee, A.; Huang, T. C.; Zhong, J.; Wu, X.; Shaw, P. G.; Freed, D.; Zahari, M. S.; Mukherjee, K. K.; Shankar, S.; Mahadevan, A.; Lam, H.; Mitchell, C. J.; Shankar, S. K.; Satishchandra, P.; Schroeder, J. T.; Sirdeshmukh, R.; Maitra, A.; Leach, S. D.; Drake, C. G.; Halushka, M. K.; Prasad, T. S.; Hruban, R. H.; Kerr, C. L.; Bader, G. D.; Iacobuzio-Donahue, C. A.; Gowda, H.; Pandey, A., A draft map of the human proteome. Nature 2014, 509, (7502), 575-81. 5.

Omenn, G. S.; Lane, L.; Lundberg, E. K.; Beavis, R. C.; Overall, C. M.; Deutsch, E. W., Metrics for 21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications. J Proteome Res 2016, 15, (11), 3951-3960. 6.

Omenn, G. S.; Lane, L.; Lundberg, E. K.; Beavis, R. C.; Nesvizhskii, A. I.; Deutsch, E. W., Metrics for

the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification. J Proteome Res 2015, 14, (9), 3452-60. 7.

Baker, M. S.; Ahn, S. B.; Mohamedali, A.; Islam, M. T.; Cantor, D.; Verhaert, P. D.; Fanayan, S.;

Sharma, S.; Nice, E. C.; Connor, M.; Ranganathan, S., Accelerating the search for the missing proteins in the human proteome. Nat Commun 2017, 8, 14271. 8.

Omenn, G. S., The strategy, organization, and progress of the HUPO Human Proteome Project. J

Proteomics 2014, 100, 3-7. 9.

Duek, P.; Bairoch, A.; Gateau, A.; Vandenbrouck, Y.; Lane, L., Missing Protein Landscape of Human

Chromosomes 2 and 14: Progress and Current Status. J Proteome Res 2016, 15, (11), 3971-3978. 10. Paik, Y. K.; Overall, C. M.; Deutsch, E. W.; Hancock, W. S.; Omenn, G. S., Progress in the Chromosome-Centric Human Proteome Project as Highlighted in the Annual Special Issue IV. J Proteome Res 2016, 15, (11), 3945-3950. 11. Gaudet, P.; Michel, P. A.; Zahn-Zabal, M.; Britan, A.; Cusin, I.; Domagalski, M.; Duek, P. D.; Gateau, A.; Gleizes, A.; Hinard, V.; Rech de Laval, V.; Lin, J.; Nikitin, F.; Schaeffer, M.; Teixeira, D.; Lane, L.; Bairoch, A., The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res 2017, 45, (D1), D177-D182. 12. Elguoshy, A.; Magdeldin, S.; Xu, B.; Hirao, Y.; Zhang, Y.; Kinoshita, N.; Takisawa, Y.; Nameta, M.; Yamamoto, K.; El-Refy, A.; El-Fiky, F.; Yamamoto, T., Why are they missing? : Bioinformatics characterization of missing human proteins. J Proteomics 2016, 149, 7-14. 13. Horvatovich, P.; Lundberg, E. K.; Chen, Y. J.; Sung, T. Y.; He, F.; Nice, E. C.; Goode, R. J.; Yu, S.; Ranganathan, S.; Baker, M. S.; Domont, G. B.; Velasquez, E.; Li, D.; Liu, S.; Wang, Q.; He, Q. Y.; Menon, R.; Guan, Y.; Corrales, F. J.; Segura, V.; Casal, J. I.; Pascual-Montano, A.; Albar, J. P.; Fuentes, M.; Gonzalez-Gonzalez, M.; Diez, P.; Ibarrola, N.; Degano, R. M.; Mohammed, Y.; Borchers, C. H.; Urbani, A.; Soggiu, A.; Yamamoto, T.; Salekdeh, G. H.; Archakov, A.; Ponomarenko, E.; Lisitsa, A.; Lichti, C. F.; Mostovenko, E.; Kroes, R. A.; Rezeli, M.; Vegvari, A.; Fehniger, T. E.; Bischoff, R.; Vizcaino, J. A.; Deutsch, E. W.; Lane, L.; Nilsson, C. L.; Marko-Varga, G.; Omenn, G. S.; Jeong, S. K.; Lim, J. S.; Paik, Y. K.; Hancock, W. S., Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. J Proteome Res 2015, 14, (9), 3415-31. 14. Millioni, R.; Tolin, S.; Puricelli, L.; Sbrignadello, S.; Fadini, G. P.; Tessari, P.; Arrigoni, G., High abundance proteins depletion vs low abundance proteins enrichment: comparison of methods to reduce the plasma proteome complexity. PLoS One 2011, 6, (5), e19603. 15. Boschetti, E.; Righetti, P. G., The ProteoMiner in the proteomic arena: a non-depleting tool for discovering low-abundance species. J Proteomics 2008, 71, (3), 255-64. 16. Guerrier, L.; Righetti, P. G.; Boschetti, E., Reduction of dynamic protein concentration range of biological extracts for the discovery of low-abundance proteins by means of hexapeptide ligand library. Nat Protoc 2008, 3, (5), 883-90. 17. Keidel, E. M.; Ribitsch, D.; Lottspeich, F., Equalizer technology--Equal rights for disparate beads. Proteomics 2010, 10, (11), 2089-98. 18. Sennels, L.; Salek, M.; Lomas, L.; Boschetti, E.; Righetti, P. G.; Rappsilber, J., Proteomic analysis of human blood serum using peptide library beads. J Proteome Res 2007, 6, (10), 4055-62. 19. Weng, Y.; Sui, Z.; Shan, Y.; Jiang, H.; Zhou, Y.; Zhu, X.; Liang, Z.; Zhang, L.; Zhang, Y., In-Depth 22

ACS Paragon Plus Environment

Page 22 of 34

Page 23 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Proteomic Quantification of Cell Secretome in Serum-Containing Conditioned Medium. Anal Chem 2016, 88, (9), 4971-8. 20. Castagna, A.; Cecconi, D.; Sennels, L.; Rappsilber, J.; Guerrier, L.; Fortis, F.; Boschetti, E.; Lomas, L.; Righetti, P. G., Exploring the hidden human urinary proteome via ligand library beads. J Proteome Res 2005, 4, (6), 1917-30. 21. Peffers, M. J.; McDermott, B.; Clegg, P. D.; Riggs, C. M., Comprehensive protein profiling of synovial fluid in osteoarthritis following protein equalization. Osteoarthritis Cartilage 2015, 23, (7), 1204-13. 22. Guerrier, L.; Claverol, S.; Finzi, L.; Paye, F.; Fortis, F.; Boschetti, E.; Housset, C., Contribution of solid-phase hexapeptide ligand libraries to the repertoire of human bile proteins. J Chromatogr A 2007, 1176, (1-2), 192-205. 23. Altomare, A.; Fasoli, E.; Colzani, M.; Parra, X. M.; Ferrari, M.; Cilurzo, F.; Rumio, C.; Cannizzaro, L.; Carini, M.; Righetti, P. G.; Aldini, G., An in depth proteomic analysis based on ProteoMiner, affinity chromatography and nano-HPLC-MS/MS to explain the potential health benefits of bovine colostrum. J Pharm Biomed Anal 2016, 121, 297-306. 24. Ben Mlouka, M. A.; Khemiri, A.; Seyer, D.; Hardouin, J.; Chan Tchi Song, P.; De, E.; Jouenne, T.; Cosette, P., Characterization of new outer membrane proteins of Pseudomonas aeruginosa using a combinatorial peptide ligand library. Anal Bioanal Chem 2015, 407, (5), 1513-8. 25. Thulasiraman, V.; Lin, S.; Gheorghiu, L.; Lathrop, J.; Lomas, L.; Hammond, D.; Boschetti, E., Reduction of the concentration difference of proteins in biological liquids using a library of combinatorial ligands. Electrophoresis 2005, 26, (18), 3561-71. 26. Thomas, T. C.; McNamee, M. G., Purification of membrane proteins. Methods Enzymol 1990, 182, 499-520. 27. Ozols, J., Preparation of membrane fractions. Methods Enzymol 1990, 182, 225-35. 28. Kraehenbuhl, J. P.; Bonnard, C., Purification and characterization of membrane proteins. Methods Enzymol 1990, 184, 629-41. 29. Garavito, R. M.; Ferguson-Miller, S., Detergents as tools in membrane biochemistry. J Biol Chem 2001, 276, (35), 32403-6. 30. Slinde, E.; Flatmark, T., Effect of the hydrophile-lipophile balance of non-ionic detergents (Triton X-series) on the solubilization of biological membranes and their integral b-type cytochromes. Biochim Biophys Acta 1976, 455, (3), 796-805. 31. Helenius, A.; Simons, K., Solubilization of membranes by detergents. Biochim Biophys Acta 1975, 415, (1), 29-79. 32. Malik, P.; Korfali, N.; Srsen, V.; Lazou, V.; Batrakou, D. G.; Zuleger, N.; Kavanagh, D. M.; Wilkie, G. S.; Goldberg, M. W.; Schirmer, E. C., Cell-specific and lamin-dependent targeting of novel transmembrane proteins in the nuclear envelope. Cell Mol Life Sci 2010, 67, (8), 1353-69. 33. Hao, P.; Ren, Y.; Dutta, B.; Sze, S. K., Comparative evaluation of electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) and high-pH reversed phase (Hp-RP) chromatography in profiling of rat kidney proteome. J Proteomics 2013, 82, 254-62. 34. Wen, B.; Zhou, R.; Feng, Q.; Wang, Q.; Wang, J.; Liu, S., IQuant: an automated pipeline for quantitative proteomics based upon isobaric tags. Proteomics 2014, 14, (20), 2280-5. 35. Brosch, M.; Yu, L.; Hubbard, T.; Choudhary, J., Accurate and sensitive peptide identification with Mascot Percolator. J Proteome Res 2009, 8, (6), 3176-81. 36. Savitski, M. M.; Wilhelm, M.; Hahne, H.; Kuster, B.; Bantscheff, M., A Scalable Approach for 23

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Protein False Discovery Rate Estimation in Large Proteomic Data Sets. Mol Cell Proteomics 2015, 14, (9), 2394-404. 37. MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J., Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26, (7), 966-8. 38. Beseme, O.; Fertin, M.; Drobecq, H.; Amouyel, P.; Pinet, F., Combinatorial peptide ligand library plasma treatment: Advantages for accessing low-abundance proteins. Electrophoresis 2010, 31, (16), 2697-704. 39. Bandow, J. E., Comparison of protein enrichment strategies for proteome analysis of plasma. Proteomics 2010, 10, (7), 1416-25. 40. Rivers, J.; Hughes, C.; McKenna, T.; Woolerton, Y.; Vissers, J. P.; Langridge, J. I.; Beynon, R. J., Asymmetric proteome equalization of the skeletal muscle proteome using a combinatorial hexapeptide library. PLoS One 2011, 6, (12), e28902. 41. Kalipatnapu, S.; Chattopadhyay, A., Membrane protein solubilization: recent advances and challenges in solubilization of serotonin1A receptors. IUBMB Life 2005, 57, (7), 505-12. 42. Qian, W. J.; Kaleta, D. T.; Petritis, B. O.; Jiang, H.; Liu, T.; Zhang, X.; Mottaz, H. M.; Varnum, S. M.; Camp, D. G., 2nd; Huang, L.; Fang, X.; Zhang, W. W.; Smith, R. D., Enhanced detection of low abundance human plasma proteins using a tandem IgY12-SuperMix immunoaffinity separation strategy. Mol Cell Proteomics 2008, 7, (10), 1963-73. 43. Jongeneel, C. V.; Delorenzi, M.; Iseli, C.; Zhou, D.; Haudenschild, C. D.; Khrebtukova, I.; Kuznetsov, D.; Stevenson, B. J.; Strausberg, R. L.; Simpson, A. J.; Vasicek, T. J., An atlas of human gene expression from massively parallel signature sequencing (MPSS). Genome Res 2005, 15, (7), 1007-14. 44. Uhlen, M.; Hallstrom, B. M.; Lindskog, C.; Mardinoglu, A.; Ponten, F.; Nielsen, J., Transcriptomics resources of human tissues and organs. Mol Syst Biol 2016, 12, (4), 862. 45. Fagerberg, L.; Hallstrom, B. M.; Oksvold, P.; Kampf, C.; Djureinovic, D.; Odeberg, J.; Habuka, M.; Tahmasebpoor, S.; Danielsson, A.; Edlund, K.; Asplund, A.; Sjostedt, E.; Lundberg, E.; Szigyarto, C. A.; Skogs, M.; Takanen, J. O.; Berling, H.; Tegel, H.; Mulder, J.; Nilsson, P.; Schwenk, J. M.; Lindskog, C.; Danielsson, F.; Mardinoglu, A.; Sivertsson, A.; von Feilitzen, K.; Forsberg, M.; Zwahlen, M.; Olsson, I.; Navani, S.; Huss, M.; Nielsen, J.; Ponten, F.; Uhlen, M., Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics 2014, 13, (2), 397-406. 46. Uhlen, M.; Fagerberg, L.; Hallstrom, B. M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, A.; Kampf, C.; Sjostedt, E.; Asplund, A.; Olsson, I.; Edlund, K.; Lundberg, E.; Navani, S.; Szigyarto, C. A.; Odeberg, J.; Djureinovic, D.; Takanen, J. O.; Hober, S.; Alm, T.; Edqvist, P. H.; Berling, H.; Tegel, H.; Mulder, J.; Rockberg, J.; Nilsson, P.; Schwenk, J. M.; Hamsten, M.; von Feilitzen, K.; Forsberg, M.; Persson, L.; Johansson, F.; Zwahlen, M.; von Heijne, G.; Nielsen, J.; Ponten, F., Proteomics. Tissue-based map of the human proteome. Science 2015, 347, (6220), 1260419. 47. Zhang, Y.; Li, Q.; Wu, F.; Zhou, R.; Qi, Y.; Su, N.; Chen, L.; Xu, S.; Jiang, T.; Zhang, C.; Cheng, G.; Chen, X.; Kong, D.; Wang, Y.; Zhang, T.; Zi, J.; Wei, W.; Gao, Y.; Zhen, B.; Xiong, Z.; Wu, S.; Yang, P.; Wang, Q.; Wen, B.; He, F.; Xu, P.; Liu, S., Tissue-Based Proteogenomics Reveals that Human Testis Endows Plentiful Missing Proteins. J Proteome Res 2015, 14, (9), 3583-94. 48. Jumeau, F.; Com, E.; Lane, L.; Duek, P.; Lagarrigue, M.; Lavigne, R.; Guillot, L.; Rondel, K.; Gateau, A.; Melaine, N.; Guevel, B.; Sergeant, N.; Mitchell, V.; Pineau, C., Human Spermatozoa as a Model for Detecting Missing Proteins in the Context of the Chromosome-Centric Human Proteome Project. J 24

ACS Paragon Plus Environment

Page 24 of 34

Page 25 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Proteome Res 2015, 14, (9), 3606-20. 49. Vandenbrouck, Y.; Lane, L.; Carapito, C.; Duek, P.; Rondel, K.; Bruley, C.; Macron, C.; Gonzalez de Peredo, A.; Coute, Y.; Chaoui, K.; Com, E.; Gateau, A.; Hesse, A. M.; Marcellin, M.; Mear, L.; Mouton-Barbosa, E.; Robin, T.; Burlet-Schiltz, O.; Cianferani, S.; Ferro, M.; Freour, T.; Lindskog, C.; Garin, J.; Pineau, C., Looking for Missing Proteins in the Proteome of Human Spermatozoa: An Update. J Proteome Res 2016, 15, (11), 3998-4019. 50. Wei, W.; Luo, W.; Wu, F.; Peng, X.; Zhang, Y.; Zhang, M.; Zhao, Y.; Su, N.; Qi, Y.; Chen, L.; Zhang, Y.; Wen, B.; He, F.; Xu, P., Deep Coverage Proteomics Identifies More Low-Abundance Missing Proteins in Human Testis Tissue with Q-Exactive HF Mass Spectrometer. J Proteome Res 2016, 15, (11), 3988-3997.

51. Schaeffer, M.; Gateau, A.; Teixeira, D.; Michel, P. A.; Zahn-Zabal, M.; Lane, L., The neXtProt peptide uniqueness checker: a tool for the proteomics community. Bioinformatics 2017. PMID: 28520855, DOI:10.1093/bioinformatics/btx318

52. Li, D.; Fu, Y.; Sun, R.; Ling, C. X.; Wei, Y.; Zhou, H.; Zeng, R.; Yang, Q.; He, S.; Gao, W., pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry. Bioinformatics 2005, 21, (13), 3049-50. 53. Murer, H., [Membrane transport and renal physiology]. Schweiz Rundsch Med Prax 1982, 71, (43), 1685-91. 54. Hediger, M. A.; Romero, M. F.; Peng, J. B.; Rolfs, A.; Takanaga, H.; Bruford, E. A., The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteinsIntroduction. Pflugers Arch 2004, 447, (5), 465-8. 55. Peterson, A. C.; Russell, J. D.; Bailey, D. J.; Westphall, M. S.; Coon, J. J., Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics. Mol Cell Proteomics 2012, 11, (11), 1475-88. 56. Ronsein, G. E.; Pamir, N.; von Haller, P. D.; Kim, D. S.; Oda, M. N.; Jarvik, G. P.; Vaisar, T.; Heinecke, J. W., Parallel reaction monitoring (PRM) and selected reaction monitoring (SRM) exhibit comparable linearity, dynamic range and precision for targeted quantitative HDL proteomics. J Proteomics 2015, 113, 388-99. 57. Schiffmann, C.; Hansen, R.; Baumann, S.; Kublik, A.; Nielsen, P. H.; Adrian, L.; von Bergen, M.; Jehmlich, N.; Seifert, J., Comparison of targeted peptide quantification assays for reductive dehalogenases by selective reaction monitoring (SRM) and precursor reaction monitoring (PRM). Anal Bioanal Chem 2014, 406, (1), 283-91.

Figure Legends Figure 1. The images of SDS-PAGE gels to display the effects of ProteoMinerTM enrichmemnt on mouse liver proteins under four extraction conditions. Figure 2. The images of SDS-PAGE gels to display the effects of ProteoMinerTM enrichmemnt on the four human tissue proteins with TX-100 or NH4HCO3 extraction. 25

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A) human liver proteins; B) human colon proteins; C) human kidney proteins; D) human bladder proteins Figure 3. The comparison of hydrophobicity and amino acid length for different protein groups. A) analysis of protein size; B) analysis of protein hydrophobicity Figure 4. The illustration for the confirmation of MP identification by their synthetic peptides in PRM assay

Supporting information Supplementary Figure 1: The scheme illustrating the tissue sample pathological information and experimental flowchart. Supplementary Figure 2: The labeled spectra with MS identification information of all the identified unique peptides from 20 MPs Supplementary Figure 3: The comparison of RT and fragment abundance pattern for the native peptides in human tissues and synthetic peptides from 16 MPs identified in PRM assay Supplementary Figure 4: The evaluation for the reproducibility of ProteoMiner depletion treatment by PRM assay Supplementary Table 1: The list of target peptides with m/z and sequence obtained from Skyline analysis and used for the PRM assay Supplementary Table 2: The identification summary of equalized proteins by 26

ACS Paragon Plus Environment

Page 26 of 34

Page 27 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ProteoMinerTM from human four tissues. Supplementary Table 3: The list of proteins identified from the four human tissues in this study Supplementary Table 4: The precursor m/z, mass error, and E-value of all identified unique peptides from 20 MPs

27

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 28 of 34

Table 1, the list of missing proteins identified from four human tissues Protein ID

sp|F8WCM5|INSR2_HUMAN

MP

Cellular

Iden_Du

Level

Localization

plicates*

PE2

Secreted

R1

Non-nested Uni_Pep_ Sequecne

ALGTSDSPVLFIHCPGAAGTAQGLEYR;APPALVVTANIG

Confirmed

Tissue

Chro_

Isoform

Length

Coverage

by PRM#

Source

Location

N

Liver

11p15.5

NA

200

23.5

Y

Kidney

Xq26.2

NA

325

12.92

N

Kidney

19q13.2

NA

130

44.62

N

Kidney

21q22.11

NA

714

8.26

(%) )

QAGGSSSR sp|A6NGH7|CC160_HUMAN

PE3

NA

R1

EIFTKPLNFQETETDASKSDYELQALR;YIFQLNEIEQEQN

sp|A6NGS2|ERIC4_HUMAN

PE4

NA

R1&2

EELVTILEEEEESSKEEEEDQEPQR;EVSPVEIPGQTLR;QL

LR

NQAGLVPPGLGPPPQALR sp|P57058|HUNK_HUMAN

PE2

cytoplasm

R1

nucleus

FPMMGIGQMLRK;ISLEDLSPSVVLHMTEK;MVDKEMNP LPTQLSTGAISFLR

sp|Q14916|NPT1_HUMAN

PE2

Membrane;

R1&2

EYITSSLVQQVSSSR;NILSVIAVR

Y

Kidney

6p22.2

Q14916-1

467/413

6.64

sp|Q6ZP65|BICL1_HUMAN

PE2

centrosome

R1&2

LNLSQQLEAWQDDMHR;LSATLEENDLLQGTVEELQDR;

Y

Kidney

12q24.23

Q6ZP65-1

573/270

15.18

Y

Kidney

14q22.3

NA

296

21.28

cytoplasm

MDMMSLNSQLLDAIQQK;SLQSSAATSTSLLSEIEQSME AEELEQEREQLR

sp|Q9NVL8|CN105_HUMAN

PE2

NA

R1

ALEGQLPPLQENWYGR;DMYFDIPLEHR;METWLHEQE AQGQLLWDSSSSDSDEQGK

sp|Q96JF6|ZN594_HUMAN

PE2

Nucleus

R2

DSNQSSNLIIHQR;TFNQSSDLLR

Y

Kidney

17p13.2

NA

807

2.85

sp|Q86T96|RN180_HUMAN

PE2

Membrane;

R1&2

AFHLFGGFR;LTLLPTLYEIHSK

N

Kidney

5q12.3

Q86T96-1

592/416/9

3.72

6 sp|Q8IXH8|CAD26_HUMAN

PE2

membrane

R1&2

EEGARPGTLLGTFNAMDPDSQIR;GNYLVPLFIGDK;IVD

Y

Bladder

20q13.33

Q8IXH8-1/4

852/165/1

12.56

24/832

TSLIFNIR;LLVQDRDSPFTSAWR;NWGQSVELLTLR;RWV ITTLELEEEDPGPFPK;YELVHDPANWVSVDK sp|Q8IZF3|AGRF4_HUMAN

PE2

Membrane

R1&2

ACQMMLDIR;APETIESVAQGIR;SYSEVANHILDTAAISN

sp|Q92819|HYAS2_HUMAN

PE2

Membrane

R1&2

CLTETPIEYLR;ESSQHVTQLVLSNK

Y

Bladder

6p12.3

NA

695

6.62

Y

Bladder

8q24.13

NA

552

7.79

WAFIPNK

28

ACS Paragon Plus Environment

Page 29 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

sp|Q96MI6|PPM1M_HUMAN

Journal of Proteome Research

PE2

Nucleus

R1&2

IQQLAFVYPELLAGEFTR;LLGTLAVSR;LYMANAGDSR;

Y

Bladder

3p21.2

Q96MI6-1/4

RDEIRPLSFEFTPETER sp|Q9BPW4|APOL4_HUMAN

PE2

Secreted

R1&2

FTEEVIEYFQK;ILLTSDEAWKR;LTATSTDQLEALR;NLTP

sp|A6NES4|MRO2A_HUMAN

PE4

NA

R1&2

FLLETMAYVK;MTVFQTTMCSILTR;NSLQELQLDPDPGV

270/123/1

20

69/247 Y

Bladder

22q12.3

Q9BPW4-1/ 2/3

48

Y

Bladder

2q37.1

NA

1674

YVAIEDKDMQQK;QWAQELEENLNELTHIHQSLK

351/347/3

22.22

2.75

RR sp|Q9NUC0|SRTD4_HUMAN

PE2

nucleus

R2

AHILYMSLEK;FIDDPEVYLR

N

Bladder

1q32.2

NA

356

7.87

sp|A6NK89|RASFA_HUMAN

PE2

Cytoplasm

R1&2

EEPLEPDGGPDGELLLEQER;GAPARPSLAMTQEK;LNTD

Y

Bladder

11p15.3

NA

507

16.77

LEAVKSDLDYSQQQWDSK;LWAAWGEEQENVR;METLV HLVLSQDHTIR sp|Q0D2K2|KLH30_HUMAN

PE2

Cytosol

R2

AALSQGHDGAPLALQQK;DVEPAVVGQLVDFVYTGR

Y

Bladder

2q37.3

NA

578

7.09

sp|Q5XKL5|BTBD8_HUMAN

PE2

nucleus

R2

ATVSEQLSQDLLR;NIKNYEEEILR

Y

Bladder

1p22.1

Q5XKL5-1/

378/305

6.35

sp|Q5UAW9|GP157_HUMAN

PE2

membrane

R2

AHTALSEYRPILSQEHR;APAPSKPGESQESQGTPGELPST

Y

Bladder

1p36.22

NA

335

11.94

2

Notes: 1, *: R1 or R2: The MPs were identified with 2 non-nested unique peptides (≥ 9 a.a) in only one of the two LC-MS injections; R1&2: The MPs were identified with 2 non-nested unique peptides (≥ 9 a.a) in both two LC-MS injections; 2, #: Whether the MPs were confirmed by synthetic peptides in PRM assay.

29

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Figure 1

30

ACS Paragon Plus Environment

Page 30 of 34

Page 31 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Figure 2

31

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Figure 3

32

ACS Paragon Plus Environment

Page 32 of 34

Page 33 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Figure 4

33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

For TOC only

34

ACS Paragon Plus Environment

Page 34 of 34