Identification of Missing Proteins in the Phosphoproteome of Kidney

Aug 31, 2017 - PTMOracle: A Cytoscape App for Covisualizing and Coanalyzing Post-Translational Modifications in Protein Interaction Networks. Journal ...
3 downloads 14 Views 2MB Size
Subscriber access provided by Grand Valley State | University

Article

Identification of Missing Proteins in the Phosphoproteome of Kidney Cancer Xuehui Peng, Feng Xu, Shu Liu, Suzhen Li, Qingbo Huang, Lei Chang, Lei Wang, Xin Ma, Fuchu He, and Ping Xu J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00332 • Publication Date (Web): 31 Aug 2017 Downloaded from http://pubs.acs.org on August 31, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Identification of Missing Proteins in the Phosphoproteome of Kidney Cancer 1,2#

Xuehui Peng

2#, *

, Feng Xu

2#

3#

4

2

4

, Shu Liu , Suzhen Li , Qingbo Huang , Lei Chang , Lei Wang , 4

2*

1,2,3*

Xin Ma , Fuchu He , Ping Xu 1

Key Laboratory of Combinatorial Biosynthesis and Drug Discovery of the Ministry of Education,

School of Pharmaceutical Sciences, Wuhan University, Wuhan 430072, China

2

State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing

Proteome Research Center, National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing 102206, China

3

Graduate School, Anhui Medical University, Hefei 230032, China

4

Chinese PLA General Hospital, Urology Department, Beijing 100853, China

#

These authors contributed equally to this work.

*

To whom correspondence should be addressed:

Ping Xu National Center for Protein Sciences Beijing, Beijing Proteome Research Center, 38 Science Park Road, Changping District, Beijing 102206, China. Tel: 86-10-61777113; Fax: 86-10-61777050; E-mail: [email protected] Feng Xu National Center for Protein Sciences Beijing, Beijing Proteome Research Center, 38 Science Park Road, Changping District, Beijing 102206, China. Tel: 86-10-61777119; Fax: 86-10-61777050; E-mail: [email protected] Fuchu He 1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

National Center for Protein Sciences Beijing, Beijing Proteome Research Center, 38 Science Park Road, Changping District, Beijing 102206, China. Tel and Fax: 86-10-68171208; E-mail: [email protected]



ABSTRACT

Identifying missing proteins (MPs) has been one of the critical missions of Chromosome-Centric Human Proteome Project (C-HPP). Since 2012, over thirty research teams from seventeen countries have been trying to search adequate and accurate evidence for MPs through various biochemical strategies. MPs mainly fall into the following classes: (1) low molecular weight (LMW) proteins, (2) membrane proteins, (3) proteins that contained various post-translational modifications (PTMs), (4) nucleic acid-associated proteins, (5) low abundance, and (6) unexpressed genes. In this study, kidney cancer and adjacent tissues were used for phosphoproteomics research and 8,962 proteins were identified including 6,415 phosphoproteins, and 44,728 phosphosites, of which 10,266 were unreported previously. In total, 75 candidate detections were found, including 45 phoshoproteins. GO analysis for these 75 candidate detections revealed that these proteins mainly clustered as membrane proteins and took part in nephron and kidney development. After rigorous screening and manual check, 9 of them were verified with the synthesized peptides. Finally, only one missing protein was confirmed. All mass spectrometry data from this study have been deposited in the PRIDE with identifier PXD006482 (username: [email protected] and password: Tv5n7wuj). KEYWORDS Chromosome-Centric Human Proteome Project, Missing proteins, phosphoproteomics, kidney 2

ACS Paragon Plus Environment

Page 2 of 27

Page 3 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

cancer, LTQ Orbitrap Velos



INTRODUCTION

The Chromosome-Centric Human Proteome Project (C-HPP) is launched by the Human Proteome Organization (HUPO) with the initial goal that the C-HPP database is thoroughly matched with all the human protein-coding genes by identifying and characterizing at least one representative protein product and as many posttranslational modification (PTM), single amino acid polymorphism (SAP), and splice variant isoforms as possible in whole-chromosome sets1. According to the neXtProt2 database (version 201702), there are 2579 “missing proteins” (MPs) with merely evidence at transcript level (protein evidence level 2, PE2), or based on homology (PE3) or prediction (PE4), yet not with high-stringency evidence at protein level (PE1) validated by Edman sequencing, antibody-capture, mass spectrometry or 3D structure3. MPs fail to be successfully detected for a couple of reasons, for instance, they are improperly predicted, or are expressed only in a certain time and/or space, or their biophysical (low abundance, low molecular weight and hydrophobicity4 etc.) and chemical properties (pI value etc.) are not compatible with usual proteomics experiments5-6. Thus, multiple analytical and experimental methods have been developed to solubilize, capture, and characterize special proteins, especially those low molecular weight (LMW) proteins, membrane proteins and PTM proteins7. Given that many kidney-enriched genes were membrane-bound8, kidney could be perfect materials for identifying missing membrane proteins. As many important signal transduction pathways are initiated and regulated by the phosphorylation of the low abundant membrane proteins in lipid rafts9-10, it is feasible to enrich 3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

those phosphorylated proteins or their corresponding phosphopeptides through phosphoproteomic enrichment methods following by identifying them by mass spectrometry (MS) and thus contribute to the identification of MPs. Recent years, phosphoproteomic methods have been well developed at a depth of over 10,000 phosphosites. The commonly used methods for enrichment of phosphopeptides are Fe3+/ Ti4+-immobilized metal affinity chromatography (Fe3+/Ti4+-IMAC)11-12, metal oxide (TiO2) affinity chromatography (MOAC)13 and Src homology 2 (SH2) domain affinity purification14. To achieve deep coverage of phosphoproteome, pre-fractionation steps are usually taken to reduce the complexity of samples, allowing the MS to identify more low abundant proteins. Fractionation methods include cutting the 1D SDS-PAGE gel into several gel bands according to the molecular weight and abundance of proteins15, or separating the peptides by an/multiple offline/online chromatography (reverse phase liquid chromatography16, strong anion-exchange chromatography17, strong cation-exchange chromatography18, etc.). In terms of quantitative phosphoproteomics, universally acknowledged methods are stable isotope labeling by amino acids in cell culture (SILAC)19-20 and isobaric tag for relative and absolute quantitation (iTRAQ)21-22. However, label-free phosphoproteomics has also achieved the same quantitative completeness as labelled quantitative phosphoproteomics mentioned above and is gaining more attention. Except membrane proteins, proteins that contained various post-translational modifications (PTMs), such as phosphorylation, were also difficult to be identified. Tomonaga et al. identified 11,278 proteins, including 8,305 phosphoproteins and 28,205 phosphorylation sites from five independent colorectal cancer (CRC) samples and up to 3,033 of these identified proteins were MPs, which currently lack evidence by mass spectrometry in the neXtProt database. Therefore, enrichment of phosphorylated proteins may allow more detection of MPs23. 4

ACS Paragon Plus Environment

Page 4 of 27

Page 5 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Given that many kidney-enriched genes were membrane-bound and enrichment of phosphorylated proteins may allow more detection of MPs, phosphoproteome analysis of kidney cancer tissues were performed to identify more MPs. In this study, we identified 8,962 proteins, including 6,415 phosphoproteins from kidney cancer and adjacent samples derived from 8 patients. Also, 44,728 phosphosites were identified and 10,266 of them (23%) are new phosphosites not registered in the PhosphoSitePlus database24. In total, 75 candidate detections were found, of which 12 passed our rigorous manual check, yet did not meet the guidelines 2.1 revealing difficulty in finding MPs. Finally, P81133 was just verified as MP by comparing synthesized peptides with MS2 spectra similarity. GO analysis illustrated that most of the candidate detections were membrane proteins, meanwhile, 60% (45/75) of these detections were phosphoproteins.



MATERIALS AND METHODS

Samples Used in This Study All kidney tissue samples were obtained from eight kidney clear cell carcinoma patients at Chinese PLA General Hospital (Beijing, China) after surgical resection with informed consent. The tumor and adjacent tissues after surgical resection were washed with cold PBS three times and stored at -80°C until use. All tissues were histologically confirmed before analysis. Kidney Cancer Phosphoproteomics Protein Extraction and Digestion Kidney tissue samples were ground in liquid nitrogen and sonicated with lysis buffer (9 M Urea, 10 mM Tris–HCl (pH 8.0), 30 mM NaCl, 50 mM IAA, 5 mM Na4P2O7, 100 mM Na2HPO4 (pH 8.0), 1 mM NaF, 1 mM Na3VO4, 1 mM sodium glycerophophate, 1% phosphatase inhibitor 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

cocktail 2 (Sigma, St. Louis, USA), 1% phosphatase inhibitor cocktail 3 (Sigma, St. Louis, USA), 1 tablet of EDTA-free protease inhibitor cocktail (Roche, Basel, Switzerland) for every 10 mL of lysis buffer). Then the total lysate was centrifuged at 17,000g for 15 min at 4℃ to remove debris. The evaluation of protein concentration and subsequent in-gel digestion was performed as previously described11. In brief, 2mg of kidney proteins were used as starting material for each sample, the proteins were reduced with 5 mM DTT for 30min at 37 °C and were alkylated with 20mM iodoacetamide for 30 min at room temperature in the dark. Next, proteins were resolved on a 10% SDS-PAGE gel and running for a 0.8 cm length and then stained with Coomassie Blue G-250. The entire gel lane was sliced into 1 mm3 pieces followed by destaining and in-gel digestion with 10 ng/µL Trypsin (Promega, Madison, WI, USA,) at 37 °C incubation overnight. Off-line High-pH HPLC Separation and Phosphopeptides Enrichment Peptide mixtures were separated by Off-line High-pH HPLC Separation and then enriched by a multistep immobilized metal ion affinity chromatography (IMAC) method as previously reported11, 25. Briefly, peptide mixtures were resuspended in 400 µL buffer A (5% ACN, 10 mM ammonium formate, pH = 10) and resolved to a Bonna-Agela C18, 5µm, 150Å,, 4.6 × 250 mm column on a RIGOL-L3000 HPLC (RIGOL, Beijing, China). Before experiments, the HPLC column was processed by methanol and equilibrated by buffer B (95% ACN, 10 mM ammonium formate, pH = 10) and buffer A sequentially. The fractionation gradient was performed as follows: 0%B at 0.1 mL/min for 5 min, 2-10% B for 5 min, 10-27%B for 32 min, 27-31% B for 3 min, 31-39% B for 4 min, 39-50% B for 7 min, and 50%-80% B for 5 min. The chromatogram was recorded at 214 nm. All the fractions were combined to 12 fractions. All of these 12 fractions were lyophilized immediately. After an off-line high-pH HPLC separation, the phosphopeptides were 6

ACS Paragon Plus Environment

Page 6 of 27

Page 7 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

enriched by immobilized metal ion affinity chromatography (IMAC) method as previously reported. Nickel-nitrilotriacetic acid (Ni-NTA) magnetic agarose beads slurry (IMAC beads, Qiagen, Dusseldorf, Germany) were used with minor modification26. Firstly, 120 µL Ni-NTA magnetic agarose beads slurry were centrifuged at 1,000g, the supernatants were discarded, and the beads were washed 3 times with ddH2O. Then, the beads were processed with 100 mM EDTA (pH = 8.0) and 10 mM FeCl3 for 1 h end-over-end, respectively. After removing excess FeCl3, the beads were washed 3 times with water, equilibration buffer (CH3OH: ACN: 0.01% HAc = 1:1:1), and loading buffer (80% ACN with 0.1% TFA) sequentially. After that, the beads were resuspended in 240 µL loading buffer, and equally packed into 12 pipes containing peptide mixtures, then incubated with loading buffer for 30 min. The non-specific adsorption of peptides were washed with loading buffer 3 times, and the enriched phosphopeptides were eluted with basic elution buffer (ACN: 25% NH3•H2O=1:1) twice. The eluted phosphopeptides were immediately acidified with 5% FA /50% ACN and lyophilized for LC–MS/MS analysis. LC-MS/MS Analysis and Database Searching The fractionated enriched peptides were eluted by Waters Nano Acquity LC and analyzed by LTQ Orbitrap Velos as previously reported7. Briefly, the peptides mixtures were eluted at a flow rate of 800 nL/min by a Waters Nano Acquity LC (3 µm, C18 reverse-phase fused-silica) through nonlinear gradient. The elution gradient was as follows: 0-10% B for 8 min, 10-22% B for 30 min, 22-32% B for 17min, 32-80% B for 3 min, and 80% B for 2 min (Phase A: 0.1% FA and 2% ACN in ddH2O; Phase B: 0.1% FA in 99.9% ACN). The initial MS spectrum (MS1) was analyzed over a range of m/z 300-1600 with a resolution of 30,000 at m/z 400. The automatic gain control (AGC) was set as 1 × 106, and the maximum injection time (MIT) was 150 ms. The subsequent MS 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

spectrum (MS2) was analyzed using data-dependent mode searching for the top 20 intense ions fragmented in the linear ion trap via collision induced dissociation (CID). The intensity threshold is 2×103. Ions with charge state from 2+ to 4+ were subjected for screening. For each scan, the AGC was set at 1×104, and the NCE was set at 35, MIT was 25 ms. The dynamic exclusion was set at 35s to suppress repeated peak fragmentation. MS/MS raw files were processed in Proteome Discoverer 2.0 (version 2.1.21, Matrix Science Mascot) against the neXtProt database (version 201702). The parameters set for database searching were as follows: cysteine carbamidomethyl was specified as a fixed modification. Oxidation of methionine, N-acetylation and phospho (STY) were set as variable modifications. The tolerances of precursor and fragment ions were set at 20 ppm and 0.5 Da, respectively. For digestion, trypsin was set as protease with two missed cleavage permitted. Only the proteins satisfying the following criteria were considered: (1) the peptide length ≥7; (2) the FDR ≤ 1% at peptide level; (3) the FDR ≤ 1% at protein level; (4) at least two different peptides (non-nested) for protein identification. The peptides were quantified by the peak area in Proteome Discoverer. For protein quantification, only the top 3 unique peptides were used for area calculation. The number of decoy identifications divided by that of target identifications were calculated as the corresponding FDRs of PSM (peptide spectrum match), peptide, and protein. Bioinformatics Analysis of Identified Proteins Chromosomal locations for MPs and identified proteins were elucidated according to the neXtProt database (http://www.nextprot.org/db/), and identified phosphorylation sites were aligned with the PhosphoSitePlus database (http://www.phosphosite.org/). The BiNGO plug-in Cytoscape based on Gene Ontology (GO) database was employed to perform analysis of 75 candidate detections. 8

ACS Paragon Plus Environment

Page 8 of 27

Page 9 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Verification of MPs with Synthesized Peptides To exclude false positives, we applied stringent filter to these identified candidate detections by manual check. The score, base peak intensity and continuity of b/y ion matches were taken into consideration to evaluate the quality of spectra. Peptides with relative high quality (unique peptides≥2, peptide length≥9) were selected and sent to Anhui Guoping Pharmaceutical company (Anhui, China) to be synthesized for further analysis. The pFind and pBuild software27-29 were used to match the spectra of the identified peptides to those of synthesized peptides. Cosine similarity was calculated concerning b1+, b2+, y1+ and y2+ ions that matched to the peptide by using the specific formula previously described30. Only when the cosine score is higher than 0.9, its corresponding peptides were considered as highly confirmed peptides of MPs. Considering that isobaric substitutions could change the mapping of the peptide from MPs to a commonly-observed protein, the isobaric filtering was performed by evaluating whether I=L, Q [Delaminated] = E, GG = N in the protein database.



RESULTS and DISCUSSION

Deep Phosphoproteome Coverage of Human Kidney Tissue Samples. In this study, deep phosphoproteome coverage of human kidney cancer and adjacent tissues was obtained through off-line high-pH reversed-phase fractionation followed by Fe3+-IMAC phosphopeptide enrichment strategies and high scan speed mass spectrometer of LTQ Orbitrap Velos (Figure 1A). In total, we identified 712,205 MS2 spectra and 702,711 unambiguous PSMs from 8 pairs of kidney tissue samples. In order to obtain credible proteomics data, strict criteria were set up, such as FDR values for PSM-(0.42%), peptide-(0.85%), and protein-level (0.56%), 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

respectively. The total number of the expected true positives and false positives at each level were counted in detail, as well as the database search parameters (Table S-1). As a result, 74,395 peptides and 8,962 proteins were identified with high confidence, including 60,402 phosphopeptides and 6,415 phosphoproteins (Table S-2, S-3, and S-4). The enrichment specificity which is defined as the proportion of phosphorylated peptides in total identified peptides for our study was 81.2%, indicating specific and effective enrichment of phosphopeptides of kidney tissues. Besides, 44,728 phosphosites were identified, of which 10,266 were not reported previously. To our knowledge, this is the largest human kidney phosphoproteome dataset so far. Accumulation curves of these identified phosphoproteins and phosphopeptides from each individual showed that the number of non-redundant phosphopeptides and phosphoproteins consistently increased with the addition of individuals and saturation was achieved starting with the sixth patient (Figure 1B, C). Finally, the total number of identified phosphoproteins and phosphopeptides reached 6,415 and 60,402 respectively. These data further indicated that deep phosphoproteome coverage of human kidney cancer and adjacent tissues was achieved to detect candidatedetections. The vast majority of the phosphoproteins (6238/6415) were singly phosphorylated and only 177 were multiply phosphorylated forms with 2 to 7 phosphosites (Figure 1D). Next, we compared our proteome and phosphoproteome datasets with newly released neXtProt MPs datasets (version 201702) and found 75 candidate detections in our datasets, among which 45 were phosphorylated (Figure 1E).

Chromosomal Localization of Newly Identified Phosphorylated Proteins To determine whether a phosphosite is novel, we need to perform comprehensive cross-check with 10

ACS Paragon Plus Environment

Page 10 of 27

Page 11 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

publicly available phosphorylation databases. We download the latest and most extensive phosphorylation

datasets

from

PhosphoSitePlus

database

(version

201704,

www.

phosphosite.org)23, and cross-checked the newly identified phosphopeptides in our study (Table S-4). The overlapped portion is 34,462 of registered phosphorylation sites, which occupied 14.9% of all registered sites in PhosphoSitePlus (34,462/221,153). The remaining 10,266 sites were not registered in PhosphoSitePlus (Figure 2A). The expending 4.4% of the size of PhosphoSitePlus dataset highly suggested the deep coverage of our phosphoproteomics study. The ptmRS node was employed to calculate individual probability values for each putatively modified site. Distribution of true phosphorylation sites on the level of PSMs for different phosphoRS site probability ranges were shown in Figure 2B. The chromosomal locations of the proteins with these identified phosphorylation sites were shown in Figure 2C, which mainly distributed on chromosome 1, 2, 11, and 17.

Enrichment of Specific Motifs in Phosphoproteomics As we know, kinases phosphorylate one certain kind of substrates through specific recognition of motifs during the biological processes and signaling transduction. In this study, the Motif-X algorithm was employed to extract specific phosphorylation motifs from the phosphoproteomic data. We launched motif analysis of Ser, Thr, and Tyr phosphorylation sites by Motif-X using the 13-mer sequences, and selected those score ranked top 4 for Ser (Figure 3A) and Thr (Figure 3B), top 2 for Tyr (Figure 3C). Surprisingly, we found two interesting motifs, SLTACK and GGSYSQAA in motif-T and motif-Y respectively, whose targeted kinase or function were not reported previously. We also found two proline-directed motifs (SP and TP) (Figure 3D), which 11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

were reported as substrates of mitogen-activated protein kinase (MAPK)31-32.

Enrichment of the Newly Identified Candidate Detections from Phosphoproteome on Membrane Proteins with High pI. To understand the reason of identification for newly identified detections in this study, we checked the properties of these candidate detections. Previous studies suggested that low molecular weight (LMW) proteins, extreme physical and chemical properties, and high pI were the main properties of MPs33. Proteins with LMW are difficult to detect using the standard MS/MS-based proteomics strategy because they produce fewer proteotypic peptides. Consistently, in this study, the percentage (73.3%) of relative LMW entries (< 60kDa) took the largest portion of these 75 candidate detections. Conversely, the percentage (26.5%) of HMW (high molecular weight) phosphorylated proteins (> 100kDa) took the largest portion of all phosphorylated proteins probably due to specific enrichment of phosphorylated peptides derived from some large proteins (Figure 4A). The isoelectric-point distribution of 75 candidate detections, phosphoproteins, and non-phosphoproteins were shown in Figure 4B. Obviously, these candidate detections dominated in high pI area (pI>8) when compared with both phosphoproteins and non-phosphoproteins, as they took a percentage of 57.3% in the range of pI>8, but only 28.7% and 30% for phosphoproteins and non-phosphoproteins in the same pI range Gene Ontology (GO) analysis was performed to observe the biological process and cellular component of these putative detections. Notoriously, membrane proteins are difficult to be detected in regular proteomics study, resulting in more candidates of MPs30, 33. As most of the detections were membrane proteins (Figure 4C) in our study of kidney cancer phosphoproteomics, 12

ACS Paragon Plus Environment

Page 12 of 27

Page 13 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

we surmise that these membrane proteins were detected mainly due to two reasons: (1) they are phosphorylated and play important roles in signaling transduction, and their phosphopepetide were specifically enriched by our Fe3+-IMAC strategy; (2) the non-phosphorylated membrane proteins may exposure their hydrophilic composition after sample preparing, and happened to be non-specifically caught during Fe3+-IMAC enrichment process. Among these membrane proteins, H2AFB2 and SIM1 are predicted to be involved in DNA binding, while the FRAT2 is predicted to be involved in Wnt signal pathway34. As the special materials used in our study, tissue-specificities of these candidate detections were also investigated. We found that SIM1 is kidney-specific. Besides, RBMY1B, which may participate in mRNA processing is testis-specific and some other 6 proteins are testis-enriched. There are 3 testis-enriched proteins TBC1D3D, RGPD2, and FRMD1 reported as expressed merely at mRNA level in kidney in the Human Protein Atlas. However, 5 other testis-enriched proteins RBMY1B, H2AFB2, SLC25A52, WBSCR28, and TOMM20L have not been reported as expressed at even mRNA level in kidney in the Human Protein Atlas yet. These proteins were identified in kidney for the first time from our data indicating that they may exert specific biological function needed to be investigated in the future. In addition, the biological process clustering revealed that the putative detections mainly participate in the development of nephron tubule, nephron epithelium, kidney epithelium, nephron and kidney mesenchyme (Figure 4D). These results were consistent with the kidney tissue we used in this study.

Kidney Cancer Phosphoproteomics Contributes to the Identification of MPs Among these 75 candidate detections, 45 of them were phosphorylated proteins (Figure 5A). Chromosomal distribution of identified phosphoproteins and candidate detections was showed in 13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5B. Notably, only 4 entries were detected on chromosome 1, on which the most number of proteins were identified. Conversely, 13 candidate detections were detected on chromosome 2, so we claimed that chromosome 2 accounted for the highest percentage of identified 75 candidate detections compared with chromosome 1 which had more identified proteins. This may also indicate the potential enrichment of kidney specific genes on certain chromosomes. The reliability of 75 candidate detections identified in this study was further verified by manual check. 50 detections whose peptide length was not less than 9 amino acids were remained. According to the guideline, MPs were required to have at least two unique mapping peptides. Isobaric or near isobaric substitutions which may change the results of MPs identified should be taken into account. For example, the unique peptide “AGLSGEIGPR” can be mapped to candidate detection Q8TDG2. It can also be isobaric substitution of “AGLSGEFGPR”, which is mapped to a commonly identified protein Q8TDY3 in our data. Assisted by the online tool available at https://search.nextprot.org/view/unicity-checker, 12 entries were obtained and referred as confident detections whose spectra had higher base peak intensity, continuous b/y ion matches and without single amino-acid variation (SAAV) (Figure 5C). Furthermore, a synthetic peptide was created and run through the same instrument to determine the correct identification of each PSM corresponding to a MP. 8 entries had one unique peptide which had high similarity (cosine score ≥ 0.94) between synthesized peptide spectra and those from large scale identifications (Figure 5D, table S-5) by comparing the matches for b1+, y1+, b2+, and y2+ ions and the pattern for peak intensity of these matches. Finally, the phosphoprotein P81133 passed the HPP guidelines and was determined as MP with two unique peptides whose cosine scores were not less than 0.96.

14

ACS Paragon Plus Environment

Page 14 of 27

Page 15 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research



Conclusion

In this study, deep phosphoproteome coverage of human kidney cancer and adjacent tissues was obtained through high coverage phosphopreteomics study. A total of 8962 proteins were identified with high confidence, which contributed 6415 identified phosphoproteins. This is the largest human kidney proteomics dataset so far. After comparing our proteome and phosphoproteome datasets with newly released neXtProt MPs datasets (version 201702), 75 candidates were overlapped in both datasets which was considered as putative detections, of which 50 and 13 proteins corresponding to peptide length ≥ 9 AA and uniquely mapping peptides ≥ 2, respectively. After strict filtering by manual check on spectra quality of HPP guidelines35, 12 proteins were selected as confident detections. After further verification by synthesized peptides, the phosphoprotein P81133 remained as MP. Our data demonstrated that membrane protein is enriched in the newly identified candidate detections from the kidney samples, suggesting an additional ideal material to identify MPs besides testis.



ASSOCIATED CONTENT

Supporting Information Supplementary Figure S-1. The verification of candidate detections by matching their corresponding synthesized peptide and previously identified MS spectra. Supplementary table S-1. The searching criteria and results summary of kidney cancer phosphoproteomics. Supplementary table S-2. The total proteins identified in human kidney cancer. Supplementary table S-3. The peptides identified in human kidney cancer. Supplementary table S-4. The phosphopeptides identified in human kidney cancer. Supplementary table S-5. The identification and verification of 15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the total candidate detections in human kidney cancer.



AUTHOR INFORMATION

Corresponding Authors *

P.X.: Tel/Fax: 86-10-61777113. E-mail: [email protected].

*

F.X.: Tel/Fax: 86-10-61777119. E-mail: [email protected].

*

F.H.: Tel/Fax: 86-10-68171208. E-mail: [email protected].

Author Contributions #

These authors contributed equally to this work.

Notes The authors declare no competing financial interest.



ACKNOWLEDGMENTS

We thank Wei Wei for his technical support on bioinformatics analysis. We also appreciate the Xu lab members for their helpful discussion and efforts. This work was funded by the Chinese National Basic Research Programs (2016YFA0501300, 2013CB911201 & 2015CB910700), the International Collaboration Program (2014DFB30020), the National Natural Science Foundation of China (Grant No. 31470809, 31670834, 31400698, 31400697 & 81670590), the National Natural Science Foundation of Beijing (Grant No. 5152008), Beijing Training Project for The Leading Talents in S&T, National Megaprojects for Key Infectious Diseases (2013zx10003002), the Innovation Foundation of Medicine (2017CXJJ19 & BWS14J052), Tianjin Baodi hospital Translational Medicine Research Center (TMRC2014M02B), the Unilever 21th Century Toxicity 16

ACS Paragon Plus Environment

Page 16 of 27

Page 17 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Program (MA-2014-02409) and the Foundation of State Key Lab of Proteomics (SKLPYB201501). 

ABBREVIATIONS

HPP, Human Proteome Project; C-HPP, Chromosome Centric Human Proteome Project; MPs, missing proteins; MW, molecular weight; LMW, low molecular-weight; PSM, peptide spectrum match; MIT, maximum injection time; FDR, false discovery rate.  REFERENCE (1). Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nature biotechnology 2012, 30 (3), 221-3. (2). Gaudet, P.; Michel, P. A.; Zahn-Zabal, M.; Britan, A.; Cusin, I.; Domagalski, M.; Duek, P. D.; Gateau, A.; Gleizes, A.; Hinard, V.; Rech de Laval, V.; Lin, J.; Nikitin, F.; Schaeffer, M.; Teixeira, D.; Lane, L.; Bairoch, A., The neXtProt knowledgebase on human proteins: 2017 update. Nucleic acids research 2017, 45 (D1), D177-D182. (3). Lane, L.; Bairoch, A.; Beavis, R. C.; Deutsch, E. W.; Gaudet, P.; Lundberg, E.; Omenn, G. S., Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. Journal of proteome research 2014, 13 (1), 15-20. (4). Zhang, C.; Li, N.; Zhai, L.; Xu, S.; Liu, X.; Cui, Y.; Ma, J.; Han, M.; Jiang, J.; Yang, C.; Fan, F.; Li, L.; Qin, P.; Yu, Q.; Chang, C.; Su, N.; Zheng, J.; Zhang, T.; Wen, B.; Zhou, R.; Lin, L.; Lin, Z.; Zhou, B.; Zhang, Y.; Yan, G.; Liu, Y.; Yang, P.; Guo, K.; Gu, W.; Chen, Y.; Zhang, G.; He, Q. Y.; Wu, S.; Wang, T.; Shen, H.; Wang, Q.; Zhu, Y.; He, F.; Xu, P., Systematic analysis of missing proteins provides clues to help define all of the protein-coding genes on human chromosome 1. Journal of proteome research 2014, 13 (1), 114-25. (5). Horvatovich, P.; Lundberg, E. K.; Chen, Y. J.; Sung, T. Y.; He, F.; Nice, E. C.; Goode, R. J.; Yu, S.; Ranganathan, S.; Baker, M. S.; Domont, G. B.; Velasquez, E.; Li, D.; Liu, S.; Wang, Q.; He, Q. Y.; Menon, R.; Guan, Y.; Corrales, F. J.; Segura, V.; Casal, J. I.; Pascual-Montano, A.; Albar, J. P.; Fuentes, M.; Gonzalez-Gonzalez, M.; Diez, P.; Ibarrola, N.; Degano, R. M.; Mohammed, Y.; Borchers, C. H.; Urbani, A.; Soggiu, A.; Yamamoto, T.; Salekdeh, G. H.; Archakov, A.; Ponomarenko, E.; Lisitsa, A.; Lichti, C. F.; Mostovenko, E.; Kroes, R. A.; Rezeli, M.; Vegvari, A.; Fehniger, T. E.; Bischoff, R.; Vizcaino, J. A.; Deutsch, E. W.; Lane, L.; Nilsson, C. L.; Marko-Varga, G.; Omenn, G. S.; Jeong, S. K.; Lim, J. S.; Paik, Y. K.; Hancock, W. S., Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. Journal of proteome research 2015, 14 (9), 3415-31. (6). Elguoshy, A.; Magdeldin, S.; Xu, B.; Hirao, Y.; Zhang, Y.; Kinoshita, N.; Takisawa, Y.; Nameta, M.; Yamamoto, K.; El-Refy, A.; El-Fiky, F.; Yamamoto, T., Why are they missing? : Bioinformatics characterization of missing human proteins. Journal of proteomics 2016, 149, 7-14. (7). Zhao, M.; Wei, W.; Cheng, L.; Zhang, Y.; Wu, F.; He, F.; Xu, P., Searching Missing Proteins Based on the Optimization of Membrane Protein Enrichment and Digestion Process. Journal of proteome 17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

research 2016, 15 (11), 4020-4029. (8). Uhlen, M.; Fagerberg, L.; Hallstrom, B. M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, A.; Kampf, C.; Sjostedt, E.; Asplund, A.; Olsson, I.; Edlund, K.; Lundberg, E.; Navani, S.; Szigyarto, C. A.; Odeberg, J.; Djureinovic, D.; Takanen, J. O.; Hober, S.; Alm, T.; Edqvist, P. H.; Berling, H.; Tegel, H.; Mulder, J.; Rockberg, J.; Nilsson, P.; Schwenk, J. M.; Hamsten, M.; von Feilitzen, K.; Forsberg, M.; Persson, L.; Johansson, F.; Zwahlen, M.; von Heijne, G.; Nielsen, J.; Ponten, F., Proteomics. Tissue-based map of the human proteome. Science 2015, 347 (6220), 1260419. (9). Adebiyi, A.; Soni, H.; John, T. A.; Yang, F., Lipid rafts are required for signal transduction by angiotensin II receptor type 1 in neonatal glomerular mesangial cells. Experimental cell research 2014, 324 (1), 92-104. (10). Simons, K.; Toomre, D., Lipid rafts and signal transduction. Nature reviews. Molecular cell biology 2000, 1 (1), 31-9. (11). Tian, M.; Cheng, H.; Wang, Z.; Su, N.; Liu, Z.; Sun, C.; Zhen, B.; Hong, X.; Xue, Y.; Xu, P., Phosphoproteomic analysis of the highly-metastatic hepatocellular carcinoma cell line, MHCC97-H. International journal of molecular sciences 2015, 16 (2), 4209-25. (12). Zhou, H.; Ye, M.; Dong, J.; Corradini, E.; Cristobal, A.; Heck, A. J.; Zou, H.; Mohammed, S., Robust phosphoproteome enrichment using monodisperse microsphere-based immobilized titanium (IV) ion affinity chromatography. Nature protocols 2013, 8 (3), 461-80. (13). Humphrey, S. J.; Azimifar, S. B.; Mann, M., High-throughput phosphoproteomics reveals in vivo insulin signaling dynamics. Nature biotechnology 2015, 33 (9), 990-5. (14). Bian, Y.; Li, L.; Dong, M.; Liu, X.; Kaneko, T.; Cheng, K.; Liu, H.; Voss, C.; Cao, X.; Wang, Y.; Litchfield, D.; Ye, M.; Li, S. S.; Zou, H., Ultra-deep tyrosine phosphoproteomics enabled by a phosphotyrosine superbinder. Nature chemical biology 2016, 12 (11), 959-966. (15). Xu, P.; Duong, D. M.; Seyfried, N. T.; Cheng, D.; Xie, Y.; Robert, J.; Rush, J.; Hochstrasser, M.; Finley, D.; Peng, J., Quantitative proteomics reveals the function of unconventional ubiquitin chains in proteasomal degradation. Cell 2009, 137 (1), 133-45. (16). Batth, T. S.; Francavilla, C.; Olsen, J. V., Off-line high-pH reversed-phase fractionation for in-depth phosphoproteomics. Journal of proteome research 2014, 13 (12), 6176-86. (17). Alpert, A. J.; Hudecz, O.; Mechtler, K., Anion-exchange chromatography of phosphopeptides: weak anion exchange versus strong anion exchange and anion-exchange chromatography versus electrostatic repulsion-hydrophilic interaction chromatography. Analytical chemistry 2015, 87 (9), 4704-11. (18). Xu, J.; Gao, J.; Yu, C.; He, H.; Yang, Y.; Figeys, D.; Zhou, H., Development of Online pH Gradient-Eluted Strong Cation Exchange Nanoelectrospray-Tandem Mass Spectrometry for Proteomic Analysis Facilitating Basic and Histidine-Containing Peptides Identification. Analytical chemistry 2016, 88 (1), 583-91. (19). Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M., Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Molecular & cellular proteomics : MCP 2002, 1 (5), 376-86. (20). Monetti, M.; Nagaraj, N.; Sharma, K.; Mann, M., Large-scale phosphosite quantification in tissues by a spike-in SILAC method. Nature methods 2011, 8 (8), 655-8. (21). Wiese, S.; Reidegeld, K. A.; Meyer, H. E.; Warscheid, B., Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research. Proteomics 2007, 7 (3), 340-50. (22). Mertins, P.; Udeshi, N. D.; Clauser, K. R.; Mani, D. R.; Patel, J.; Ong, S. E.; Jaffe, J. D.; Carr, S. A., 18

ACS Paragon Plus Environment

Page 18 of 27

Page 19 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

iTRAQ labeling is superior to mTRAQ for quantitative global proteomics and phosphoproteomics. Molecular & cellular proteomics : MCP 2012, 11 (6), M111 014423. (23). Shiromizu, T.; Adachi, J.; Watanabe, S.; Murakami, T.; Kuga, T.; Muraoka, S.; Tomonaga, T., Identification of missing proteins in the neXtProt database and unregistered phosphopeptides in the PhosphoSitePlus database as part of the Chromosome-centric Human Proteome Project. Journal of proteome research 2013, 12 (6), 2414-21. (24). Hornbeck, P. V.; Kornhauser, J. M.; Tkachev, S.; Zhang, B.; Skrzypek, E.; Murray, B.; Latham, V.; Sullivan, M., PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic acids research 2012, 40 (Database issue), D261-70. (25). Villen, J.; Gygi, S. P., The SCX/IMAC enrichment approach for global phosphorylation analysis by mass spectrometry. Nature protocols 2008, 3 (10), 1630-8. (26). Ficarro, S. B.; Adelmant, G.; Tomar, M. N.; Zhang, Y.; Cheng, V. J.; Marto, J. A., Magnetic bead processor for rapid evaluation and optimization of parameters for phosphopeptide enrichment. Analytical chemistry 2009, 81 (11), 4566-75. (27). Wang, L. H.; Li, D. Q.; Fu, Y.; Wang, H. P.; Zhang, J. F.; Yuan, Z. F.; Sun, R. X.; Zeng, R.; He, S. M.; Gao, W., pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. Rapid communications in mass spectrometry : RCM 2007, 21 (18), 2985-91. (28). Li, D.; Fu, Y.; Sun, R.; Ling, C. X.; Wei, Y.; Zhou, H.; Zeng, R.; Yang, Q.; He, S.; Gao, W., pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry. Bioinformatics 2005, 21 (13), 3049-50. (29). Fu, Y.; Yang, Q.; Sun, R.; Li, D.; Zeng, R.; Ling, C. X.; Gao, W., Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry. Bioinformatics 2004, 20 (12), 1948-54. (30). Su, N.; Zhang, C.; Zhang, Y.; Wang, Z.; Fan, F.; Zhao, M.; Wu, F.; Gao, Y.; Li, Y.; Chen, L.; Tian, M.; Zhang, T.; Wen, B.; Sensang, N.; Xiong, Z.; Wu, S.; Liu, S.; Yang, P.; Zhen, B.; Zhu, Y.; He, F.; Xu, P., Special Enrichment Strategies Greatly Increase the Efficiency of Missing Proteins Identification from Regular Proteome Samples. Journal of proteome research 2015, 14 (9), 3680-92. (31). Olsen, J. V.; Vermeulen, M.; Santamaria, A.; Kumar, C.; Miller, M. L.; Jensen, L. J.; Gnad, F.; Cox, J.; Jensen, T. S.; Nigg, E. A.; Brunak, S.; Mann, M., Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Science signaling 2010, 3 (104), ra3. (32). Redpath, N. T.; Proud, C. G., Cyclic AMP-dependent protein kinase phosphorylates rabbit reticulocyte elongation factor-2 kinase and induces calcium-independent activity. The Biochemical journal 1993, 293 ( Pt 1), 31-4. (33). Xu, A.; Li, G.; Yang, D.; Wu, S.; Ouyang, H.; Xu, P.; He, F., Evolutionary Characteristics of Missing Proteins: Insights into the Evolution of Human Chromosomes Related to Missing-Protein-Encoding Genes. Journal of proteome research 2015, 14 (12), 4985-94. (34). van Amerongen, R.; Nawijn, M. C.; Lambooij, J. P.; Proost, N.; Jonkers, J.; Berns, A., Frat oncoproteins act at the crossroad of canonical and noncanonical Wnt-signaling pathways. Oncogene 2010, 29 (1), 93-104. (35). Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y. K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S., Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. Journal of proteome research 2016, 15 (11), 3961-3970. 19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Page 20 of 27

Graphical Abstract

In this study, human kidney cancer and adjacent tissues were used as materials to find specific MPs, equipped with high pH off-line fractionation and Fe3+-IMAC enrichment in order to obtain deep coverage of phosphoproteome dataset. LTQ Orbitap Velos was used to detect kidney proteins, followed by the identification of the widely used authoritative Proteome Discoverer search engine with strict peptide and protein quality filtering. Finally, the largest kidney cancer phosphoteome dataset up to now was produced, with 6,415 phosphoproteins, 60,402 phosphopeptides, and 44,728 phosphosites were identified. In addition, 75 candidate detections were identified. This study suggests that phosphoproteomics strategy applied here is a promising and powerful tool to dig more MPs, which may also make more significant contributions to C-HPP goal in the near future.

Figure Legends Figure

1.

Overview

of

kidney

phosphoproteome

data.

(A)

Workflow

of

kidney

phosphoproteomics. (B-C) The accumulation curves of identified phosphopeptides and phosphoproteins from 8 kidney clear cell carcinoma patients. (D) Mono-and multi-phosphosites distribution of phosphoproteins. (E) Candidate detections found in total proteome and phosphoproteome. Figure 2. New phosphosites were found through kidney phosphoproteomics. (A) Comparison of identified phosphosites in this study with PhosphoSitePlus database. (B) The ptmRS site probability of identified phosphosites. (C) Chromosomal distribution of totally identified and

20

ACS Paragon Plus Environment

Page 21 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

previously unreported phosphosites in this study. Figure 3. Motif analysis. (A-C) Motif analysis of serine, threonine, and tyrosine. (D) Proline-directed motifs. Figure 4. Properties of candidate detections. (A) Molecular weight distribution of candidates, phosphoproteins, and non-phosphoproteins. (B) The isoelectric-point distribution of candidates, phosphoproteins, and non-phosphoproteins. (C) Cellular component analysis for candidates. (D) Biological process analysis of candidates. Figure 5. Distribution and verification of missing proteins. (A) Venn diagram of candidates in neXtProt_2017 dataset and this study. (B) Chromosomal distribution of the identified proteins and candidates. (C) Stepwise effort to credibly identify expression of MPs. (D) Example of verification of the identified peptide for MPs by using the synthesized peptide.

Supplementary Figure and Table Legends Supplementary Figure S-1. The verification of candidate detections by their synthesized peptide and previously identified MS spectra matching. Supplementary table S-1. The searching criteria and results summary of kidney cancer phosphoproteomics. Supplementary table S-2. The total proteins identified in human kidney cancer. Supplementary table S-3. The peptides identified in human kidney cancer. Supplementary table S-4. The phosphopeptides identified in human kidney cancer. Supplementary table S-5. The identification and verification of the total candidate detections in human kidney cancer.

21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Overview of kidney phosphoproteome data. (A) Workflow of kidney phosphoproteomics. (B-C) The accumulation curves of identified phosphopeptides and phosphoproteins from 8 kidney clear cell carcinoma patients. (D) Mono-and multi-phosphosites distribution of phosphoproteins. (E) Candidate detections found in total proteome and phosphoproteome. 177x229mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 22 of 27

Page 23 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2. New phosphosites were found through kidney phosphoproteomics. (A) Comparison of identified phosphosites in this study with PhosphoSitePlus database. (B) The ptmRS site probability of identified phosphosites. (C) Chromosomal distribution of totally identified and previously unreported phosphosites in this study. 177x229mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Motif analysis. (A-C) Motif analysis of serine, threonine, and tyrosine. (D) Proline-directed motifs. 177x229mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 24 of 27

Page 25 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4. Properties of candidate detections. (A) Molecular weight distribution of candidates, phosphoproteins, and non-phosphoproteins. (B) The isoelectric-point distribution of candidates, phosphoproteins, and non-phosphoproteins. (C) Cellular component analysis for candidates. (D) Biological process analysis of candidates. 177x229mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Distribution and verification of missing proteins. (A) Venn diagram of candidates in neXtProt_2017 dataset and this study. (B) Chromosomal distribution of the identified proteins and candidates. (C) Stepwise effort to credibly identify expression of MPs. (D) Example of verification of the identified peptide for MPs by using the synthesized peptide. 177x229mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 26 of 27

Page 27 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table of Contents Graphic 218x110mm (300 x 300 DPI)

ACS Paragon Plus Environment