Identification of Missing Proteins Defined by Chromosome-Centric

Jun 25, 2015 - E-mail: [email protected]. ... Currently, there are 2948 PE level 2–4 coding genes per neXtProt, which are deemed missing protei...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OF MISSISSIPPI

Article

Identification of missing proteins defined by chromosome-centric proteome project in the cytoplasmic detergent-insoluble proteins Yang Chen, Yaxing Li, Jiayong Zhong, Jing Zhang, Zhipeng Chen, Lijuan Yang, Xin Cao, Qing-Yu He, Gong Zhang, and Tong Wang J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/pr501103r • Publication Date (Web): 25 Jun 2015 Downloaded from http://pubs.acs.org on July 1, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Identification of missing proteins defined by chromosome-centric proteome project in the cytoplasmic detergent-insoluble proteins Yang Chen†, Yaxing Li†, Jiayong Zhong†, Jing Zhang, Zhipeng Chen, Lijuan Yang, Xin Cao, Qing-Yu He*, Gong Zhang* and Tong Wang*

Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China.

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 44

ABSTRACT Finding protein evidence (PE) for protein coding genes is a primary task of the Phase I Chromosome-centric Human Proteome Project (C-HPP). Currently, there are 2948 PE level 2-4 coding genes per neXtProt, which are deemed missing proteins in the human proteome. As most samples prepared and analyzed in the C-HPP framework were focusing on detergent soluble proteins, we posit that as a natural composition, the cytoplasmic detergent-insoluble proteins (DIPs) represent a source of finding missing proteins. We optimized a workflow and separated cytoplasmic DIPs from 3 human lung and 3 human hepatoma cell lines via differential speed centrifugation. We verified that the detergent-soluble proteins (DSPs) could be sufficiently depleted and the cytoplasmic DIP isolation was partially reproducible with Spearman r>0.70 according to 2 independent SILAC MS experiments. Through label-free MS, we identified 4524 and 4156 DIPs from lung and liver cells, respectively. Among them, a total of 23 missing proteins (22 PE2 and 1 PE4) were identified by MS, and 18 of them had translation evidence; in addition, 6 PE5 proteins were identified by MS, 3 with translation evidence. We showed that cytoplasmic DIPs were not an enrichment of transmembrane proteins, and were chromosome-, cell type-, and tissue-specific. Furthermore, we demonstrated that DIPs were distinct from DSPs in terms of structural and physical-chemical features. In conclusion, we have found 23 missing proteins and 6 PE5 proteins from the cytoplasmic insoluble proteome that is biologically and physical-chemically different from the soluble proteome, suggesting that cytoplasmic DIPs carry comprehensive and valuable information for finding PE of missing proteins. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD001694. KEYWORDS: Chromosome-Centric Human Proteome Project, detergent-insoluble proteins, missing proteins, neXtProt 2

ACS Paragon Plus Environment

Page 3 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION As accomplished by the human genome project (HGP) in 2004, ~20,000 protein coding genes had been identified in the human genome, providing the first systems knowledge of the starting point of the flow of human genetic information. 1, 2 In agreement with current opinions, 3-5 HGP annotated numerous genes that are uncertain for their coding properties, such as hypothetical and pseudo-genes. At the endpoint of this flow, a functional proteome carries the information of both post-transcriptional control and direct evidence for protein coding genes. 6-8 Accordingly, the chromosome-centric human proteome project (C-HPP)/human proteome project (HPP) has been implemented for nearly 3 years, with one of the primary goals to identify at least one protein product for each coding gene, incorporating evidence from transcriptome and translating mRNA analysis. 3-5, 8-11 As a widely accepted concept in C-HPP, missing proteins are defined as the translational product of known protein coding genes that lack adequate protein evidence (PE) per mass spectrometry (MS), antibody and transcripts/translating mRNA analyses, as well as bioinformatics analyses via such as neXtProt, Global Proteome Machine Database (GPMDB), PeptideAtlas and Human Protein Atlas (HPA). 4, 5, 12-22 Indeed, there are currently 5 PE categories (PE1-5) as proposed by HPP/C-HPP; PE1 has been defined as sufficient evidence of protein expression, while PE2-4 are deemed missing proteins as more experimental validations are needed. 5 In addition, PE5 genes are considered as doubtful, but potentially coding genes. 5 Until early 2014, the sample size of missing proteins had been approximately 3,800 out of the new denominator of 19,490 genes. 5, 23 Of course, the annual rapidly expanding human proteome map from the Human Proteome Project has been ongoing since 2011; PeptideAtlas, GPMDB, and neXtProt promptly incorporated the datasets from 24, 25, made publicly available through ProteomeXchange and PRIDE, into the 09-19-2014 update of neXtProt. As a result,

3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 44

a total of 16491 PE1 proteins had been confirmed in that neXtProt release, which brought the number of missing proteins down to 2948.

If the protein product existed, Paik et al and Lane et al proposed possible reasons for why these missing proteins were not identified by MS. 4, 5 In general, these proteins are either of extremely low abundance or not suitable for MS determination due to their hard digestibility, specially physical and chemical properties, or the feature being not distinguishable within protein families because of high sequence homology. To overcome these hindrances, subcellular fractionation has been widely applied. However, we previously showed that the number of protein identifications was marginally increased along with the mass spectra amount increase when a plateau phase had been reached. 26 This suggested that finding missing proteins in a non-traditional compartment of cell lysates could be useful for C-HPP.

It is commonly known that the post-translational fate of proteins largely diverges from correct and incorrect folding, while the mis-folded and unfolded proteins form aggregates in general. 27 Protein aggregations always exist in the cytoplasm of steady-state cells at an insoluble state. These aggregated proteins are readily and regularly precipitated and thus usually discarded during the protein extraction process for MS analyses. In addition, Zhang et al found that highly hydrophobic proteins were frequently missed by MS. 20 We hypothesize that as a natural fraction, cytoplasmic detergent-insoluble proteins (DIPs) represent a source of “missing proteins” in human proteome.

In this study, we optimized a procedure to effectively isolate the cytoplasmic DIPs from 6 human lung and liver cell lines, respectively. A total of 23 missing proteins (PE2-4) and 6 PE5 proteins were identified by MS from these DIPs, and their physical-chemical features and cell-specific chromosomal 4

ACS Paragon Plus Environment

Page 5 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

distribution were further analyzed. These data favored that cytoplasmic detergent-insoluble cellular fraction should be considered in the C-HPP investigations.

MATERIALS AND METHODS

Cell culture

Human lung cancer A549 and H1299 cells, normal human bronchial epithelial (HBE) cells were acquired from American Type Culture Collections (ATCC, Rockville, MD), and human hepatoma Hep3B, MHCC97H and HCCLM3 cells were acquired from Professor Yikun Liu, Fudan University. Cells used for this study were from the same passage number. As we previously described, cells were cultured in the complete medium DMEM (Life Technologies, Carlsbad, CA, USA) supplemented with 10% FBS (Life Technologies), 1% penicillin/streptomycin (pen/strep) and 10 µg/mL ciprofloxacin. 9 Mycoplasma infection was routinely examined by either PCR or ribosome nascent chain complex (RNC) rRNA analysis.

SILAC labeling

We employed the SILAC labeling strategy as previously described, 9 with minor modifications. In brief, cells were cultured for at least 6 passages in SILAC DMEM media (Thermo Fisher Scientific, Shanghai, China), supplemented with 10% dialytic FBS (Life Technologies), 1% pen/strep and various forms of essential amino acids (Cambridge Isotope Laboratories, Andover, MA, USA). Based on the composition of these AAs, media were termed as follows: light media [100 mg/L L-lysine (Lys0) and L-arginine (Arg0)], middle media [100 mg/L 4,4,5,5-D4-L-lysine (Lys4) and 13

C614N4-L-arginine (Arg6)] and heavy media [73mg/L 13C615N2- L-lysine (Lys8) and 42 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 44

mg/L13C615N4-L-arginine (Arg10)].

Cytoplasmic DIP Extraction

The general work flow for the DIP extraction was shown in Fig. 1A. Cells were seeded at ~2×106 cells per T75 flask and cultured for 48 h to reach ~90% confluence, with visual examination on epithelial morphologies via light microscopy. Next, ~8.0×107 cells were harvested from 10 flasks and washed twice with pre-chilled PBS at 4℃. Cells were pelleted by centrifugation (300×g, 4℃, 10 min) and resuspended by a mild Triton-100 based RIPA lysis buffer [1% Triton-100, 20 mM Tris and 150 mM NaCl (pH 7.5)] (Beyotime, Shanghai, China), supplemented with 2% (v:v) protease inhibitor (PI) cocktail (Roche, Shanghai, China), at 3.5×104 cells/µL. After being incubated for 30 min on ice, cell lysates were subjected to debris removal by centrifugation (2000×g, 4℃, 15 min), and the supernatant, which contained the total protein (TP) fraction ,was collected. TP was mixed with a cold Buffer A at 4℃ [10 mM tripotassium phosphate, 1 mM EDTA, 6 µM MG-132 (Selleckchem, Shanghai, China), 2% (v:v) PI cocktail (Roche) and 1 mM PMSF, pH 6.5] at a volume ratio of 3:1, followed by centrifugation at 15000×g at 4℃ for 30 min to obtain the detergent-soluble protein (DSP) fraction from the supernatant. Furthermore, the pellet that contained the cytoplasmic DIPs was resuspended with 500 µL Buffer A under sonication (40 w, 20 s on ice), followed by another 30 min centrifugation (15000×g, 4℃). The pellet was resuspended again by 500 µL of Buffer A under sonication, followed by the addition of 2% NP-40 and centrifugation (15000×g, 4℃ for 30 min). With a repeated NP-40 and Buffer A wash, the cytoplasmic DIPs, specifically the Triton-100 and NP-40 insoluble proteins in cytoplasm, were obtained in pellets. Per further analysis needs, cytoplasmic DIPs were reconstituted by a rehydration buffer [7 M urea, 2 M thiourea, 20 mM DTT, 2

6

ACS Paragon Plus Environment

Page 7 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

% CHAPS and 1 % SDS]. Protein concentrations were determined with a BCA protein assay kit (Thermo). The Image J software version 1.43 was used to quantify the grey scale optical density (OD) of Coomassie blue staining. 28

Western blotting

Western blotting (WB) was performed as we previously described with minor modifications. 29 Specifically, primary antibodies employed were listed as follows: mouse anti-human vimentin mAb (Cell Signaling Technologies, Shanghai, China) (1:1000), mouse anti-human Hsp70 mAb (Biolegend, San Diego, CA, USA) (1:2000), rabbit anti-ribosomal protein L26 pAb (Abcam, Shanghai, China) (1:2000), as well as rabbit anti-human GAPDH pAb (1:4000), rabbit anti-human β-tubulin pAb (1:2000) and mouse anti-human β-actin mAb (1:4000) that were purchased from Epitomics (Hangzhou, Zhejiang, China). The HRP-conjugated goat anti-mouse/rabbit secondary antibodies (1:2000) were purchased from ProteinTech (Wuhan, Hubei, China).

Protein digestion and mass spectrometry analysis

For in-gel digestion, cytoplasmic DIPs were separated by the 1-D SDS-PAGE electrophoresis, followed by the trypsin digestion as we described previously. 9, 30 Regarding in-solution digestion, we employed the Lys-C-based FASP method and StageTip fractionation exactly as described in ref. 31. In both digestion procedures, PIs could be sufficiently removed prior to trypsin or Lys-C addition. In SILAC MS, peptides were analyzed by a LTQ-Orbitrap mass spectrometer (LTQ MS; Thermo). 9, 30 MS parameters: capillary temperature, 200℃; spray voltage, 1.80 kV; scan range, 4000−1800 m/z; resolution, 60K FWHM (full width at half maximum) at 400m/z; MS/MS CID-scans, 10 most intense

7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 44

precursor ions; dynamic exclusion, applied; repeat, 1; and duration, 90s.

We used a Triple TOF 5600 MS (5600 MS; AB SCIEX, Framingham, CA, USA) for analyzing the label-free peptides as we described previously with minor modifications. 13, 20, 32, 33 MS parameters: spray voltage, 2.3 kV; interface heater temperature, of 120℃; scan range, 350-1500 m/z; mass tolerance, 50 mDa; resolution, >30k FWHM; Information dependent acquisition (IDA) MS/MS scans, applied; maximum number of candidate ions per cycle, 40; charge state, 2-4 and >200 cps; dynamic exclusion, applied; co-occurrence, 1; and duration, 20s.

Peak lists were extracted from LTQ MS RAW files by the Quant.exe tool included in the MaxQuant package, followed by database searches with MaxQuant software version 1.1.1.2. 34 The 5600 MS spectra were extracted by AB SCIEX MS Data Converter (version 1.3) and searched with Mascot local server Version 2.5.1 (Matrix Science, London, UK). All database searches were against Uniprot-Swiss HUMAN.fasta (2014.11 Release, 20193 entries). Searching parameters were used as follows: fixed modification, carbamidomethyl (C); variable modifications, oxidation (M), Gln->pyro-Glu (N-terminus), and acetyl (N-terminus); fragment ion mass tolerance, 0.05 Da (LTQ MS data) and 0.60 Da (5600 MS data); parent ion tolerance, 20 ppm (LTQ MS data) and 10.0 ppm (5600 MS data). Both peptide and protein false discovery rates (FDR) were set to 1% for MaxQuant. In Mascot searches, we adjusted the decoy peptide match FDR to 1% considering homology and got the ion score cut-off for each dataset; the resulting DAT search file was imported into Scaffold (version 4.2.1) to control the protein level FDR to 6 residues, unique to an individual protein, but not to a protein group) detected. As a result, a total of 4524 and 4156 DIPs were identified from the lung and liver cells, respectively (Supplementary Table S2, label-free worksheets). If not considering protein groups, we identified 36 PE2-4 proteins/protein groups in lung (Fig. 3A) and 20 in liver cytoplasmic DIPs (Fig. 3B). By applying the unique-to-single-protein rule, Smith-Waterman filtration and manual spectrum inspection, we had 23 missing proteins and 6 PE5 proteins identified in the cytoplasmic DIP fraction from all of the 6 tested cell lines. The gene name, identified cell line, and the protein description of these 29 DIPs are shown in Table 1. Considering that the major update between the neXtProt 2013 and 2014 release, we marked 28 PE2-4 DIPs that had been confirmed as PE1 proteins during this update (Table 1). We further noted that 72 cytoplasmic DIPs were missed by the gastrointestinal system specific database of CCPD2.0 (Fig. 3C).

The above analyses were based peptide level FDR (6). In our data, only 2 PE2-4 such DIPs (2 unique peptides) could pass these critically stringent criteria (Table 1).

In fact, we have published the translating evidence of the lung cells 9 and liver cells, 20, 26, 33 and we have proposed that translating evidence for C-HPP is important to avoid the noise of non-translatable mRNAs in transcriptome and to quantitatively predict the protein abundance. 10

14

ACS Paragon Plus Environment

Page 15 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Accordingly, we performed translating mRNA evidence matching in this study, and found that more than 96% of the DIPs have strong evidence of translation (RNC-mRNA reads ≥ 10) 9 in each of the analyzed cell lines (Fig. 3D).

Furthermore, we plotted the heat-map of the DIP missing proteins according to their RNC-mRNA abundances as indicated by RPKM, as well as the cell lines, from which they were identified (Fig. 3E). According to our previous report and estimation, we used RPKM = 0.01 as a threshold for positive translating evidence 26. There were 17 out of the 23 DIP missing proteins and 3 out of the 6 PE5 proteins had translating evidence, but with low RNC-mRNA abundances; in contrast, among these proteins, ZNF584 (PE2) and SLC25A43 (PE2) and LINC00116 (PE5) showed active translation with very abundant RNC-mRNA (Fig. 3E). According to neXtProt, the evidence of 22 out of the 23 DIP missing proteins (PE2-4) was at transcript level, with A6NHS7 being at predicted level. In addition, we identified 6 PE5 proteins. A total of 25 unique peptides (unique to individual proteins, but not to protein groups) were identified for the DIP missing proteins, and all of these unique peptides were not observed by PeptideAtlas searches (Table 1). Similarly regarding the PE5 proteins, we identified 7 unique peptides, and 4 of them were not included in PeptideAtlas (Table 1). Although these peptides had passed the FDR control (Supplementary Table S1), to verify the proteomics evidence, we manually inspected the MS spectrum quality of each unique peptide by the pLabel software (Supplementary Fig. S3). The criteria of this inspection were listed as follows: 1) ion score was higher than the cut-off determined by the peptide match FDR of 1% considering homology (Supplementary Table S1); 2) most major peaks above 200 Da were labeled; 3) expectation value was less than 0.01; 4) consecutive b and/or y ion series were observed; 5) ion mass errors were largely following linear distribution with low deviation. For example, one of such a highly confident MS 15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 44

spectrum of the ONECUT2 unique peptide was shown in Fig. 3E and Supplementary Fig. S3.

Although the DIP PE2-4 proteins lacked MS evidence, 2 out of 23 of them had supportive antibody evidence according to the HPA database (Table 1). According to GPMDB, 4 out of the 23 PE2-4 proteins had highly confident MS evidence, while 17 of them had low or no MS evidence. There was neither antibody nor GPMDB MS evidence for any of the 6 PE5 proteins (Table 1). Furthermore, we noted that the DIP missing protein ONECUT2 had both HPA antibody and strong (green) GPMDB MS evidence (Table 1).

Cell-specific chromosomal distribution of cytoplasmic DIPs

To understand the chromosomal origin of the cytoplasmic DIPs, we next investigated the chromosomal distribution analyses. In general, the coding genes of these cytoplasmic DIPs distributed unevenly along the chromosomes. Specifically, they significantly accumulated on the local regions of chromosomes 2 and 12 (Ppyro-Glu (N-term Q))

1

trypin

32.93

7

kinase 1-binding protein 1

40

Q5TBK1*

N4BP2L1

Hep3B

PE 2

green

2

NEDD4-binding protein 2-like 1

41

P83105*

HTRA4

Hep3B

PE 2

red

1

LDLAVIK

1

trypin

34

0

Serine protease HTRA4

42

Q9Y5H2*

PCDHGA11

MHCC97H

PE 2

green

2

FALPNAR

1

trypin

33.97

0

Protocadherin gamma-A11

36

ACS Paragon Plus Environment

Page 37 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Dermatan-sulfate epimerase-like 43

Q8IZU8*

DSEL

MHCC97H

PE 2

red

0

LFKIEGGK

1

trypin

34.94

0 protein Glycerophosphodiester

44

Q8WTR4*

GDPD5

MHCC97H

PE 2

green

0

EAVASLR

1

trypin

37.38

0

phosphodiesterase domain-containing protein 5

45

P0C5Z0*

H2AFB2

H1299

PE 2

yellow

0

46

Q16670*

ZSCAN26

H1299

PE 2

green

0

47

48

49

P52741*

A2RUB1*

Q5CZC0*

ZNF134

C17orf104

FSIP2

H1299

H1299

H1299

PE 2

PE 2

PE 2

green

yellow

red

1

2

VLELAGNEAQNSGER

2

trypin

103.61

10

Histone H2A-Bbd type 2/3

AFSLTSDLIR

2

trypin

39.17

1

Zinc finger and SCAN

VFSQNAGLLEHLR

1

trypin

83.42

5

domain-containing protein 26

AYSLSSHLNR

1

trypin

38.1

0

Zinc finger protein 134

LIVDELRELAR

1

trypin

40.99

0

SSLLHASISTALDR

1

trypin

56.36

0

HNFQAKPQSGHYDPEEGPK

1

trypin

30.4

0

LANSLIR

1

trypin

39.55

0

KNEMAELDIMGLALK

8

trypin

MLQLLVLK

1

trypin

37.97

0

NLDYLEKPR

1

trypin

33.18

0

LLQFLNPDPLR

1

trypin

45.38

2

LLACQSLLFLR

1

trypin

58.6

0

VVVESFMQLPYR

1

trypin

62.08

1

AFLCGSGLR

1

trypin

33.38

1

LNSSLIQHLR

1

trypin

90.54

0

ESLLENPVR(Acetyl (Protein N-term))

1

trypin

32.61

0

IQ motif and SEC7

NFEKIRNSLLESR

1

trypin

29.17

0

domain-containing protein 3

2

Uncharacterized protein C17orf104

Fibrous sheath-interacting protein 2 70.56

0

Putative ATP-dependent RNA 50

Q587J7*

TDRD12

H1299

PE 2

green

2

helicase TDRD12

51

52

Q8N141*

Q9UPP2*

ZFP82

IQSEC3

H1299

HBE

PE 2

PE 2

green

green

2

Zinc finger protein 82 homolog

0

53

Q64LD2*

WDR25

A549

PE 2

green

2

ATIQQTLDILFLR

1

trypin

71.8

0

WD repeat-containing protein 25

54

Q9Y6Q3*

ZFP37

A549

PE 2

green

2

AFGHSSSLTYHMR

1

trypin

38.12

2

Zinc finger protein 37 homolog

55

Q9HCL3*

ZFP14

A549

PE 2

green

2

AFTVLQELTQHQR

1

trypin

52.68

1

Zinc finger protein 14 homolog

56

Q9HBI5*

C3orf14

A549

PE 2

green

2

TSLFAQEIR

1

trypin

41.63

0

Uncharacterized protein C3orf14

57

Q8WUB2*

FAM216A

H1299

PE 2

green

2

SSSAEPPAVAGTEGGGGGSAGYSCYQNSK

4

trypin

0

Protein FAM216A

72.53

37

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

a

: The cell lines from which the DIP missing proteins were identified.

b

: Proteomics evidence in GPMDB. The evidence level follows the order of green>yellow>red>black.

Page 38 of 44

c

: Antibody evidence in HPA, "1", "2", "0" represent supportive, uncertain, and unavailable antibody evidence, respectively.

d

: Peptide identification frequencies as searched by PeptideAtlas.

e

: DR