Next Generation Proteomic Pipeline for Chromosome-based

Oct 1, 2017 - neXtProt and Ensemble databases are usually used to provide curated information of human coding genes. However, in order to find these p...
2 downloads 6 Views 2MB Size
Subscriber access provided by LAURENTIAN UNIV

Article

Next Generation Proteomic Pipeline for Chromosome-based Proteomic Research Using NeXtProt and GENCODE databases Heeyoun Hwang, Gun Wook Park, Ji Yeong Park, Hyun Kyoung Lee, Ju Yeon Lee, Ji Eun Jeong, Sung-Kyu Robin Park, John R. Yates, Kyung-Hoon Kwon, Young Mok Park, Hyoung-Joo Lee, Young-Ki Paik, Jin Young Kim, and Jong Shin Yoo J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00223 • Publication Date (Web): 01 Oct 2017 Downloaded from http://pubs.acs.org on October 5, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Next Generation Proteomic Pipeline for Chromosome-based Proteomic Research Using NeXtProt and GENCODE databases

Heeyoun Hwang1, Gun Wook Park1, Ji Yeong Park1,2, Hyun Kyoung Lee1,2, Ju Yeon Lee1, Ji Eun Jeong1,2, Sung-Kyu Robin Park3, John R. Yates III3, KyungHoon Kwon1, Young Mok Park4, Hyoung-Joo Lee5, Young-Ki Paik5, Jin Young Kim1*, and Jong Shin Yoo1,2* 1

Biomedical Omics Group, Korea Basic Science Institute, Ochang, Republic of

Korea; 2

Graduate School of Analytical Science and Technology, Chungnam National

University, Daejeon, Republic of Korea; 3

Department of Chemical Physiology, The Scripps Research Institute, La Jolla,

CA 92037, USA 4

Center for Cognition and Sociality, Institute for Basic Science, Daejeon,

Republic of Korea 5

Yonsei Proteome Research Center and Department of Integrated OMICS for

Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea;

* Corresponding authors: Jong Shin Yoo, Ph.D. and Jin Young Kim, Ph.D 1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 35

Biomedical Omics Group, Korea Basic Science Institute, 162 YeonGuDanji-Ro, Ochang-eup, Cheongju Chungbuk, 363-883, Republic of Korea. Phone: +82-43-240-5145, Fax: +82-240-5159, E-mail: [email protected] and [email protected]

Abstract Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information of human coding genes. However, in order to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins

from

human

hippocampus

dataset

(MSV000081385)

and

ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8), and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, applying the pipeline to human brain related datasets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), 2

ACS Paragon Plus Environment

Page 3 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

and CPNE5-005 (Chr.6) were identified from two or more datasets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5’-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related datasets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) datasets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.

Key words: alternative splicing variant, single amino acid variant, C-HPP, chromosome-centric human proteome project, proteogenomics

Introduction As a part of the Human Proteome Project (HPP), which aims to map all human proteins including missing proteins as well as post translational modifications of human proteins, alternative splicing variants (ASVs), and single amino acid variants (SAAVs), the Chromosome-centric Human Proteome Project (C-HPP) aims to expand our understanding of the human proteome with a focus on expanding the understanding of each and every gene on each chromosome.1 Draft maps of the human proteome reported that ~18,000 proteins, including 3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

missing proteins and novel proteoforms, have been identified from many types of human tissues and cells using liquid chromatography tandem mass spectrometry (LC-MS/MS), but with no false discovery rate (FDR) at the protein level and 4,000 false-positives identified.2-4 Including them, the MS datasets stored at data sharing consortiums (e.g., ProteomeXchange, PRIDE, or MassIVE), have been re-analyzed by PeptideAtlas using the trans-proteomic pipeline.5,6 In a recent report, only 15,290 and 1,088 human proteins were validated using MS data or non-MSdata with HPP guidelines, respectively, while 2,579 proteins remained missing with no or insufficient MS evidence.4,6 To find the missing proteins from MS/MS data, Deutsch et al. proposed sets of tiered databases concatenated from multiple sources, such as Ensembl, RefSeq, UniprotKB, and neXtProt.7 The human genome annotation in the encyclopedia of DNA elements project (GENCODE) database has also been proposed for identification of protein coding genes and novel non-coding protein variants.8,9 The GENCODE database (54,775, V25; http://www.gencodegenes.org/) contains 12,000 more ASVs than the neXtProt database (42,135, v2.4.0; http://www.nextprot.org/), whereas they contain similar numbers of predicted protein-coding genes, 19,950 and 20,159, respectively. In a study that used the GENCODE database, three high-throughput MS datasets (PXD000561, PXD000865, and PRDB000012) 4

ACS Paragon Plus Environment

Page 4 of 35

Page 5 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

were re-analyzed using multiple search engines; only one GENCODE ASV and 15 non-coding protein variants were identified using multiple databases from UniprotKB, GENCODE, pseudogenes, predicted coding-genes and RNAseq data.10 Recently, Weisser et al. re-analyzed C-HPP testis data (PXD002179), and identified seven GENCODE ASVs with two or more unique peptides of seven or more amino acids using a very large database (1.1GB) customized from GENCODE database of pseudogenes, long non-coding RNA, and proteincoding transcripts.11 From human hippocampal tissue, we identified four GENCODE ASVs mapped at 5’ untranslated (5’-UTR) regions of their corresponding genes.12 Our study was not performed with concatenated databases but only used individually customized databases for the identification of non-coding variants, GENCODE ASVs and SAAVs, where a three-frame translated transcript database from GENCODE was used to find GENCODE ASVs. According to the HPP Data Interpretation Guidelines version 2.1, when the missing proteins and GENCODE ASVs during the mapping of the peptides to proteins are identified from MS data, we have to consider alternative mapping for peptide to protein, such as isobaric sequences or SAAVs.6 To find GENCODE ASVs and SAAVs, multiple concatenated databases from RefSeq, UniprotKB, and GENCODE with a controlled false discovery rate (FDR) have 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

been proposed.13 Here we suggest a proteomic pipeline for the confident identification of GENCODE ASVs and SAAVs with a controlled FDR using a database customized and concatenated from GENCODE and neXtProt. We also re-analyzed the 655 RAW files of human hippocampus, cerebral cortex, fetal brain, spinal cord, testis, and spermatozoa, to find the missing proteins, GENCODE ASVs, and SAAVs.

Experimental Section Proteomic datasets for re-analysis The 144 raw data files (MSV000081385) stored in ProteomeXchange (http://www.proteomexchange.org/) and analyzed in our lab were used for reanalysis and confirmation of missing proteins and GENCODE ASVs from human hippocampal tissues (control, epilepsy, and Alzheimer’s disease (AD)), these files are referred to as the Hippocampus_Hwang dataset.12,14,15 A total of 80 LC-MS/MS raw files from cerebral cortex from AD and non-demented cases (PXD000067) were downloaded and referred to as the Cortex_Bai dataset.16 A total of 24 LC-MS/MS raw files from adult cortex (two sets), spinal cord (one set), fetal brain (two sets), and testis (two sets) (PXD000561) were downloaded and named the Cortex_Kim, S.Cord_Kim, F.Brain_Kim, and Testis_Kim, 6

ACS Paragon Plus Environment

Page 6 of 35

Page 7 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

respectively.2 A total of Orbitap Velos 150 LC-MS/MS raw files from tissues of three post-mortem individuals were downloaded and referred to as the Testis_Zhang

dataset

(PXD002179).17,18

However,

one

file

CHPP_SDS_3003.raw was corrupt and could not be converted into an .MS2 file; we analyzed the remaining 149 raw files for our study. Finally, a total of 109 LC-MS/MS raw files from spermatozoa were downloaded and referred to as the Sperm_Vandenbrouk dataset (PXD003947).19

LC-MS/MS conditions for analyses of synthetic peptides The detailed experimental conditions were the same as those of Hwang et al.,12 with minor modifications. To confirm of peptide fragmentation patterns of missing proteins or GENCODE ASVs, we purchased commercial synthetic peptides from Anygen (Korea, Gwangju) (Table S1), and analyzed them on an LTQ-Orbitrap mass spectrometer (Elite or Fusion Lumos version, Thermo Fisher Scientific) equipped with an EASY-nLC system (Thermo Fisher Scientific) using collision-induced dissociation (CID) MS/MS fragmentation. Each synthesized peptide (1 picomole) was injected at a flow rate of 0.3 µL/min into the C18 trap column (180 µm I. D. × 20 mm, 5 µm, 100 Å) and was analyzed at a flow rate of 0.3 µL/min with an analytical column (100 µm I. D. × 200 mm, 3 µm, 100 Å). The LC gradient started at 2% solution B (0.1% formic 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

acid with acetonitrile) for 1 min, and was increased to 8% over 16 min, 35% over 74 min, 95% over 9 min, and then decreased to 2% over another 20 min. The full scan resolution was 60,000 at m/z 400. The 15 most intense ions were sequentially isolated for tandem MS/MS spectrometry; CID scans were acquired in the LTQ with a 10 ms activation time, 2+ and higher charged state, 35% normalized collision energy, and a 2.0 Da isolation window. Previously fragmented ions were excluded for 60 s for all MS/MS scans. The MS1 mass scan range was 400-2000 m/z. Electrospray voltage was maintained at 1.8 kV, and the capillary temperature was set to 250oC. Synthesized peptides of missing proteins (1 picomole) was injected at a flow rate of 4.0 µL/min into the C18 trap column (75 µm I. D. × 20 mm, 4 µm, 100 Å) and was analyzed at a flow rate of 0.3 µL/min with an analytical column (100 µm I. D. × 500 mm, 2 µm, 100 Å). The LC gradient started at 2% solution B (0.1% formic acid with 80% acetonitrile) for 1 min, and was increased to 8% over 16 min, 35% over 74 min, 95% over 9 min, and then decreased to 2% over another 20 min. The full scan resolution was 120,000 at m/z 400. Top speed method (3s) were used for isolation of tandem MS/MS spectrometry; CID scans, Isolation window: 2Da, Detector type: Orbitrap, 35% normalized collision energy, Rs 30000, Maximum injection time: 54ms; ETD scans, Isolation window: 2Da, Detector type: Orbitrap, Rs 30000, reaction time: 100ms, 8

ACS Paragon Plus Environment

Page 8 of 35

Page 9 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Maximum injection time: 250ms. They were acquired in the LTQ with a 10 ms activation time, 2+ and higher charged state, 35% normalized collision energy, and a 2.0 Da isolation window. Previously fragmented ions were excluded for 30 s for all MS/MS scans. The MS1 mass scan range was 400-2500 m/z. Electrospray voltage was maintained at 1.85 kV, and the capillary temperature was set to 275oC.

Construction of customized and concatenated databases For the identification of GENCODE ASVs, sequence database files of proteincoding

transcripts

from

GENCODE

v.25

and

v.26

databases

(http://www.gencodegenes.org/) were in silico three-frame translated using an in-house program (coded by Python script 2.7), where the longest protein sequence from methionine to the stop codon was selected for the entry of proteins. The Peff files released on 11th January and 8th August 2016 from neXtProt (http://www.nextprot.org) were used to construct the database. Using the SAAV information in the title of each entry, in silico tryptic peptides from each protein entry were constructed and integrated into each SAAV entry using an in-house program (coded using Python Script 2.7) for finding SAAVs. Then we prepared tiered databases starting with neXtProt general protein sequence. As Tier 1, only the neXtProt (2016 Jan version) general protein sequences were 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

used. A three-frame translated GENCODE database was concatenated to it as Tier 2, and the SAAV database was concatenated to Tier 2 as Tier 3. The number of variants that could be translated to SAAV, was approximately doubled, from 2.5 million to 5 million, 11th January, to 8th August 2016 (http://www.nextprot.org/). Therefore, neXtProt (2016 Aug version) and threeframe translated GENCODE databases including the SAAV database were concatenated for Tier 3.5. For following the HPP Guidelines, we generated and used the latest version of concatenated database from neXtProt (2017 Jan version) and GENCODE v.26 (2017 Mar). The SAAV database from neXtProt (2017 Jan version) was also generated using the in-house program as previously described, and concatenated on the latest database.

Softwares used for the proteomic search pipeline Total 655 raw LC/MS/MS files were converted into *.MS2 files using RawConverter v. 1.1.0.18 (The Scripps Research Institute, La Jolla, CA, USA) which can be downloaded for free at http://fields.scripps.edu/rawconv. This software has been reported to have state-of-the-art performance, finding more accurate mono-isotope m/z values and charge states than other converter programs such as RawExtract, pXtract, and ProteoWizard.20

10

ACS Paragon Plus Environment

Page 10 of 35

Page 11 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Converted MS2 files were used to query the sequence database using ProLuCID in the Integrated Proteomic Pipeline (IP2) systems v.4.1.2.4 (The Scripps Research Institute) with cloud computing system (Microsoft Azure).21 ProLuCID uses a modified cross-correlation score (Xcorr) with higher sensitivity than that of SEQUEST. Search parameters were as follows: precursor ion tolerance = 50 ppm, fragment ion tolerance = 0.8 Da, missed cleavage = 1 modification = carbamidomethyl cysteine (fixed), methionine oxidation (variable), and enzyme (trypsin). For peptide validation, FDRs 0.1% and 1% at the peptide level were used by DTASelect in the IP2. We used ProteinInferencer (The Scripps Research Institute;

http://fields.scripps.edu/download.php) for

mapping peptides to protein entries with FDR 1.0% at the protein level.22 Calculated FDR values along with total number of expected true positives and false positives at PSM, peptide, and protein level from proteomic datasets were shown in Table S2.

Results and Discussion Proteomic search pipeline and re-analysis of hippocampus dataset For the identification of missing proteins and GENCODE ASVs, we constructed a pipeline using customized and concatenated databases from neXtProt and 11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

GENCODE with stringent FDR conditions and filtering condition of identified peptides with nine or more amino acids (Figure 1). Raw files of C-HPP studies were downloaded from the ProteomeXChange consortium and re-analyzed using the IP2 system with RawConverter and ProteinInferencer. Identified peptides were mapped onto the custom database, and decoys, contaminants and SAAVs were filtered out. Proteins were counted with two or more unique peptides which were not mapped onto the neXtProt database were counted as GENCODE ASVs. Missing proteins were counted with two or more unique peptides, after their Protein Evidence (PE) levels were verified from the latest version of the neXtProt database (release 2017 Jan, http://www.nextprot.org/). Using this pipeline with customized and concatenated databases, we re-analyzed Hippocampus_Hwang dataset (144 raw files), from which we had previously reported missing proteins, GENCODE ASVs, and SAAVs using each individual database (Table 1).12,15 First, we tested the tiered databases for the identification of GENCODE ASVs and SAAVs; the memory sizes of the databases including decoy sequences were 57.5 MB (Tier 1), 157.5 MB (Tier 2), 380.6 MB (Tier 3), and 602.0 MB (Tier 3.5), respectively. As a result, 3,445 (27,627 peptides), 3,636 (30,180 peptides), 4,007 (36,069 peptides), and 4,153 (38,682 peptides) proteins were identified with 0.1% FDR at peptide level and 1% FDR at protein level against tier 1, 2, 3, and 3.5 database, respectively. More proteins were 12

ACS Paragon Plus Environment

Page 12 of 35

Page 13 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

identified by Tiers 2, 3, and 3.5 that Tier 1, that is 189 (2,553 peptides), 559 (8,442 peptides) and 710 (11,055 peptides) proteins more, respectively. In addition, using the Tier 3 and Tier 3.5 databases, four missing proteins, Q16478 (GRIK5), P22794 (EVI2A), Q96GM1 (PLPPR2), and B4DS77 (SHISA9), were identified, where Q16478 and P22794 had been already reported from this dataset.15 However, Q16478 still remains as a missing protein from the latest version of the neXtProt database, and its identified peptides and their corresponding synthetic peptides were already shown in Park et al. 15 Using GENCODE 3-frame translated database (Tier 2, 3, and 3.5), we detected several GENCODE ASVs. We identified and manually validated five times more GENCODE ASVs and SAAVs in Tier 3.5 than previous our report (Table S3).12 In particular, a GENCODE ASV of SYT7-007 was identified in Tier 2 and 3, but not in Tier 3.5, because four more SYT7 isoforms including the identified peptides of SYT7-007 were newly joined in 2016 Sep neXtProt database (Tier 3.5) (Table S3). Second, we tested and manually validated the filtering conditions of estimated 0.1% or 1% FDRs at peptide level for more confident identification before the cut-off with a protein level FDR of 1% (Table 1). As a result, more missing proteins and GENCODE ASVs were detected with the 1% FDR at the peptide level condition than 0.1% FDR, because of more number of peptides filtered in. 13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

However, after manual validation and comparison of peptide fragmentation pattern to their corresponding synthetic peptides (Figure S1), false positive missing proteins and GENCODE ASVs were removed, and the number of true positives did not differ from number of identified proteins with the 0.1% FDR at the peptide level. If we use more sized databases, XCorr scores of true and false positives will be increased. In other words, true matches with expanded database and random matches with expanded decoy database will be simultaneously increased. In typical case, because, true matches have higher score than random matches, the identified peptides were increased under lower FDR condition (0.1 %) of filtering with higher score using more sized database. In the other way, the identified peptides were decreased under higher FDR condition (1 %) of filtering with lower score because of increased random matches with expanded decoy database. Therefore, it was not surprising that false discoveries disappeared under more stringent filtering condition, 0.1% FDR at peptide level and 1% at protein level. The HPP guideline (ver. 2.1.0) does not suggest an estimated a peptide level FDR 0.1%, but suggests a protein level FDR 1%. To analyze large scale datasets, we applied the stringent filtering condition of a 0.1% FDR rather than 1% FDR at the peptide level, and 1% FDR at the protein level. As Deutsch et al. mentioned, we should pay closer attention to the estimation of FDR values, because the FDR estimation is based on 14

ACS Paragon Plus Environment

Page 14 of 35

Page 15 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

imperfect assumptions.6

Reanalysis of cortex, spinal cord, fetal brain, testis, and spermatozoa datasets From other C-HPP datasets related to the human brain or male representative tissues,

that

Testis_Kim,

is

Cortex_Bai,

Testis_Zhang,

and

Cortex_Kim,

S.Cord_Kim,

Sperm_Vandenbrouk,

F.Brain_Kim,

missing

proteins,

GENCODE ASVs, and SAAVs were identified with 0.1% FDR at the peptide level and 1% FDR at the protein level, using the latest version of database from neXtProt (2017 Jan) and GENCODE v.26 (2017 Mar). A total of two missing proteins, activity-regulated cytoskeleton-associated protein (Q7LC44, ARC, Chr 8), and glutamate receptor ionotropic, kainite 5 (Q16478, GRIK5, Chr 19), were identified with two or more unique peptides. The identified peptides from Q16478 and Q7LC44, were stored in PeptideAtlas (2017-01-human). These peptides were also validated with their corresponding synthetic peptide (Figure S2). Including DPYSL2-005, which was reported in our previous study, we also wanted to know how many GENCODE ASVs were commonly detected in neuronal tissues.12 As shown in Figure 2 and 3, seven GENCODE ASVs, ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM115

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 35

013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6), were identified from 2 or more datasets. MPRIP-003, NCAM1-013 and EPB41L1-017 which were mapped onto the novel exon inserted. Because a very large novel exon from NCAM1-013 was sited at position 113,260,181 – 113,275,740 of the 11th chromosome, most novel peptides were identified into the novel exon inserted (Figure 4 A, B). For validation of GENCODE ASVs, we analyzed the synthetic peptides identified from GENCODE ASVs and compared

their

MS/MS

pattern

(Figure

S1).

The

peptide

of

SEAASVSTTNPSQGEDFK from NCAM1-013, which mapped onto a novel exon in chromosome 11, was compared and validated with its corresponding synthetic peptide (Figure 4 C, D). The identified peptides of DPYSL2-005 were mapped onto 5’-UTR region, and altered translation start might exist in this gene.12 The identified GENCODE ASVs are known to be related to remodeling of cytoskeleton, functioning in axon guidance (DPYSL2)23, regulation of dendritic spine (AGAP1 and ACTN4)24,25, neuronal cell-cell adhesion (NCAM1)26, stress fibers extension in neuronal cells (MPRIP)27, and regulation of stability to neuronal membrane (EPB41L1)28. From two sets of testis tissues and one set of spermatozoa, four GENCODE ASVs, such as RAB11FIP5-008, RP13-347D8.7-001, RP11-666A8.13-001, and PRDX4-002, were commonly identified with two or more unique peptides (Figure 2). 109 out of total 195 16

ACS Paragon Plus Environment

Page 17 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

GENCODE ASVs identified with two more peptides including at least one unique peptide, were identified with sample specificity (Figure 3). Functional study of GENCODE ASVs identified using our pipeline remains a challenge. Either somatic or germinal mutation can change the genomic sequence, and a non-synonymous single nucleotide change can be expressed in an SAAV. The neXtProt version (2016 SEP) of the SAAV database includes approximately 5 million SAAVs in human coding genes. Using the latest version of database, we identified about 2 ~ 3 % of total peptides in each dataset, while ~50% of them were also identified with their corresponding peptides mapped onto the reference database (Table 2 and Table S5). It indicated that our pipeline is useful for finding missing proteins as well as GENCODE ASVs while considering all known SAAVs.

Conclusion Here we suggest a pipeline for the confident identification of GENCODE ASVs and SAAVs with a stringent FDR, 0.1% at the peptide level and 1% at the protein level, using a database customized and concatenated from GENCODE and neXtProt. A total of 67 GENCODE ASVs were identified from neuronal or testis tissues, and their information may be considered for uploading to the 17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

neXtProt database as new protein coding genes or ASVs. Especially, neural cell adhesion molecule 1 (NCAM1-013) which is coded in chromosome 11, was identified only in brain tissues including hippocampus, cortex, spinal cord and fetal brain. Seven out of total 72 GENCODE ASVs identified from testis and spermatozoa, were mapped in chromosome X. In addition, we found total two missing proteins from proteomXchange datasets of hippocampus and cortex, from the latest version (release 2017 Jan) of the neXtProt database. We will further study their biological functions of GENCODE ASVs and missing proteins using animal models, such as Caenorhabditis elegans or mouse models.

Supporting Information Available: This material is available free of charge via the Internet at http://pubs.acs.org. Figure S1. MS/MS spectra of identified peptides and their corresponding synthetic peptides from GENCODE alternative splicing variants. Figure S2. MS/MS spectra of identified peptides and their corresponding synthetic peptides from missing proteins. Table S1. List of synthetic peptides for validation of GENCODE alternative splicing variants. Table S2. List of calculated FDRs of PSM, peptide, and protein level from proteomic datasets. Table S3. List of identified peptides of GENCODE alternative splicing variants from human hippocampal tissue. Table S4. List of identified peptides of GENCODE alternative splicing variants from cortex, spinal cord, fetal brain, testis, and spermatozoa tissues. Table S5. List of identified peptides of missing proteins from hippocampus, 18

ACS Paragon Plus Environment

Page 18 of 35

Page 19 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

cortex, testis, and spermatozoa tissues Table S6. SAAVs and their corresponding reference peptides from hippocampus, cortex, spinal cord, fetal brain, testis, and spermatozoa tissues.

ACKNOWLEDGMENTS This research was supported by the National Research Council of Science and Technology (CAP-15-03-KRIBB); by the Korea Health Technology R&D Project, through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (HI13C2098); and

by

the

Bio-Synergy

Research

Project

(grant

number:

NRF-

2014M3A9C4066461) of the Ministry of Science, ICT and Future Planning through the National Research Foundation.

Abbreviations ASV, alternative splicing variant SAAV, single amino acid variant FDR, false discovery rate PSM, peptide spectrum match nextPP, next generation proteomic pipeline C-HPP, chromosome-centric human proteome project NCAM1, neural cell adhesion molecule 1 DPYSL2, dihydropyrimidinase-related protein 2 19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MPRIP, myosin phosphatase Rho-interacting protein SYN2, synapsin-2

References (1) Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W.S. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012, 30 (3), 221-223. (2) Kim, M. S.; Pinto, S. M.; Getnet, D.; Nirujogi, R. S.; Manda, S. S.; Chaerkady, R.; Madugundu, A. K.; Kelkar, D. S.; Isserlin, R.; Jain, S.; Thomas, J. K.; Muthusamy, B.; Leal-Rojas, P.; Kumar, P.; Sahasrabuddhe, N. A.; Balakrishnan, L.; Advani, J.; George, B.; Renuse, S.; Selvan, L. D.; Patil, A. H.; Nanjappa, V.; Radhakrishnan, A.; Prasad, S.; Subbannayya, T.; Raju, R.; Kumar, M.; Sreenivasamurthy, S. K.; Marimuthu, A.; Sathe, G. J.; Chavan, S.; Datta, K. K.; Subbannayya, Y.; Sahu, A.; Yelamanchi, S. D.; Jayaram, S.; Rajagopalan, P.; Sharma, J.; Murthy, K. R.; Syed, N.; Goel, R.; Khan, A. A.; Ahmad, S.; Dey, G.; Mudgal, K.; Chatterjee, A.; Huang, T. C.; Zhong, J.; Wu, X.; Shaw, P. G.; Freed, D.; Zahari, M. S.; Mukherjee, K. K.; Shankar, S.; Mahadevan, A.; Lam, H.; Mitchell, C. J.; Shankar, S. K.; Satishchandra, P.; Schroeder, J. T.; Sirdeshmukh, R.; Maitra, A.; Leach, S. D.; Drake, C. G.; Halushka, M. K.; Prasad, T. S.; Hruban, R. H.; Kerr, C. L.; Bader, G. D.; Iacobuzio-Donahue, C. A.; Gowda, H.; Pandey, A. A draft map of the human proteome. Nature 2014, 509 (7502), 575581. 20

ACS Paragon Plus Environment

Page 20 of 35

Page 21 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(3) Wilhelm, M.; Lieberenz, M.; Marx, H.;

Schlegl, J.;

Hahne, H.;

Savitski, M. M.;

Mathieson, T.;

Moghaddas Gholami, A.;

Ziegler, E.; Butzmann, L.;

Lemeer, S.;

Wenschuh, H.;

Mollenhauer, M.;

Bantscheff, M.;

Gerstmair, A.;

Schnatbaum, K.;

Slotta-Huspenina, J.;

Gessulat, S.; Reimer, U.; Boese, J. H.;

Faerber, F.; Kuster B. Mass-spectrometry-

based draft of the human proteome. Nature 2014, 509 (7502), 582-587. (4) Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y. K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandernbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S. Human proteome project mass spectrometry data interpretation guidelines 2.1. J. Proteome Res. 2016, 15, 3961-3970. (5) Deutsch, E. W.; Sun, Z.; Campbell, D.; Kusebauch, U.; Chu, C. S.; Mendoza, L.; Shteynberg, D.; Omenn, G. S.; Moritz, R. L. State of the human proteome in 2014/2015 as viewed through peptideatlas: enhancing accuracy and coverage through the AtlasProphet. J. Proteome Res. 2015, 14, 3461-3473. (6) Omenn, G. S.; Lane, L.; Lundburg, E. K.; Overall, C. M.; Deutsch, E. W. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the human proteome project. J. Proteome Res. 2017, DOI: 10.1021/acs/jproteome.7b00375 on this issue. (7) Deutsch, E. W.; Sun, Z.; Campbell, D.; Binz, P. A.; Farrah, T.; Shteynberg, D.; Mendoza, L.; Omenn, G. S.; Moritz, R. L. Tiered human integrated sequence search databases for shotgun proteomics. J. Proteome Res. 2016, 15, 4091-4100. (8) Paik, Y. K.; Hancock, W. S. Uniting ENCODE with genome-wide proteomics. Nat. Biotechnol. 2012 30 (11), 1065-1067. 21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 35

(9) Harrow, J.; Frankish, A.;

Gonzalez, J. M.; Tapanari, E.;

Kokocinski, F.;

Barrell, D.; Zadissa, A.;

I.;

Aken, B. L.;

Bignell, A.;

Rajan, J.;

Boychenko, V.;

Despacio-Reyes, G.;

Lin, M.; Howald, C.; Balasubramanian, S.; van, Baren, J.; Reymond, A.;

Hunt, T.;

Tanzer, A.; Pei, B.;

Brent, M.; Gerstein, M.;

Mukherjee, G.;

Steward, C.;

Derrien, T.;

Tress, M.;

Searle, S.; Barnes,

Kay, M.;

Saunders, G.;

Diekhans, M.;

Chrast, J.; Walters, N.;

Rodriguez, J. M.;

Haussler, D.;

Harte, R.;

Kellis, M.;

Ezkurdia, I.; Valencia, A.;

Guigó, R.; Hubbard, T. J. GENCODE: the

reference human genome annotation for The ENCODE Project. Genome Res. 2012, 22 (9), 1760-1774. (10) Wright, J.; Mudge, J.; Weisser, H.; Barzine, M. P.; Gonzalez, J. M.; Brazma, A.; Choudhary, J. S.; Harrow, J. Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow. Nat. Comm. 2015, 7, 11778. (11) Weisser, H.; Wright, J. C.; Mudge, J. M.; Gutenbrunner, P.; Choudhary, J. S. Flexible data analysis pipeline for high-confidence proteogenomics. J. Proteome Res. 2016, 15, 4686-4695. (12) Hwang, H.; Park, G. W.; Kim, K. H.; Lee, J. Y.; Lee, H. K.; Ji, E. S.; Park, S. K.; Xu, T.; Yates, J. R., 3rd; Kwon, K. H.; Park, Y. M.; Lee, H. J.; Paik, Y. K.; Kim, J. Y.; Yoo, J. S., Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases. J. Proteome Res. 2015, 14 (12), 50285037. (13)

Nesvizhskii,

A.

I.

Proteogenomics:

concepts,

applications

computational strategies. Nat. Method 2014, 11 (11), 1114-1125. 22

ACS Paragon Plus Environment

and

Page 23 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(14) Kwon, K. H.; Kim, J. Y.; Kim, S. Y.; Min, H. K.; Lee, H. J.; Ji, I. J.; Kang, T.; Park, G. W.; An, H. J.; Lee, B.; Ravid, R.; Ferrer, I.; Chung, C. K.; Paik, Y. K.; Hancock, W. S.; Park, Y. M.; Yoo, J. S., Chromosome 11-centric human proteome analysis of human brain hippocampus tissue. J. Proteome Res. 2013, 12 (11), 97-105. (15) Park, G. W.; Hwang, H.; Kim, K. H.; Lee, J. Y.; Lee, H. K.; Park, J. Y.; Ji, E. S.; Park, S. K.; Yates, J. R., 3rd; Kwon, K. H.; Park, Y. M.; Lee, H. J.; Paik, Y. K.; Kim, J. Y.; Yoo, J. S. Integrated proteomic pipeline using multiple search engines for a proteogenomic study with a controlled protein false discovery rate. J. Proteome Res. 2016, 15, 4082-4090. (16) Bai, B.; Hales, C. M.; Chen, P. C.; Gozal, Y.; Dammer, E. B.; Fritz, J. J.; Wang, X.; Xia, Q.; Duong, D. M.; Street, C.; Cantero, G.; Cheng, D.; Jones, D. R.; Wu, Z.; Li, Y.; Diner, I.; Heilman, C. J.; Rees, H. D.; Wu, H.; Lin, L.; Szulwach, K. E.; Gearing, M.; Mufson, E. J.; Bennett, D. A.; Montine, T. J.; Seyfried, N. T.; Wingo, T. S.; Sun, Y. E.; Jin, P.; Hanfelt, J.; Willcock, D. M.; Levey, A.; Lah, J. J.; Peng, J. U1 small nuclear ribonucleoprotein complex and RNA splicing alterations in Alzheimer’s disease. Proc. Natl. Acad. Sci. USA. 2013, 110 (41), 16562-16567. (17) Zhang, Y.; Li, Q.; Wu, F.; Zhou, R.; Qi, Y.; Su, N.; Chen, L.; Xu, S.; Jiang, T.; Zhang, C.; Cheng, G.; Chen, X.; Kong, D.; Wang, Y.; Zhang, T.; Zi, J.; Wei, W.; Gao, Y.; Zhen, B.; Xiong, Z.; Wu, S.; Yang, P.; Wang, Q.; Wen, B.; He, F.; Xu, P.; Liu, S. Tissue-based proteogenomics reveals that human testis endows plentiful messing proteins. J. Proteome Res. 2015, 14, 3583-3594. (18) Wei, W.; Luo, W.; Wu, F.; Peng, X.; Zhang, Y.; Zhang, M.; Zhao, Y.; Su, N.; Qi, Y. Z.; Chen, L.; Zhang, Y.; Wen, B.; He, F.; Xu, P. Deep coverage proteomics identifies more low-abundance missing proteins in human testis 23

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tissue with Q-Exactive HF mass spectrometer. J. Proteome Res. 2016, 15, 39883997. (19) Vandenbrouk, Y.; Lane, L.; Carapito, C.; Duek, P.; Rondel, K.; Bruley, C.; Macron, C.; de Peredo, A. G.; Coute, Y.; Chaoui, K.; Com, E.; Gateau, A.; Hesse, A.; Marcellin, M.; Mear, L.; Moutin-Barbosa, E.; Robin, T.; BurletSchiltz, O.; Cianferani, S.; Ferro, M.; Freour, T.; Lindskog, C.; Garin, J.; Pineau, C. Looking for missing proteins in the proteome of human spermatozoa: an update. J. Proteome Res. 2016, 15, 3998-4019. (20) He, L.; Diedrich, J.; Chu, Y. Y.; Yates, J. R., 3rd. Extracting accurate precursor information for tandem mass spectra by RawConverter. Anal. Chem. 2015, 87, 11361-11367. (21) Xu, T.; Park, S. K.; Venable, J. D.; Wohlschlegel, J. A.; Diedrich, J. K.; Cociorva, D.; Lu, B.; Liao, L.; Hewel, J.; Han, X.; Wong, C. C.; Fonslow, B.; Delahunty, C.; Gao, Y.; Shah, H.; Yates, J. R. 3rd. ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J. Proteomics 2015, 129 (3), 16-24. (22) Zhang, Y.; Xu, T.; Shan, B.; Hart, J.; Aslanian, A.; Han, X.; Zong, N.; Li, H.; Choi, H.; Wang, D.; Acharya, L.; Du, L.; Vogt, P. K.; Ping, P.; Yates, J. R., 3rd. ProteinInferencer: confident protein identification and multiple experiment comparison for large scale proteomics projects. J. Proteomics 2015, 129 (3), 2532. (23) Makihara, H.; Nakai, S.; Ohkubo, W.; Yamashita, N.; Nakamura, F.; Kiyonari. H.; Shioi, G.; Jitsuki-Takahashi, A.; Nakamura, H.; Tanaka, F.; Akase, T.; Kolattukudy, P.; Goshima, Y. CRMP1 and CRMP2 have synergistic but distinct roles in dendritic development. Genes Cells 2016, 21 (9), 994-1005. 24

ACS Paragon Plus Environment

Page 24 of 35

Page 25 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(24) Arnold, M.; Cross, R.; Singleton, K. S.; Zlatic, S.; Chapleau, C.; Mullin, A. P.; Rolle, I.; Moore, C. C.; Theibert, A.; Pozzo-Miller, L.; Faundez, V.; Larimore, J. The Endosome Localized Arf-GAP AGAP1 Modulates Dendritic Spine Morphology Downstream of the Neurodevelopmental Disorder Factor Dysbindin. Front. Cell Neurosci. 2017, 22 (10) 218. (25) Kalinowska, M.; Chavez, A. E.; Lutzu, S.; Castillo, P. E.; Bukauskas, F. F.; Francesconi, A. Actinin-4 governs dendritic spine dynamics and promotes their remodeling by metabotropic glutamate receptors. J. Biol. Chem. 2015, 290 (26), 15909-15920. (26) Kasper, C.; Rasmussen, H.; Kastrup, S. J.; Ikemizu, S.; Yvonne Jones, E.; Berezin, V.; Bock, E.; Larsen, I. K. Structural basis of cell-cell adhesion by NCAM. Nat. Struct. Mol. Biol. 2000, 7, 389-393. (27) Tojkander, S.; Gateva, G.; Lappalanien, P. Actin stress fibers – assembly, dynamics and biological roles. J. Cell Sci. 2012, 125, 1855-1864. (28) Walensky, L. D.; Blackshaw, S.; Liao, D.; Watkins, C. C.; Weier, H. G.; Parra, M.; Huganir, R. L.; Conboy, J. G.; Mohandas, N.; Snyder, S. H. A novel neuron-enriched homolog of the erythrocyte membrane cytoskeletal protein 4.1. J. Neurosci. 1999, 19 (5), 6457-6467.

25

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 35

Figure Legends

Figure 1. Next-generation proteomic pipeline (nextPP) for finding novel protein variants. A database customized and concatenated with neXtProt, 3-frame translated GENCODE and neXtProt SAAV databases, was used for proteomic analysis of C-HPP studies, such as human hippocampus, cerebral cortex, spinal cord, fetal brain, testis, and spermatozoa. Figure 2. Venn diagrams of identified GENCODE ASVs from C-HPP studies, (A) neuronal tissues, such as hippocampus (Hippocampus_Hwang), cerebral cortex (Cortex_Bai and Cortex_Kim), spinal cord (S.Cord_Kim), and fetal brain (F.Brain_Kim) (B) male reproductive tissues, such as testis (Testis_Kim and Testis_Zhang), and spermatozoa (Sperm_Vandenbrouk). Figure 3. A list of GENCODE ASVs from C-HPP studies, such as Hippocampus_Hwang, Cortex_Bai, Cortex_Kim, S.Cord_Kim, F.Brain_Kim, Testis_Kim, Testis_Zhang, Sperm_Vandenbrouk. (A) 67 identified GENCODE ASVs with two or more unique peptides and (B) 127 with two peptides including at least one unique peptide. The numbers indicate identified peptides from each GENCODE ASV. Figure 4. A novel alternative splicing variant of NCAM1-013 (coded in Chr. 11) included a novel exon. (A) A screenshot of chromosomal region coded with NCAM1 in Ensemble. Blue circle indicates a novel exon inserted in NCAM1013. (B) Protein sequence of NCAM1-013. Yellow region indicates a novel exon of NCAM1-013. Red words indicate the sequence of identified peptides of NCAM1-013. (C) MS/MS spectra of SEAASVSTTNPSQGEDFK mapped into a

novel

exon

of

NCAM1-013.

(D)

26

ACS Paragon Plus Environment

MS/MS

spectra

of

Page 27 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

SEAASVSTTNPSQGEDFK that was synthesized for validation of the identification of NCAM1-013 with comparison of their peptide fragmentation pattern.

27

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 1. Comparison of identified protein variants in human hippocampal tissue dataset using tiered and concatenated databases.

28

ACS Paragon Plus Environment

Page 28 of 35

Page 29 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 2. Identification of missing proteins and protein variants from C-HPP studies using next-generation proteomic pipeline.

29

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1.

30

ACS Paragon Plus Environment

Page 30 of 35

Page 31 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2

31

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. A GENCODE ASV ENST00000586538.1|frame1|ACTN4-012 ENST00000521913.6|frame2|DPYSL2-005 ENST00000225964.9|frame3|COL1A1-001 ENST00000313485.10|frame3|MPRIP-003 ENST00000639674.1|frame1|MKKS-004 ENST00000528158.2|frame2|NCAM1-013 ENST00000542833.1|frame1|CADPS-017 ENST00000496384.6|frame3|BRAF-003 ENST00000440400.1|frame1|ACTN4-004 ENST00000636016.1|frame3|EPB41L1-017 ENST00000635765.1|frame3|HSPA12A-005 ENST00000638088.1|frame3|DPY19L1-009 ENST00000486777.6|frame1|RAB11FIP5-008 ENST00000592741.5|frame1|PRKCSH-003 ENST00000392948.3|frame1|HIST2H3PS2-001 ENST00000547588.5|frame1|AGAP2-002 ENST00000634267.2|frame1|DNM1-018 ENST00000570382.1|frame1|ACTG1-014 ENST00000371435.6|frame2|BCAS1-003 ENST00000440114.1|frame3|HSPD1-007 ENST00000391860.6|frame3|PGBD5-001 ENST00000612501.1|frame1|PPP1R9B-001 ENST00000530543.5|frame1|NCAM1-020 ENST00000630866.1|frame1|SPTAN1-019 ENST00000545470.5|frame1|STK38L-012 ENST00000524432.5|frame1|MACF1-030 ENST00000409514.5|frame1|EIF4E2-003 ENST00000617231.4|frame1|RAVER1-007 ENST00000620151.2|frame1|RP13-347D8.7-001 ENST00000379349.5|frame1|PRDX4-002 ENST00000640006.1|frame3|RP11-666A8.13-001 ENST00000458472.1|frame1|CXORF51A-001 ENST00000636696.1|frame1|RP11-104H15.12-001 ENST00000457775.2|frame1|XX-FW81066F1.2-001 ENST00000640271.1|frame1|RP11-173A6.4-001 ENST00000583666.2|frame2|CTD-2200P10.1-001 ENST00000565154.5|frame1|PKM-006 ENST00000433834.5|frame2|CAPZB-008 ENST00000354807.7|frame3|BPNT1-008 ENST00000351606.10|frame3|GSG1-010 ENST00000409538.5|frame2|AGAP1-004 ENST00000633136.1|frame1|CPNE5-005 ENST00000445941.5|frame1|MAP2-016 ENST00000511884.6|frame3|PCDH7-003 ENST00000640413.1|frame1|AP3S1-201 ENST00000620033.4|frame2|CHL1-201 ENST00000409359.6|frame3|ARHGEF4-005 ENST00000379892.4|frame3|DCLK1-001 ENST00000600806.5|frame3|HNRNPM-004 ENST00000578049.3|frame3|SEC22B-001 ENST00000354775.4|frame3|ALDH9A1-001 ENST00000633942.1|frame3|PLIN4-001 ENST00000564304.5|frame1|ATXN2L-008 ENST00000433835.3|frame3|AP000350.10-005 ENST00000529760.5|frame2|PICALM-018 ENST00000254636.9|frame1|IMMT-005 ENST00000367063.6|frame1|CD55-003 ENST00000464099.5|frame1|METTL10-003 ENST00000417339.2|frame1|CENPVL3-001 ENST00000445091.2|frame3|U82695.5-002 ENST00000530374.5|frame1|CYHR1-002 ENST00000632372.2|frame1|TEX13D-001 ENST00000360091.3|frame2|EWSR1-019 ENST00000522275.5|frame3|PCM1-016 ENST00000622764.2|frame1|SAP25-002 ENST00000519253.5|frame3|PCM1-003 ENST00000485816.5|frame2|ITIH4-021 ENST00000546815.5|frame3|SART3-006 ENST00000574426.6|frame3|CLUH-013 ENST00000448951.5|frame1|LANCL1-008 ENST00000447751.5|frame1|GDI2-003 ENST00000546652.6|frame3|PCBP2-025 ENST00000423556.6|frame2|ELMSAN1-003 ENST00000451249.6|frame1|GOSR1-008 ENST00000481862.1|frame1|PSIP1-005 ENST00000610935.1|frame3|ZNF787-001 ENST00000495027.5|frame2|DNAH12-003 ENST00000376052.4|frame2|TPP2-002 ENST00000285124.12|frame2|PPP4R1-003 ENST00000423632.2|frame3|RP11-402G3.5-001 ENST00000432134.6|frame1|TMEM191C-007 ENST00000450644.1|frame3|AP1S2-004 ENST00000546891.5|frame3|CS-020 ENST00000564529.1|frame2|SCAMP2-007 ENST00000509403.6|frame1|CTC-441N14.4-002

Chr

start

end

19 8 17 17 20 11 3 7 19 20 10 7 2 19 1 12 9 17 20 2 1 17 11 9 12 1 2 19 X X 17 X 17 X 1 17 15 1 1 12 2 6 2 4 5 3 2 13 19 1 1 19 16 22 11 2 1 10 X X 8 X 22 8 7 8 3 12 17 2 10 12 14 17 9 19 3 13 18 9 22 X 12 15 5

38708142 26514022 50183289 17158497 10420546 113260181 62533022 140719327 38724157 36197956 116671192 34928876 73073382 11435635 143905556 57726552 128203471 81511519 53943541 197494640 230314482 50133735 113232262 128566741 27244233 39105411 232550683 10316212 119073226 23664262 76563710 146814106 7428903 149929645 244730375 58660424 72199590 19339359 220057731 13084825 235669604 36756287 209579609 30722354 115842038 319683 130836916 35849923 8444977 120150898 165662216 4502192 28823095 23862188 85958818 86143932 207321532 124760146 51618055 153396906 144449931 124332660 29292159 17972583 100572228 17923028 52813270 108526210 2691609 210440699 5765849 53454810 73718843 30477377 15470990 56087366 57293699 102597023 9547713 114656304 21467319 15831443 56273746 74850512 122160176

38717316 26658173 50201632 17185560 10420737 113275740 62585246 140783157 38731583 36232799 116849741 35037525 73112783 11450968 143905966 57738246 128255024 81512799 54070545 197499822 230426371 50150630 113260198 128633335 27315106 39323008 232583642 10333546 119076373 23679287 76570544 146814726 7432762 149931287 244730962 58692018 72231113 19484664 220089772 13103645 236126554 36839329 209693625 31146805 115906856 409412 131046189 36131306 8489110 120176515 165698863 4518465 28836919 23895223 86001158 86195002 207340766 124791868 51618918 153401420 144462861 124336862 29300525 18027747 100573820 18027803 52830673 108561159 2711795 210477652 5785922 53467285 73740255 30522805 15472750 56121280 57544344 102679958 9614505 114657812 21469925 15852502 56286008 74873358 122183949

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Hippocampus Cortex Cortex S.Cord F.Brain Testis Testis_ Sperm_Vand _Hwang _Bai _Kim _Kim _Kim _Kim Zhang enbrouk 2 1 2 1 2 1 2 2 9 5 10 3 9 1 4 0 1 1 1 1 1 1 1 0 2 6 1 1 1 1 0 0 1 0 1 0 1 1 3 2 8 2 5 7 10 0 0 0 1 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 0 0 2 1 1 1 2 0 2 2 4 7 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 1 1 0 1 0 0 0 0 7 5 2 0 0 0 0 1 1 2 1 0 0 0 0 1 1 1 1 2 1 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 2 0 0 0 0 1 0 1 0 0 0 1 0 0 4 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 2 6 2 0 0 0 0 0 2 3 2 0 0 0 0 0 2 2 12 0 0 0 0 0 2 2 1 0 0 0 0 0 2 1 54 0 0 0 0 0 2 1 1 0 0 0 0 0 1 3 7 0 0 0 0 0 1 3 3 0 0 0 0 0 1 1 4 0 0 0 0 0 1 1 3 0 0 0 0 0 1 1 2 0 0 0 0 0 1 1 1 4 0 0 0 5 0 0 0 2 0 2 0 0 0 0 0 2 0 1 0 0 0 0 0 2 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 3 0 1 0 0 0 0 0 2 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 3 3 0 0 0 0 0 0 2 3 0 0 0 0 0 0 2 3 0 0 0 0 0 0 2 3 0 0 0 0 0 0 2 2 0 0 0 0 0 0 2 1 0 0 0 0 0 0 2 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 9 0 0 0 0 0 0 1 2 0 0 0 0 0 0 1 2 0 0 0 0 0 0 1 2 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1

32

ACS Paragon Plus Environment

Page 32 of 35

Page 33 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3.B GENCODE ASV ENST00000513588.5|frame2|CTNND2-002 ENST00000596580.2|frame2|FARP1-043 ENST00000274382.8|frame2|LIX1-001 ENST00000554989.1|frame1|CKB-015 ENST00000540946.5|frame1|TOM1L2-009 ENST00000580647.5|frame1|EPB41L3-022 ENST00000564688.1|frame1|ALDOA-022 ENST00000553064.6|frame1|PCBP2-023 ENST00000503117.1|frame1|HAPLN1-007 ENST00000431348.1|frame3|MAP6D1-002 ENST00000553967.1|frame1|TUBB3-011 ENST00000617204.4|frame1|MAP7-203 ENST00000450156.5|frame2|DOCK4-012 ENST00000446677.2|frame1|PTGDS-010 ENST00000476263.1|frame3|CAP2-008 ENST00000441556.5|frame3|CTTNBP2-012 ENST00000558447.1|frame3|TJP1-007 ENST00000637069.1|frame1|KIAA1671-007 ENST00000461698.5|frame1|PSD-011 ENST00000434582.5|frame2|MFSD6-006 ENST00000527355.5|frame1|NDUFV1-027 ENST00000580068.5|frame1|TANC2-006 ENST00000617695.4|frame1|LAMA2-201 ENST00000610321.4|frame2|ANK3-034 ENST00000447574.1|frame2|HSPB1-002 ENST00000491424.5|frame2|CADPS-016 ENST00000447163.5|frame3|DLG4-007 ENST00000585201.5|frame1|MBP-050 ENST00000382031.5|frame2|MAP1A-002 ENST00000194900.8|frame3|DLG3-003 ENST00000430418.5|frame1|ABI2-004 ENST00000571207.5|frame3|MINK1-006 ENST00000472598.5|frame3|SYP-002 ENST00000400111.8|frame1|EPB41L3-036 ENST00000585169.5|frame1|FLOT2-006 ENST00000579271.5|frame2|EPB41L3-024 ENST00000407950.5|frame1|DGKB-011 ENST00000635664.1|frame3|CLASP2-026 ENST00000336415.8|frame3|CEP170-004 ENST00000379863.7|frame1|NRBP1-003 ENST00000444294.7|frame1|TRIM67-003 ENST00000635051.1|frame1|ENAH-011 ENST00000613214.4|frame1|TNXB-202 ENST00000613539.1|frame1|HMOX2-203 ENST00000522904.5|frame2|TACC1-031 ENST00000416929.2|frame1|MAGEB6P1-001 ENST00000509063.5|frame1|ALB-013 ENST00000426837.6|frame2|MAP4-005 ENST00000400422.6|frame3|EIF4G3-013 ENST00000446903.5|frame1|TNS1-002 ENST00000637992.1|frame2|RALGAPA1-022 ENST00000636096.2|frame1|RP11-402P6.15-002 ENST00000409258.2|frame1|IMMT-008 ENST00000475573.5|frame1|NME1-008 ENST00000475668.6|frame1|MGAM-007 ENST00000637289.1|frame3|TDRD12-007 ENST00000624354.3|frame3|VCL-006 ENST00000610520.4|frame1|EEF1A1-201 ENST00000615138.4|frame1|STAG3-201 ENST00000372418.2|frame2|PRPS1-003 ENST00000518230.5|frame3|ARFGEF1-007 ENST00000461630.1|frame1|PRPS2-005 ENST00000456097.6|frame2|EPB41L2-003 ENST00000338146.6|frame2|SPRYD4-001 ENST00000522232.2|frame2|PPP1R2P3-001 ENST00000538980.5|frame1|RAP1B-020 ENST00000380789.5|frame2|ADAM7-002 ENST00000463045.2|frame1|SDHB-006 ENST00000634903.1|frame1|PPP1R12B-020 ENST00000536606.5|frame1|RAN-009 ENST00000610213.5|frame1|DCP1A-001 ENST00000498765.5|frame1|GSTP1-007 ENST00000450569.5|frame1|AGR2-006 ENST00000403251.1|frame1|PDXP-002 ENST00000427277.6|frame1|WNK2-202 ENST00000570322.5|frame3|ELP5-004 ENST00000580054.1|frame2|SMARCD2-007 ENST00000506815.5|frame1|PPA2-010 ENST00000447166.2|frame3|C2ORF16-001 ENST00000637680.1|frame3|CTB-60B18.10-002 ENST00000609480.3|frame1|RP11-309L24.4-001 ENST00000233596.7|frame1|REEP6-001 ENST00000567970.1|frame1|C16ORF95-003 ENST00000636395.1|frame1|QRICH2-001 ENST00000378653.7|frame1|CFAP47-002 ENST00000417563.2|frame1|RP11-360D2.1-001 ENST00000639072.1|frame2|CFAP46-012 ENST00000614082.1|frame2|DNAH10-201 ENST00000637336.1|frame3|CFAP54-013 ENST00000638797.1|frame3|CCDC187-005 ENST00000371429.3|frame3|NDC1-001 ENST00000322527.3|frame1|CCDC168-001 ENST00000372781.3|frame2|SEMG1-001 ENST00000637986.1|frame1|RP11-249L12.1-001 ENST00000389840.6|frame3|DNAH17-201 ENST00000339474.9|frame2|LRRC37A3-003 ENST00000617432.4|frame3|CUL3-202 ENST00000505320.5|frame2|SPATA18-004 ENST00000622390.4|frame2|GPX4-202 ENST00000616170.4|frame1|SH3GLB1-202 ENST00000376363.5|frame2|ENKUR-004 ENST00000538330.5|frame2|PLCZ1-002 ENST00000585057.5|frame1|RUVBL1-008 ENST00000431221.6|frame2|ISCU-004 ENST00000433702.7|frame2|LINC00854-001 ENST00000395484.4|frame1|REEP6-004 ENST00000553443.5|frame1|TTC6-001 ENST00000511865.6|frame1|REEP5-006 ENST00000399143.8|frame2|DCDC2C-001

Chr

start

end

5 13 5 14 17 18 16 12 5 3 16 6 7 9 6 7 15 22 10 2 11 17 6 10 7 3 17 18 15 X 2 17 X 18 17 18 7 3 1 2 1 1 6 16 8 X 4 3 1 2 14 X 2 17 7 19 10 6 7 X 8 X 6 12 5 12 8 1 1 12 3 11 7 22 9 17 17 4 2 19 7 19 16 17 X 1 10 12 12 9 1 13 20 3 17 17 2 4 19 1 10 12 3 12 17 19 14 5 2

10972881 98143416 97091867 103519668 17847469 5397398 30067503 53461015 83644588 183817062 89923366 136342281 111765196 136979080 17393732 117710863 29766440 25111842 102405348 190437424 67611006 63073945 128883141 60042691 76302688 62491339 7203221 76978835 43510958 70444861 203328392 4884469 49189137 5392433 28879341 5397395 14148309 33498513 243125164 27428653 231162112 225496885 32065081 4496146 38757281 26160601 73404304 47851094 20807004 217848176 35539069 71667804 86159247 51153597 141995879 32810214 73998114 73515752 100177743 107639358 67199929 12799355 130869659 56468567 156850538 68610918 24441084 17027749 202539840 130872047 53283428 67584464 16792640 37664911 93184930 7252525 63835491 105405637 27537386 49017496 128866308 1490747 87302884 76274049 35919734 173635333 132808443 123762495 96592561 136249971 53765460 102729369 45206997 88338413 78423697 64854312 224473934 52051461 1104650 86704570 24984402 18683182 128064778 108562601 43221417 1495476 37622065 112878577 3703592

11903980 98407494 97142872 103521322 17972397 5419794 30069892 53476070 83673544 183825594 89934001 136526213 111784479 136981737 17539290 117873429 29968865 25197399 102409290 190502113 67612535 63194082 129516563 60109054 76304260 62536466 7217206 76990023 43531620 70505490 203427528 4898061 49200176 5543967 28897652 5419801 14902842 33645458 243200814 27442255 231221556 225652699 32098198 4510346 38836291 26161824 73421193 48088841 21176847 218002995 35809214 71671524 86195433 51162035 142106747 32829381 74118048 73521032 100214387 107650301 67238754 12823165 130890424 56479707 156852528 68657150 24509548 17052488 202580577 130875709 53347610 67586604 16801196 37666932 93299156 7259940 63837625 105473987 27582720 49020523 128872044 1497927 87317380 76307998 36385319 173637134 132812804 123813661 96875533 136304099 53838860 102759070 45209772 88467562 78577394 64919468 224505971 52094887 1106787 86748176 25016156 18714771 128097496 108569092 43229011 1497721 37842463 112922230 3847404

33

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Hippocampus Cortex Cortex S.Cord F.Brain Testis Testis_ Sperm_Vand _Hwang _Bai _Kim _Kim _Kim _Kim Zhang enbrouk 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 117 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4.

34

ACS Paragon Plus Environment

Page 34 of 35

Page 35 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

TOC only

35

ACS Paragon Plus Environment