Identification of Missing Proteins in Normal Human ... - ACS Publications

Aug 14, 2018 - Identification of Missing Proteins in Normal Human Cerebrospinal Fluid. Charlotte Macron , Lydie Lane , Antonio Núñez Galindo , and L...
1 downloads 0 Views 427KB Size
Subscriber access provided by UNIV OF THE WESTERN CAPE

Letter

Identification of Missing Proteins in Normal Human Cerebrospinal Fluid Charlotte Macron, Lydie Lane, Antonio Núñez Galindo, and Loïc Dayon J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00194 • Publication Date (Web): 14 Aug 2018 Downloaded from http://pubs.acs.org on August 15, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Identification of Missing Proteins in Normal Human Cerebrospinal Fluid Charlotte Macron1, Lydie Lane2,3, Antonio Núñez Galindo1 and Loïc Dayon1, *

1

Proteomics, Nestlé Institute of Health Sciences, 1015 Lausanne, Switzerland

2

CALIPHO Group, SIB-Swiss Institute of Bioinformatics, CMU, rue Michel-Servet 1, 1211 Geneva

4, Switzerland 3

Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva,

rue Michel-Servet 1, 1211 Geneva 4, Switzerland *To whom correspondence should be addressed Corresponding Author Dr. Loïc Dayon Nestlé Institute of Health Sciences SA EPFL Innovation Park, Bâtiment H 1015 Lausanne Switzerland Email: [email protected] Fax: +41 21 632 6499

Abstract The cerebrospinal fluid (CSF) proteome dataset presented herein was obtained after immunodepletion of abundant proteins and off-gel electrophoresis fractionation of a commercial pool of normal human CSF; liquid chromatography tandem mass spectrometry analysis was performed with a linear ion trap-Orbitrap Elite. We report the identification of 12344 peptides mapping on 2281 proteins. In the context of the Chromosome-Centric Human Proteome Project (C-HPP), the existence of seven missing proteins (MPs) is proposed to be validated.

This

dataset

is

available

to

the

ProteomeXchange

(http://www.proteomexchange.org/) with the dataset identifier PXD008029. 1 ACS Paragon Plus Environment

Consortium

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Keywords: cerebrospinal fluid; deep proteome; Human Proteome Project; mass spectrometry; LC-MS/MS; missing proteins; proteomics

Introduction Cerebrospinal fluid (CSF) has been understudied using proteomics compared to other biological fluids such as, for example, blood. Yet this fluid, in direct contact with the brain, proves to be a significant resource in terms of biomarkers of neurodegenerative diseases for instance. Tau protein (MAPT), its phosphorylated forms, and amyloid β (APP) peptides are well-established CSF biomarkers of core Alzheimer disease (AD) pathology. CSF alphasynuclein (SNCA) is used for Parkinson disease (PD) diagnosis. Because the human CSF proteome can still be further characterized using mass spectrometry (MS)-based proteomics, we assumed that CSF represents a matrix of choice for the detection of “missing proteins” (MPs) predicted by genomic or transcriptomic analyses. Identification of MPs is one key goal of the Chromosome-Centric Human Proteome Project (C-HPP)1. Therefore, we investigated a previously generated and recently made public CSF dataset (PXD008029) for the presence of MPs.

Experimental Section The shotgun proteomic workflow used to analyze a pooled sample of normal human CSF is summarized in Figure 1 and detailed in Supporting Information I. Briefly, a unique commercial pool of CSF samples from healthy donors was depleted from 14 proteins; the flow-through fraction was subjected to reduction, alkylation, and digestion with trypsin. Peptides were labeled using the tandem mass tag (TMT) 6-plex technology. Off-gel electrophoresis (OGE) was used for sample fractionation and the resulting 24 fractions were analyzed independently with nano-liquid chromatography (LC)-tandem MS (MS/MS) with a linear ion trap-Orbitrap Elite. Proteome Discoverer (version 1.4, Thermo Scientific) was 2 ACS Paragon Plus Environment

Page 2 of 14

Page 3 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

used as data processing interface. Identification was performed against the human UniProtKB/Swiss-Prot database (2017-07 release) including the β-lactoglobulin sequence (20225 sequences in total). Mascot (version 2.4.2, Matrix Sciences, London, U.K.) was used as search engine. Variable amino acid modifications were oxidized methionine, deamidated asparagine/glutamine, and 6-plex TMT-labeled peptide amino terminus; 6-plex TMT-labeled lysine was set as fixed modifications as well as carbamidomethylation of cysteine. Trypsin was selected as the proteolytic enzyme, with a maximum of two potential missed cleavages. Peptide and fragment ion tolerance were set to, respectively, 10 ppm and 0.02 Da. All Mascot result files were loaded into Scaffold Q+S 4.3.2 (Proteome Software, Portland, OR) to be further searched with X! Tandem (The GPM, thegpm.org; version CYCLONE (2010.12.01.1)). The false discovery rate (FDR) in Scaffold was calculated with the method of Kall et al.2 by dividing the number of reverse hits by the total number of hits. At the end, FDRs were 1% for proteins, 0.2% for peptides, and 0.07% for peptide-spectrum matches (that means 22 false positives at each level). The FDR at protein level was based on target-decoy strategy; this strategy cannot model all type of error. Therefore all proteins that passed the threshold are not all “confidently identified”. The dataset was used initially to generate a spectral library of TMT-labeled peptides for another study3. That is why CSF samples were labeled with TMT; this quantitative information was not exploited in the present report.

Results and Discussion The present analysis of human normal CSF allowed the identification of 12344 peptides mapping on 2281 proteins (peptides and proteins lists are reported in Supporting Information Table S1). Among these proteins, we noted the presence of some well-studied CSF biomarkers in the AD field, e.g., tau protein (MAPT), amyloid β (APP), neurogranin

3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 14

(NRGN), and apolipoprotein E (APOE); same thing was true with respect to PD with the identification of alpha-synuclein (SNCA). The list of all identified proteins was searched against the MP list available on neXtProt4 (release 2018-01-17), where 2186 proteins were still marked as missing. These proteins

include proteins annotated with a “protein existence” (PE) score of 2 (predicted from transcriptomic analysis), 3 (predicted from genomic analysis and having a homologue in distant species), and 4 (only predicted from genomic analysis). According to the HPP guidelines5, a MP can obtain the status “PE1”, i.e., evidence at protein level, if it is identified with at least two unique, non-overlapping peptides with a sequence length ≥ 9 amino acids (AAs). The uniqueness of the peptide sequence needs to be checked with the peptide uniqueness checker

tool

available

at

neXtProt

(https://www.nextprot.org/viewers/peptide-uniquenesschecker/app/index.html)6. Two proteins of our dataset were among the list of MPs and met the required criteria for their existence validation, i.e., augurin (C2orf40, Q9H1Z8), and shadow of prion protein (SPRN, Q5BIV9). Peptides used for the identification of these MPs are listed in Table 1 and their respective tandem mass spectra are given in Supporting Information II. Spectra allowing the identification of augurin and shadow of prion protein were carefully checked. Apart from the augurin peptide AKEFLGSLK that only presented 4 AAs out of 9, all other spectra of augurin peptides allowed to identify all the amino acids in their sequences. Shadow of prion protein

was

identified

with

the

detection

of

14

AAs

out

of

25

for

VAAAGAAAGAAAGAAAGLAAGSGWR with a good ion score of 35.4. For peptide

4 ACS Paragon Plus Environment

Page 5 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

YGAPGSSLR, the tandem mass spectrum allowed the identification of 7 AAs out of 9, with a good ion score of 36.8. As recommended by the HPP guidelines, for further confirmation, we synthetized the peptides allowing the identification of MPs is our dataset. The synthetic peptides were labeled with TMT and analyzed with MS. For all peptides (except AKEFLGSLK), we obtained a good match between the endogenous and synthetic peptide tandem mass spectra; this strengthen our MP identifications (see Supporting Information II). Augurin is a hormone-like protein mostly expressed in the choroid plexus, which has been suggested to be involved in CSF homeostasis and brain injury7. Shadow of prion protein was shown to be associated to the outer leaflet of plasma membranes through a glycosylphosphatidylinositol anchor8. For 14 proteins of the MP list, we identified only one unique peptide with a sequence length ≥ 9 AAs (Supporting Information Table S2). All the tandem mass spectra are given in Supporting Information II. Three of these peptides, mapping to transmembrane protein 178A (TMEM178A, Q8NBL3), protocadherin beta-4 (PCDHB4, Q9Y5E5) and four-jointed box protein 1 (FJX1, Q86VR8), are already reported in GPMdb (www.thegpm.org), also with TMT labels. The fact that these peptides were identified in independent studies gives further confidence in their identification. The detailed comparison of our spectra with the spectra from GPMdb are provided in Supporting Information III. For these three proteins plus three other ones - epidermal growth factor-like protein 6 (EGFL6, Q8IUX8), interferon-induced very large GTPase 1 (GVINP1, Q7Z2Y8), and multiple epidermal growth factor-like domains protein 11 (MEGF11, A6BM72) - other unique peptides of nine AAs or more are reported in the last release (2018-01) of PeptideAtlas9. These six proteins are presented in Table 2, and additional information on 5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

complementary peptides is presented in Supporting Information IV, including PeptideAtlas identifiers, links to the spectra for the best PSM using Universal Spectrum Identifiers, the number of observations and the tissue in which each peptide was detected. We also report the tissue in which the gene expression is expected to be the highest according to RNA sequencing data from the Human Protein Atlas (www.proteinatlas.org). Four of these proteins are majoritarily or exclusively expressed in the brain. Transmembrane protein 178A is a transmembrane protein encoded by chromosome 2, which was shown to act as a negative regulator of osteoclast differentiation10. According to the Human Protein Atlas, transmembrane protein 178A is highly expressed in the brain, where it might play an active role. Interestingly, the additional unique peptide previously reported was found in the brain11. Protocadherin beta-4 is a potential calcium-dependent cell-adhesion protein that is possibly involved in the establishment and maintenance of specific neuronal connections in the brain according to neXtProt. The four-jointed box protein 1 is a secreted protein that regulates dendrite extension12. Epidermal growth factor-like protein 6, also called MAEG (MAM and EGF domainscontaining gene protein), is a secreted protein which was reported to be involved in hair follicle morphogenesis13, osteoblast differentiation14, adipocyte differentiation15, and ovarian cancer development16. Although it is still considered as missing in neXtProt because only one peptide was found with MS (in pituitary gland), its presence in serum from patients with different diseases has previously been monitored with ELISA17. Interferon-induced very large GTPase 1 is the human orthologue of the mouse Gvin1, which plays a role in the immune system. The identification of this protein was quite unexpected since various genomic analyses concluded that the human locus had undergone pseudogenization18. Moreover, the tandem mass spectrum we acquired for peptide 6 ACS Paragon Plus Environment

Page 6 of 14

Page 7 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MATGEHTPDDPLLRGK identifying interferon-induced very large GTPase 1 was noisy (see Supporting Information II, Figure S10), and complementary information about the peptide KDDFFTSFK reported in plasma did not allow us to reinforce this identification. As well, the tandem mass spectrum of the synthetic form of MATGEHTPDDPLLRGK did not reinforce this identification (Supporting Information II, Figure S10); all this information does not allow us to be confident in this MP validation (see footnote in Table 2). Multiple epidermal growth factor-like domains protein 11 is a type 1 transmembrane protein encoded on chromosome 15. In mouse, it is involved in the patterning of retinal neurons19. Detailed evolutionary analyses showed that MEGF11 underwent adaptive molecular evolution in primate lineages, implying that it might be involved in primate-specific traits20. HPP recommends to use PeptideAtlas to support the detection of MPs, but searching in other databases could allow to find additional peptides21. For the 14 MPs identified with one unique peptide with a sequence length ≥ 9 AAs, we performed a second exploration in GPMdb data. Additional peptides of high quality were found for 11 of them, including 5 for which an additional peptide was already found in PeptideAtlas (Table 2 and Supporting Information V). Spectra were manually checked for these 11 additional peptides. However, it was difficult for us to identify the provenance of these peptides, and to evaluate if they would match the HPP criteria in terms of FDR control and could be used to validate the proteins. In conclusion, we anticipate that our report will be used to directly validate the existence of two MPs in neXtProt and at least 5 others when merged with already available information.

Supporting Information Supporting Information I. MS-based proteomic workflow used for the analysis of human CSF samples. 7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information II. Annotated tandem mass spectra for all peptides identifying a missing protein and their respective synthetics tandem mass spectra. Supporting Information III. Comparison of our spectra with spectra present in GPMdb. Supporting Information IV. Additional information on complementary peptides found in PeptideAtlas. Supporting Information V: List of unique peptides of nine or more amino acids reported in GPMdb (referred here as additional peptides).

Table S1. List of peptides and proteins identified in the dataset. Table S2. List of missing proteins identified with only one unique peptide with 9 or more AAs.

Corresponding Author Proteomics, Nestlé Institute of Health Sciences, EPFL Innovation Park, Bâtiment H, 1015 Lausanne, Switzerland; Email: [email protected], Phone: +41 21 632 6114, Fax: +41 21 632 6499

Notes The authors declare no competing financial interest.

Abbreviations AA

Amino Acids

AD

Alzheimer Disease

C-HPP

Chromosome Human Project Proteome

CSF

Cerebrospinal Fluid

ELISA

Enzyme-Linked Immunosorbent Assay

8 ACS Paragon Plus Environment

Page 8 of 14

Page 9 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

FDR

False Discovery Rate

LC

Liquid Chromatography

MP

Missing Protein

MS

Mass Spectrometry

MS/MS

Tandem Mass Spectrometry

OGE

Off-Gel Electrophoresis

PD

Parkinson Disease

PE

Protein Existence

TMT

Tandem Mass Tag

References 1. Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol 2012, 30 (3), 221-3. 2. Kall, L.; Storey, J. D.; MacCoss, M. J.; Noble, W. S., Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 2008, 7 (1), 29-34. 3. Corthesy, J.; Theofilatos, K.; Mavroudi, S.; Macron, C.; Cominetti, O.; Remlawi, M.; Ferraro, F.; Nunez Galindo, A.; Kussmann, M.; Likothanassis, S.; Dayon, L., An Adaptive Pipeline To Maximize Isobaric Tagging Data in Large-Scale MS-Based Proteomics. J Proteome Res 2018. 4. Gaudet, P.; Michel, P. A.; Zahn-Zabal, M.; Cusin, I.; Duek, P. D.; Evalet, O.; Gateau, A.; Gleizes, A.; Pereira, M.; Teixeira, D.; Zhang, Y.; Lane, L.; Bairoch, A., The neXtProt knowledgebase on human proteins: current status. Nucleic Acids Res 2015, 43 (Database issue), D764-70. 5. Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y. K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S., Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J Proteome Res 2016, 15 (11), 3961-3970.

9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

6. Schaeffer, M.; Gateau, A.; Teixeira, D.; Michel, P. A.; Zahn-Zabal, M.; Lane, L., The neXtProt peptide uniqueness checker: a tool for the proteomics community. Bioinformatics 2017, 33 (21), 3471-3472. 7. Gonzalez, A. M.; Podvin, S.; Lin, S. Y.; Miller, M. C.; Botfield, H.; Leadbeater, W. E.; Roberton, A.; Dang, X.; Knowling, S. E.; Cardenas-Galindo, E.; Donahue, J. E.; Stopa, E. G.; Johanson, C. E.; Coimbra, R.; Eliceiri, B. P.; Baird, A., Ecrg4 expression and its product augurin in the choroid plexus: impact on fetal brain development, cerebrospinal fluid homeostasis and neuroprogenitor cell response to CNS injury. Fluids and barriers of the CNS 2011, 8 (1), 6. 8. Watts, J. C.; Drisaldi, B.; Ng, V.; Yang, J.; Strome, B.; Horne, P.; Sy, M. S.; Yoong, L.; Young, R.; Mastrangelo, P.; Bergeron, C.; Fraser, P. E.; Carlson, G. A.; Mount, H. T.; Schmitt-Ulms, G.; Westaway, D., The CNS glycoprotein Shadoo has PrP(C)-like protective properties and displays reduced levels in prion infections. The EMBO journal 2007, 26 (17), 4038-50. 9. Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R., The PeptideAtlas project. Nucleic Acids Res 2006, 34 (Database issue), D655-8. 10. Decker, C. E.; Yang, Z.; Rimer, R.; Park-Min, K. H.; Macaubas, C.; Mellins, E. D.; Novack, D. V.; Faccio, R., Tmem178 acts in a novel negative feedback loop targeting NFATc1 to regulate bone mass. Proceedings of the National Academy of Sciences of the United States of America 2015, 112 (51), 15654-9. 11. Pinto, S. M.; Manda, S. S.; Kim, M. S.; Taylor, K.; Selvan, L. D.; Balakrishnan, L.; Subbannayya, T.; Yan, F.; Prasad, T. S.; Gowda, H.; Lee, C.; Hancock, W. S.; Pandey, A., Functional annotation of proteome encoded by human chromosome 22. J Proteome Res 2014, 13 (6), 2749-60. 12. Tisler, A.; Barna, I.; Chatel, R., [Comparative study of 24-hour ambulatory blood pressure monitors with standard zero and random zero sphygmomanometers]. Orvosi hetilap 1994, 135 (26), 1415-9. 13. Osada, A.; Kiyozumi, D.; Tsutsui, K.; Ono, Y.; Weber, C. N.; Sugimoto, N.; Imai, T.; Okada, A.; Sekiguchi, K., Expression of MAEG, a novel basement membrane protein, in mouse hair follicle morphogenesis. Exp Cell Res 2005, 303 (1), 148-59. 14. Chim, S. M.; Qin, A.; Tickner, J.; Pavlos, N.; Davey, T.; Wang, H.; Guo, Y.; Zheng, M. H.; Xu, J., EGFL6 promotes endothelial cell migration and angiogenesis through the activation of extracellular signal-regulated kinase. The Journal of biological chemistry 2011, 286 (25), 22035-46. 15. Oberauer, R.; Rist, W.; Lenter, M. C.; Hamilton, B. S.; Neubauer, H., EGFL6 is increasingly expressed in human obesity and promotes proliferation of adipose tissue-derived stromal vascular cells. Mol Cell Biochem 2010, 343 (1-2), 257-69. 16. Bai, S.; Ingram, P.; Chen, Y. C.; Deng, N.; Pearson, A.; Niknafs, Y.; O'Hayer, P.; Wang, Y.; Zhang, Z. Y.; Boscolo, E.; Bischoff, J.; Yoon, E.; Buckanovich, R. J., EGFL6 Regulates the Asymmetric Division, Maintenance, and Metastasis of ALDH+ Ovarian Cancer Cells. Cancer research 2016, 76 (21), 6396-6409. 17. Chuang, C. Y.; Chen, M. K.; Hsieh, M. J.; Yeh, C. M.; Lin, C. W.; Yang, W. E.; Yang, S. F.; Chou, Y. E., High Level of Plasma EGFL6 Is Associated with Clinicopathological Characteristics in Patients with Oral Squamous Cell Carcinoma. Int J Med Sci 2017, 14 (5), 419-424. 18. Li, G.; Zhang, J.; Sun, Y.; Wang, H.; Wang, Y., The evolutionarily dynamic IFNinducible GTPase proteins play conserved immune functions in vertebrates and cephalochordates. Mol Biol Evol 2009, 26 (7), 1619-30.

10 ACS Paragon Plus Environment

Page 10 of 14

Page 11 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

19. Kay, J. N.; Chu, M. W.; Sanes, J. R., MEGF10 and MEGF11 mediate homotypic interactions required for mosaic spacing of retinal neurons. Nature 2012, 483 (7390), 465-9. 20. Tabata, H.; Hachiya, T.; Nagata, K.; Sakakibara, Y.; Nakajima, K., Screening for candidate genes involved in the production of mouse subventricular zone proliferative cells and an estimation of their changes in evolutionary pressure during primate evolution. Frontiers in neuroanatomy 2013, 7, 24. 21. Elguoshy, A.; Hirao, Y.; Xu, B.; Saito, S.; Quadery, A. F.; Yamamoto, K.; Mitsui, T.; Yamamoto, T., Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy. J Proteome Res 2017, 16 (12), 44034414.

Figure captions Figure 1. MS-based proteomic workflow used for the analysis of human CSF. CSF samples were depleted from 14 proteins ①, the flow-through proteins were then digested

②, tryptic peptides were labeled with TMT 6-plex, and samples were pooled together ③. The pooled sample was separated in 24 fractions with OGE ④, and each fraction was analyzed with LC-MS/MS using a linear ion trap-Orbitrap Elite ⑤.

Table 1: List of unique peptides of nine or more amino acids allowing the validation of PE2 proteins. Entry

Protein name

Unique Peptide sequence

Chromosome

Q9H1Z8

Augurin

AKEFLGSLK

2q12.2

Q9H1Z8

Augurin

EAPVPTKTK

2q12.2

Q9H1Z8

Augurin

HYDEDSAIGPR

2q12.2

Q9H1Z8

Augurin

TKVAVDENK

2q12.2

Q9H1Z8

Augurin

VAVDENKAK

2q12.2

Q5BIV9

Shadow of prion protein

VAAAGAAAGAAAGAAAGLAAGSGWR

10q26.3

Q5BIV9

Shadow of prion protein

YGAPGSSLR

10q26.3

Table 2: List of unique peptides of nine or more amino acids allowing the validation of PE2 proteins thanks to the contribution of this study and previous peptide identifications reported in PeptideAtlas (referred here as additional peptides).

Entry

Protein Name

Unique Peptide sequence

11 ACS Paragon Plus Environment

Chromosome

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Q8NBL3 Additional peptide Q9Y5E5

Transmembrane protein 178A Protocadherin beta-4

Additional peptide Q86VR8

Q7Z2Y8* Additional peptide

TIQQDEWHLLHLR

VLSDDDKQR MICSIPDNLPFILKPTLK

Four-jointed box protein 1

Additional peptides Q8IUX8 Additional peptide

AGADPPDQK

TELPASRPPEDR GAQWAQVQEELR

Epidermal growth factorlike protein 6 Interferon-induced very large GTPase 1

VNLQPFNYEEIVSR GVCEATCEPGCK MATGEHTPDDPLLRGK KDDFFTSFK

Page 12 of 14

2p22.1 5q31.3 11p13 Xp22.2 11p15.4

Multiple epidermal growth ISPALGAER factor-like domains protein 15q22.31 Additional peptide CDCHNGGQCSPTTGACECEPGYK 11 *The identification of interferon-induced very large GTPase 1 was not confirmed using synthetic form of MATGEHTPDDPLLRGK. We are therefore not confident to propose the validation of this MP. A6BM72

12 ACS Paragon Plus Environment

Page 13 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 1.

13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For TOC only

14 ACS Paragon Plus Environment

Page 14 of 14