Subscriber access provided by UNIV OF THE WESTERN CAPE
Letter
Identification of Missing Proteins in Normal Human Cerebrospinal Fluid Charlotte Macron, Lydie Lane, Antonio Núñez Galindo, and Loïc Dayon J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00194 • Publication Date (Web): 14 Aug 2018 Downloaded from http://pubs.acs.org on August 15, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Identification of Missing Proteins in Normal Human Cerebrospinal Fluid Charlotte Macron1, Lydie Lane2,3, Antonio Núñez Galindo1 and Loïc Dayon1, *
1
Proteomics, Nestlé Institute of Health Sciences, 1015 Lausanne, Switzerland
2
CALIPHO Group, SIB-Swiss Institute of Bioinformatics, CMU, rue Michel-Servet 1, 1211 Geneva
4, Switzerland 3
Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva,
rue Michel-Servet 1, 1211 Geneva 4, Switzerland *To whom correspondence should be addressed Corresponding Author Dr. Loïc Dayon Nestlé Institute of Health Sciences SA EPFL Innovation Park, Bâtiment H 1015 Lausanne Switzerland Email:
[email protected] Fax: +41 21 632 6499
Abstract The cerebrospinal fluid (CSF) proteome dataset presented herein was obtained after immunodepletion of abundant proteins and off-gel electrophoresis fractionation of a commercial pool of normal human CSF; liquid chromatography tandem mass spectrometry analysis was performed with a linear ion trap-Orbitrap Elite. We report the identification of 12344 peptides mapping on 2281 proteins. In the context of the Chromosome-Centric Human Proteome Project (C-HPP), the existence of seven missing proteins (MPs) is proposed to be validated.
This
dataset
is
available
to
the
ProteomeXchange
(http://www.proteomexchange.org/) with the dataset identifier PXD008029. 1 ACS Paragon Plus Environment
Consortium
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Keywords: cerebrospinal fluid; deep proteome; Human Proteome Project; mass spectrometry; LC-MS/MS; missing proteins; proteomics
Introduction Cerebrospinal fluid (CSF) has been understudied using proteomics compared to other biological fluids such as, for example, blood. Yet this fluid, in direct contact with the brain, proves to be a significant resource in terms of biomarkers of neurodegenerative diseases for instance. Tau protein (MAPT), its phosphorylated forms, and amyloid β (APP) peptides are well-established CSF biomarkers of core Alzheimer disease (AD) pathology. CSF alphasynuclein (SNCA) is used for Parkinson disease (PD) diagnosis. Because the human CSF proteome can still be further characterized using mass spectrometry (MS)-based proteomics, we assumed that CSF represents a matrix of choice for the detection of “missing proteins” (MPs) predicted by genomic or transcriptomic analyses. Identification of MPs is one key goal of the Chromosome-Centric Human Proteome Project (C-HPP)1. Therefore, we investigated a previously generated and recently made public CSF dataset (PXD008029) for the presence of MPs.
Experimental Section The shotgun proteomic workflow used to analyze a pooled sample of normal human CSF is summarized in Figure 1 and detailed in Supporting Information I. Briefly, a unique commercial pool of CSF samples from healthy donors was depleted from 14 proteins; the flow-through fraction was subjected to reduction, alkylation, and digestion with trypsin. Peptides were labeled using the tandem mass tag (TMT) 6-plex technology. Off-gel electrophoresis (OGE) was used for sample fractionation and the resulting 24 fractions were analyzed independently with nano-liquid chromatography (LC)-tandem MS (MS/MS) with a linear ion trap-Orbitrap Elite. Proteome Discoverer (version 1.4, Thermo Scientific) was 2 ACS Paragon Plus Environment
Page 2 of 14
Page 3 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
used as data processing interface. Identification was performed against the human UniProtKB/Swiss-Prot database (2017-07 release) including the β-lactoglobulin sequence (20225 sequences in total). Mascot (version 2.4.2, Matrix Sciences, London, U.K.) was used as search engine. Variable amino acid modifications were oxidized methionine, deamidated asparagine/glutamine, and 6-plex TMT-labeled peptide amino terminus; 6-plex TMT-labeled lysine was set as fixed modifications as well as carbamidomethylation of cysteine. Trypsin was selected as the proteolytic enzyme, with a maximum of two potential missed cleavages. Peptide and fragment ion tolerance were set to, respectively, 10 ppm and 0.02 Da. All Mascot result files were loaded into Scaffold Q+S 4.3.2 (Proteome Software, Portland, OR) to be further searched with X! Tandem (The GPM, thegpm.org; version CYCLONE (2010.12.01.1)). The false discovery rate (FDR) in Scaffold was calculated with the method of Kall et al.2 by dividing the number of reverse hits by the total number of hits. At the end, FDRs were 1% for proteins, 0.2% for peptides, and 0.07% for peptide-spectrum matches (that means 22 false positives at each level). The FDR at protein level was based on target-decoy strategy; this strategy cannot model all type of error. Therefore all proteins that passed the threshold are not all “confidently identified”. The dataset was used initially to generate a spectral library of TMT-labeled peptides for another study3. That is why CSF samples were labeled with TMT; this quantitative information was not exploited in the present report.
Results and Discussion The present analysis of human normal CSF allowed the identification of 12344 peptides mapping on 2281 proteins (peptides and proteins lists are reported in Supporting Information Table S1). Among these proteins, we noted the presence of some well-studied CSF biomarkers in the AD field, e.g., tau protein (MAPT), amyloid β (APP), neurogranin
3 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 14
(NRGN), and apolipoprotein E (APOE); same thing was true with respect to PD with the identification of alpha-synuclein (SNCA). The list of all identified proteins was searched against the MP list available on neXtProt4 (release 2018-01-17), where 2186 proteins were still marked as missing. These proteins
include proteins annotated with a “protein existence” (PE) score of 2 (predicted from transcriptomic analysis), 3 (predicted from genomic analysis and having a homologue in distant species), and 4 (only predicted from genomic analysis). According to the HPP guidelines5, a MP can obtain the status “PE1”, i.e., evidence at protein level, if it is identified with at least two unique, non-overlapping peptides with a sequence length ≥ 9 amino acids (AAs). The uniqueness of the peptide sequence needs to be checked with the peptide uniqueness checker
tool
available
at
neXtProt
(https://www.nextprot.org/viewers/peptide-uniquenesschecker/app/index.html)6. Two proteins of our dataset were among the list of MPs and met the required criteria for their existence validation, i.e., augurin (C2orf40, Q9H1Z8), and shadow of prion protein (SPRN, Q5BIV9). Peptides used for the identification of these MPs are listed in Table 1 and their respective tandem mass spectra are given in Supporting Information II. Spectra allowing the identification of augurin and shadow of prion protein were carefully checked. Apart from the augurin peptide AKEFLGSLK that only presented 4 AAs out of 9, all other spectra of augurin peptides allowed to identify all the amino acids in their sequences. Shadow of prion protein
was
identified
with
the
detection
of
14
AAs
out
of
25
for
VAAAGAAAGAAAGAAAGLAAGSGWR with a good ion score of 35.4. For peptide
4 ACS Paragon Plus Environment
Page 5 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
YGAPGSSLR, the tandem mass spectrum allowed the identification of 7 AAs out of 9, with a good ion score of 36.8. As recommended by the HPP guidelines, for further confirmation, we synthetized the peptides allowing the identification of MPs is our dataset. The synthetic peptides were labeled with TMT and analyzed with MS. For all peptides (except AKEFLGSLK), we obtained a good match between the endogenous and synthetic peptide tandem mass spectra; this strengthen our MP identifications (see Supporting Information II). Augurin is a hormone-like protein mostly expressed in the choroid plexus, which has been suggested to be involved in CSF homeostasis and brain injury7. Shadow of prion protein was shown to be associated to the outer leaflet of plasma membranes through a glycosylphosphatidylinositol anchor8. For 14 proteins of the MP list, we identified only one unique peptide with a sequence length ≥ 9 AAs (Supporting Information Table S2). All the tandem mass spectra are given in Supporting Information II. Three of these peptides, mapping to transmembrane protein 178A (TMEM178A, Q8NBL3), protocadherin beta-4 (PCDHB4, Q9Y5E5) and four-jointed box protein 1 (FJX1, Q86VR8), are already reported in GPMdb (www.thegpm.org), also with TMT labels. The fact that these peptides were identified in independent studies gives further confidence in their identification. The detailed comparison of our spectra with the spectra from GPMdb are provided in Supporting Information III. For these three proteins plus three other ones - epidermal growth factor-like protein 6 (EGFL6, Q8IUX8), interferon-induced very large GTPase 1 (GVINP1, Q7Z2Y8), and multiple epidermal growth factor-like domains protein 11 (MEGF11, A6BM72) - other unique peptides of nine AAs or more are reported in the last release (2018-01) of PeptideAtlas9. These six proteins are presented in Table 2, and additional information on 5 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
complementary peptides is presented in Supporting Information IV, including PeptideAtlas identifiers, links to the spectra for the best PSM using Universal Spectrum Identifiers, the number of observations and the tissue in which each peptide was detected. We also report the tissue in which the gene expression is expected to be the highest according to RNA sequencing data from the Human Protein Atlas (www.proteinatlas.org). Four of these proteins are majoritarily or exclusively expressed in the brain. Transmembrane protein 178A is a transmembrane protein encoded by chromosome 2, which was shown to act as a negative regulator of osteoclast differentiation10. According to the Human Protein Atlas, transmembrane protein 178A is highly expressed in the brain, where it might play an active role. Interestingly, the additional unique peptide previously reported was found in the brain11. Protocadherin beta-4 is a potential calcium-dependent cell-adhesion protein that is possibly involved in the establishment and maintenance of specific neuronal connections in the brain according to neXtProt. The four-jointed box protein 1 is a secreted protein that regulates dendrite extension12. Epidermal growth factor-like protein 6, also called MAEG (MAM and EGF domainscontaining gene protein), is a secreted protein which was reported to be involved in hair follicle morphogenesis13, osteoblast differentiation14, adipocyte differentiation15, and ovarian cancer development16. Although it is still considered as missing in neXtProt because only one peptide was found with MS (in pituitary gland), its presence in serum from patients with different diseases has previously been monitored with ELISA17. Interferon-induced very large GTPase 1 is the human orthologue of the mouse Gvin1, which plays a role in the immune system. The identification of this protein was quite unexpected since various genomic analyses concluded that the human locus had undergone pseudogenization18. Moreover, the tandem mass spectrum we acquired for peptide 6 ACS Paragon Plus Environment
Page 6 of 14
Page 7 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
MATGEHTPDDPLLRGK identifying interferon-induced very large GTPase 1 was noisy (see Supporting Information II, Figure S10), and complementary information about the peptide KDDFFTSFK reported in plasma did not allow us to reinforce this identification. As well, the tandem mass spectrum of the synthetic form of MATGEHTPDDPLLRGK did not reinforce this identification (Supporting Information II, Figure S10); all this information does not allow us to be confident in this MP validation (see footnote in Table 2). Multiple epidermal growth factor-like domains protein 11 is a type 1 transmembrane protein encoded on chromosome 15. In mouse, it is involved in the patterning of retinal neurons19. Detailed evolutionary analyses showed that MEGF11 underwent adaptive molecular evolution in primate lineages, implying that it might be involved in primate-specific traits20. HPP recommends to use PeptideAtlas to support the detection of MPs, but searching in other databases could allow to find additional peptides21. For the 14 MPs identified with one unique peptide with a sequence length ≥ 9 AAs, we performed a second exploration in GPMdb data. Additional peptides of high quality were found for 11 of them, including 5 for which an additional peptide was already found in PeptideAtlas (Table 2 and Supporting Information V). Spectra were manually checked for these 11 additional peptides. However, it was difficult for us to identify the provenance of these peptides, and to evaluate if they would match the HPP criteria in terms of FDR control and could be used to validate the proteins. In conclusion, we anticipate that our report will be used to directly validate the existence of two MPs in neXtProt and at least 5 others when merged with already available information.
Supporting Information Supporting Information I. MS-based proteomic workflow used for the analysis of human CSF samples. 7 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Supporting Information II. Annotated tandem mass spectra for all peptides identifying a missing protein and their respective synthetics tandem mass spectra. Supporting Information III. Comparison of our spectra with spectra present in GPMdb. Supporting Information IV. Additional information on complementary peptides found in PeptideAtlas. Supporting Information V: List of unique peptides of nine or more amino acids reported in GPMdb (referred here as additional peptides).
Table S1. List of peptides and proteins identified in the dataset. Table S2. List of missing proteins identified with only one unique peptide with 9 or more AAs.
Corresponding Author Proteomics, Nestlé Institute of Health Sciences, EPFL Innovation Park, Bâtiment H, 1015 Lausanne, Switzerland; Email:
[email protected], Phone: +41 21 632 6114, Fax: +41 21 632 6499
Notes The authors declare no competing financial interest.
Abbreviations AA
Amino Acids
AD
Alzheimer Disease
C-HPP
Chromosome Human Project Proteome
CSF
Cerebrospinal Fluid
ELISA
Enzyme-Linked Immunosorbent Assay
8 ACS Paragon Plus Environment
Page 8 of 14
Page 9 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
FDR
False Discovery Rate
LC
Liquid Chromatography
MP
Missing Protein
MS
Mass Spectrometry
MS/MS
Tandem Mass Spectrometry
OGE
Off-Gel Electrophoresis
PD
Parkinson Disease
PE
Protein Existence
TMT
Tandem Mass Tag
References 1. Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol 2012, 30 (3), 221-3. 2. Kall, L.; Storey, J. D.; MacCoss, M. J.; Noble, W. S., Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 2008, 7 (1), 29-34. 3. Corthesy, J.; Theofilatos, K.; Mavroudi, S.; Macron, C.; Cominetti, O.; Remlawi, M.; Ferraro, F.; Nunez Galindo, A.; Kussmann, M.; Likothanassis, S.; Dayon, L., An Adaptive Pipeline To Maximize Isobaric Tagging Data in Large-Scale MS-Based Proteomics. J Proteome Res 2018. 4. Gaudet, P.; Michel, P. A.; Zahn-Zabal, M.; Cusin, I.; Duek, P. D.; Evalet, O.; Gateau, A.; Gleizes, A.; Pereira, M.; Teixeira, D.; Zhang, Y.; Lane, L.; Bairoch, A., The neXtProt knowledgebase on human proteins: current status. Nucleic Acids Res 2015, 43 (Database issue), D764-70. 5. Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y. K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S., Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J Proteome Res 2016, 15 (11), 3961-3970.
9 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
6. Schaeffer, M.; Gateau, A.; Teixeira, D.; Michel, P. A.; Zahn-Zabal, M.; Lane, L., The neXtProt peptide uniqueness checker: a tool for the proteomics community. Bioinformatics 2017, 33 (21), 3471-3472. 7. Gonzalez, A. M.; Podvin, S.; Lin, S. Y.; Miller, M. C.; Botfield, H.; Leadbeater, W. E.; Roberton, A.; Dang, X.; Knowling, S. E.; Cardenas-Galindo, E.; Donahue, J. E.; Stopa, E. G.; Johanson, C. E.; Coimbra, R.; Eliceiri, B. P.; Baird, A., Ecrg4 expression and its product augurin in the choroid plexus: impact on fetal brain development, cerebrospinal fluid homeostasis and neuroprogenitor cell response to CNS injury. Fluids and barriers of the CNS 2011, 8 (1), 6. 8. Watts, J. C.; Drisaldi, B.; Ng, V.; Yang, J.; Strome, B.; Horne, P.; Sy, M. S.; Yoong, L.; Young, R.; Mastrangelo, P.; Bergeron, C.; Fraser, P. E.; Carlson, G. A.; Mount, H. T.; Schmitt-Ulms, G.; Westaway, D., The CNS glycoprotein Shadoo has PrP(C)-like protective properties and displays reduced levels in prion infections. The EMBO journal 2007, 26 (17), 4038-50. 9. Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R., The PeptideAtlas project. Nucleic Acids Res 2006, 34 (Database issue), D655-8. 10. Decker, C. E.; Yang, Z.; Rimer, R.; Park-Min, K. H.; Macaubas, C.; Mellins, E. D.; Novack, D. V.; Faccio, R., Tmem178 acts in a novel negative feedback loop targeting NFATc1 to regulate bone mass. Proceedings of the National Academy of Sciences of the United States of America 2015, 112 (51), 15654-9. 11. Pinto, S. M.; Manda, S. S.; Kim, M. S.; Taylor, K.; Selvan, L. D.; Balakrishnan, L.; Subbannayya, T.; Yan, F.; Prasad, T. S.; Gowda, H.; Lee, C.; Hancock, W. S.; Pandey, A., Functional annotation of proteome encoded by human chromosome 22. J Proteome Res 2014, 13 (6), 2749-60. 12. Tisler, A.; Barna, I.; Chatel, R., [Comparative study of 24-hour ambulatory blood pressure monitors with standard zero and random zero sphygmomanometers]. Orvosi hetilap 1994, 135 (26), 1415-9. 13. Osada, A.; Kiyozumi, D.; Tsutsui, K.; Ono, Y.; Weber, C. N.; Sugimoto, N.; Imai, T.; Okada, A.; Sekiguchi, K., Expression of MAEG, a novel basement membrane protein, in mouse hair follicle morphogenesis. Exp Cell Res 2005, 303 (1), 148-59. 14. Chim, S. M.; Qin, A.; Tickner, J.; Pavlos, N.; Davey, T.; Wang, H.; Guo, Y.; Zheng, M. H.; Xu, J., EGFL6 promotes endothelial cell migration and angiogenesis through the activation of extracellular signal-regulated kinase. The Journal of biological chemistry 2011, 286 (25), 22035-46. 15. Oberauer, R.; Rist, W.; Lenter, M. C.; Hamilton, B. S.; Neubauer, H., EGFL6 is increasingly expressed in human obesity and promotes proliferation of adipose tissue-derived stromal vascular cells. Mol Cell Biochem 2010, 343 (1-2), 257-69. 16. Bai, S.; Ingram, P.; Chen, Y. C.; Deng, N.; Pearson, A.; Niknafs, Y.; O'Hayer, P.; Wang, Y.; Zhang, Z. Y.; Boscolo, E.; Bischoff, J.; Yoon, E.; Buckanovich, R. J., EGFL6 Regulates the Asymmetric Division, Maintenance, and Metastasis of ALDH+ Ovarian Cancer Cells. Cancer research 2016, 76 (21), 6396-6409. 17. Chuang, C. Y.; Chen, M. K.; Hsieh, M. J.; Yeh, C. M.; Lin, C. W.; Yang, W. E.; Yang, S. F.; Chou, Y. E., High Level of Plasma EGFL6 Is Associated with Clinicopathological Characteristics in Patients with Oral Squamous Cell Carcinoma. Int J Med Sci 2017, 14 (5), 419-424. 18. Li, G.; Zhang, J.; Sun, Y.; Wang, H.; Wang, Y., The evolutionarily dynamic IFNinducible GTPase proteins play conserved immune functions in vertebrates and cephalochordates. Mol Biol Evol 2009, 26 (7), 1619-30.
10 ACS Paragon Plus Environment
Page 10 of 14
Page 11 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
19. Kay, J. N.; Chu, M. W.; Sanes, J. R., MEGF10 and MEGF11 mediate homotypic interactions required for mosaic spacing of retinal neurons. Nature 2012, 483 (7390), 465-9. 20. Tabata, H.; Hachiya, T.; Nagata, K.; Sakakibara, Y.; Nakajima, K., Screening for candidate genes involved in the production of mouse subventricular zone proliferative cells and an estimation of their changes in evolutionary pressure during primate evolution. Frontiers in neuroanatomy 2013, 7, 24. 21. Elguoshy, A.; Hirao, Y.; Xu, B.; Saito, S.; Quadery, A. F.; Yamamoto, K.; Mitsui, T.; Yamamoto, T., Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy. J Proteome Res 2017, 16 (12), 44034414.
Figure captions Figure 1. MS-based proteomic workflow used for the analysis of human CSF. CSF samples were depleted from 14 proteins ①, the flow-through proteins were then digested
②, tryptic peptides were labeled with TMT 6-plex, and samples were pooled together ③. The pooled sample was separated in 24 fractions with OGE ④, and each fraction was analyzed with LC-MS/MS using a linear ion trap-Orbitrap Elite ⑤.
Table 1: List of unique peptides of nine or more amino acids allowing the validation of PE2 proteins. Entry
Protein name
Unique Peptide sequence
Chromosome
Q9H1Z8
Augurin
AKEFLGSLK
2q12.2
Q9H1Z8
Augurin
EAPVPTKTK
2q12.2
Q9H1Z8
Augurin
HYDEDSAIGPR
2q12.2
Q9H1Z8
Augurin
TKVAVDENK
2q12.2
Q9H1Z8
Augurin
VAVDENKAK
2q12.2
Q5BIV9
Shadow of prion protein
VAAAGAAAGAAAGAAAGLAAGSGWR
10q26.3
Q5BIV9
Shadow of prion protein
YGAPGSSLR
10q26.3
Table 2: List of unique peptides of nine or more amino acids allowing the validation of PE2 proteins thanks to the contribution of this study and previous peptide identifications reported in PeptideAtlas (referred here as additional peptides).
Entry
Protein Name
Unique Peptide sequence
11 ACS Paragon Plus Environment
Chromosome
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Q8NBL3 Additional peptide Q9Y5E5
Transmembrane protein 178A Protocadherin beta-4
Additional peptide Q86VR8
Q7Z2Y8* Additional peptide
TIQQDEWHLLHLR
VLSDDDKQR MICSIPDNLPFILKPTLK
Four-jointed box protein 1
Additional peptides Q8IUX8 Additional peptide
AGADPPDQK
TELPASRPPEDR GAQWAQVQEELR
Epidermal growth factorlike protein 6 Interferon-induced very large GTPase 1
VNLQPFNYEEIVSR GVCEATCEPGCK MATGEHTPDDPLLRGK KDDFFTSFK
Page 12 of 14
2p22.1 5q31.3 11p13 Xp22.2 11p15.4
Multiple epidermal growth ISPALGAER factor-like domains protein 15q22.31 Additional peptide CDCHNGGQCSPTTGACECEPGYK 11 *The identification of interferon-induced very large GTPase 1 was not confirmed using synthetic form of MATGEHTPDDPLLRGK. We are therefore not confident to propose the validation of this MP. A6BM72
12 ACS Paragon Plus Environment
Page 13 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Figure 1.
13 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
For TOC only
14 ACS Paragon Plus Environment
Page 14 of 14