Exploring the uncharacterized human proteome using neXtProt

2 days ago - 20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of ...
0 downloads 0 Views 996KB Size
Subscriber access provided by University of South Dakota

Article

Exploring the uncharacterized human proteome using neXtProt Paula Duek, Alain Gateau, Amos Bairoch, and Lydie Lane J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00537 • Publication Date (Web): 07 Sep 2018 Downloaded from http://pubs.acs.org on September 8, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Exploring the uncharacterized human proteome using neXtProt Paula Duek1, Alain Gateau1, Amos Bairoch1,2 and Lydie Lane1,2*

1

CALIPHO group, SIB-Swiss Institute of Bioinformatics, 2Department of Microbiology and Molecular

Medicine, Faculty of Medicine, University of Geneva, CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland *corresponding author [email protected] + 41 (0) 22 379 58 41

Abstract 20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of them are still lacking functional annotation, either predicted by bioinformatics tools or captured from experimental reports. A systematic exploration of the available literature on uncharacterized human genes/proteins led to propose functional annotations for 113 proteins and to consolidate a list of 1,862 uncharacterized human proteins. The advanced search functionality of neXtProt was used extensively in order to examine the landscape of the uncharacterized human proteome in terms of subcellular locations, protein-protein interactions, tissue expression, association with diseases and 3D structure. Finally, a deep data mining in various publicly available resources allowed to build functional hypotheses for 26 uncharacterized human proteins validated at protein level (uPE1). These hypotheses cover the fields of cilia biology, male reproduction, metabolism, nervous system, immunity, inflammation, RNA metabolism and chromatin biology. They will require experimental validation before they can be considered for annotation. Despite technological progresses, the pace of human protein characterization studies is still slow. It could be accelerated by a better integration of existing knowledge resources, and by initiating large collaborative projects involving specialists of different biology fields. We hope that our analysis will contribute to set up the ground for such collaborative approaches and will be exploited by the HUPO Human Proteome Project teams committed to characterize uPE1 proteins.

Keywords biocuration, cilium biology, data mining, functional annotation, human protein, knowledgebase, neXtProt, systems biology, SPARQL 1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction

20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17). High-throughput technologies in the last years produced huge amount of data in terms of protein validation (shotgun and targeted MS), protein-protein interactions (affinity purification coupled with MS), genetic variants (genome wide association studies, deep sequencing), gene/protein expression (RNA sequencing and antibody-based studies) and 3D structure (nuclear magnetic resonance, X-ray crystallography, cryo-electron microscopy). These progresses contributed to develop a more precise picture of the human proteome, but many human proteins still have only a vague or speculative function annotated in UniProtKB/Swiss-Prot1 and about 10% of them have no function annotated at all, either predicted or experimentally confirmed. A comprehensive characterization of a protein implies to define not only its molecular function but also the context in which this activity is performed: its targets and interacting partners, its tissular and subcellular location, the pathways it belongs to or regulates and its role at organism level. Characterizing a protein in human tissues is often difficult due to sample access restrictions and most functional studies are performed on cell lines, which only partially reproduce in vivo conditions. Although new technologies of targeted genome editing accelerate initial steps in protein characterization, understanding the function(s) of each protein in its biological context requires discrete, often time-consuming studies. The functional characterization of all human proteins is a huge challenge currently being undertaken using systems biology approaches that combine highthroughput omics technologies and bioinformatics. This strategy largely relies on data that is properly standardized, annotated and shared among the community. The Human Protein Atlas (HPA)2 and neXtProt3, which focus on human proteins, provide important support at every step of characterization projects. neXtProt collects high-quality data at genomic, transcriptomic and proteomic levels for every human protein, and converts it into semantic annotations. This format allows querying not only neXtProt data but also any data from compatible resources through an advanced query tool based on the SPARQL language. neXtProt is the knowledge reference resource for the HUPO Human Proteome Project (HPP)4, an international consortium dedicated to cataloguing the 20,230 parts list, understanding the complexity of the human proteome and making human proteomics an integrated complement to genomics and other “omics” across the clinical, biomedical and life sciences. The aim of the present study is to explore the uncharacterized human proteome using neXtProt and a combination of bioinformatics resources in order to propose annotation updates and new functional hypotheses to be validated by expert laboratories.

2 ACS Paragon Plus Environment

Page 2 of 36

Page 3 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Material and methods

neXtProt data exploration using the neXtProt SPARQL endpoint Data was extracted from neXtProt using the advanced search functionality based on SPARQL language on the neXtProt SNORQL interface (snorql.nextprot.org), as detailed in File S1. Logical operations on entry lists (AND/OR/NOT) were performed using the list management tool of neXtProt, freely accessible in the personalized mode of the platform (www.nextprot.org/user/protein/lists).

Mapping to ENSEMBL Mapping of neXtProt entries to ENSEMBL genes (ENSG) (v90) was retrieved from the neXtProt ftp site (ftp://ftp.nextprot.org/pub/current_release/mapping/nextprot_ensg.txt). This mapping is a manually checked and corrected version of the UniProt mapping to ENSG.

Integration of HPA RNA-Seq data Among the different resources that provide data on tissue levels of transcript expression, such as GTEx5, FANTOM6 and HPA2, we chose to use HPA (www.proteinatlas.org) because it provides with a classification of genes into different categories (expressed in all, mixed, not detected, tissue enriched and tissue enhanced) that allows to organize the uncharacterized human proteome by expression profile. Mapping HPA (V16.1) data with neXtProt entries was done using the ENSG identifiers whenever possible. neXtProt entries that do not map to any ENSG were mapped to HPA using gene names. Genes from the categories “expressed in all” and “mixed” have ubiquitous and broad expression, respectively. The expression profiles of the proteins belonging to the categories “group enriched” and “tissue enhanced” were closely examined in order to relate them to biological systems or functions. We considered as representative tissues for the hematopoietic system the bone marrow, spleen, tonsil, lymph node and appendix; for the digestive system the colon, rectum, duodenum, intestine, stomach, liver, gallbladder and pancreas; for male reproduction the testis, epididymis, prostate and seminal vesicle and for cilia biology the testis and fallopian tube. A protein was classified as related to the hematopoietic system, digestive system, female reproduction, male reproduction or cilia biology, when its expression was enhanced or enriched only in the relevant tissues. Some proteins in the “tissue enhanced” category are enhanced in only one tissue. In those cases, the tissue in the non-specific group (“TPM max in non-specific tissue”) must be related to the system to be included in the corresponding category. We also considered the proteins that were “tissue enriched” in cerebral cortex, the only nervous tissue in the HPA panel, as potentially brainrelated.

3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 36

Integration of InterPro entries Mapping to InterPro was done with a SPARQL query aiming to extract the corresponding crossreferences in neXtProt (File S1). An XML file containing all InterPro entries (interpro.xml.gz) was downloaded from the InterPro website (version 67.0 1st March 2018).

Integration of orthology information For the 26 proteins for which we propose a functional hypothesis, BLAST searches were performed against all UniProtKB as target database with the set up parameters of E value of 10, filtering for low complexity regions and allowing to retrieve the maximum number of hits (1000). Two proteins were considered homologous if they were reciprocally found by BLAST and if the E value was 1, one uniquely mapping peptide of at least 9 amino-acids was identified in a human sample (Table S3, column G). This was not sufficient to validate the existence of these proteins according to the current HPP guidelines, but can help to determine in which samples to try and validate them. Using the neXtProt advanced query tool, we checked the availability of reagents and determined that 466 out of the 675 proteins with unknown function annotated as PE>1 have at least two uniquely mapping synthetic peptides of at least 9 amino-acids in length available from SRMAtlas16, and that 24 of these proteins have associated immunohistochemistry data in neXtProt from at least two antibodies from HPA (Table S3, column H and I).

2)

Association with diseases

8 ACS Paragon Plus Environment

Page 8 of 36

Page 9 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

According to neXtProt medical annotations, originating from UniProtKB/Swiss-Prot and Orphanet17, 29 protein-coding genes with unknown function are related to diseases (Table S3, column J). Given their potential medical relevance, finding the function of these proteins should be a high priority. Twelve have mutations located in their sequences that directly cause diseases (including single amino-acid variants, intron splicing donor sites, intron insertions and GAC expansions). Five of them (ATXN8, BEAN1, CWF19L1, TMEM240, and VWA3B) are involved in spinocerebellar ataxias. Eleven other genes have been found to be involved in chromosomal translocations - nine associated with cancers and two with other diseases. For six others, the potential association with diseases is less clear: they belong to genomic regions found to be deleted in diseases or have variants associated with disease susceptibility.

3)

Phenotypes in mouse models

For proteins with mouse orthologs, functional clues can be obtained by analysing the phenotypes of mutant animals. Notably, out of the 1,862 entries from our set, 509 do not have a cross-reference to MGI18, indicating that the corresponding genes are not conserved in mouse, or that the orthology is difficult to establish, for example due to paralogs in at least one of the two organisms (Table S3, column K). This proportion (26%) is much higher than for all PE1-4 entries in neXtProt (9%). Lack of conservation in mouse may be one of the reasons why some proteins still lack functional characterization. However, according to MouseMine knock-out mice have been established for 144 of the 1,353 genes with mapping to MGI (Table S3, column L). Combining the observed phenotypes with information from neXtProt can help to build or refine functional hypotheses, as will be discussed in the next sections.

4)

Domains

Functional clues about uncharacterized proteins can sometimes be inferred based on available information on characterized proteins with similar sequence features. InterPro combines sequencebased signatures provided by different resources to define protein families, structural and/or functional domains and protein features such as active sites or binding sites with minimal redundancy19. From our list of proteins without functional annotation 1,394 have cross-references to InterPro corresponding to 132 superfamilies, 630 families, 353 domains and 39 features (repeats, conserved sites and binding sites) (Table S3, column M). As described in the following sections, we used this data in combination with other information to build functional hypotheses.

5)

Protein-protein interactions

9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Protein-protein interaction data has often been used to build functional hypothesis on proteins, based on the “guilt by association” principle. The main drawback of this kind of analysis is that methods used in interactomics such as yeast two hybrid or affinity-based proteomics lead to numerous false positives. Therefore, it is crucial to limit such analysis to the interactions that were validated by independent experiments (labelled as “Gold” in neXtProt). 20% (369/1,862) of the uncharacterized proteins have “Gold” binary interactions (Table S3, column N), and the vast majority of the identified interactants (906 out of 980; data not shown) have a function annotated. Sixty-one entries have “Gold” information on binary interactions and protein complexes that was manually curated from papers by Swiss-Prot curators, and nine entries have “Gold” GO cellular component annotations reflecting their participation in protein complexes (exon-exon junction complex (EJC), tRNA-splicing ligase complex, ubiquitin ligase complex, ribosome, receptor complex, collagen trimer) (Table S3, column O). The interactome of uncharacterized proteins combined with localization information or with other sequence features can be used to draw functional hypothesis. For instance, MCRIP2 interacts with its characterized paralog MCRIP1 and with DDX6 and relocates from nucleus and cytoplasm to stress granules upon cellular stress20,21, suggesting that it may have a role in stress granule formation. While MCRIP1 is known to bind the transcriptional co-repressor CTBP(s) via its PXDLS motif, MCRIP2 lacks key residues of the PXDLS motif, suggesting that it would not bind CTBP(s). Experimental validation should always be performed before annotating the function of a protein based on protein-protein interactions. For example, the integral membrane protein CCDC90B was shown to interact with MCUR1 and MCU, two components of the MCU complex involved in mitochondrial Ca2+uptake, but silencing of CCDC90B did not affect the Ca2+ uptake suggesting it may have another function22.

6)

Structural information

According to neXtProt, 24 proteins with unknown function have structural information from X-ray or NMR studies in the PDB data bank23 (Table S3, column P). neXtProt provides a dedicated tool accessible from the structure page of a protein entry that allows users to display the tridimensional structure of that protein with sequence features such as active site residues, PTMs or variants mapped on the structure. In some cases, the structural information covers only a subpart of the sequence, often a domain, but in most cases, the structure of the full protein has been solved. Structural information on protein complexes sometimes confirms protein-protein interactions. Proteins sharing structural properties may share functional properties, even if their sequence similarity is low. Therefore, structural similarity search is often used to complement sequence-based similarity tools.

10 ACS Paragon Plus Environment

Page 10 of 36

Page 11 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

7)

Subcellular localization

neXtProt contains “Gold” subcellular localization information (experimentally-based annotations or sequence-based predictions of transmembrane domains, signal peptide, transit peptide or lipid anchor) for 41 % of the uncharacterized proteins (766/1,862) (Table S3, column Q). As expected, this is very low in comparison to the percentage of proteins with annotated gold subcellular location in the whole human proteome (17,882/20,230=88%) and in the PE1 proteome (16,078/17,460=92%). As shown in Figure 4A, 409 of them are predicted to be membrane proteins and 108 are predicted to be secreted, while 259 proteins have other intracellular locations. The 1,096 remaining uncharacterized proteins are not predicted to be at membranes or secreted. Determining their location would be a first step toward their functional characterization.

Figure 4: Subcellular location annotations for the 1,862 human proteins with unknown function. A. Distribution in the intracellular and extracellular compartments and at membranes. Membrane proteins are proteins having at least one transmembrane domain (401) or a lipid anchor (8), secreted proteins are those with a predicted signal sequence and without a transmembrane domain (108), intracellular proteins are those having “Gold” annotations based on experiments and no transmembrane domains, lipidation or signal sequence (249). The predicted intracellular proteins are the ones without the above mentioned annotations (1,096). B. Detailed localization of the 249 intracellular proteins with experimental annotations. The number of proteins at each location are the following: nucleus (84), cytoplasm (45), mitochondria (11), membrane (14), cytoplasmic vesicle, endoplasmic reticulum, Golgi apparatus or acrosome (14), centrosome, centriole, cilium, microtubules, microtubule organizing center, midbody, spindle, spindle pole (26), cytoskeleton, cell cortex, focal adhesion, cell junction (6), nucleus and cytoplasm (24), nucleus and at least one of the following: Golgi apparatus, cytoplasmic vesicle, postsynaptic density or plasma membrane (8), nucleus and cytoskeleton (4), nucleus and chromosome associated components such as kinetochore, centromere or midbody (2), nucleus, cytoplasm and mitochondria (2), centrosome-related structures (centriole, cilium, flagellum, microtubule, microtubule organizing center) and other locations (plasma membrane, membrane, nucleus, cytoplasmic granule, cytoplasmic vesicle or Golgi apparatus) (8), endosome and mitochondria (1). Membrane proteins

11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Four hundred one uncharacterized proteins have at least one transmembrane domain predicted, among which 157 would span the membrane more than once (Table S3, column R). In addition, eight proteins associate to membranes by lipid anchors (Table S3, column R). Interestingly, the proportion of transmembrane proteins in our set (401/1,862, 22%) is slightly lower than the proportion of transmembrane proteins in the whole human proteome (5,207/20,230, 28%) or in the PE1 proteome (4,197/17,460, 24%). This is rather unexpected since transmembrane proteins are known to be difficult to detect and purify - they represent 42% of the missing proteins (927/2,186). This suggests that the function of transmembrane proteins is not more difficult to predict or to characterize than the function of soluble proteins. Fifty-nine integral or lipid-anchored membrane proteins have additional annotations that precise or suggest their localization at the plasma membrane or at organellar membranes (Table S3, column Q, in yellow). Conversely, 15 proteins have additional annotations that are apparently not compatible with the presence of transmembrane domains, such as nucleoplasm, cytoplasm, centrosome, spindle poles, or extracellular compartments (Table S3, column Q, in red). Some of these apparent discrepancies might be explained by the co-existence of splice isoforms lacking the transmembrane domain(s), as for LSMEM1, RMDN2, TMEM25 and TMEM134. Another explanation could be the generation of soluble proteoforms by ectodomain shedding. However, one cannot exclude that some of these annotations, even if considered as “Gold” quality, may correspond to artifacts from the experimental or curation workflows.

Secreted proteins One hundred eight soluble proteins have a predicted signal sequence (Table S3, column S) indicating that they may enter the secretory pathway. Some may be released in body fluids and represent interesting candidates for clinical applications, such as MSMB, which was proposed as a serum marker for prostate cancer24. The proportion of predicted secreted proteins in our set (108/1,862, 6%) is lower than the proportion in the whole human proteome (2,013/ 20,230, 10 %) or in the PE1 set (1,785/17,460, 10 %). This may indicate that the function of secreted proteins is easier to study or to predict than the function of intracellular soluble proteins and/or that these proteins have been more intensively studied due to their potential biomedical relevance. PLAC9, a small (75 amino-acids after cleavage of the predicted signal sequence) secreted protein with enhanced expression in adipose tissue, was found to be up-regulated in obesity samples25, suggesting a function in metabolism. According to the International Mouse Phenotyping Consortium26, mice mutant for Plac9a, one of the two mouse paralogs, display increased total body fat amount, decreased mean corpuscular volume, decreased bone mineral content, decreased fasted circulating glucose level and cataract. No mice mutant has been described so far for Plac9b.

12 ACS Paragon Plus Environment

Page 12 of 36

Page 13 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Intracellular proteins Two hundred forty-nine proteins with “Gold” annotations are not integral or lipid-anchored to membranes and are not predicted to be secreted (Figure 4A). As shown in Figure 4B, most of them (200/249) localize to one single or structurally related subcellular location while the others localize to more than one subcellular location (Table S3, column T, indicated as single and multiple, respectively). Among the proteins that localize in one consensus location, 14 are annotated as membraneassociated, and 14 as associated with the secretory pathway (endoplasmic reticulum, Golgi apparatus, cytoplasmic vesicles or acrosomal vesicle and plasma membrane). Since these proteins do not have a predicted transmembrane, lipid anchor, transit or signal peptide, they probably associate to these structures by interacting with other membrane proteins. Eleven are mitochondrial, including four that have a predicted transit peptide and are probably located in the mitochondrial matrix (Table S3, column S). Among them, C21orf33, previously known as HES1/KNP-I, was initially proposed to be a mitochondrial RNA-binding protein27. It is now called GATD3A due to the presence of a class I glutamine amidotransferase-like domain (IPR029062), and has structural homology with PARK7/DJ-1, a mitochondrial redox sensor involved in Parkinson,s disease with multiple functions including transcription regulation, chaperone and glyoxylase/deglycase activities28. Since the two residues reported to be essential for PARK7 chaperone and enzymatic properties – Glu-18 and Cys10628, are conserved in C21orf33 (Figure S1), we speculate that C21orf33 might have similar molecular functions than PARK7. PARK7 also seems to act in the maintenance of mitochondrial integrity by interfering with the PINK1/Parkin pathway at transcriptional and post-translational levels29,30. A genome-wide RNAi screen suggested that C21orf33 may be a negative regulator of PINK1/Parkin translocation to damaged mitochondria31, supporting a role in mitochondrial maintenance as well. Interestingly, C21orf33 protein levels were shown to be two-fold elevated in Down Syndrome (DS) fetal brain cortex, suggesting a potential involvement of this protein in DS pathogenesis32. Mitochondrial morphological and functional defects, with impaired respiratory complex I activity and increased oxidative stress, are hallmarks of DS33. In mouse, knockdown of C21orf33 was shown to specifically increase the mt-mRNA levels of mitochondrial ND1, ND2 and ND3 complex I proteins34, suggesting that the overexpression of C21orf33 occurring in DS may result in the downregulation of complex I proteins. The remaining proteins that localize in one consensus location were described in the cytoplasm (45), in the nucleus (84), or cytoskeleton-associated structures (32). Twenty-six were found at centrosome, centriole, microtubule organizing center, microtubules, midbody, spindle, spindle pole or cilia, which are microtubule-based organelles that emerge from the mother centrioles found at the cell centrosomes upon cell cycle arrest35. One of them is CCDC146, which has been shown to be specifically located at the mother centriole in HeLa 13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and hTERT-RPE1 cells36 and may play a role in cilia biology. Out of the 84 proteins that localize in the nucleus, 14 contain domains present in proteins with nuclear activities such as transcription (IPR000313, IPR005062 and IPR009847), chromatin remodeling (IPR026947 and IPR029708), nucleosome assembly (IPR014840 and IPR026943), chromosome organization (IPR020839), or in proteins that interact with nucleic acids (IPR001374, IPR001781, IPR004910, IPR011124, IPR012340, IPR029269, IPR034068 and IPR036867) (Table S3, column M). For example, PWWP2B contains a PWWP domain (IPR000313) that recognizes both DNA and histone methylated lysines and functions as a chromatin methylation reader. The structure of the PWWP domain of PWWP2B solved by crystallography (PDB:4LD6) confirmed the presence of the three conserved aromatic residues that form the cage for methyl-lysine histone binding37. The closest paralog of PWWP2B is PWWP2A, which specifically binds histone H2A.Z-containing nucleosomes via a region distinct from the PWWP domain (422-574)38. This region is not conserved in PWWP2B, suggesting that it has a different nucleosome specificity. Among the 49 proteins for which no consensus single localization could be determined, 24 are annotated to be located both at cytosol and nucleus. Nucleocytoplasmic shuttling is a well-known phenomenon described for hormone receptors, transcription factors, cell cycle regulators, translation initiation factors and RNA binding proteins, and nucleocytoplasmic carriers39. Two other proteins, ANKRD37 and NIPSNAP3A, are not only annotated both at cytosol and nucleus, but also at mitochondria. Distribution of proteins between nucleus and mitochondria is well documented for some nuclear transcription factors that translocate to the mitochondria upon stress and for mitochondrial biosynthetic enzymes, pro-apoptotic factors and transcription factors that translocate to the nucleus upon different cellular and environmental stimulus40. Two proteins, GPATCH11 and ZNF330, are annotated in the nucleus and at transient structures linked to chromosome segregation during cell division (kinetochore, centromere, midbody). GPATCH11 localizes at the kinetochore and at the nucleus and contains a G-patch domain suggested to have a function in RNA processing (IPR000467). It could play a role in the recently discovered crosstalk between the RNA processing machinery and kinetochore assembly41. Finally, 8 proteins are annotated both in centrosome-derived structures and in another location. Together with the 26 soluble proteins for which the consensus location mentioned above is centrosome, centriole, cilium, spindle or microtubules and the two integral membrane proteins that are located at the cilium, they form a set of 36 proteins that may play a role in cilia (Table S3, column U). While nearly every non-cycling cell has a unique immotile primary cilia acting as an environmental sensor, highly specialized forms of motile and immotile cilia play key roles in animal development and physiology42. Cilia dysfunction can cause a large spectrum of phenotypes including male infertility, nephropathies, cardiopathies, retinopathies, brain or skeletal defects and hearing impairment, depending on the type and location of affected cilia42. 14 ACS Paragon Plus Environment

Page 14 of 36

Page 15 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Knock-out mice have been established for 3 proteins potentially involved in cilia or spindle pole function (CCDC77, CCDC92 and ODF3L2) (Table S3, column L). Ccdc77 knock-out mice have skeletal defects that may be explained by a function of this protein in primary cilia. Odf3l2 and Ccdc92 knockout mice have abnormal auditory brainstem response that might be compatible with a dysfunction of sensory cilia in the cochlea. ODF3L2 is an uncharacterized paralog of ODF3, a component of the sperm outer dense fibers43, which are filamentous structures located on the outside of the axoneme in the mammalian sperm tail. The ODF3 family includes two other uncharacterized proteins: ODF3L1 and ODF3B, that might play a role in cilia biology. CCDC92 interacts with CEP164, a ciliary protein involved in an oculo-renal ciliopathy (NPHP15)44, and with CEP76, a centrosomal protein. Other potential ciliary proteins have interactors known to play a role in cilia biology. CCDC172, localized in sperm flagellum and midpiece, was shown to be a potential interactor of TEKT2, a structural component of motile cilia45. CFAP45, localized both at the nucleus and cilium, interacts with the ciliary protein ENKUR for which mutations were shown to cause situs inversus in human46 and subfertility in mice47.

8)

Tissue localization

Out of the 1,862 uncharacterized neXtProt entries, 1,815 were mapped to HPA using their ENSG identifiers. Out of those, 206 do not have information in HPA and 8 correspond to proteins encoded by more than one gene with different expression profiles and were thus not integrated in the analysis. The 47 neXtProt entries that do not map to any ENSG were mapped to HPA using their official gene names, which allowed to retrieve seven proteins with expression information. In total, RNA-Seq expression data from HPA was retrieved for 1,608 neXtProt entries (Table S3, column V-Y). The 615 proteins from the “expressed in all” and “mixed” categories probably act in processes that are common to a large number of cell types, such as cell growth, cell division, apoptosis, respiration or homeostasis. The 932 proteins belonging to the categories “tissue enriched”, “group enriched” and “tissue enhanced” have more specific expression profiles, which can give clues for their possible function. For example, 364 proteins are classified as ”tissue enriched”, “group enriched” or “tissue enhanced” in testis, epididymis, prostate and/or seminal vesicles, and may be important for male reproduction (Table S3, column Z). Despite their enrichment in testis, some of them (TEX37, CCDC73, C19orf18/2900092C05Rik, C19orf45/1700019B03Rik and ACTRT3) were shown to be dispensable for fertility

mice48,49

in

and

http://www.mousephenotype.org/data/experiments?geneAccession=MGI:1923902&mpTermId=MP: 0005389), suggesting either that their function is redundant with that of other genes, or that they perform other functions. One protein with a possible role in male reproduction is TEX44, that we recently showed by immunohistochemistry to be enriched in elongated spermatids50. TEX44 interacts 15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 36

with the carnitine O-palmitoyltransferase CPT1B51 that localizes at the outer mitochondrial membrane. CPT1B is expressed in muscle and testis and involved in the import of palmitoyl-CoA into the mitochondrial matrix. In rat, transient expression of Cpt1b was shown to be important for the maturation and/or function of sperm52. TEX44 and CPT1B were both identified in the sperm tail proteome in a study that shows that mitochondrial beta-oxidation inhibition results in reduced sperm motility53. Thus, we propose that TEX44 may act together with CPT1B in fatty acid metabolism during sperm maturation. Twenty-six other genes are enriched in fallopian tubes, placenta or ovary and may be important for female fertility and gestation (Table S3, column AA). Twenty-nine others are enriched in cerebral cortex and may be important for brain function (Table S3, column AB). This was confirmed for TMEM151B, for which the mouse mutant display abnormal behavioral response to light and abnormal sleep behavior26. Its paralog TMEM151A is also enriched in cerebral cortex but no mice mutant has been reported. Given that mice mutant for TMEM151B display a central nervous system phenotype in presence of TMEM151A, we suggest that TMEM151 genes have evolved to acquire non-overlapping nervous system functions. Eighteen genes “group enriched” or “tissue enhanced” in hematopoietic tissues might be involved in hematopoiesis or immune function (Table S3, column AC). NKG7/GMP-17, whose expression is enhanced in bone marrow and spleen, is one of them. Bone marrow is the primary site of generation of B and NK cells, and spleen is an important secondary lymphoid organ where B cells maturate and differentiate. NKG7 is a small integral membrane protein with four transmembrane domains which belongs to the PMP-22/EMP/MP20 family. This family has 19 members in human, including five other proteins with no function (CLDND2, TMEM114, TMEM178B, TMEM182, and TMEM235) and nine auxiliary subunits for voltagegated calcium channels. The CLDND2 and NKG7 genes are located in 19q13.41, within the extended leukocyte receptor complex, an important region for various immune functions54. The NKG7 protein localizes at cytoplasmic granules of cytolytic T-lymphocytes, NK cells, and neutrophils and has been shown to translocate to the plasma membrane upon induction of NK cell degranulation55. Knock-out mice

have

an

abnormal

electrocardiogram

(http://www.mousephenotype.org/data/genes/MGI:1931250#section-associations) and decreased NK cell-mediated cytotoxicity (https://researchonline.jcu.edu.au/50541/). Thirty-two genes that are “group enriched” or “tissue enhanced” in tissues from the digestive system may be involved in gut function (Table S3, column AD). This is the case of the glycoproteins NXPE1, NXPE2 and NXPE4. They share structural properties with neuroexophilins, which are brain glycoproteins cleaved into neuropeptides that seem to act in specific neuronal circuits56. Genome-wide association studies identified a variant associated with inflammatory bowel disease near the NXPE1 gene57. The expression of NXPE4 increased in the gut of mouse exposed to intestinal bacteria of a mouse donor58, and in the colon of a mouse model for multiple intestinal neoplasia that displayed increased gut 16 ACS Paragon Plus Environment

Page 17 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

microbiota compared to control mice59. Taken together, these observations suggest a role of NXPE proteins in the gut-brain-microbiota axis60. The uncharacterized NXPE3 is the most divergent of the family in terms of sequence. Since its expression is not enriched in the digestive system, a role in the gut should be considered with caution. Another protein with enhanced expression in digestive system is TMEM82, which has eight transmembrane domains, is upregulated by fatty acids in heart61 and co-expresses with proteins involved in lipids, hormones, xenobiotic and alcohol metabolic pathways

according

to

the

Genevestigator

gene

expression

software

(https://genevestigator.com/gv/) (data not shown), suggesting a function in such pathways. Eightysix proteins are “group enriched” or “tissue enhanced” both in testis and fallopian tubes (Table S3, column AE), a property shared by many proteins involved in cilia function62. Only ten were experimentally shown to be associated with cilia, flagella or related structures such as microtubule, centrosome or centriole (Table S3, column U). We combined the tissue expression information with other available data, including the phenotypes of available mouse mutants, and found five ciliary candidates among the ones which were not already annotated as localized in cilia-related structures: C4orf22, C9orf116, CASC1, CCDC33 and LRRC23. LRRC23 is expressed during ciliogenesis63. Its expression is downregulated in bronchial tissues of primary ciliary dyskinesia patients, a behaviour shared by genes functionally related to cilia64. Although it was reported as nucleolar by HPA, LRRC23 was recently shown to be located in cilia in human kidney proximal tubule epithelial cells (HK-2)65. In contrast to Lrrc23 knock-out mice that are infertile (International Mouse Strain Resource), Zebrafish mutants for lrrc23 are fertile. However, they have ear development defects, and cilia of the otic vesicle display motility defects66. Taken together, these observations suggest that human LRRC23 plays a role in cilia motility. CASC1 is also expressed during ciliogenesis in human63 and downregulated in bronchial tissues of primary ciliary dyskinesia patients67. Casc1 mutant mice were not infertile but have increased incidence of lung tumors68, and it was shown that CASC1 siRNA in human cell lines induced mitotic spindle and microtubule polymerization defects69. Collectively, these observations suggest that CASC1 may be involved in ciliogenesis by regulating microtubule polymerization. C4orf22 mutant mice have no male reproduction phenotype48 but abnormal retinal pigmentation and decreased threshold for auditory brainstem response, which may reflect an impaired function of sensory ciliated cells in the retina and the cochlea. C9orf116 knock-out mice have cardiac malformations and situs inversus11, which are frequently observed in primary ciliopathies42. Ccdc33 knock-out mice have skeleton defects, that are frequently observed in ciliopathies as well70. Ccdc33 interacts with Hook271, a protein involved in the morphogenesis of the primary cilium72. Therefore, we speculate that C4orf22, C9orf116 and CCDC33 may play a role in primary cilia.

17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The remaining proteins are enriched in other tissues where they might have specific roles. One caveat of this reasoning would be for the genes expressed at a very low level in a single tissue. In these cases, the main function of the protein may not be in that particular tissue but in a tissue or a condition that was not tested in the considered panel. For 61 genes, transcript expression was below detection limits in all the 37 analyzed tissues, suggesting that their expression may occur mainly in tissues, conditions or developmental stages that are not part of the panel used for RNA sequencing, or that they are not coding for proteins. Nonetheless, fifteen of these genes have been validated at protein level (PE1). Some of them have been found by mass spectrometry in rarely studied human samples, such as LACTBL1, CCDC166 and TCHHL1 in retina 73, sperm74, and hair follicles75, respectively. Similarly, ZAR1L has been validated by single cell proteomics in human oocytes76 and not found in any other human tissue, suggesting a specific function in oocyte biology or early embryonic development and confirming previous studies performed on mouse models77. Thirty-two other proteins have been validated at transcriptomics level (PE2) on the basis of the sequencing of the full or partial mRNA. Careful literature mining allowed us to find experimental evidence for the expression of SCGB1C1 (presently PE3) in nasal mucosa and its regulation by inflammatory cytokines78. SCGB1C1 has a paralog in human, SCGB1C2, differing by a single amino acid, also annotated as PE3. The rat ortholog of SCGB1C1/2 (called Ryd5) had been found to be specifically expressed in Bowman’s glands and proposed to bind to small, hydrophobic odorant molecules79. However, its presence in the bottlenose dolphin, which is devoid of olfaction80, suggests that it might perform another function. The 14 remaining genes classified as PE3-5 are still awaiting experimental proof for their coding potential by transcriptomics or proteomics analysis.

Refining functional hypotheses using orthology information Taken together, our manual data mining study based on a combination of data from different sources allowed us to propose functional hypotheses for 26 uncharacterized proteins (Table 1). Thirteen proteins may be involved in cilia biology (C4orf22, C9orf116, CASC1, CCDC33, CCDC77, CCDC92, CCDC146, CCDC172, CFAP45, LRRC23, ODF3B, ODF3L1 and ODF3L2), one in spermatogenesis (TEX44), two in metabolism (PLAC9 and TMEM82), two in central nervous system (TMEM151A and TMEM151B), three in the gut-brain-microbiota axis (NXPE1, NXPE2 and NXPE4), one in immune functions (NKG7), one in mitochondrial function (C21orf33), one in stress granule formation (MCRIP2), one in chromatin biology (PWWP2B) and one in kinetochore and RNA-related processes (GPATCH11). The phylogenetic profiles of these proteins help to consolidate and refine some of these hypotheses, as detailed below.

18 ACS Paragon Plus Environment

Page 18 of 36

Page 19 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The 13 genes with a putative role in cilia biology are all broadly conserved in vertebrates but absent in C. elegans, S. cerevisiae and Arabidopsis, which lack motile cilia. Six of them (C4orf22, CASC1, CCDC77, CCDC146, CFAP45 and LRRC23) are conserved in choanoflagellate, ciliate and trypanosomatid protozoans. LRRC23, CASC1 and C4orf22 orthologs were identified in the cilium and flagella proteomes of Paramecium81 and Trypanosoma brucei82. The CASC1 ortholog was also identified in the cilium of Tetrahymena thermophila83. The Chlamydomonas orthologs of C4orf22 and CCDC77, respectively called FBB9 and POC11, were identified in the flagella84 and in the basal body85. The Chlamydomonas ortholog of CCDC146, MBO2, was shown to be required for flagellar waveform conversion86. The Vertebrate ODF3 family has homologs in vertebrates, D. melanogaster, and in some choanoflagellates and ciliate protozoans. ODF3/ODF3B orthologs were found to be expressed in Xenopus laevis ciliated epidermis cells87 and in Zebrafish pronephros multiciliated cells88 and spermatids89. The fly ortholog of ODF3/ODF3B (Dmel\CG8086) is expressed in the Johnston’s organ, composed of neurons that use sensory cilia to detect sounds, and flies with mutant CG8086 have defects in sound perception90. PLAC9 and TMEM82, proposed to act in metabolism, are conserved in a wide range of vertebrates, regardless of their environment (marine or terrestrial), diet, or body temperature (ectotherms or endotherms), suggesting that they act in basic metabolic processes. TMEM151A and TMEM151B are conserved in vertebrates, and homologs are also found in other chordates, in some arachnides and in C.elegans. The expression profile reported for their homolog in C.elegans in head and tail neurons, ventral nerve cords and nerve rings91 is consistent with their proposed function in the central nervous system. The NXPE family proteins proposed to act in the gut-brain-microbiota axis have homologs in chordates and in some gastropods. The orthology relationship is complex, as some organisms have two NXPE genes and others such as Branchiostoma dozens. This was quite expected since the phylogenetic distribution of neuropeptides across the different phyla is known to be particularly complex92. For example, the ortholog of NXPE1 in mouse is a pseudogene, while four NXPE protein coding genes exist in the mouse genome. The Xenopus laevis NXPE2 ortholog was shown to be strongly up-regulated in the adult intestine compared to the larval one, which might be associated to dietary change93. NKG7 proposed to play a role in immunity has orthologs in mammals and reptiles (turtles and lizards), but not in fish or amphibians. Most features of the adaptative immune system are conserved in the different vertebrate classes, but only endotherms have germinal centers where B cell affinity maturation occurs94, and reptiles have “lymphoid aggregates” that might correspond to precursors of mammalian germinal centers95. NKG7 has been lost in bird lineages, as many other genes on human chromosome 1996. Birds have lost many features of adaptative immunity, and their B-cells are not generated in bone marrow but in the bursa of Fabricius94. We speculate that NKG7 might play a role in mammalian B cell maturation and its 19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

regulation by NK cells. C21orf33, which may share some functional properties with PARK7 in the protection against oxidative stress and/or mitochondrial maintenance, is conserved in a wide range of Metazoa, including porifera and placozoa, and in protists (Choanoflagellata, Stramenopiles, Alveolata, Rhizaria). In contrast to PARK7, it has no homolog in fungi or plants, and it seems to have been lost in endopterygotes (including Drosophila) and in nematodes. However, both C21orf33 and PARK7 homologs are found in proteobacteria. The three Escherichia coli genes (hchA, yajL, yhbO) that are homologous to PARK7, as well as elbB that is homolog to C21orf33 were recently characterized as glyoxalases97, confirming our hypothesis that C21orf33 and PARK7 have similar enzymatic properties. Non-mammalian vertebrates such as Zebrafish or Xenopus have several C21orf33 paralogs that may perform tissue-specific functions. For example, ES1, one of the Zebrafish paralogs, is specifically expressed in the photoreceptor cells of the retina where it acts as a mitochondrial enlarging factor98. PWWP2B is present as a paralog of PWWP2A in vertebrates. PWWP2A was shown to be essential for neural crest cell differentiation and migration in Xenopus laevis and Xenopus tropicalis38, indicating that PWWP2B does not perform the exact same functions. This is consistent with the lack of similarity of both proteins in the region that specifies H2A.Znucleosome binding for PWWP2A. GPATCH11 was found to be the ortholog of S. cerevisiae CMG1/YLR271W. CMG1 interacts with the helicase PRP43 and stimulates both its RNA binding and ATPase activities, but the biological role of this interaction is still unclear99. It would be interesting to test whether GPATCH11 interacts with DHX15, the human ortholog of PRP43, and modulate its function(s).

20 ACS Paragon Plus Environment

Page 20 of 36

Page 21 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Proteome Research

Gene /AC

Chr

Hypothesis

Subcell. Loc.

Expression

KO mice

Additional data

Putative orthologs

HPA Ab

C4orf22 NX_Q6V702

4q21.21

Primary cilia

-

Group enriched fallopian tube testis

no male reproduction phenotype but abnormal retinal pigmentation, decreased threshold for auditory brainstem response

Orthologs found in cilium/flagella in Paramecium81, T. brucei82 and Chlamydomonas84

HPA043383

C9orf116 NX_Q5BN46

9q34.3

Primary cilia

-

cardiac malformations and situs inversus11

-

CCDC33 NX_Q8N5R6

15q24.1

Primary cilia

-

decreased caudal vertebrae number, short tibia

Interacts with HOOK271

CCDC77 NX_Q9BR77

12p13.33

Primary cilia

Centrosome

Group enriched epididymis fallopian tube testis Group enriched fallopian tube testis Mixed

Chlamydomonas ortholog found in basal body85

CASC1 NX_Q6TDU7

12p12.1

Cilia biology

-

Tissue enhanced fallopian tube testis

decreased lumbar vertebrae number, increased circulating LDL cholesterol level, increased circulating thyroxine level, increased sacral vertebrae number, sparse hair no male reproduction phenotype but increased incidence of tumors by chemical induction, increased lung adenoma incidence68

Q810M1 (M) A2AVJ0 (Z) XP_002934045 (XT) Q5PQ44 (XL) Q9VFY6 (F) A0A2K3D8T3 (C) Q382P6 (Tb) A0BR59 (P) Q23AB2 (Tt) Q5BN45 (M) B0UXH9 (Z) A0A1B8XXY0 (XT) A0A1L8F128 (XL) Q0KI89 (F) Q3ULW6 (M) A0A0R4IUA3 (Z) F6W813 (XT) A0A1L8GZY7 (XL) Q9CZH8 (M) E7FBZ5 (Z) F6TXZ7 (XT) Q6DFC2 (XL) A0CNT2 (P) I7MMU5 (Tt) XP_001693122.1 (C)

CCDC92 NX_Q53HC0

12q24.31

Cilia biology

Centriole centrosome nucleoplasm

Mixed

CCDC146 NX_Q8IYE0

7q11.23

Cilia biology

Mother centriole

Tissue enhanced fallopian tube testis

CCDC172

10q25.3

Flagellum biology

Cilium

Tissue

decreased circulating alanine transaminase level, decreased circulating cholesterol level, decreased grip strength, increased or absent threshold for auditory brainstem response IMPC: mice produced, not phenotype described

IMPC: ES produced

21

ACS Paragon Plus Environment

HPA021439 HPA065287

-

HPA038854

Expressed during ciliogenesis63. Downregulated in primary ciliary dyskinesia bronchial tissues67. CASC1 siRNA induce mitotic spindle and microtubule polymerization defects69. Orthologs found in cilium/flagella in Paramecium81, Tetrahymena83, T. brucei82 Interacts with CEP164 and CEP7651

Q6TDU8 (M) A6H8T2 (Z) B3DLY0 (XT) A0A1L8GW00 (XL) Q9VWZ4 (F) C7A2A8 (C) C9ZLR1 (Tb) A0D9M0 (P) I7LV93 (Tt)

HPA039662

Q8VDN4 (M) Q32PM2 (Z) Q0V9Z6 (XT) A0A1L8HQK7 (XL)

HPA038560 HPA057580

Chlamydomonas ortholog (MBO2) required for flagellar waveform conversion86

E9Q9F7 (M) Q5TYU2 (Z) F6PYL8 (XT) A0A1L8GZ96 (XL) Q8S4W6 (C) C9ZMK0 (Tb) A0DQB8 (P) I7MA06 (Tt) Q810N9 (M)

HPA020082 HPA020105

Interacts with TEKT245

-

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

NX_P0C7W6

cytoplasm sperm midpiece

enriched Testis

Page 22 of 36

(structural component of motile cilia)

CFAP45 NX_Q9UL16

1q23.2

Cilia biology

cilium nucleus

Group enriched fallopian tube testis

IMPC: mice produced, not phenotype described

Interacts with ENKUR

LRRC23 NX_Q53EV4

12p13.31

Cilia motility

nucleolus cilium

Group enriched fallopian tube testis

male infertility

ODF3B NX_A8MYP8

22q13.33

Cilia biology

-

Tissue enhanced fallopian tube

-

ODF3L1 NX_Q8IXM7

15q24.2

Cilia biology

-

Tissue enriched testis

-

ODF3L2 NX_Q3SX64

19p13.3

Cilia biology

Microtubule

Group enriched skeletal muscle testis

absent vibrissae, decreased startle reflex, increased or absent threshold for auditory brainstem response

TEX44 NX_Q53QW1

2q37.1

Spermatogenesis

cytoplasm

Tissue enriched

IMPC: ES produced

Expressed during ciliogenesis63. Downregulated in primary ciliary dyskinesia bronchial tissues67. Orthologs found in cilium/flagella in Paramecium81 and T. brucei82. Zebrafish mutants have otolith formation defects during early ear development and cilia of the otic vesicle display motility defects66 Human paralog ODF3 in sperm outer dense fibers43. Zebrafish odf3/odf3b ortholog expressed in pronephros multiciliated cells88 and spermatids89. X. laevis odf3/odf3b ortholog expressed in ciliated epidermis cells87. Fly odf3/odf3b ortholog expressed in Johnston,s organ and mutants with defects in sound perception90. Homologs found in the genome of some ciliate protozoans and choanoflagellates. Human paralog ODF3 in sperm outer dense fibers43. Homologs found in the genome of some ciliate protozoans and choanoflagellates. Human paralog ODF3 in sperm outer dense fibers43. Homologs found in the genome of some ciliate protozoans and choanoflagellates. Interacts with carnitine Opalmitoyltransferase

22

ACS Paragon Plus Environment

B8JME7 (Z) F6QA03 (XT) Q3KPU6 (XL) Q9D9U9 (M) F1QH96 (Z) F6ZDR3 (XT) A0A1L8FCR1 (XL) Q9VNU3 (F) A8I9E8 (C) Q57UZ3 (Tb) A0D788 (P) W7XCX2 (Tt) O35125 (M) Q5XJM1 (Z) Q28CU0 (XT) A0A1L8FDX0 (XL) A0BZX1 (P) W7X6G9 (Tt) XP_822841.1 (Tb)

HPA042204 HPA043618

HPA037766 HPA057533

Q5M8M2 (M) A3KQA5 (Z) Q5EB30 (Xt) Q8AVY1 (Xl) Q9VLQ4 (F)

HPA062837

Q810P2 (M)

-

Q3TZ65 (M) B3DI43 (Z) Q568P2 (Z) F7BVB4 (XT) A0A1L8HWY3 (XL) Q9VCJ6 (F)

HPA067973

Q9DA60 (M)

HPA049917 HPA056433

Page 23 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Proteome Research

Testis PLAC9 NX_Q5JTB6

10q22.3

Metabolism

secreted

Tissue enhanced adipose tissue

decreased fasted circulating glucose level, decreased mean corpuscular volume, increased total body fat amount (Plac9a). -

TMEM82 NX_A0PJX8

1p36.21

Metabolism

integral component of membrane

TMEM151A NX_Q8N4L1

11q13.2

Central nervous system

integral component of membrane

Tissue enhanced duodenum liver small intestine Tissue enriched cerebral cortex

TMEM151B NX_Q8IW70

6p21.1

Central nervous system

integral component of membrane

Tissue enriched cerebral cortex

NXPE1 NX_Q8N323

11q23.2

Gut-brainmicrobiota axis

secreted

Group enriched colon rectum

NXPE2 NX_Q96DL1

11q23.3

Gut-brainmicrobiota axis

integral component of membrane

Tissue enhanced colon epididymis rectum

IMPC: ES produced

NXPE4 NX_Q6UWF7

11q23.2

Gut-brainmicrobiota axis

secreted

Group enriched colon rectum salivary gland

IMPC: ES produced

NKG7 NX_Q16617

19q13.41

Immune function

cell membrane cytoplasmic granule membrane

Tissue enhanced bone marrow spleen

Abnormal electrocardiogram Decreased NK cellmediated cytotoxicity

C21orf33 NX_P30042

21q22.3

Mitochondria redoxsensor, transcription

mitochondria

Expressed in all

IMPC: ES produced

-

abnormal behavior, abnormal behavioral response to light, abnormal sleep behavior, hyperactivity, increased circulating sodium level None (Pseudogene in mice)

23

ACS Paragon Plus Environment

CPT1B51. Enriched in elongated spermatids50 Up-regulated in obesity25.

Q8K262 (M) XP_018094935.1 (XL) XP_017951140.1 (XT)

HPA043469

Upregulated by fatty acids in heart61. Co-expresses with proteins involved in lipids, hormones, xenobiotic and alcohol metabolic pathways. C.elegans homolog of TMEM151A/B expressed in head and tail neurons, ventral nerve cords and nerve rings91. C.elegans homolog of TMEM151A/B expressed in head and tail neurons, ventral nerve cords and nerve rings91.

Q8R115 (M) X1WDB7 (Z) B0BMG8 (XT) Q5XG04 (XL)

HPA060282

Q6GQT5 (M) A0A2R8Q6K1 (Z) E7F9L1 (Z)

HPA041035

Q68FE7 (M) F1RCC4 (Z) E7FFG7 (Z) F6VF59 (XT) A0A1L8G703 (XL)

HPA055167

Shared structural properties with neuroexophilins Variant associated with inflammatory bowel disease near the NXPE1 gene57. Shared structural properties with neuroexophilins X. laevis ortholog strongly up-regulated in adult compared to larval intestine93 Shared structural properties with neuroexophilins. Mouse gut and colon expression increased in upon bacteria exposure58,59. Cytoplasmic granule localization in cytolytic Tlymphocytes, NK cells, and neutrophils. Translocates to plasma membrane upon NK cell degranulation55. Gene is part of the extended leukocyte receptor complex54. Class l glutamine amidotransferase-like

pseudogen (M) F7C4B6 (XT) A0A1L8FLA7 (XL)

HPA049133

Q3U095 (M) F7BIV1 (XT) Q5U498 (XL)

HPA039744 HPA039876

Q52KP5 (M)

HPA042801

Q99PA5 (M)

HPA071454

Q9D172 (M) F1QCN0 (Z)

HPA018517

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

regulation, chaperone or glyoxylase/deglycase activity. Mitochondria maintenance or biogenesis. Possible involvement in Down syndrome.

Page 24 of 36

domain, structural similarity to PARK7. Bacterial homolog elbB characterized as glyoxalase. Candidate negative regulator of PINK1/Parkin translocation to damaged mitochondria by siRNA screen31. Fish paralog ES1 involved in retinal giant mitochondria biogenesis98. Higher mt-mRNA levels of complex I proteins upon C21orf33 downregulation34. Twofold elevated in Down syndrome32. Core stress granule component. Interacts with DDX620,21.

MCRIP2 NX_Q9BUT9

16p13.3

Stress granule formation

nucleus cytoplasmic stress granule

Mixed

IMPC: ES produced

PWWP2B NX_Q6NUJ5

10q26.3

Chromatin biology (methylated histone binding)

nucleus

Mixed

IMPC: ES produced

PWWP domain.

GPATCH11 NX_Q8N954

2p22.2

Crosstalk between the RNA processing machinery and kinetochore assembly, possibly in interaction with DHX15

kinetochore nucleus

Mixed

IMPC: ES produced

G-patch domain (RNA processing). Ortholog of yeast Cmg1/YLR271W, cofactor of the helicase Prp4399

A9JRZ9 (XT) Q0P3R0 (XL) P0ABU5 (E.coli)

Q9CQB2 (M) Q1MT54 (Z) F6TW17 (XT) Q801Q0 (XL) Q9VUP0 (F) E9Q9M8 (M) A0A0R4IGL9 (Z) F6ZNI1 (XT) A0A1L8FEB1 (XL) Q3UFS4 (M) Q6DGZ0 (Z) Q6DF57 (XT) A0A1L8G7G4 (XL) Q9VI83 (F) Q22178 (W) Q06152 (Y) Q9M2Q7 (A)

HPA060363

HPA038056 HPA038502 HPA038231 HPA038232

Table 1: Functional hypotheses on 26 uncharacterized proteins. These hypotheses were derived from the analysis of tissular and subcellular localization, interactions, phenotype of orthologs, presence of domains and homology with characterized proteins. This information and additional data are described in the following columns: A: Gene name, B: Chromosome location, C: Functional hypothesis, D: Subcellular location, E: Expression according to HPA, F: Knockout mice phenotype , G: Additional supporting data, H: Homologs in mouse (M), zebrafish (Z), Xenopus laevis (XL), Xenopus tropicalis (XT), Drosophila melanogaster (F), Caenorhabditis elegans (W), S. cerevisiae (Y), Arabidopsis thaliana (A), Chlamydomonas reinhardtii (C), Trypanosoma brucei (Tb), Paramecium tetraurelia (P), Tetrahymena thermophila (Tt), I: HPA Antibody.

24

ACS Paragon Plus Environment

Page 25 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Conclusion

Despite the constant effort of UniProtKB and neXtProt curators to keep up to date with the literature, annotation gaps and delays are unavoidable. The present extensive manual review of the uncharacterized human proteome will allow to fill gaps in functional annotation for 113 entries. The more represented biological processes are neurobiology (17 proteins, including Cxorf56 that is associated with intellectual disability and ZSWIM6 that is associated with acromelic frontonasal dysostosis with neurocognitive and motor delay), development (eleven proteins), cilium biology (nine proteins) and hematopoietic, immune and inflammatory processes (eight proteins) (Table S2). UniProtKB/Swiss-Prot curators already updated the function of 44 entries in UniProtKB release 2018_08. This survey will also lead to reconsider the PE status of 26 proteins. Sixteen entries presently considered as dubious (PE5) might be reconsidered as valid protein-coding genes, whereas 10 entries presently considered as valid should be deprecated. Three PE1-4 entries (A6NCW3, P0DMU4 and A8MWS1) have already been deprecated to PE5 or deleted in UniProtKB release 2018_08, and one PE5 entry (Q3ZCU0) has been upgraded to PE2. These changes will appear in next neXtProt release. More importantly, this study allowed to propose a consolidated list of 1862 uncharacterized proteins, including 1187 uPE1, 659 uMP and 16 uPE5. Functional hypotheses could be built on 26 uPE1 based on a combination of data. These hypotheses will need to be experimentally tested in human and/or in model organisms with appropriate tools. Importantly, HPA antibodies are available for all of them except CCDC33, CCDC172 and ODF3L1 (Table 1). For PLAC9, TMEM151B and NKG7, large-scale studies on knock-out mice already suggest involvement in lipid metabolism, brain and immune functions, respectively. More focused studies using targeted genome editing technologies on mouse or human adipocytes, neuronal and immune cells should be performed to understand their mechanism of action. Knock-out mice are also available for 10 of the 13 genes proposed to have a role in cilia biology. For two of them the phenotype is not reported, but for the remaining except for Casc1, their phenotypes are indicative of cilia defects. More focused experiments on these mice could help explore cilia-related functions in more detail. Proteins expected be involved in primary cilia function can be tested in a number of human ciliated cell lines, including hTERT-RPE1 cells100. Motile cilia function is more difficult to validate in human cell lines. In particular, there is no available human cell to study sperm-specific proteins potentially involved in flagellum biology, such as CCDC172. Mice knock-out for TEX44, TMEM151A and TMEM82 are not yet available. Generating such models may be useful to confirm a potential involvement of these proteins in spermatogenesis, central nervous system and metabolic processes, respectively. The characterization of the NXPE family will be more difficult using knock-out approaches due to 25 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

probable overlapping functions of the different members. One alternative would be to produce mice knock-out for several NXPE proteins at once. Since they are secreted proteins, another strategy would be to produce recombinant NXPE proteins and inject them in animals. These two approaches have been successfully used to characterize the closely related NXPH family101,102. Involvement of MCRIP2 in stress granule formation could be validated by genome editing technologies on human cell lines. Functional studies of PWWP2B could be done in human cells and Xenopus embryos using the PWWP2A characterization study as a reference38. The possible role of GPATCH11 and its predicted partner helicase DHX15 in kinetochore assembly and mRNA processing could be studied in human cells, or in Xenopus egg extracts41. The role of C21orf33 in mitochondria redox-sensitive pathways and the hypothesis that its overexpression may participate to the general oxidative imbalance observed in DS cells would need to be tested in mouse DS models and in human cells overexpressing C21orf33 or C21orf33 mutants predicted to be catalytically inactive. In the last five years, we estimate that there were between 8 and 10 papers describing newly characterized human proteins published each month. At this pace, the number of uncharacterized proteins in five years will only decrease by 25%. The scientific community should encourage collaborative projects aiming at functionalizing uncharacterized proteins. The HPP project has just committed to move in this direction103. In its first phase, which was focused on protein validation, the HPP project federated mostly proteomics experts. To succeed in its second phase, it will need to recruit specialists in various other fields of human and model organism biology. Biocuration and integration of high quality data in databases are key aspects of this process. neXtProt will continue to collaborate with data providers and other bioinformatics resources to transform data into knowledge and provide tools to explore it. neXtProt data and query tool are open source and the scientific community is highly encouraged to use them and provide feedback and suggestions for improvement. The neXtProt team will be glad to collaborate with other resources and data providers to enhance interoperability, improve the quality of functional predictions, and speed up the functional characterization of the human proteome.

Acknowledgements This work was supported by SIB Swiss Institute of Bioinformatics and University of Geneva.

The neXtProt server is hosted by Vital-IT, the SIB Swiss Institute of Bioinformatics, Competence Centre in Bioinformatics and Computational Biology. The authors thank all the neXtProt team and their collaborators for their commitment in their projects, Lionel Breuza and Sylvain Poux from the

26 ACS Paragon Plus Environment

Page 26 of 36

Page 27 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

UniProtKB/Swiss-Prot team for their valuable feedback and Nicolas Roggli for providing various scripts and help with parsing data. Supporting information The following supporting information is available free of charge at ACS website http://pubs.acs.org

Supporting File S1 (PDF) Additional Materials and Method section: SPARQL queries used to integrate neXtProt information into Table S3. Figure S1: C21orf33 (NX_P30042) and PARK7 (NX_Q99497) alignment performed with HHPRED

Supporting File S2 (XLS) Table S1: List of 2,323 entries resulting from NXQ_00022 query (“Proteins with no function annotated”) on neXtProt release 2018-01-17. Table S2: List of 113 proteins for which characterization papers were found. Table S3: Consolidated list of 1,862 uncharacterized proteins obtained after removing the 333 entries with uncertain existence and the 113 entries corresponding to proteins recently characterized from Table S1.

References (1)

UniProt Consortium. The Universal Protein Knowledgebase. Nucleic Acids Res. 2018, 46 (5), 2699.

(2)

Uhlén, M.; Fagerberg, L.; Hallström, B. M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, Å.; Kampf, C.; Sjöstedt, E.; Asplund, A.; et al. Proteomics. Tissue-Based Map of the Human Proteome. Science 2015, 347 (6220), 1260419.

(3)

Gaudet, P.; Michel, P.-A.; Zahn-Zabal, M.; Britan, A.; Cusin, I.; Domagalski, M.; Duek, P. D.; Gateau, A.; Gleizes, A.; Hinard, V.; et al. The NeXtProt Knowledgebase on Human Proteins: 2017 Update. Nucleic Acids Res. 2017, 45 (D1), D177–D182.

(4)

Legrain, P.; Aebersold, R.; Archakov, A.; Bairoch, A.; Bala, K.; Beretta, L.; Bergeron, J.; Borchers, C. H.; Corthals, G. L.; Costello, C. E.; et al. The Human Proteome Project: Current State and Future Direction. Mol. Cell. Proteomics 2011, 10 (7), M111.009993.

(5)

GTEx Consortium, J.; Thomas, J.; Salvatore, M.; Phillips, R.; Lo, E.; Shad, S.; Hasz, R.; Walters, G.; Garcia, F.; Young, N.; et al. The Genotype-Tissue Expression (GTEx) Project. Nat. Genet. 2013, 45 (6), 580–585.

(6)

Lizio, M.; Harshbarger, J.; Shimoji, H.; Severin, J.; Kasukawa, T.; Sahin, S.; Abugessaisa, I.;

27 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fukuda, S.; Hori, F.; Ishikawa-Kato, S.; et al. Gateways to the FANTOM5 Promoter Level Mammalian Expression Atlas. Genome Biol. 2015, 16 (1), 22. (7)

Famiglietti, M. L.; Estreicher, A.; Gos, A.; Bolleman, J.; Géhant, S.; Breuza, L.; Bridge, A.; Poux, S.; Redaschi, N.; Bougueleret, L.; et al. Genetic Variations and Diseases in UniProtKB/SwissProt: The Ins and Outs of Expert Manual Curation. Hum. Mutat. 2014, 35 (8), 927–935.

(8)

Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M. KEGG as a Reference Resource for Gene and Protein Annotation. Nucleic Acids Res. 2016, 44 (D1), D457-62.

(9)

Fabregat, A.; Sidiropoulos, K.; Garapati, P.; Gillespie, M.; Hausmann, K.; Haw, R.; Jassal, B.; Jupe, S.; Korninger, F.; McKay, S.; et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2016, 44 (D1), D481-7.

(10)

Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; et al. Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25 (1), 25–29.

(11)

Sung, Y. H.; Baek, I.-J.; Kim, Y. H.; Gho, Y. S.; Oh, S. P.; Lee, Y. J.; Lee, H.-W. PIERCE1 Is Critical for Specification of Left-Right Asymmetry in Mice. Sci. Rep. 2016, 6 (1), 27932.

(12)

Komiya, Y.; Mandrekar, N.; Sato, A.; Dawid, I. B.; Habas, R. Custos Controls β-Catenin to Regulate Head Development during Vertebrate Embryogenesis. Proc. Natl. Acad. Sci. U. S. A. 2014, 111 (36), 13099–13104.

(13)

Farrell, C. M.; O’Leary, N. A.; Harte, R. A.; Loveland, J. E.; Wilming, L. G.; Wallin, C.; Diekhans, M.; Barrell, D.; Searle, S. M. J.; Aken, B.; et al. Current Status and New Features of the Consensus Coding Sequence Database. Nucleic Acids Res. 2014, 42 (Database issue), D865-72.

(14)

Deutsch, E. W. The PeptideAtlas Project. Methods Mol. Biol. 2010, 604, 285–296.

(15)

Omenn, G. S.; Lane, L.; Lundberg, E. K.; Overall, C. M.; Deutsch, E. W. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. J. Proteome Res. 2017, 16 (12), 4281–4287.

(16)

Kusebauch, U.; Campbell, D. S.; Deutsch, E. W.; Chu, C. S.; Spicer, D. A.; Brusniak, M.-Y.; Slagel, J.; Sun, Z.; Stevens, J.; Grimes, B.; et al. Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome. Cell 2016, 166 (3), 766–778.

(17)

Rath, A.; Olry, A.; Dhombres, F.; Brandt, M. M.; Urbero, B.; Ayme, S. Representation of Rare Diseases in Health Information Systems: The Orphanet Approach to Serve a Wide Range of End Users. Hum. Mutat. 2012, 33 (5), 803–808.

(18)

Eppig, J. T.; Smith, C. L.; Blake, J. A.; Ringwald, M.; Kadin, J. A.; Richardson, J. E.; Bult, C. J. Mouse Genome Informatics (MGI): Resources for Mining Mouse Genetic, Genomic, and Biological Data in Support of Primary and Translational Research. Methods Mol. Biol. 2017, 1488, 47–73. 28 ACS Paragon Plus Environment

Page 28 of 36

Page 29 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(19)

Finn, R. D.; Attwood, T. K.; Babbitt, P. C.; Bateman, A.; Bork, P.; Bridge, A. J.; Chang, H.-Y.; Dosztányi, Z.; El-Gebali, S.; Fraser, M.; et al. InterPro in 2017-beyond Protein Family and Domain Annotations. Nucleic Acids Res. 2017, 45 (D1), D190–D199.

(20)

Youn, J.-Y.; Dunham, W. H.; Hong, S. J.; Knight, J. D. R.; Bashkurov, M.; Chen, G. I.; Bagci, H.; Rathod, B.; MacLeod, G.; Eng, S. W. M.; et al. High-Density Proximity Mapping Reveals the Subcellular Organization of MRNA-Associated Granules and Bodies. Mol. Cell 2018, 69 (3), 517–532.e11.

(21)

Bish, R.; Cuevas-Polo, N.; Cheng, Z.; Hambardzumyan, D.; Munschauer, M.; Landthaler, M.; Vogel, C. Comprehensive Protein Interactome Analysis of a Key RNA Helicase: Detection of Novel Stress Granule Proteins. Biomolecules 2015, 5 (3), 1441–1466.

(22)

Tomar, D.; Dong, Z.; Shanmughapriya, S.; Koch, D. A.; Thomas, T.; Hoffman, N. E.; Timbalia, S. A.; Goldman, S. J.; Breves, S. L.; Corbally, D. P.; et al. MCUR1 Is a Scaffold Factor for the MCU Complex Function and Promotes Mitochondrial Bioenergetics. Cell Rep. 2016, 15 (8), 1673– 1685.

(23)

Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235–242.

(24)

Sjöblom, L.; Saramäki, O.; Annala, M.; Leinonen, K.; Nättinen, J.; Tolonen, T.; Wahlfors, T.; Nykter, M.; Bova, G. S.; Schleutker, J.; et al. Microseminoprotein-Beta Expression in Different Stages of Prostate Cancer. PLoS One 2016, 11 (3), e0150241.

(25)

Font-Clos, F.; Zapperi, S.; La Porta, C. A. M. Integrative Analysis of Pathway Deregulation in Obesity. NPJ Syst. Biol. Appl. 2017, 3 (1), 18.

(26)

Dickinson, M. E.; Flenniken, A. M.; Ji, X.; Teboul, L.; Wong, M. D.; White, J. K.; Meehan, T. F.; Weninger, W. J.; Westerberg, H.; Adissu, H.; et al. High-Throughput Discovery of Novel Developmental Phenotypes. Nature 2016, 537 (7621), 508–514.

(27)

Ponamarev, M. V; She, Y.-M.; Zhang, L.; Robinson, B. H. Proteomics of Bovine Mitochondrial RNA-Binding Proteins: HES1/KNP-I Is a New Mitochondrial Resident Protein. J. Proteome Res. 2005, 4 (1), 43–52.

(28)

Biosa, A.; Sandrelli, F.; Beltramini, M.; Greggio, E.; Bubacco, L.; Bisaglia, M. Recent Findings on the Physiological Function of DJ-1: Beyond Parkinson’s Disease. Neurobiol. Dis. 2017, 108, 65– 72.

(29)

Hauser, D. N.; Mamais, A.; Conti, M. M.; Primiani, C. T.; Kumaran, R.; Dillman, A. A.; Langston, R. G.; Beilina, A.; Garcia, J. H.; Diaz-Ruiz, A.; et al. Hexokinases Link DJ-1 to the PINK1/Parkin Pathway. Mol. Neurodegener. 2017, 12 (1), 70.

(30)

Requejo-Aguilar, R.; Lopez-Fabuel, I.; Jimenez-Blasco, D.; Fernandez, E.; Almeida, A.; Bolaños, J. P. DJ1 Represses Glycolysis and Cell Proliferation by Transcriptionally Up-Regulating Pink1. 29 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochem. J. 2015, 467 (2), 303–310. (31)

Hasson, S. A.; Kane, L. A.; Yamano, K.; Huang, C.-H.; Sliter, D. A.; Buehler, E.; Wang, C.; HemanAckah, S. M.; Hessa, T.; Guha, R.; et al. High-Content Genome-Wide RNAi Screens Identify Regulators of Parkin Upstream of Mitophagy. Nature 2013, 504 (7479), 291–295.

(32)

Shin, J.-H.; Weitzdoerfer, R.; Fountoulakis, M.; Lubec, G. Expression of Cystathionine BetaSynthase, Pyridoxal Kinase, and ES1 Protein Homolog (Mitochondrial Precursor) in Fetal Down Syndrome Brain. Neurochem. Int. 2004, 45 (1), 73–79.

(33)

Piccoli, C.; Izzo, A.; Scrima, R.; Bonfiglio, F.; Manco, R.; Negri, R.; Quarato, G.; Cela, O.; Ripoli, M.; Prisco, M.; et al. Chronic Pro-Oxidative State and Mitochondrial Dysfunctions Are More Pronounced in Fibroblasts from Down Syndrome Foeti with Congenital Heart Defects. Hum. Mol. Genet. 2013, 22 (6), 1218–1232.

(34)

Wolf, A. R.; Mootha, V. K. Functional Genomic Analysis of Human Mitochondrial RNA Processing. Cell Rep. 2014, 7 (3), 918–931.

(35)

Malicki, J. J.; Johnson, C. A. The Cilium: Cellular Antenna and Central Processing Unit. Trends Cell Biol. 2017, 27 (2), 126–140.

(36)

Firat-Karalar, E. N.; Sante, J.; Elliott, S.; Stearns, T. Proteomic Analysis of Mammalian Sperm Cells Identifies New Components of the Centrosome. J. Cell Sci. 2014, 127 (Pt 19), 4128–4133.

(37)

Qin, S.; Min, J. Structure and Function of the Nucleosome-Binding PWWP Domain. Trends Biochem. Sci. 2014, 39 (11), 536–547.

(38)

Pünzeler, S.; Link, S.; Wagner, G.; Keilhauer, E. C.; Kronbeck, N.; Spitzer, R. M.; Leidescher, S.; Markaki, Y.; Mentele, E.; Regnard, C.; et al. Multivalent Binding of PWWP2A to H2A.Z Regulates Mitosis and Neural Crest Differentiation. EMBO J. 2017, 36 (15), 2263–2279.

(39)

Gama-Carvalho, M.; Carmo-Fonseca, M. The Rules and Roles of Nucleocytoplasmic Shuttling Proteins. FEBS Lett. 2001, 498 (2–3), 157–163.

(40)

Lionaki, E.; Gkikas, I.; Tavernarakis, N. Differential Protein Distribution between the Nucleus and Mitochondria: Implications in Aging. Front. Genet. 2016, 7, 162.

(41)

Grenfell, A. W.; Heald, R.; Strzelecka, M. Mitotic Noncoding RNA Processing Promotes Kinetochore and Spindle Assembly in Xenopus. J. Cell Biol. 2016, 214 (2), 133–141.

(42)

Mitchison, H. M.; Valente, E. M. Motile and Non-Motile Cilia in Human Pathology: From Function to Phenotypes. J. Pathol. 2017, 241 (2), 294–309.

(43)

Petersen, C.; Aumüller, G.; Bahrami, M.; Hoyer-Fender, S. Molecular Cloning of Odf3 Encoding a Novel Coiled-Coil Protein of Sperm Tail Outer Dense Fibers. Mol. Reprod. Dev. 2002, 61 (1), 102–112.

(44)

Chaki, M.; Airik, R.; Ghosh, A. K.; Giles, R. H.; Chen, R.; Slaats, G. G.; Wang, H.; Hurd, T. W.; Zhou, W.; Cluckey, A.; et al. Exome Capture Reveals ZNF423 and CEP164 Mutations, Linking 30 ACS Paragon Plus Environment

Page 30 of 36

Page 31 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Renal Ciliopathies to DNA Damage Response Signaling. Cell 2012, 150 (3), 533–548. (45)

Yamaguchi, A.; Kaneko, T.; Inai, T.; Iida, H. Molecular Cloning and Subcellular Localization of Tektin2-Binding Protein 1 (Ccdc 172) in Rat Spermatozoa. J. Histochem. Cytochem. 2014, 62 (4), 286–297.

(46)

Sigg, M. A.; Menchen, T.; Lee, C.; Johnson, J.; Jungnickel, M. K.; Choksi, S. P.; Garcia, G.; Busengdal, H.; Dougherty, G. W.; Pennekamp, P.; et al. Evolutionary Proteomics Uncovers Ancient Associations of Cilia with Signaling Pathways. Dev. Cell 2017, 43 (6), 744–762.e11.

(47)

Jungnickel, M. K.; Sutton, K. A.; Baker, M. A.; Cohen, M. G.; Sanderson, M. J.; Florman, H. M. The Flagellar Protein Enkurin Is Required for Mouse Sperm Motility and for Transport through the Female Reproductive Tract. Biol. Reprod. 2018 (in press).

(48)

Miyata, H.; Castaneda, J. M.; Fujihara, Y.; Yu, Z.; Archambeault, D. R.; Isotani, A.; Kiyozumi, D.; Kriseman, M. L.; Mashiko, D.; Matsumura, T.; et al. Genome Engineering Uncovers 54 Evolutionarily Conserved and Testis-Enriched Genes That Are Not Required for Male Fertility in Mice. Proc. Natl. Acad. Sci. U. S. A. 2016, 113 (28), 7704–7710.

(49)

Khan, M.; Jabeen, N.; Khan, T.; Hussain, H. M. J.; Ali, A.; Khan, R.; Jiang, L.; Li, T.; Tao, Q.; Zhang, X.; et al. The Evolutionarily Conserved Genes: Tex37, Ccdc73, Prss55 and Nxt2 Are Dispensable for Fertility in Mice. Sci. Rep. 2018, 8 (1), 4975.

(50)

Jumeau, F.; Com, E.; Lane, L.; Duek, P.; Lagarrigue, M.; Lavigne, R.; Guillot, L.; Rondel, K.; Gateau, A.; Melaine, N.; et al. Human Spermatozoa as a Model for Detecting Missing Proteins in the Context of the Chromosome-Centric Human Proteome Project. J. Proteome Res. 2015, 14 (9), 3606–3620.

(51)

Rolland, T.; Taşan, M.; Charloteaux, B.; Pevzner, S. J.; Zhong, Q.; Sahni, N.; Yi, S.; Lemmens, I.; Fontanillo, C.; Mosca, R.; et al. A Proteome-Scale Map of the Human Interactome Network. Cell 2014, 159 (5), 1212–1226.

(52)

Adams, S. H.; Esser, V.; Brown, N. F.; Ing, N. H.; Johnson, L.; Foster, D. W.; McGarry, J. D. Expression and Possible Role of Muscle-Type Carnitine Palmitoyltransferase I during Sperm Development in the Rat. Biol. Reprod. 1998, 59 (6), 1399–1405.

(53)

Amaral, A.; Castillo, J.; Estanyol, J. M.; Ballescà, J. L.; Ramalho-Santos, J.; Oliva, R. Human Sperm Tail Proteome Suggests New Endogenous Metabolic Pathways. Mol. Cell. Proteomics 2013, 12 (2), 330–342.

(54)

Barrow, A. D.; Trowsdale, J. The Extended Human Leukocyte Receptor Complex: Diverse Ways of Modulating Immune Responses. Immunol. Rev. 2008, 224 (1), 98–123.

(55)

Medley, Q. G.; Kedersha, N.; O’Brien, S.; Tian, Q.; Schlossman, S. F.; Streuli, M.; Anderson, P. Characterization of GMP-17, a Granule Membrane Protein That Moves to the Plasma Membrane of Natural Killer Cells Following Target Cell Recognition. Proc. Natl. Acad. Sci. U. S. 31 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A. 1996, 93 (2), 685–689. (56)

Craig, A. M.; Kang, Y. Neurexin-Neuroligin Signaling in Synapse Development. Curr. Opin. Neurobiol. 2007, 17 (1), 43–52.

(57)

Hulur, I.; Gamazon, E. R.; Skol, A. D.; Xicola, R. M.; Llor, X.; Onel, K.; Ellis, N. A.; Kupfer, S. S. Enrichment of Inflammatory Bowel Disease and Colorectal Cancer Risk Variants in Colon Expression Quantitative Trait Loci. BMC Genomics 2015, 16 (1), 138.

(58)

Brodziak, F.; Meharg, C.; Blaut, M.; Loh, G. Differences in Mucosal Gene Expression in the Colon of Two Inbred Mouse Strains after Colonization with Commensal Gut Bacteria. PLoS One 2013, 8 (8), e72317.

(59)

Son, J. S.; Khair, S.; Pettet, D. W.; Ouyang, N.; Tian, X.; Zhang, Y.; Zhu, W.; Mackenzie, G. G.; Robertson, C. E.; Ir, D.; et al. Altered Interactions between the Gut Microbiome and Colonic Mucosa Precede Polyposis in APCMin/+ Mice. PLoS One 2015, 10 (6), e0127985.

(60)

Holzer, P.; Farzi, A. Neuropeptides and the Microbiota-Gut-Brain Axis. Adv. Exp. Med. Biol. 2014, 817, 195–219.

(61)

Georgiadi, A.; Boekschoten, M. V; Müller, M.; Kersten, S. Detailed Transcriptomics Analysis of the Effect of Dietary Fatty Acids on Gene Expression in the Heart. Physiol. Genomics 2012, 44 (6), 352–361.

(62)

Ivliev, A. E.; ’t Hoen, P. A. C.; van Roon-Mom, W. M. C.; Peters, D. J. M.; Sergeeva, M. G. Exploring the Transcriptome of Ciliated Cells Using in Silico Dissection of Human Tissues. PLoS One 2012, 7 (4), e35618.

(63)

Ross, A. J.; Dailey, L. A.; Brighton, L. E.; Devlin, R. B. Transcriptional Profiling of Mucociliary Differentiation in Human Airway Epithelial Cells. Am. J. Respir. Cell Mol. Biol. 2007, 37 (2), 169–185.

(64)

Geremek, M.; Ziętkiewicz, E.; Bruinenberg, M.; Franke, L.; Pogorzelski, A.; Wijmenga, C.; Witt, M. Ciliary Genes Are Down-Regulated in Bronchial Tissue of Primary Ciliary Dyskinesia Patients. PLoS One 2014, 9 (2), e88216.

(65)

Nevers, Y.; Prasad, M. K.; Poidevin, L.; Chennen, K.; Allot, A.; Kress, A.; Ripp, R.; Thompson, J. D.; Dollfus, H.; Poch, O.; et al. Insights into Ciliary Genes and Evolution from Multi-Level Phylogenetic Profiling. Mol. Biol. Evol. 2017, 34 (8), 2016–2034.

(66)

Han, X.; Xie, H.; Wang, Y.; Zhao, C. Radial Spoke Proteins Regulate Otolith Formation during Early Zebrafish Development. FASEB J. 2018, 32 (7), 3984–3992.

(67)

Geremek, M.; Bruinenberg, M.; Ziętkiewicz, E.; Pogorzelski, A.; Witt, M.; Wijmenga, C. Gene Expression Studies in Cells from Primary Ciliary Dyskinesia Patients Identify 208 Potential Ciliary Genes. Hum. Genet. 2011, 129 (3), 283–293.

(68)

Liu, P.; Wang, Y.; Vikis, H.; Maciag, A.; Wang, D.; Lu, Y.; Liu, Y.; You, M. Candidate Lung Tumor 32 ACS Paragon Plus Environment

Page 32 of 36

Page 33 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Susceptibility Genes Identified through Whole-Genome Association Analyses in Inbred Mice. Nat. Genet. 2006, 38 (8), 888–895. (69)

Sinnott, R.; Winters, L.; Larson, B.; Mytsa, D.; Taus, P.; Cappell, K. M.; Whitehurst, A. W. Mechanisms Promoting Escape from Mitotic Stress-Induced Tumor Cell Death. Cancer Res. 2014, 74 (14), 3857–3869.

(70)

Yuan, X.; Yang, S. Cilia/Ift Protein and Motor -Related Bone Diseases and Mouse Models. Front. Biosci. (Landmark Ed. 2015, 20, 515–555.

(71)

Rual, J.-F.; Venkatesan, K.; Hao, T.; Hirozane-Kishikawa, T.; Dricot, A.; Li, N.; Berriz, G. F.; Gibbons, F. D.; Dreze, M.; Ayivi-Guedehoussou, N.; et al. Towards a Proteome-Scale Map of the Human Protein-Protein Interaction Network. Nature 2005, 437 (7062), 1173–1178.

(72)

Baron Gaillard, C. L.; Pallesi-Pocachard, E.; Massey-Harroche, D.; Richard, F.; Arsanto, J.-P.; Chauvin, J.-P.; Lecine, P.; Krämer, H.; Borg, J.-P.; Le Bivic, A. Hook2 Is Involved in the Morphogenesis of the Primary Cilium. Mol. Biol. Cell 2011, 22 (23), 4549–4562.

(73)

Pinto, S. M.; Manda, S. S.; Kim, M.-S.; Taylor, K.; Selvan, L. D. N.; Balakrishnan, L.; Subbannayya, T.; Yan, F.; Prasad, T. S. K.; Gowda, H.; et al. Functional Annotation of Proteome Encoded by Human Chromosome 22. J. Proteome Res. 2014, 13 (6), 2749–2760.

(74)

Vandenbrouck, Y.; Lane, L.; Carapito, C.; Duek, P.; Rondel, K.; Bruley, C.; Macron, C.; Gonzalez de Peredo, A.; Couté, Y.; Chaoui, K.; et al. Looking for Missing Proteins in the Proteome of Human Spermatozoa: An Update. J. Proteome Res. 2016, 15 (11), 3998–4019.

(75)

Wilhelm, M.; Schlegl, J.; Hahne, H.; Gholami, A. M.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H.; et al. Mass-Spectrometry-Based Draft of the Human Proteome. Nature 2014, 509 (7502), 582–587.

(76)

Virant-Klun, I.; Leicht, S.; Hughes, C.; Krijgsveld, J. Identification of Maturation-Specific Proteins by Single-Cell Proteomics of Human Oocytes. Mol. Cell. Proteomics 2016, 15 (8), 2616–2627.

(77)

Hu, J.; Wang, F.; Zhu, X.; Yuan, Y.; Ding, M.; Gao, S. Mouse ZAR1-like (XM_359149) Colocalizes with MRNA Processing Components and Its Dominant-Negative Mutant Caused Two-CellStage Embryonic Arrest. Dev. Dyn. 2010, 239 (2), 407–424.

(78)

Lu, X.; Wang, N.; Long, X.-B.; You, X.-J.; Cui, Y.-H.; Liu, Z. The Cytokine-Driven Regulation of Secretoglobins in Normal Human Upper Airway and Their Expression, Particularly That of Uteroglobin-Related Protein 1, in Chronic Rhinosinusitis. Respir. Res. 2011, 12 (1), 28.

(79)

Dear, T. N.; Boehm, T.; Keverne, E. B.; Rabbitts, T. H. Novel Genes for Potential Ligand-Binding Proteins in Subregions of the Olfactory Mucosa. EMBO J. 1991, 10 (10), 2813–2819.

(80)

Oelschläger, H. H. A. The Dolphin Brain--a Challenge for Synthetic Neurobiology. Brain Res. Bull. 2008, 75 (2–4), 450–459. 33 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(81)

Arnaiz, O.; Malinowska, A.; Klotz, C.; Sperling, L.; Dadlez, M.; Koll, F.; Cohen, J. Cildb: A Knowledgebase for Centrosomes and Cilia. Database (Oxford). 2009, 2009, bap022.

(82)

Broadhead, R.; Dawe, H. R.; Farr, H.; Griffiths, S.; Hart, S. R.; Portman, N.; Shaw, M. K.; Ginger, M. L.; Gaskell, S. J.; McKean, P. G.; et al. Flagellar Motility Is Required for the Viability of the Bloodstream Trypanosome. Nature 2006, 440 (7081), 224–227.

(83)

Smith, J. C.; Northey, J. G. B.; Garg, J.; Pearlman, R. E.; Siu, K. W. M. Robust Method for Proteome Analysis by MS/MS Using an Entire Translated Genome: Demonstration on the Ciliome of Tetrahymena Thermophila. J. Proteome Res. 2005, 4 (3), 909–919.

(84)

Pazour, G. J.; Agrin, N.; Leszyk, J.; Witman, G. B. Proteomic Analysis of a Eukaryotic Cilium. J. Cell Biol. 2005, 170 (1), 103–113.

(85)

Keller, L. C.; Romijn, E. P.; Zamora, I.; Yates, J. R.; Marshall, W. F. Proteomic Analysis of Isolated Chlamydomonas Centrioles Reveals Orthologs of Ciliary-Disease Genes. Curr. Biol. 2005, 15 (12), 1090–1098.

(86)

Tam, L.-W.; Lefebvre, P. A. The Chlamydomonas MBO2 Locus Encodes a Conserved Coiled-Coil Protein Important for Flagellar Waveform Conversion. Cell Motil. Cytoskeleton 2002, 51 (4), 197–212.

(87)

Hayes, J. M.; Kim, S. K.; Abitua, P. B.; Park, T. J.; Herrington, E. R.; Kitayama, A.; Grow, M. W.; Ueno, N.; Wallingford, J. B. Identification of Novel Ciliogenesis Factors Using a New in Vivo Model for Mucociliary Epithelial Development. Dev. Biol. 2007, 312 (1), 115–130.

(88)

Marra, A. N.; Wingert, R. A. Epithelial Cell Fate in the Nephron Tubule Is Mediated by the ETS Transcription Factors Etv5a and Etv4 during Zebrafish Kidney Development. Dev. Biol. 2016, 411 (2), 231–245.

(89)

Assis, L. H. C.; Crespo, D.; Morais, R. D. V. S.; França, L. R.; Bogerd, J.; Schulz, R. W. INSL3 Stimulates Spermatogonial Differentiation in Testis of Adult Zebrafish (Danio Rerio). Cell Tissue Res. 2016, 363 (2), 579–588.

(90)

Senthilan, P. R.; Piepenbrock, D.; Ovezmyradov, G.; Nadrowski, B.; Bechstedt, S.; Pauls, S.; Winkler, M.; Möbius, W.; Howard, J.; Göpfert, M. C. Drosophila Auditory Organ Genes and Genetic Hearing Defects. Cell 2012, 150 (5), 1042–1054.

(91)

Hunt-Newbury, R.; Viveiros, R.; Johnsen, R.; Mah, A.; Anastas, D.; Fang, L.; Halfnight, E.; Lee, D.; Lin, J.; Lorch, A.; et al. High-Throughput in Vivo Analysis of Gene Expression in Caenorhabditis Elegans. PLoS Biol. 2007, 5 (9), e237.

(92)

Elphick, M. R.; Mirabeau, O.; Larhammar, D. Evolution of Neuropeptide Signalling Systems. J. Exp. Biol. 2018, 221 (Pt 3), jeb151092.

(93)

Heimeier, R. A.; Das, B.; Buchholz, D. R.; Fiorentino, M.; Shi, Y.-B. Studies on Xenopus Laevis Intestine Reveal Biological Pathways Underlying Vertebrate Gut Adaptation from Embryo to 34 ACS Paragon Plus Environment

Page 34 of 36

Page 35 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Adult. Genome Biol. 2010, 11 (5), R55. (94)

Flajnik, M. F. A Cold-Blooded View of Adaptive Immunity. Nat. Rev. Immunol. 2018, 18 (7), 438–453.

(95)

Neely, H. R.; Flajnik, M. F. Emergence and Evolution of Secondary Lymphoid Organs. Annu Rev Cell Dev Biol. 2016, 32:693-711.

(96)

Lovell, P. V; Wirthlin, M.; Wilhelm, L.; Minx, P.; Lazar, N. H.; Carbone, L.; Warren, W. C.; Mello, C. V. Conserved Syntenic Clusters of Protein Coding Genes Are Missing in Birds. Genome Biol. 2014, 15 (12), 565.

(97)

Lee, C.; Lee, J.; Lee, J.; Park, C. Characterization of the Escherichia Coli YajL, YhbO and ElbB Glyoxalases. FEMS Microbiol. Lett. 2016, 363 (3), fnv239.

(98)

Masuda, T.; Wada, Y.; Kawamura, S. ES1 Is a Mitochondrial Enlarging Factor Contributing to Form Mega-Mitochondria in Zebrafish Cones. Sci. Rep. 2016, 6 (1), 22360.

(99)

Heininger, A. U.; Hackert, P.; Andreou, A. Z.; Boon, K.-L.; Memet, I.; Prior, M.; Clancy, A.; Schmidt, B.; Urlaub, H.; Schleiff, E.; et al. Protein Cofactor Competition Regulates the Action of a Multifunctional RNA Helicase in Different Pathways. RNA Biol. 2016, 13 (3), 320–330.

(100) Katoh, Y.; Michisaka, S.; Nozaki, S.; Funabashi, T.; Hirano, T.; Takei, R.; Nakayama, K. Practical Method for Targeted Disruption of Cilia-Related Genes by Using CRISPR/Cas9-Mediated, Homology-Independent Knock-in System. Mol. Biol. Cell 2017, 28 (7), 898–906. (101) Kinzfogl, J.; Hangoc, G.; Broxmeyer, H. E. Neurexophilin 1 Suppresses the Proliferation of Hematopoietic Progenitor Cells. Blood 2011, 118 (3), 565–575. (102) Beglopoulos, V.; Montag-Sallaz, M.; Rohlmann, A.; Piechotta, K.; Ahmad, M.; Montag, D.; Missler, M. Neurexophilin 3 Is Highly Localized in Cortical and Cerebellar Regions and Is Functionally Important for Sensorimotor Gating and Motor Coordination. Mol. Cell. Biol. 2005, 25 (16), 7278–7288. (103) Paik, Y.-K.; Omenn, G. S.; Hancock, W. S.; Lane, L.; Overall, C. M. Advances in the Chromosome-Centric Human Proteome Project: Looking to the Future. Expert Rev. Proteomics 2017, 14 (12), 1059–1071.

35 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For TOC Only Table of Contents (TOC)/Abstract (ABS) Graphic

36 ACS Paragon Plus Environment

Page 36 of 36