Bioinformatics Annotation of Human Y ... - ACS Publications

Aug 17, 2015 - Project including neXtProt, PeptideAtlas, and the Human Protein. Atlas. When we examined the pathways of human Y-encoded proteins throu...
10 downloads 14 Views 2MB Size
Subscriber access provided by UNIV OF CAMBRIDGE

Article

Bioinformatics annotation of human Y chromosomeencoded protein pathways and interactions Deivendran Rengaraj, Woo-Sung Kwon, and Myung-Geol Pang J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b00491 • Publication Date (Web): 17 Aug 2015 Downloaded from http://pubs.acs.org on August 19, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Bioinformatics annotation of human Y chromosome-encoded protein pathways and interactions

Deivendran Rengaraj, Woo-Sung Kwon, and Myung-Geol Pang*

Department of Animal Science and Technology, Chung-Ang University, Anseong, Gyeonggi-Do 456-756, Republic of Korea *

Author to whom correspondence should be addressed: Myung-Geol Pang, Chung-Ang

University, Anseong, Gyeonggi-Do 456-756, Republic of Korea; Tel: +82.31.670.4841; Fax: +82.31.675.9001; E-mail: [email protected]

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 40

ABSTRACT

In this paper, we performed a comprehensive analysis of human Y chromosome-encoded proteins, their pathways, and their interactions using bioinformatics tools. From the NCBI annotation release 107 of human genome, we retrieved a total of 66 proteins encoded on Y chromosome. Most of the retrieved proteins were also matched with the proteins listed in the core databases of the Human Proteome Project including neXtProt, PeptideAtlas, and the Human Protein Atlas. When we examined the pathways of human Y-encoded proteins through KEGG database and Pathway Studio software, many of proteins fall into the categories related to cell signaling pathways. Using the STRING program, we found a total of 49 human Y-encoded proteins showing strong/medium interaction with each other. While using the Pathway studio software, we found a total of 16 proteins interact with other chromosome-encoded proteins. Particularly, the SRY protein interacted with 17 proteins encoded on other chromosomes. Additionally, we aligned the sequences of human Y-encoded proteins with the sequences of chimpanzee and mouse Y-encoded proteins using the NCBI BLAST program. This analysis resulted a significant number of orthologous proteins between human, chimpanzee and mouse. Collectively, our findings provide the scientific community with additional information on the human Y chromosome-encoded proteins.

KEYWORDS: Chromosome-Centric Human

Proteome Project

(C-HPP), Human

chromosome, Protein annotation, Protein pathways, Protein interactions; Protein orthologs

ACS Paragon Plus Environment

Y

Page 3 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ADDITIONAL ABBREVIATIONS

AANAT - aralkylamine N-acetyltransferase; ACE2 - angiotensin I converting enzyme 2; AMH anti-Mullerian hormone; ANKRD27 - ankyrin repeat domain 27; AP3D1 - adaptor-related protein complex 3, delta 1 subunit; BET1 - blocked early in transport 1 homolog; CALR calreticulin; CBX2 - chromobox homolog 2; CD40 - CD40 molecule, TNF receptor superfamily member 5; CD44 - CD44 molecule; CD81 - CD81 molecule; CDC6 - cell division cycle 6 homolog; CDX1 - caudal type homeobox 1; CDX2 - caudal type homeobox 2; CEBPA CCAAT/enhancer binding protein (C/EBP), alpha; CRX - cone-rod homeobox; CSF2 - colony stimulating factor 2; CSF2RB - colony stimulating factor 2 receptor, beta; CTSB - cathepsin B; DAZAP1 - DAZ associated protein 1; EN1 - engrailed homeobox 1; EN2 - engrailed homeobox 2; EP300 - E1A binding protein p300; EZH2 - enhancer of zeste homolog 2; FGFR3 - fibroblast growth factor receptor 3; FOSL1 - FOS-like antigen 1; GNB2L1 - guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1; GSN - gelsolin; HDAC3 - histone deacetylase 3; HNRNPF - heterogeneous nuclear ribonucleoprotein F; HOXA9 - homeobox A9; HSPA1A heat shock 70kDa protein 1A; ICAM1 - intercellular adhesion molecule 1; IFNG - interferon, gamma; IL2RG - interleukin 2 receptor, gamma; IL3 - interleukin 3; IL4 - interleukin 4; IL5 interleukin 5; IL5RA - interleukin 5 receptor, alpha; IL7R - interleukin 7 receptor; AZF1 azoospermia factor 1; IL9 - interleukin 9; INSL3 - insulin-like 3; ITGB1 - integrin, beta 1; JAK1 - janus kinase 1; JAK2 - janus kinase 2; KPNB1 - karyopherin beta 1; L1CAM - L1 cell adhesion molecule; LAMP1 - lysosomal-associated membrane protein 1; MAOA - monoamine oxidase A; MAP3K14 - mitogen-activated protein kinase kinase kinase 14; MAS1 - MAS1 oncogene; MBD2 - methyl-CpG binding domain protein 2; MECP2 - methyl CpG binding protein 2; MPP1 - membrane protein, palmitoylated 1, 55kDa; MTERF - mitochondrial transcription termination factor; NCL - nucleolin; NFKB2 - nuclear factor of kappa light polypeptide gene enhancer in Bcells 2; NPPB - natriuretic peptide B; NR5A1 - nuclear receptor subfamily 5, group A, member 1; NTF3 - neurotrophin 3; ORC5 - origin recognition complex, subunit 5; PCNA - proliferating cell nuclear antigen; PDE11A - phosphodiesterase 11A; PILRB - paired immunoglobin-like type 2 receptor beta; PLP1 - proteolipid protein 1; POLG - polymerase (DNA directed), gamma; PRDM1 - PR domain containing 1, with ZNF domain; PROX1 - prospero homeobox 1; PTGDS - prostaglandin D2 synthase 21kDa; PTPN11 - protein tyrosine phosphatase, non-receptor type

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

11; PTPN6 - protein tyrosine phosphatase, non-receptor type 6; RAB7A - RAB7A, member RAS oncogene family; RHBDF1 - rhomboid 5 homolog 1; RPS6 - ribosomal protein S6; RUNX1 runt-related transcription factor 1; SOX9 - SRY (sex determining region Y)-box 9; SP1 - Sp1 transcription factor; SPI1 - spleen focus forming virus (SFFV) proviral integration oncogene spi1; SSBP1 - single-stranded DNA binding protein 1; STX12 - syntaxin 12; STX3 - syntaxin 3; STX4 - syntaxin 4; STX6 - syntaxin 6; STX7 - syntaxin 7; SUZ12 - suppressor of zeste 12 homolog; SYT7 - synaptotagmin VII; T - T, brachyury homolog; TBPL1 - TBP-like 1; TCF21 transcription factor 21; TES - testis derived transcript; TGIF2LY - TGFB-induced factor homeobox 2-like, Y-linked; TLR9 - toll-like receptor 9; TNF - tumor necrosis factor; TNFRSF8 - tumor necrosis factor receptor superfamily, member 8; TP53 - tumor protein p53; TRA2B transformer 2 beta homolog; TSLP - thymic stromal lymphopoietin; UBXN7 - UBX domain protein 7; VAMP2 - vesicle-associated membrane protein 2; VAMP4 - vesicle-associated membrane protein 4; VAMP8 - vesicle-associated membrane protein 8; VTI1B - vesicle transport through interaction with t-SNAREs homolog 1B; WDR5 - WD repeat domain 5; WT1 Wilms tumor 1; ZBTB16 - zinc finger and BTB domain containing 16; ZNF143 - zinc finger protein 143

1. INTRODUCTION

The Y chromosome is a type of sex chromosome existing primarily in male mammalian species. The Y chromosome passes through the male gamete and determines male sex in humans, non-human primates, and other mammals. During fertilization, the male-derived gamete, containing the Y chromosome, penetrates the female-derived gamete containing an X chromosome, ensuring normal determination of male sex, whereas the fusion of X chromosomecontaining gametes from male and female parents ensure female sex of the embryo. The human Y chromosome varies from the X chromosome and the rest of the chromosomes primarily by size and its male sex-determining function. The human Y chromosome is about three times smaller than the X chromosome, and its male sex-determining function is exclusively located on the short arm.1 Male sex determination is a result of gonadal sex determination during embryonic development. In the presence of the Y chromosome, the embryonic gonads become testes, while in the absence of Y chromosome, the gonads become ovaries.1,2 Several studies reported that

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

deletions or mutations, particularly in the long arm of Y chromosome, may cause male infertility, and also affects the reproductive performances of their sons.3-5 Consequently, translocation of Y chromosomal segments onto the X chromosome may cause gonadal dysgenesis in females.2,6,7 The hypothesis that exchange between X and Y chromosomes occurs during meiosis suggests that some individuals are XX males carrying Y chromosome fragments on one X chromosome, while others are XY females missing Y chromosome segments important for determining male sex.7 The mammalian Y chromosomes are subject to far more mutation, deletion, and insertion than other chromosomes.8 Because, the Y chromosomes face oxidative environment and many more cell divisions in the testis. Another evidence might be due to the presence of repetitive sequences, recombination between homologous sequences in palindromes often removes several fertility genes on the Y chromosome.8

In contrast to deletion, the presence of more than one Y chromosome has been clinically identified in several cases. Studies of boys and adult men with the XYY karyotype revealed several disorders, including tall stature, psychiatric and behavioral problems, and genitourinary and reproductive abnormalities.9-12 The XYY karyotype may occur due to the failure of YY separation during meiosis in paternal gametogenesis.13 Similarly, studies of human males with a XYYY (Y trisomy) karyotype show a significant developmental delay, mental and behavioral problems, and reproductive abnormalities.14-16 YYY sperm may be produced by secondary nondisjunction in meiosis II from a XYY primary spermatocyte.14,17 Comparatively, a XYYYY (Y tetrasomy) karyotype was found in a few human males; they exhibited mental problems, hormonal disorders, and unusual facial features.18-20 Additionally, testicular insufficiency and low testosterone in affected boys, and hypogonadism with azoospermia in affected men were reported.20,21 Non-disjunction during spermatogonial mitosis followed by meiotic nondisjunction might be a potential cause for the development of sperm carrying four Y chromosomes.21 The phenotypic features like facial abnormalities and mental deficiency on multiple-Y individuals might reflect a bias of ascertainment, as such children or adults might be selected for cytogenetic analysis. During the last few decades, researchers put forth an enormous effort to discover human Y chromosome specific genes, their transcribed RNAs, and encoded proteins. As a result, most of the genes and encoded proteins responsible for male-sex

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

determination, testis development, and spermatogenesis have been discovered and characterized in humans.1,8

The international consortium for the Chromosome-Centric Human Proteome Project (CHPP) was established in 2011.22,23 The major goal of C-HPP was to map and annotate all known and missing proteins encoded by the genes on each human chromosome using various applications, such as localization in normal and diseased tissues, quantification by mass spectrometry,

isoform

identification,

functional

characterization,

and

anti-body-based

analysis.22,23 The guidelines for C-HPP were set, and international teams have selected chromosomes.23,24 Mapping the proteome of the human Y chromosome was conducted by an Iranian team.24 In relation to C-HPP, Jangravi and colleagues24 reported a recent update of the male-specific region (MSY) of the human Y chromosome protein-encoding genes and their association with various traits and diseases. They also reported information about protein-protein interactions and post-translational modifications of protein-coding genes in the MSY.24 To add information relevant to C-HPP, we performed a comprehensive analysis of human Y chromosome-encoded proteins, their pathways, and their interactions using bioinformatics tools.

2. EXPERIMENTAL SECTION 2.1. Search for the genome annotated Y chromosomes in human and other mammals

In this study we primarily used the National Center for Biotechnology Information (NCBI) eukaryotic genome annotation resource database25,26 to search the genome annotated human and other mammalian Y chromosomes. The NCBI eukaryotic genome annotation resource database provided annotated chromosome information for 20 primates, 14 rodents, 17 even-toed ungulates and whales, and 34 other mammals. However, annotated information for the Y chromosome was only available for seven mammalian species including human (Homo sapiens, Annotation release 107), chimpanzee (Pan troglodytes, Annotation release 103), green monkey (Chlorocebus sabaeus, Annotation release 100), white-tufted-ear marmoset (Callithrix jacchus, Annotation release 102), mouse (Mus musculus, Annotation release 105), Norway rat (Rattus norvegicus, Annotation release 105), and pig (Sus scrofa, Annotation release 105)27-33

ACS Paragon Plus Environment

Page 7 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(Table 1). The information of proteins encoded on Y chromosomes of human, chimpanzee, and mouse were retrieved from the respective NCBI release, and used for the subsequent analysis stated hereafter.

2.2. Analysis of annotated human Y chromosome-encoded proteins

The latest NCBI human genome annotation release 107 consists several gene/protein isoforms. Since all isoforms represent one common gene/protein, we selected only the first isoform and corresponding protein symbol for the use of further analysis. The NCBI list of proteins encoded on the human Y chromosome were then compared with the list of proteins available in core databases of the Human Proteome Project including neXtProt (Release 201409-19) and PeptideAtlas (Build 2014-08) for identifying the protein evidence and identification status, respectively. neXtProt is a web-based platform provides extensive knowledge on human proteins.34 PeptideAtlas is also a web-based platform provides list of human proteins with a stringent 1% false discovery rate.35 Additionally, the NCBI list of proteins was subjected into the Human Protein Atlas (Version 13) database for identifying the detection status. The Human Protein Atlas provides extensive knowledge on immunohistochemical or transcript only expression of human proteins in all major tissues and organs36. In order to identify the Xhomologous of human Y chromosome-encoded proteins, we subjected the FASTA sequences of human Y chromosome-encoded proteins into NCBI Protein BLAST program and protein-protein BLAST (BLASTP) algorithm37 against the parameters non-redundant protein sequences and Homo sapiens (taxid:9606). The protein BLAST program helps to search protein database using protein query, and provides the % identities and coverage scores between query protein and identified proteins.

2.3. Analysis of human Y chromosome-encoded protein pathways and interactions

The major goal of this manuscript was to determine the cellular pathways and proteinprotein interactions involving human Y chromosome-encoded proteins. Initially, the cellular pathways of proteins encoded on the human Y chromosome was searched using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway-mapping database against organism-

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

specific parameter (Homo sapiens: hsa). The KEGG pathway-mapping database provides a pathway search tool, where given genes or proteins are searched against KEGG pathway maps in an organism-specific manner.38,39 Concurrently, we entered the human Y chromosome-encoded proteins into the Pathway Studio software (Version 9.0). Pathway Studio is a software application used to analyze biological pathways, gene regulation networks, and protein interaction maps. It is primarily based on extracted information from NCBI PubMed literature against known human protein pathways.40-42 The cellular pathway results obtained from KEGG database and Pathway Studio software were then integrated. In order to analyze the proteinprotein interactions of human Y chromosome-encoded proteins, we used the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, Version 10) database, and the Pathway Studio software. The STRING database facilitates analysis of genes/protein interactions in an organismspecific manner using commonly available sources, including the NCBI PubMed literature database.43 We used the STRING database to determine interactions among human Y chromosome-encoded proteins. Concurrently, we used the Pathway Studio software to determine the interactions of human Y chromosome-encoded proteins with other chromosome (X or somatic)-encoded proteins.

2.4. Analysis of human Y chromosome-encoded protein orthologs in chimpanzee and mouse Y chromosomes

The complete/partial Y chromosome sequences are available from only seven mammalian species in the NCBI eukaryotic genome annotation resource database. Of note, Y chromosomes from human, chimpanzee, and mouse provide a large proportion of localized genes and proteins. In order to find orthologs of human Y chromosome-encoded proteins in chimpanzee and mouse, we first retrieved all protein information from chimpanzee (Annotation release 103) and mouse (Annotation release 105) Y chromosomes from the NCBI database. Next, we compared all the human protein symbols and their synonyms with the protein symbols and synonyms of chimpanzee and mouse. Then, the FASTA sequences of proteins share symbol/synonym between human and chimpanzee/mouse were aligned using the NCBI Align two (or more) sequences using BLAST (bl2seq) program. The bl2seq program runs with BLASTP algorithm37, and provides % identities and coverage scores between aligned sequences.

ACS Paragon Plus Environment

Page 8 of 40

Page 9 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Finally, the interaction and functions of proteins commonly located in the Y chromosomes of human, chimpanzee, and mouse were searched using the Pathway Studio software.

3. RESULTS AND DISCUSSION 3.1. Genome annotated Y chromosomes in mammalian species

The NCBI eukaryotic genome annotation resource database provided annotated chromosome information for about 85 mammalian species. However, annotated information for the Y chromosome was only available for seven mammalian species including human, chimpanzee, green monkey, white-tufted-ear marmoset, mouse, Norway rat, and pig (Table 1). While the complete genomes of several mammalian species have been sequenced, their Y chromosomes have not been sequenced. Sequencing Y chromosome using the standard shotgun sequencing method can be difficult due to the high number of repeat sequences.8 The latest NCBI annotation release for the above seven species consists several characterized, as well as uncharacterized and predicted genes/proteins. The gene-derived products such as RNAs and proteins may undergo several posttranscriptional and/or posttranslational modifications.44,45 Therefore, the number of chromosome-encoded genes and proteins may differ in a species. Additionally, the size of a particular chromosome may influence variations in gene and protein number among different species. For example, the mouse Y chromosome (91.74 Mb) is much larger in comparison to that in other species, while the pig Y chromosome (1.64 Mb) is much smaller. Gene/protein symbols are necessary to search for the bioinformatics information of particular genes or proteins in a public repository. After analyzing the genes and proteins encoded on the human Y chromosome, we found that most of them were given a clear gene/protein symbol. In contrast, the symbols for most of the genes/proteins encoded on the Y chromosome of non-human mammalian species were unknown.

3.2. Human Y chromosome-encoded proteins

The human Y chromosome is approximately 57.23 Mb in size and contains structural genes (with coding function), pseudogenes (amplified copies), and repeat sequences (non-coding sequences). Of note, half of the heterochromatic long arm is composed of repeat sequences that,

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 40

if deleted, display no phenotypic effect.8 Through the NCBI annotation release 107 of human genome, we identified a total of 66 proteins encoded on the human Y chromosome (Table 2). The identified 66 proteins were then compared with three core databases of the Human Proteome Project including neXtProt, PeptideAtlas, and the Human Protein Atlas. Among the identified 66 proteins, 44 proteins had evidences in neXtProt dataset, and 40 proteins had identification status in PeptideAtlas dataset (Table 3). A few proteins including protein kinase, Y-linked, pseudogene (PRKY), solute carrier family 9, subfamily B, member 1 pseudogene 1 (SLC9B1P1), BCL6 corepressor pseudogene 1 (BCORP1), taxilin gamma pseudogene, Y-linked (TXLNGY), PTPN13-like, Y-linked pseudogene 3 (PRYP3), and PTPN13-like, Y-linked pseudogene 4 (PRYP4) listed in neXtProt / PeptideAtlas datasets were not found in the NCBI protein list. Because, they were updated as pseudogenes with no coding function in the NCBI human genome annotation release 107. Also, members of the testis-specific transcript, Y-linked family (TTTY10, TTTY12, and TTTY13) listed in neXtProt / PeptideAtlas datasets were not found in the NCBI protein list. They were updated as non-protein coding gene in the NCBI human genome annotation release 107. At the other hand, about 22 – 26 proteins listed in the NCBI were not found in neXtProt / PeptideAtlas datasets. Most of these un-matched proteins were encoded on the pseudoautosomal regions of the human Y chromosome. We also speculate that some protein isoforms encoded on the MSY might be characterized as new member of the existed protein family. For instances, VCY1B in VCY family, CDY2B in CDY family, and PRY2 in PRY family. Furthermore, comparison of the 66 proteins with the Human Protein Atlas database shows 46 proteins with immunohistochemical evidences, 17 proteins with transcript only evidences, and 3 proteins with no experimental evidence (Table 3). This analysis reinforces the need of immunological characterization of human proteins encoded on Y chromosome.

Figure 1A shows a schematic diagram of the 66 proteins and their corresponding genes that are encoded on the human Y chromosome. The human Y chromosome is an acrocentric chromosome, containing a short arm (Yp) and a long arm (Yq) demarcated by a centromeric region essential for chromosome segregation.5 The classical and modern physical map of human Y chromosome is well established. According to the classical model, the Y chromosome was divided into a euchromatic short arm, a euchromatic long arm, and a heterochromatic long arm.1 H-Y antigen producing genes, such as UTY and TMSB4Y, were predominantly encoded on the

ACS Paragon Plus Environment

Page 11 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

euchromatic short arm.46,47 Similarly, SRY, the most crucial and evolutionarily conserved malesex determining factor, was encoded on the euchromatic short arm.48 Most other protein coding genes located on either the euchromatic short arm or the euchromatic long arm were shown to have specific roles in the testis, during spermatogenesis, and in male fertility.49-52 The modern physical map of the human Y chromosome includes a non-recombining region (NRY), otherwise known as MSY, which comprises 95% of the chromosomal length (Fig. 1B).53,54 Usually, no XY recombination occurs in this region during male meiosis. The MSY region is flanked by pseudoautosomal regions where X-Y recombination does occur during meiosis in males.54,55 The MSY region is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed (exhibiting 99% identity to the X chromosome), X-degenerate (remnants of ancient autosomes), and ampliconic (where sequence pairs show >99.9% identity).45,53,54 When we searched the X-homologous of 66 human Y chromosome-encoded proteins through NCBI BLASTP program, most of the proteins from pseudoautosomal regions of human Y chromosome showed 100% identities with X-homologous. The proteins from MSY region of human Y chromosome showed various percent identities (36 - 99%) with Xhomologous. The X-homologous of CDY, PRY, BPY, DAZ, and SPRY proteins members were not known (Table 3). Furthermore, the protein-coding genes in the human Y chromosome are classified according to their expression, function, and evolutionary history. For example, ZFYlike genes are single copy, ubiquitously expressed, and remnants of ancient homology with the X chromosome. SRY-like genes are single-copy, ancient testis/male-specific functional genes, while DAZ-, RBMY-, TSPY-like genes are multi-copy, ancient testis/spermatogenesis-specific functional genes.8,54,56

3.3. Human Y chromosome-encoded protein pathways

For the cellular pathway analysis, we entered the 66 proteins encoded on the human Y chromosome into the KEGG pathway-mapping database and the Pathway Studio software. The obtained pathway results were then integrated (Figure 2). Proteins involved in single pathways are shown in Supplementary Table 1 (Supporting Information). Our results showed that most Y chromosome genes fell into four major categories including cell signaling pathways, receptor signaling pathways, cellular processes, and metabolic pathways. The cell signaling pathways

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

category was further divided into 26 subcategories, including B cell activation, cell cycle regulation, notch signaling pathway, translation control, apoptosis regulation, adipocytokine signaling pathway, Jak-STAT signaling pathway, guanylate cyclase pathway, T cell activation, insulin action, melanogenesis, NK and mast cell activation, gonadotrope cell activation, hedgehog signaling pathway, PI3K-Akt signaling pathway,

adrenergic signaling in

cardiomyocytes, calcium signaling pathway, AMPK signaling pathway, Wnt signaling pathway, cGMP-PKG signaling pathway, sphingolipid signaling pathway, NF-kB signaling pathway, gap junction regulation, skeletal myogenesis control, cell adhesion molecules, and regulation of actin cytoskeleton. The receptor signaling pathways category was further divided into 10 subcategories, including cytokine-cytokine receptor interaction, neuroactive ligand-receptor interaction, neurotransmitters/neuropeptides receptor signaling pathway, gonadotropin-releasing hormone receptor signaling pathway, epidermal growth factor receptor signaling pathway, glutamate receptor signaling pathway, cannabinoid receptor signaling pathway, cholinergic receptor signaling pathway, thrombin receptor signaling pathway, and erythropoietin receptor signaling pathway. The cellular processes category was further divided into 9 subcategories, including mRNA transcription and processing, histone acetylation, hematopoietic cell lineage processes, histone-mediated chromatin remodeling, rRNA transcription and processing, translation processes, SNARE interactions in vesicular transport, tight junction assembly, and circadian rhythm. Lastly, the metabolic pathways category was divided into 3 subcategories, including tryptophan metabolism, inositol phosphate metabolism, and glycerophospholipid and ether lipid metabolism.

In the cell signaling category, eight proteins involved in B cell activation (NLGN4Y, PLCXD1, RPS4Y2, EIF1AY, RPS4Y1, PCDH11Y, CD99, and VAMP7) were identified, while seven proteins (each) were involved in cell cycle regulation (PLCXD1, RPS4Y2, CDY2B, EIF1AY, CDY1, RPS4Y1, and ZBED1), the notch signaling pathway (ASMTL, NLGN4Y, CDY2B, CDY1, PCDH11Y, CD99, and ASMT), and translation control (RPS4Y2, EIF1AY, RPS4Y1, IL3RA, CSF2RA, IL9R, and CRLF2). Within the receptor signaling category, four proteins (CRLF2, CSF2RA, IL3RA and IL9R) were involved in cytokine-cytokine receptor interaction. Similarly, within the cellular processes category, four proteins (CDY2B, CDY1, PPP2R3B and EIF1AY) were involved in mRNA transcription and processing. Conversely, few

ACS Paragon Plus Environment

Page 13 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

proteins were involved in metabolic pathways. It is difficult to discuss the role of every Y chromosome-encoded protein associated with these pathways. However, many of the identified pathways seem to have crucial roles in the testis or during spermatogenesis. We also speculate that many proteins encoded on the human Y chromosome interact with autosomal proteins in regulating these pathways.

According to our pathway annotation, a greater number of proteins (8 proteins: NLGN4Y, PLCXD1, RPS4Y2, EIF1AY, RPS4Y1, PCDH11Y, CD99, and VAMP7) were involved in B cell activation. According to the Human Protein Atlas database, the immunohistochemical localization of these proteins was shown strong to medium level in most of the normal human tissues. Particularly, they express in liver, bone marrow, spleen, and lymph node that are key lymphoid organs for generating B-lymphocytes (B cells).57,58 B-lymphocytes specifically involved in antibody production and humoral immunity in humans and other vertebrates.59 Antibody produced by B-lymphocytes also plays vital role in immunoregulation, acting either as an immunopotentiating influence or as a negative feedback.59 Furthermore, Park and colleagues60 suggested that all kind of lymphocytes in lymph nodes express CD99, and the expression of CD99 is much strong in IgG producing B-lymphocytes than IgM producing Blymphocytes. Thus, the findings of pathway analysis may indicate that a notable number of proteins encoded on the human Y chromosome contribute B cell activation and humoral immunity in humans. We next examined the proteins involved in a greater number of pathways. We found that PLCXD1 was involved in 19 different pathways (Supplementary Table 1, Supporting Information). The gene encoding PLCXD1 localized to the pseudoautosomal region of the human Y (Yp11.32) and X (Xp22.33) chromosomes, where active recombination occurs during meiosis. However, this gene is X chromosome-specific in ruminant species.61,62 The specific role of PLCXD1 in human male and female reproduction remains to be elucidated. According to the Human Protein Atlas database, the immunohistochemical localization of PLCXD1 has been found in about 36 normal human tissues that distributed in several human systems including male/female reproductive system, digestive system, respiratory system, immune system, nervous system, and endocrine system. PLCXD1 is also expressed in the blood of male ischemic stroke patients,63 and was also shown to act as an efficient tumor suppressor in human malignant melanoma.64

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

3.4. Human Y chromosome-encoded protein interactions

In humans, most proteins are predicted to interact with other proteins during regulation of signaling pathways, cellular processes, metabolism, and disease. In this chapter, our goal was to analyze the protein-protein interactions of human Y chromosome-encoded proteins through the STRING database and Pathway Studio software. When we subjected the 66 proteins encoded on human Y chromosome into the STRING database program, it did not recognize three proteins (LOC105379416, XKRY, and XKRY2). In a medium confidence search (score 0.4), no experimental evidence was found for the interaction of 14 Y chromosome-encoded proteins. Remaining 49 proteins showed strong/medium interaction with one or more Y chromosomeencoded proteins (Figure 3). The proteins exhibited stronger associations are SRY, ZFY, AMELY, TBL1Y, TSPY1, TSPY10, UTY, TMSB4Y, KDM5D, DAZ1, DAZ4, USP9Y, DDX3Y, RPS4Y1, RPS4Y2, CDY1, CDY1B, CDY2A, CDY2B, EIF1AY, ASMT, ASMTL, RBMY1A1, AKAP17A, SLC25A6, HSFY2, DHRSX, GTPBP6, PRY, BPY2, SHOX, CD99, CRLF2, and P2RY8 (Figure 4). Theoretically, there are many mechanisms, including cooccurrence, co-expression, co-function, and co-regulation, encouraging interactions between translated proteins. The medium and high confidence interaction search showed, for instance, DAZ1 interaction with several testis-specific proteins. DAZ multi-copy genes (DAZ1, DAZ2, DAZ3, and DAZ4) and several testis-specific genes are located in close proximity along the euchromatic long arm of Y chromosome, particularly at the azoospermia factor regions (AZFa, AZFb, and AZFc).65-68 Micro-deletions in the azoospermia factor regions of the human Y chromosome cause the loss of several AZF-associated proteins. This affects spermatogenesis, which leads to male infertility.68,69

When we entered the 66 proteins encoded on human Y chromosome into the Pathway Studio software, it returned no results for 50 proteins. Interestingly, 16 proteins, SHOX, CSF2RA, VAMP7, ASMT, KDM5D, PPP2R3B, ZBED1, IL9R, CRLF2, IL3RA, USP9Y, UTY, RBMY1A1, DAZ1, CD99, and SRY, were shown to interact with one or more proteins derived from other human chromosomes (Figure 5). Among these proteins, SHOX, CSF2RA, VAMP7, ASMT, PPP2R3B, ZBED1, IL9R, CRLF2, IL3RA, and CD99 are believed to be deriving from

ACS Paragon Plus Environment

Page 15 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

pseudoautosomal regions of human Y chromosome, and have 100% identity with human X chromosome sequences (Figure 1, Table 3). Therefore, their interaction with X or autochromosomes derived proteins should be expected hypothetically. Although KDM5D, USP9Y, UTY, and RBMY1A1 are located in MSY region of human Y chromosome, they also exhibit some percent identities with X chromosome homologous sequences, and shows widespread expression and functions in human. Intriguingly, the spermatogenesis specific DAZ170 and male sex determination specific SRY48 interaction with somatic chromosome derived proteins provide vital information regarding their role in triggering diverse proteins that contribute to the development of testis and spermatogenesis. When compared to DAZ1, the SRY protein interacted with 17 other chromosome-encoded proteins, including NR5A1 (chromosome 9), EP300 (chromosome 22), HDAC3 (chromosome 5), CBX2 (chromosome 17), WDR5 (chromosome 9), FOSL1 (chromosome 11), AMH (chromosome 19), PTGDS (chromosome 9), PDE11A (chromosome 2), SOX9 (chromosome 17), NTF3 (chromosome 12), MAOA (chromosome X), KPNB1 (chromosome 17), TCF21 (chromosome 6), SP1 (chromosome 12), INSL3 (chromosome 19), and WT1 (chromosome 11). Once male sex is determined, SRY triggers proteins localized in diverse chromosomes that contribute to the development of Sertoli cells, the testis, and sex cords. For example, testicular activation of AMH causes regression of the Mullerian duct in males. In females, the Mullerian duct differentiates into the uterus, fallopian tube, and upper part of the vagina. The regression of Mullerian duct in males provides a path for the development of the Wolffian duct, which is anlagen of the vas deferens, seminal vesicles, and epididymis.71 The production of AMH by Sertoli cells is initiated by SRY in association with a few male determining factors, including NR5A1, SOX9, and WT1.71-73 In addition, the 5′-flanking region of the human SRY gene has binding sites for SP1, NR5A1, SOX9, WT1, and cAMP.74 Thevenet et al.75 demonstrated that interaction of EP300 with SRY leads to SRY acetylation on one lysine residue, thus modifying SRY nuclear sublocalization. In addition, HDAC3 associates with, and subsequently deacetylates, SRY. The co-expression of EP300 and HDAC3 in somatic cells of the genital ridge, along with simultaneous SRY expression, may indicate that these proteins regulate SRY.75 Furthermore, Xu et al. demonstrated that SRY directly targets WDR5, which has multiple roles, including methylation of histone H3K4, osteoblast differentiation, and self-renewal of embryonic stem cells in humans.76

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 40

Furthermore, the interaction of WDR5 and SRY activates SOX9 expression, which is required for Sertoli cell differentiation and AMH production.73,76 3.5. Orthologs of human Y-encoded proteins in chimpanzee and mouse Y chromosomes

According to the NCBI genome annotation database, DNA sequencing of somatic and X chromosomes was completed in several mammalian species. However, Y chromosome sequencing was not conducted.24 As discussed earlier, this might be due to the presence of repeat sequences within the Y chromosome.8 Sequencing Y chromosomes in a range of mammalian species will provide the scientific community with comparative access to genes and proteins responsible for mammalian male reproduction. To our knowledge, the complete/partial Y chromosome sequences are available from only seven mammalian species in the NCBI genome annotation database. When we examined for the human protein orthologs in the Y chromosomes of chimpanzee and mouse, we found six human protein orthologs including SRY, ZFY, DDX3Y, USP9Y, UTY, and KDM5D in both chimpanzee and mouse. Apart from these six proteins, we found 14 human protein orthologs (AMELY, CDY1, DAZ1, DAZ2, EIF1AY, RBMY1F, RBMY1J, CD99, NLGN4Y, PRORY, RPS4Y1, RPS4Y2, TBL1Y, and TMSB4Y) in chimpanzee only, and one human protein orthologous (RBMY1A1) in mouse only (Figure 6A, Table 4). The orthologs of human proteins in chimpanzee and mouse could vary due to certain variations during primate and rodent evolution. For instances, multiple-copies of DAZ proteins were reported in human and chimpanzee Y chromosomes,77,78 but no evidence for the existence of DAZ members in mouse. The DAZ encoding genes arose from the ancestral somatic genes boule-like RNA-binding protein (BOLL) and deleted in azoospermia-like (DAZL) during primate evolution.79-81 Using the Pathway Studio software, we found the interaction and functions of six proteins commonly located in the Y chromosomes of human, chimpanzee and mouse (Figure 6B). Among these six proteins, the evolutionary conserved protein, SRY, was characterized as a crucial male-sex determination factor.48 Before identification of SRY, another evolutionary conserved protein, ZFY, was considered as a crucial male-sex determination factor.82 However, later evidence refuted this hypothesis.83 The human spermatogenesis-associated proteins DDX3Y and USP9Y were localized at the AZFa region of the human Y chromosome. Deletion of the Y chromosomal region containing the protein-coding DDX3Y and USP9Y genes causes severe

ACS Paragon Plus Environment

Page 17 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

spermatogenic impairment.69 The ubiquitously expressed protein, UTY, was also localized to the AZFa region of the human Y chromosome. This protein is primarily characterized as a male specific H-Y antigen.46,69 The human KDM5D protein was localized within AZFb region in human Y chromosome, and it was also characterized as male-specific H-Y antigen. Expression of this protein can lead to rejection of male tissues by female recipients during tissue transplantation.67,84-85 According to this analysis, the number of orthologous proteins is limited between human and the chimpanzee and mouse. This is likely due to the number of uncharacterized genes and proteins associated with chimpanzee and mouse Y chromosomes in comparison to the human Y chromosome.

4. CONCLUSIONS

The human Y chromosome is significantly smaller than other human chromosomes, and it contains several genes encoding proteins with an exclusive role in male sex determination and reproductive function. Our bioinformatics analyses performed in this study revealed important information about human Y chromosome encoded proteins, their cellular pathways, and interactions that may be of interest to the broader scientific community. In this study, several human proteins were frequently associated with several cell-signaling pathways. Thus, focusing on these particular pathways and their related protein functions will be valuable in future studies. Regarding interaction analysis, DAZ1 and USP9Y showed strong association with many human Y chromosome-encoded proteins. Conversely, SRY and VAMP7 showed strong association with other chromosome-encoded proteins. These results suggest that human Y chromosome-encoded proteins interact at the molecular level to perform their specific roles in the testis. Induced downregulation or overexpression of these Y chromosome-encoded proteins may reveal novel effects on spermatogenesis. Lastly, our analysis strongly suggests that genome annotation of Y chromosomes, and characterization of Y chromosome-encoded genes and proteins should be more established in non-human mammals. Particularly, there are fewer evidences for several members of TSPY, VCY, XKRY, CDY, HSFY, RBMY, PRY, and BPY family proteins in nonhuman mammals. These proteins play crucial roles in male specific region of human Y chromosome.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 40

SUPPORTING INFORMATION Supplementary table 1. Cellular pathways of proteins encoded on the human Y chromosome. This material is available free of charge via the Internet at http://pubs.acs.org

AUTHOR INFORMATION Corresponding Author *E-mail: [email protected]. Tel: +82-31-670-4841. Fax: +82-31-675-9001

NOTES The authors declare no conflict of interest.

ACKNOWLEDGMENTS This work was carried out with the support of “Cooperative Research Program for Agriculture Science & Technology Development (Project No. PJ01106101)” Rural Development Administration, Republic of Korea.

REFERENCES (1) Goodfellow, P.; Darling, S.; Wolfe, J. The human Y chromosome. J Med Genet. 1985, 22, 329−344. (2) German, J.; Simpson, J. L.; Chaganti, R. S.; Summitt, R. L.; Reid, L. B.; Merkatz, I. R. Genetically determined sex-reversal in 46,XY humans. Science. 1978, 202, 53−56. (3) Elliott, D. J.; Cooke, H. J. Y chromosome microdeletions and male infertility. Hum Fertil (Camb). 1998, 1, 64−68. (4) Cooke, H. J. Y chromosome and male infertility. Rev Reprod. 1999, 4, 5−10. (5) Krausz, C.; McElreavey, K. Y chromosome and male infertility. Front Biosci. 1999, 4, E1−8. (6) Yunis, E.; Silva, R.; Ramirez, E.; Nossa, M. A. X/XYq - mosaicism and mixed gonadal dysgenesis. J Med Genet. 1977, 14, 262−265. (7) Disteche, C. M.; Casanova, M.; Saal, H.; Friedman, C.; Sybert, V.; Graham, J.; Thuline, H.;

ACS Paragon Plus Environment

Page 19 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Page, D. C.; Fellous, M. Small deletions of the short arm of the Y chromosome in 46,XY females. Proc Natl Acad Sci U S A. 1986, 83, 7841−7844. (8) Graves, J. A. Sex chromosome specialization and degeneration in mammals. Cell. 2006, 124, 901−914. (9) Fryns, J. P.; Kleczkowska, A.; Kubień, E.; Van den Berghe, H. XYY syndrome and other Y chromosome polysomies. Mental status and psychosocial functioning. Genet Couns. 1995, 6, 197−206. (10) Bardsley, M. Z.; Kowal, K.; Levy, C.; Gosek, A.; Ayari, N.; Tartaglia, N.; Lahlou, N.; Winder, B.; Grimes, S.; Ross, J. L. 47,XYY syndrome: clinical phenotype and timing of ascertainment. J Pediatr. 2013, 163, 1085−1094. (11) Kim, I. W.; Khadilkar, A. C.; Ko, E. Y.; Sabanegh, E. S. Jr. 47,XYY Syndrome and Male Infertility. Rev Urol. 2013, 15, 188−196. (12) Margari, L.; Lamanna, A. L.; Craig, F.; Simone, M.; Gentile, M. Autism spectrum disorders in XYY syndrome: two new cases and systematic review of the literature. Eur J Pediatr. 2014, 173, 277−283. (13) Hauschka, T. S.; Hasson, J. E.; Goldstein, M. N.; Koepf, G. F.; Sandberg, A. A. An XYY man with progeny indicating familial tendency to non-disjunction. Am J Hum Genet. 1962, 14, 22−30. (14) Schoepflin, G. S.; Centerwall, W. R. 48,XYYY: a new syndrome? J Med Genet. 1972, 9, 356−360. (15) Hori, N.; Kato, T.; Sugimura, Y.; Tajima, K.; Tochigi, H.; Kawamura, J. A male subject with 3 Y chromosomes (48, XYYY): a case report. J Urol. 1988, 139, 1059−1061. (16) Teyssier, M.; Pousset, G. 46,XY/48,XYYY mosaicism case report and review of the literature. Genet Couns. 1994, 5, 357−361. (17) Venkataraman, G.; Craft, I. Triple-Y syndrome following ICSI treatment in a couple with normal chromosomes: case report. Hum Reprod. 2002, 17, 2560−2563. (18) Gigliani, F.; Gabellini, P.; Marcucci, L.; Petrinelli, P.; Antonelli, A. Peculiar mosaicism 47,XYY/48,XYYY/49,XYYYY in man. J Genet Hum. 1980, 28, 47−51. (19) Sirota, L.; Zlotogora, Y.; Shabtai, F.; Halbrecht, I.; Elian, E. 49,XYYYY. A case report. Clin Genet. 1981, 19, 87−93. (20) Shanske, A.; Sachmechi, I.; Patel, D. K.; Bishnoi, A.; Rosner, F. An adult with 49,XYYYY karyotype: case report and endocrine studies. Am J Med Genet. 1998, 80, 103−106. (21) Frey-Mahn, G.; Behrendt, G.; Geiger, K.; Sohn, C.; Schäfer, D.; Miny, P. Y chromosomal polysomy: a unique case of 49,XYYYY in amniotic fluid cells. Am J Med Genet A. 2003, 118A, 184−186.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 40

(22) Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; MarkoVarga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S. The ChromosomeCentric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol. 2012a, 30, 221−223. (23) Paik, Y. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.; Yamamoto, T.; Legrain, P.; Lee, H. J.; Na, K.; Jeong, S. K.; He ,F.; Binz, P. A.; Nishimura, T.; Keown, P.; Baker, M. S.; Yoo, J. S.; Garin, J.; Archakov, A.; Bergeron, J.; Salekdeh, G. H.; Hancock, W. S. Standard guidelines for the chromosome-centric human proteome project. J Proteome Res. 2012b, 11, 2005−2013. (24) Jangravi, Z.; Alikhani, M.; Arefnezhad, B.; Sharifi Tabar, M.; Taleahmad, S.; Karamzadeh, R.; Jadaliha, M.; Mousavi, S. A.; Ahmadi Rastegar. D.; Parsamatin, P.;Vakilian, H.; Mirshahvaladi, S.; Sabbaghian, M.; Mohseni Meybodi, A.; Mirzaei, M.; Shahhoseini, M.; Ebrahimi, M.; Piryaei, A.; Moosavi-Movahedi, A. A.; Haynes, P. A.; Goodchild, A. K.; Nasr-Esfahani, M. H.; Jabbari, E.; Baharvand, H.; Sedighi Gilani, M. A.; Gourabi, H.; Salekdeh, G. H. A fresh look at the male-specific region of the human Y chromosome. J Proteome Res. 2013, 12, 6−22. (25) Pruitt, K. D.; Tatusova, T.; Brown, G. R.; Maglott, D. R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012, 40,(Database issue), D130−135. (26) Pruitt, K. D.; Brown, G. R.; Hiatt, S. M.; Thibaud-Nissen, F.; Astashyn, A.; Ermolaeva, O.; Farrell, C. M.; Hart, J.; Landrum, M. J.; McGarvey, K. M.; Murphy, M. R.; O'Leary, N. A.; Pujar, S.; Rajput, B.; Rangwala, S. H.; Riddick, L. D.; Shkeda, A.; Sun, H.; Tamez, P.; Tully, R. E.; Wallin, C.; Webb, D.; Weber, J.; Wu, W.; DiCuccio, M.; Kitts, P.; Maglott, D. R.; Murphy, T. D.; Ostell, J. M. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014, 42(Database issue), D756−763. (27) International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001, 409, 860−921. (28) Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437, 69−87. (29) Marmoset Genome Sequencing and Analysis Consortium. The common marmoset genome provides insight into primate biology and evolution. Nat Genet. 2014, 46, 850−857. (30) Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420, 520−562. (31) Rat Genome Sequencing Project Consortium. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428, 493−521. (32)

Schook, L. B.; Beever, J. E.; Rogers, J.; Humphray, S.; Archibald, A.; Chardon, P.;

ACS Paragon Plus Environment

Page 21 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Milan, D.; Rohrer, G.; Eversole, K. Swine Genome Sequencing Consortium (SGSC): a strategic roadmap for sequencing the pig genome. Comp Funct Genomics. 2005, 6, 251−255. (33) Groenen et al., Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012, 491, 393−398. (34) Lane, L.; Argoud-Puy, G.; Britan, A.; Cusin, I.; Duek, P. D.; Evalet, O.; Gateau, A.; Gaudet, P.; Gleizes, A.; Masselot, A.; Zwahlen, C.; Bairoch, A. neXtProt: a knowledge platform for human proteins. Nucleic Acids Res. 2012, 40, D76−83. (35) Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R. The PeptideAtlas project. Nucleic Acids Res. 2006, 34, D655−658. (36) Uhlen, M.; Oksvold, P.; Fagerberg, L.; Lundberg, E.; Jonasson, K.; Forsberg, M.; Zwahlen, M.; Kampf, C.; Wester, K.; Hober, S.; Wernerus, H.; Björling, L.; Ponten, F. Towards a knowledge-based Human Protein Atlas. Nat Biotechnol. 2010, 28, 1248−1250. (37) Johnson, M.; Zaretskaya, I.; Raytselis, Y.; Merezhuk, Y.; McGinnis, S.; Madden, T. L. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008, 36, W5−9. (38) Kanehisa, M.; Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27−30. (39) Aoki-Kinoshita, K. F.; Kanehisa, M. Gene annotation and pathway mapping in KEGG. Methods Mol Biol. 2007, 396, 71−91. (40) Nikitin, A.; Egorov, S.; Daraselia, N.; Mazo, I. Pathway studio--the analysis and navigation of molecular networks. Bioinformatics. 2003, 19, 2155−2157. (41) Kwon, W. S.; Rahman, M. S.; Lee, J. S.; Kim, J.; Yoon, S. J.; Park, Y. J.; You, Y. A.; Hwang, S.; Pang, M. G. A comprehensive proteomic approach to identifying capacitation related proteins in boar spermatozoa. BMC Genomics. 2014, 15, 897. (42) Rengaraj, D.; Kwon, W. S.; Pang, M. G. Effects of motor vehicle exhaust on male reproductive function and associated proteins. J Proteome Res. 2015, 14, 22−37. (43) Franceschini, A.; Szklarczyk, D.; Frankild, S.; Kuhn, M.; Simonovic, M.; Roth, A.; Lin, J.; Minguez, P.; Bork, P.; von Mering, C.; Jensen, L. J. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013, 41, D808−815. (44) Kiss, T. Small nucleolar RNA-guided post-transcriptional modification of cellular RNAs. EMBO J. 2001, 20, 3617−3622. (45) Mann, M.; Jensen, O. N. Proteomic analysis of post-translational modifications. Nat Biotechnol. 2003, 21, 255−261. (46) Warren, E. H.; Gavin, M. A.; Simpson, E.; Chandler, P.; Page, D. C.; Disteche, C.; Stankey, K. A.; Greenberg, P. D.; Riddell, S. R. The human UTY gene encodes a novel

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 40

HLA-B8-restricted H-Y antigen. J Immunol. 2000, 164, 2807−2814. (47) Torikai, H.; Akatsuka, Y.; Miyazaki, M.; Warren, E. H. 3rd; Oba, T.; Tsujimura, K.; Motoyoshi, K.; Morishima, Y.; Kodera, Y.; Kuzushima, K.; Takahashi, T. A novel HLAA*3303-restricted minor histocompatibility antigen encoded by an unconventional open reading frame of human TMSB4Y gene. J Immunol. 2004, 173, 7046−7054. (48) Sinclair, A. H.; Berta, P.; Palmer, M. S.; Hawkins, J. R.; Griffiths, B. L.; Smith, M. J.; Foster, J. W.; Frischauf, A. M.; Lovell-Badge, R.; Goodfellow, P. N. A gene from the human sex-determining region encodes a protein with homology to a conserved DNA-binding motif. Nature. 1990, 346, 240−244. (49) Quintana-Murci, L.; Krausz, C.; McElreavey, K. The human Y chromosome: function, evolution and disease. Forensic Sci Int. 2001, 118, 169−181. (50) Ali, S.; Hasnain, S. E. Molecular dissection of the human Y-chromosome. Gene. 2002, 283, 1−10. (51) Ali, S.; Hasnain, S. E. Genomics of the human Y-chromosome. 1. Association with male infertility. Gene. 2003, 321, 25−37. (52) Noordam, M. J.; Repping, S. The human Y chromosome: a masculine chromosome. Curr Opin Genet Dev. 2006, 16, 225−332. (53) Tilford, C. A.; Kuroda-Kawaguchi, T.; Skaletsky, H.; Rozen, S.; Brown, L. G.; Rosenberg, M.; McPherson, J. D.; Wylie, K.; Sekhon, M.; Kucaba, T. A.; Waterston, R. H.; Page, D. C. A physical map of the human Y chromosome. Nature. 2001, 409, 943−945. (54) Skaletsky, H.; Kuroda-Kawaguchi, T.; Minx, P. J.; Cordum, H. S.; Hillier, L.; Brown, L. G.; Repping, S.; Pyntikova, T.; Ali, J.; Bieri, T.; Chinwalla, A.; Delehaunty, A.; Delehaunty, K.; Du, H.; Fewell, G.; Fulton, L.; Fulton, R.; Graves, T.; Hou, S. F.; Latrielle, P.; Leonard, S.; Mardis, E.; Maupin, R.; McPherson, J.; Miner, T.; Nash, W.; Nguyen, C.; Ozersky, P.; Pepin, K.; Rock, S.; Rohlfing, T.; Scott, K.; Schultz, B.; Strong, C.; Tin-Wollam, A.; Yang, S. P.; Waterston, R. H.; Wilson, R. K.; Rozen, S.; Page, D. C. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature. 2003, 423, 825−837. (55)

Willard, H. F. Tales of the Y chromosome. Nature. 2003, 423, 810−811.

(56) Lahn, B. T.; Page, D. C. Functional coherence of the human Y chromosome. Science. 1997, 278, 675−680. (57) Asma, G. E.; Langlois van den, Bergh. R.; Vossen, J. M. Development of pre-B and B lymphocytes in the human fetus. Clin Exp Immunol. 1984, 56, 407−414. (58) Brown, A. R. Immunological functions of splenic B-lymphocytes. Crit Rev Immunol. 1992, 11, 395−417. (59) Miller, J. F. Cellular basis of the immune response. Acta Endocrinol Suppl. 1975, 194,

ACS Paragon Plus Environment

Page 23 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

55−76 (60) Park, C. K.; Shin, Y. K.; Kim, T. J.; Park, S. H.; Ahn, G. H. High CD99 expression in memory T and B cells in reactive lymph nodes. J Korean Med Sci. 1999, 14, 600−606. (61) Ross et al., The DNA sequence of the human X chromosome. Nature. 2005, 434, 325−337. (62) Das, P. J.; Chowdhary, B. P.; Raudsepp, T. Characterization of the bovine pseudoautosomal region and comparison with sheep, goat, and other mammalian pseudoautosomal regions. Cytogenet Genome Res. 2009, 126, 139−147. (63) Tian, Y.; Stamova, B.; Jickling, G. C.; Xu, H.; Liu, D.; Ander, B. P.; Bushnell, C.; Zhan, X.; Turner, R. J.; Davis, R. R.; Verro, P.; Pevec, W. C.; Hedayati, N.; Dawson, D. L.; Khoury, J.; Jauch, E. C.; Pancioli, A.; Broderick, J. P.; Sharp, F. R. Y chromosome gene expression in the blood of male patients with ischemic stroke compared with male controls. Gend Med. 2012, 9, 68−75. (64) Mithani, S. K.; Smith, I. M.; Califano, J. A. Use of integrative epigenetic and cytogenetic analyses to identify novel tumor-suppressor genes in malignant melanoma. Melanoma Res. 2011, 21, 298−307. (65) Reijo, R.; Lee, T. Y.; Salo, P.; Alagappan, R.; Brown, L. G.; Rosenberg, M.; Rozen, S.; Jaffe, T.; Straus, D.; Hovatta, O.; de la Chapelle, A.; Silber, S.; Page, D. C. Diverse spermatogenic defects in humans caused by Y chromosome deletions encompassing a novel RNA-binding protein gene. Nat Genet. 1995, 10, 383−393. (66) Yen, P. H.; Chai, N. N.; Salido, E. C. The human DAZ genes, a putative male infertility factor on the Y chromosome, are highly polymorphic in theDAZ repeat regions. Mamm Genome. 1997, 8, 756−759. (67) Navarro-Costa, P.; Plancha, C. E.; Gonçalves, J. Genetic dissection of the AZF regions of the human Y chromosome: thriller or filler for male (in)fertility? J Biomed Biotechnol. 2010, 2010, 936569. (68) Alechine, E.; Corach, D. High-throughput screening for spermatogenesis candidate genes in the AZFc region of the Y chromosome by multiplex real time PCR followed by high resolution melting analysis. PLoS One. 2014, 9, e97227. (69) Foresta, C.; Ferlin, A.; Moro, E. Deletion and expression analysis of AZFa genes on the human Y chromosome revealed a major role for DBY in male infertility. Hum Mol Genet. 2000, 9, 1161−1169. (70) Eberhart, C. G.; Maines, J. Z.; Wasserman, S. A. Meiotic cell cycle requirement for a fly homologue of human Deleted in Azoospermia. Nature. 1996, 381, 783−785. (71) Haqq, C. M.; King, C. Y.; Ukiyama, E.; Falsafi, S.; Haqq, T. N.; Donahoe, P. K.; Weiss, M. A. Molecular basis of mammalian sexual determination: activation of Müllerian inhibiting substance gene expression by SRY. Science. 1994, 266, 1494−1500.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 40

(72) de Santa Barbara, P.; Bonneaud, N.; Boizet, B.; Desclozeaux, M.; Moniot, B.; Sudbeck, P.; Scherer, G.; Poulat, F.; Berta, P. Direct interaction of SRY-related protein SOX9 and steroidogenic factor 1 regulates transcription of thehuman anti-Müllerian hormone gene. Mol Cell Biol. 1998, 18, 6653−6665. (73) de Santa Barbara, P.; Moniot, B.; Poulat, F.; Berta, P. Expression and subcellular localization of SF-1, SOX9, WT1, and AMH proteins during early human testicular development. Dev Dyn. 2000, 217, 293−298. (74) de Santa Barbara, P.; Méjean, C.; Moniot, B.; Malclès, M. H.; Berta, P.; BoizetBonhoure, B. Steroidogenic factor-1 contributes to the cyclic-adenosine monophosphate down-regulation of human SRYgene expression. Biol Reprod. 2001, 64, 775−783. (75) Thevenet, L.; Méjean, C.; Moniot, B.; Bonneaud, N.; Galéotti, N.; Aldrian-Herrada, G.; Poulat, F.; Berta, P.; Benkirane, M.; Boizet-Bonhoure, B. Regulation of human SRY subcellular distribution by its acetylation/deacetylation. EMBO J. 2004, 23, 3336−3345. (76) Xu, Z.; Gao, X.; He, Y.; Ju, J.; Zhang, M.; Liu, R.; Wu, Y.; Ma, C.; Ma, C.; Lin, Z.; Huang, X.; Zhao, Q. Synergistic effect of SRY and its direct target, WDR5, on Sox9 expression. PLoS One. 2012, 7, e34327. (77) Saxena, R.; de Vries, J. W.; Repping, S.; Alagappan, R. K.; Skaletsky, H.; Brown, L. G.; Ma, P.; Chen, E.; Hoovers, J. M.; Page, D. C. Four DAZ genes in two clusters found in the AZFc region of the human Y chromosome. Genomics. 2000, 67, 256−267. (78) Hughes, J. F.; Skaletsky, H.; Pyntikova, T.; Graves, T. A.; van Daalen, S. K.; Minx, P. J.; Fulton, R. S.; McGrath, S. D.; Locke, D. P.; Friedman, C.; Trask, B. J.; Mardis, E. R.; Warren, W. C.; Repping, S.; Rozen, S.; Wilson, R. K.; Page, D. C. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature. 2010, 463, 536−539. (79) Saxena, R.; Brown, L. G.; Hawkins, T.; Alagappan, R. K.; Skaletsky, H.; Reeve, M. P.; Reijo, R.; Rozen, S.; Dinulos, M. B.; Disteche, C. M.; Page, D. C. The DAZ gene cluster on the human Y chromosome arose from an autosomal gene that was transposed, repeatedly amplified and pruned. Nat Genet. 1996, 14, 292−299. (80) Xu, E. Y.; Moore, F. L.; Pera, R. A. A gene family required for human germ cell development evolved from an ancient meiotic gene conserved in metazoans. Proc Natl Acad Sci U S A. 2001, 98, 7414−7419. (81) Yu, Y. H.; Lin, Y. W.; Yu, J. F.; Schempp, W.; Yen, P. H. Evolution of the DAZ gene and the AZFc region on primate Y chromosomes. BMC Evol Biol. 2008, 8, 96. (82) Page, D. C.; Mosher, R.; Simpson, E. M.; Fisher, E. M.; Mardon, G.; Pollack, J.; McGillivray, B.; de la Chapelle, A.; Brown, L. G. The sex-determining region of the human Y chromosome encodes a finger protein. Cell. 1987, 51, 1091−1104. (83)

Palmer, M. S.; Sinclair, A. H.; Berta, P.; Ellis, N. A.; Goodfellow, P. N.; Abbas, N. E.;

ACS Paragon Plus Environment

Page 25 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Fellous, M. Genetic evidence that ZFY is not the testis-determining factor. Nature. 1989, 342, 937−939. (84) Wang, W.; Meadows, L. R.; den Haan, J. M.; Sherman, N. E.; Chen, Y.; Blokland, E.; Shabanowitz, J.; Agulnik, A. I.; Hendrickson, R. C.; Bishop, C. E.; Hunt, D. F.; Goulmy, E.; Engelhard, V. H. Human H-Y: a male-specific histocompatibility antigen derived from the SMCY protein. Science. 1995, 269, 1588−1590. (85)

Pennisi, E. Long-sought H-Y antigen found. Science. 1995, 269, 1515−1516.

Figure legends Figure 1. Proteins encoded on the human Y chromosome. (A) Schematic diagram shows 66 proteins or their corresponding genes that are encoded on the human Y chromosome. (B) Physical map of the human Y chromosome.

Figure 2. Cellular pathways of proteins encoded on the human Y chromosome. The cellular pathways of 66 proteins encoded on the human Y chromosome were determined using the KEGG pathway mapping database and Pathway Studio software. The pathway results obtained from the KEGG database and Pathway Studio software were then integrated.

Figure 3. Interactions among human Y chromosome-encoded proteins. A medium confidence view (score 0.4) of the interaction among human Y chromosome-encoded proteins was prepared using the STRING database program. Thicker lines represent stronger associations, and thinner lines represent medium associations.

Figure 4. Interactions among human Y chromosome-encoded proteins. A high confidence view (score 0.7) of the interaction among human Y chromosome-encoded proteins was prepared using the STRING database program.

Figure 5. Interactions of human Y chromosome-encoded proteins (blue) with other chromosome-encoded proteins. This schematic was prepared based on the output of the

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 40

Pathway Studio software.

Figure 6. Orthologs of human Y chromosome-encoded proteins. (A) Venn diagram showing the number of ortholog proteins found in the Y chromosomes of human, chimpanzee, and mouse. (B) Pathway Studio software-produced interactions and functions of six proteins commonly identified in human, chimpanzee, and mouse Y chromosomes.

ACS Paragon Plus Environment

Page 27 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Table 1. The National Center for Biotechnology Information (NCBI) reference sequence assembly and accession information of Y chromosomes in mammalian species. Species Mammalian primates Homo sapiens Pan troglodytes Chlorocebus sabaeus Callithrix jacchus Mammalian rodents Mus musculus Rattus norvegicus Other mammals Sus scrofa

Common Name

Reference Sequence Assembly Accession*

Annotation Release

Reference Sequence Accession

Size (Mb)

Human Chimpanzee Green monkey Marmoset

GCF_000001405.28 GCF_000001515.6 GCF_000409795.2 GCF_000004665.1

107 103 100 102

NC_000024.10 NC_006492.3 NC_023672.1 NC_013919.1

57.23 26.34 6.18 2.85

Mouse Norway rat

GCF_000001635.23 GCF_000001895.5

105 105

NC_000087.7 NC_024475.1

91.74 3.31

Pig

GCF_000003025.5

104

NC_010462.2

1.64

*Contributors: International Human Genome Sequencing Consortium (Human); Chimpanzee Sequencing and Analysis Consortium (Chimpanzee); Vervet Genomics Consortium (Green monkey); Marmoset Genome Sequencing and Analysis Consortium (Marmoset); Mouse Genome Sequencing Consortium (Mouse); Rat Genome Sequencing Project Consortium (Norway Rat); and Swine Genome Sequencing Consortium (Pig).

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 28 of 40

Table 2. Mapping position of proteins encoded on the human Y chromosome. S. No.

Symbol

Protein description

1

PLCXD1

2 3

GTPBP6 PPP2R3B

4 5

SHOX CRLF2

6

CSF2RA

7

IL3RA

8 9

SLC25A6 LOC105379416

10

ASMTL

11 12 13 14

P2RY8 AKAP17A ASMT DHRSX

15 16 17 18 19 20 21 22 23 24 25 26 27 28

ZBED1 CD99 SRY RPS4Y1 ZFY TGIF2LY PCDH11Y TSPY2 AMELY TBL1Y TSPY4 TSPY8 TSPY3 TSPY1

Phosphatidylinositol-specific phospholipase C, X domain containing 1 Putative GTP-binding protein 6 Serine/threonine-protein phosphatase 2A regulatory subunit B" subunit beta Short stature homeobox protein isoform SHOXa Cytokine receptor-like factor 2 isoform 1 precursor Granulocyte-macrophage colony-stimulating factor receptor subunit alpha isoform a precursor Interleukin-3 receptor subunit alpha isoform 1 precursor ADP/ATP translocase 3 PREDICTED: serine/arginine repetitive matrix protein 1-like N-acetylserotonin O-methyltransferase-like protein isoform 1 P2Y purinoceptor 8 A-kinase anchor protein 17A isoform 1 Acetylserotonin O-methyltransferase isoform 1 Dehydrogenase/reductase SDR family member on chromosome X precursor Zinc finger BED domain-containing protein 1 CD99 antigen isoform a precursor Sex-determining region Y protein 40S ribosomal protein S4, Y-linked 1 isoform 1 Zinc finger Y-chromosomal protein isoform 1 Homeobox protein TGIF2LY Protocadherin-11 Y-linked isoform a Testis-specific Y-encoded protein 2 Amelogenin, Y isoform precursor F-box-like/WD repeat-containing protein TBL1Y Testis-specific Y-encoded protein 4 Testis-specific Y-encoded protein 8 Testis-specific Y-encoded protein 3 Testis-specific Y-encoded protein 1 isoform

NCBI Ref.Seq protein ID NP_060860.1

Protein Size 323

Gene ID

Start

Stop

Strand

55344

284188

299335

Plus (+)

NP_036359.3 NP_037371.2

516 575

8225 28227

305074 334367

318787 386691

Minus (-) Minus (-)

NP_000442.1 NP_071431.2

292 371

6473 64109

630898 1190897

644636 1212634

Plus (+) Minus (-)

NP_758448.1

400

1438

1282704

1309479

Plus (+)

NP_002174.1

378

3563

1341766

1382465

Plus (+)

NP_001627.2 XP_011543943.1

298 358

293 105379416

1386602 1392008

1392009 1395560

Minus (-) Plus (+)

NP_004183.2

621

8623

1403269

1452840

Minus (-)

NP_835230.1 NP_005079.2 NP_001164509.1 NP_660160.2

359 695 373 330

286530 8227 438 207063

1465479 1593463 1615200 2221041

1466558 1601594 1643014 2500925

Minus (-) Plus (+) Plus (+) Minus (-)

NP_001164606.1 NP_002405.1 NP_003131.1 NP_000999.1 NP_003402.2 NP_631960.1 NP_001265548.1 NP_072095.2 NP_001134.1 NP_150600.1 NP_001157943.1 NP_001230650.1 NP_001071165.2 NP_003299.2

694 185 204 263 801 185 1037 308 192 522 314 308 308 308

9189 4267 6736 6192 7544 90655 83259 64591 266 90665 728395 728403 728137 7258

2488635 2691361 2786989 2841625 2953937 3579245 5032697 6246269 6866073 7025085 9337510 9357843 9398467 9467001

2490719 2740804 2787603 2866894 2979993 3579802 5104361 6248825 6872608 7091492 9340090 9360405 9401029 9469561

Minus (-) Plus (+) Minus (-) Plus (+) Plus (+) Plus (+) Plus (+) Plus (+) Minus (-) Plus (+) Plus (+) Plus (+) Plus (+) Plus (+)

ACS Paragon Plus Environment

Page 29 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

29 30

TSPY10 USP9Y

31 32 33 34 35 36 37 38 39 40 41

DDX3Y UTY TMSB4Y VCY VCY1B NLGN4Y XKRY CDY2B CDY2A XKRY2 HSFY1

42

HSFY2

43 44

KDM5D EIF1AY

45 46 47

RPS4Y2 PRORY RBMY1B

48

RBMY1A1

49

RBMY1D

50

RBMY1E

51 52

PRY2 RBMY1F

53

RBMY1J

54 55 56 57

PRY BPY2 DAZ1 DAZ2

TSPY-L Testis-specific Y-encoded protein 10 Probable ubiquitin carboxyl-terminal hydrolase FAF-Y ATP-dependent RNA helicase DDX3Y isoform 1 Histone demethylase UTY isoform 1 Thymosin beta-4, Y-chromosomal Variable charge, Y-linked Variable charge, Y-linked 1B Neuroligin-4, Y-linked isoform 1 precursor Testis-specific XK-related protein, Y-linked Testis-specific chromodomain protein Y 2B Testis-specific chromodomain protein Y 2A Testis-specific XK-related protein, Y-linked 2 Heat shock transcription factor, Y-linked 1 isoform 1 Heat shock transcription factor, Y-linked 2 isoform 1 Lysine-specific demethylase 5D isoform 1 Eukaryotic translation initiation factor 1A, Ychromosomal isoform 1 40S ribosomal protein S4, Y-linked 2 isoform 2 Proline-rich protein, Y-linked RNA-binding motif protein, Y chromosome, family 1 member B RNA-binding motif protein, Y chromosome, family 1 member A1 RNA-binding motif protein, Y chromosome, family 1 member D RNA-binding motif protein, Y chromosome, family 1 member E PTPN13-like protein, Y-linked 2 RNA-binding motif protein, Y chromosome, family 1 member F isoform 1 RNA-binding motif protein, Y chromosome, family 1 member J PTPN13-like protein, Y-linked Testis-specific basic protein Y 2 Deleted in azoospermia protein 1 Deleted in azoospermia protein 2 isoform 1

NP_001269398.1 NP_004645.2

308 2555

100289087 8287

9527926 12709448

9530488 12859416

Plus (+) Plus (+)

NP_001116137.1 NP_872601.1 NP_004193.1 NP_004670.1 NP_870996.1 NP_055708.3 NP_004668.2 NP_001001722.1 NP_004816.1 NP_001002906.1 NP_149099.2

660 1079 44 125 125 816 117 541 541 117 401

8653 7404 9087 9084 353513 22829 9082 203611 9426 353515 86614

12904937 13323114 13704336 13985871 14056290 14622120 17769544 17878267 18026115 18136112 18546788

12918122 13479665 13705259 13986440 14056859 14841262 17769897 17879892 18027740 18136465 18548471

Plus (+) Minus (-) Plus (+) Minus (-) Plus (+) Plus (+) Minus (-) Minus (-) Plus (+) Plus (+) Plus (+)

NP_714927.1

401

159119

18771935

18773618

Minus (-)

NP_001140177.1 NP_004672.2

1570 144

8284 9086

19705995 20575872

19744534 20592346

Minus (-) Plus (+)

NP_001034656.1 NP_001269400.1 NP_001006121.1

263 182 496

140032 100533178 378948

20756164 21383186 21513351

20781032 21386263 21525508

Plus (+) Minus (-) Plus (+)

NP_005049.1

496

5940

21536892

21549048

Plus (+)

NP_001006120.2

496

378949

21880354

21892513

Minus (-)

NP_001006118.2

496

378950

21903896

21916054

Minus (-)

NP_001002758.1 NP_689798.1

147 496

442862 159163

22072323 22168820

22084839 22180969

Minus (-) Minus (-)

NP_001006117.2

496

378951

22405448

22417603

Plus (+)

NP_004667.2 NP_004669.2 NP_004072.3 NP_065096.2

147 106 744 558

9081 9083 1617 57055

22501565 22992344 23135210 23219741

22514070 22998268 23198800 23285501

Plus (+) Plus (+) Minus (-) Plus (+)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

58

CDY1B

59 60 61 62 63

BPY2B DAZ3 DAZ4 BPY2C CDY1

64 65 66

SPRY3 VAMP7 IL9R

Testis-specific chromodomain protein Y 1B isoform a Testis-specific basic protein Y 2B Deleted in azoospermia protein 3 Deleted in azoospermia protein 4 isoform 1 Testis-specific basic protein Y 2C Testis-specific chromodomain protein Y 1 isoform a Protein sprouty homolog 3 Vesicle-associated membrane protein 7 isoform 1 Interleukin-9 receptor isoform 1 precursor

Page 30 of 40

NP_001003894.1

540

253175

24046066

24047688

Minus (-)

NP_001002760.1 NP_065097.2 NP_001005375.1 NP_001002761.1 NP_733841.1

106 438 579 106 540

442867 57054 57135 442868 9085

24626085 24768934 24834127 25038098 25622443

24632010 24813185 24901175 25044023 25624065

Plus (+) Minus (-) Plus (+) Minus (-) Plus (+)

NP_001291919.1 NP_005629.1 NP_002177.2

288 220 521

10251 6845 3581

56960392 57075987 57184280

56961258 57128471 57196929

Plus (+) Plus (+) Plus (+)

ACS Paragon Plus Environment

Page 31 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Table 3. Comparison of the NCBI human Y chromosome-encoded proteins with core datasets of the Human Proteome Project. S. No.

NCBI Protein (Y-linked)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

PLCXD1 GTPBP6 PPP2R3B SHOX CRLF2 CSF2RA IL3RA SLC25A6 LOC105379416 ASMTL P2RY8 AKAP17A ASMT DHRSX ZBED1 CD99 SRY RPS4Y1 ZFY TGIF2LY PCDH11Y TSPY2 AMELY TBL1Y TSPY4 TSPY8 TSPY3 TSPY1 TSPY10 USP9Y DDX3Y UTY TMSB4Y VCY VCY1B

neXtProt (Release 2014-09-19) Protein Evidence Protein level Protein level Protein level Transcript level Protein level Transcript level Transcript level Transcript level Homology Homology Homology Protein level Homology Protein level Protein level Protein level Protein level Protein level -

PeptideAtlas (Build 2014-08) Identification Status Not observed Canonical Possibly distinguished Subsumed by Q8IUE1 Indistinguishable to Q9BZA7 Canonical Not observed NTT subsumed by O60907 Indistinguishable to A6NKD2 Subsumed by A6NKD2 Indistinguishable to A6NKD2 Possibly distinguished Identical to P0CV99 Possibly distinguished Canonical Canonical Subsumed by P62328 Canonical -

The Human Protein Atlas (Version 13) Detection Status Immunohistochemistry Immunohistochemistry Transcript only Transcript only Transcript only Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Transcript only Immunohistochemistry Transcript only Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Transcript only Immunohistochemistry Immunohistochemistry Transcript only Immunohistochemistry Immunohistochemistry

ACS Paragon Plus Environment

NCBI BLASTP X-homologous PLCXD1 GTPBP6 PPP2R3B SHOX CRLF2 CSF2RA IL3RA SLC25A6 LOC105373102 ASMTL P2RY8 AKAP17A ASMT DHRSX ZBED1 CD99 SOX3 RPS4X ZFX TGIF2LX PCDH11X TSPYL2 AMELX TBL1X TSPYL2 TSPYL2 TSPYL2 TSPYL2 TSPYL2 USP9X DDX3X UTX TMSB4X VCX VCX

% Identities 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 74% 93% 92% 89% 98% 49% 88% 90% 49% 47% 49% 49% 49% 92% 92% 84% 93% 96% 96%

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

NLGN4Y XKRY CDY2B CDY2A XKRY2 HSFY1 HSFY2 KDM5D EIF1AY RPS4Y2 PRORY RBMY1B RBMY1A1 RBMY1D RBMY1E PRY2 RBMY1F RBMY1J PRY BPY2 DAZ1 DAZ2 CDY1B BPY2B DAZ3 DAZ4 BPY2C CDY1 SPRY3 VAMP7 IL9R

Transcript level Transcript level Protein level Transcript level Protein level Protein level Protein level Protein level Protein level Transcript level Transcript level Protein level Transcript level Protein level Transcript level Transcript level Protein level Protein level Protein level Protein level Protein level Protein level Protein level Protein level Protein level Protein level -

Subsumed by Q8N0W4 Not observed Subsumed by Q9Y232 Not observed Canonical Possibly distinguished Possibly distinguished Possibly distinguished Not observed Possibly distinguished Possibly distinguished Possibly distinguished Canonical Possibly distinguished Not observed Not observed Indistinguishable to Q13117 Canonical Indistinguishable to Q9Y6F7 Indistinguishable to Q13117 Indistinguishable to Q13117 Indistinguishable to Q9Y6F7 -

Immunohistochemistry Transcript only Transcript only Transcript only Transcript only Immunohistochemistry Immunohistochemistry Immunohistochemistry Transcript only Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Immunohistochemistry Transcript only Immunohistochemistry Immunohistochemistry Transcript only Transcript only Immunohistochemistry Immunohistochemistry Transcript only Transcript only Immunohistochemistry Immunohistochemistry Immunohistochemistry

ACS Paragon Plus Environment

Page 32 of 40

NLGN4X XKRX XKRX HSFX1 HSFX1 KDM5C EIF1AX RPS4X ASMT RBMX RBMX RBMX RBMX RBMX RBMX VAMP7 IL9R

98% 36% 36% 48% 48% 83% 99% 92% 89% 56% 56% 56% 56% 56% 56% 100% 100%

Page 33 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Table 4. Orthologs of human Y chromosome-encoded proteins in chimpanzee and mouse Y chromosomes. S. No.

Human Protein

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

SRY ZFY DDX3Y USP9Y UTY KDM5D AMELY CDY1 DAZ1 DAZ2 EIF1AY RBMY1F RBMY1J CD99 NLGN4Y PRORY RPS4Y1 RPS4Y2 TBL1Y TMSB4Y RBMY1A1

Chimpanzee Protein SRY ZFY DDX3Y USP9Y UTY KDM5D AMELY CDY1 DAZ1 DAZ2 EIF1AY RBMY1F RBMY1J CD99 NLGN4Y PRORY RPS4Y1 RPS4Y2 TBL1Y TMSB4Y -

% Identities 97% 99% 98% 97% 98% 97% 98% 97% 92% 85% 99% 95% 94% 98% 96% 94% 99% 97% 97% 95% -

Gene ID 449510 449580 449508 465985 449579 449032 473886 744034 749905 738636 449499 736268 736860 751057 449030 101059107 449509 449636 465981 100608527 -

Mouse Protein SRY ZFY1 DDX3Y USP9Y UTY KDM5D RBMY1A1

ACS Paragon Plus Environment

% Identities 55% 69% 89% 83% 75% 78% 44%

Gene ID 21674 22767 26900 107868 22290 20592 19657

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Proteins encoded on the human Y chromosome. (A) Schematic diagram shows 66 proteins or their corresponding genes that are encoded on the human Y chromosome. (B) Physical map of the human Y chromosome. 425x550mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2. Cellular pathways of proteins linked to the human Y chromosome. The cellular pathways of 66 proteins linked to the human Y chromosome were determined using the KEGG pathway mapping database and Pathway Studio software. The pathway results obtained from the KEGG and Pathway Studio databases were then integrated. 346x373mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Interactions among human Y chromosome-linked proteins. A medium confidence view (score 0.4) of the interaction among human Y chromosome-linked proteins was prepared using the STRING program. Thicker lines represent stronger associations, and thinner lines represent medium associations. 246x190mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4. Interaction among human Y chromosome-linked proteins. A high confidence view (score 0.7) of the interaction among human Y chromosome-linked proteins was prepared using the STRING program. 211x190mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Interaction of human Y chromosome-linked proteins (blue) with other chromosomelinked proteins. This schematic was prepared based on the output of the Pathway Studio database. 338x155mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 6. Orthologs of human Y chromosome-linked proteins. (A) Venn diagram showing the number of ortholog proteins found in the Y chromosomes of human, chimpanzee, and mouse. (B) Pathway Studio software-produced interactions and functions of six proteins commonly identified in human, chimpanzee, and mouse Y chromosomes. 1358x502mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for TOC only 1197x765mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 40 of 40