Novel Insights into the Evolution and Structural ... - ACS Publications

Dec 26, 2014 - 1876 Bernal, Buenos Aires, Argentine. •S Supporting Information. ABSTRACT: Dyskerin is a conserved nucleolar protein. Several...
1 downloads 0 Views 647KB Size
Article pubs.acs.org/jpr

Novel Insights into the Evolution and Structural Characterization of Dyskerin Using Comprehensive Bioinformatics Analysis Carolina Susana Cerrudo,*,† Diego Luis Mengual Gómez,‡ Daniel Eduardo Gómez,‡,§ and Pablo Daniel Ghiringhelli†,§ †

Laboratory of Genetic Engineering and Cellular and Molecular Biology, Department of Science and Technology, Quilmes National University, Roque Saenz Peña 352, 1876 Bernal, Buenos Aires, Argentine ‡ Laboratory of Molecular Oncology, Department of Science and Technology, Quilmes National University, Roque Saenz Peña 352, 1876 Bernal, Buenos Aires, Argentine S Supporting Information *

ABSTRACT: Dyskerin is a conserved nucleolar protein. Several related genetic diseases are caused by defects in dyskerin. We hypothesized that having a comprehensive bioinformatic analysis of dyskerin will help to develop new drugs for this diseases. We predicted protein domains and compared sequences and structures to detect the universe of dyskerin-like proteins. We identified conserved features of shared domains in the three superkingdoms. We analyzed the phylogenetic diversity, confirming that there is a strong structural conservation. Also, we studied the relationship of dyskerin-like proteins with other proteins through an integrative protein−protein interaction approach. Most of them are conserved among homologous eukaryotic and archaeal proteins. Our results highlighted the preservation of proteins interacting with dyskerin. We identified conserved dyskerin interactor proteins between the different eukaryotes organisms. Furthermore, we studied the existence of dyskerin-like proteins in different species. Also, we compared and analyzed the secondary structure with the hydrophobic profile, confirming that all have hydrophilic properties highly conserved among proteins. The greatest difference was observed in the NTE and CTE regions. Another aspect studied was the comparison and analysis of tertiary structures. In our knowledge, this is the first time that these analyses were performed in such a comprehensive manner. KEYWORDS: dyskerin-like protein, DKCLD, TruB_N, PUA, dyskerin phylogeny



INTRODUCTION Dyskerin is a vastly conserved nucleolar protein encoded in the human genome by the DKC1 gene at Xq28 and is required for the biogenesis of ribonucleoprotein particles (RNPs) that incorporate H/ACA RNAs.1 The DKC1 gene is a member of the H/ACA small nucleolar ribonucleoprotein (snoRNP) gene family. Other proteins, as GAR1, NHP2, and NOP10, assist dyskerin in the interaction and processing of H/ACAcontaining RNAs.2 The main function of Box H/ACA RNPs is to convert uridine (U) into pseudouridine (Ψ) posttranscriptionally at specific sites of rRNAs and snRNAs.3 Dyskerin is also needed for normal function of telomerase RNP and for telomere maintenance.4,5 Furthermore, some reports suggest a role for dyskerin in the regulation of microRNAs6 and in translational control of mRNA,7 mainly those containing internal ribosome entry site elements such as those encoding some antiapoptotic factors and tumor suppressors.8 Dyskerin is evolutionarily conserved in different species from the three superkingdoms. All dyskerin-like proteins present two wellcharacterized domains: TruB_N and PUA. Eukaryotes and archaeal dyskerin-like proteins also present another conserved © XXXX American Chemical Society

domain that is not detected in bacteria: DKCLD (dyskerin-like domain). Additionally, only eukaryotic dyskerin-like proteins have N and C terminal extension (NTE and CTE). In the PDB database structures of dyskerin-like proteins corresponding to bacteria, archaea, and yeast are filed, while other eukaryotic sequences have not been solved yet. Rocchi et al. have recently created a homology model of human dyskerin by using a template crystal structure from Saccharomyces cerevisiae (PDB ID: 3U28), which has a sequence identity of 73% with the human dyskerin sequence.9 However, the structures of eukaryotic dyskerins remain poorly understood. A number of related genetic diseases are caused by defects in the telomere maintenance machinery. These disorders, often called telomeropathies, share symptoms and molecular mechanisms, and mounting evidence indicates they are points along a spectrum of disease. Defects in genes involved in telomere maintenance result in a large overlapping spectrum of symptoms. The best known of these diseases is dyskeratosis Received: September 10, 2014

A

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research congenital disorder (DKC),10 aplastic anemia, Hoyeraal− Hreidarsson syndrome, idiopathic pulmonary fibrosis,11 and cancer. The symptoms of these disorders are extensive, and the age of onset is highly variable; however, the disorders share similar underlying molecular mechanisms and have overlapping, incompletely penetrant phenotypes. Recently, an increasing number of reports have identified syndromes that do not include the classic symptoms but are still caused by mutations in core telomere maintenance proteins such as Revesz syndrome and Coats plus syndrome.12 Despite this heterogeneity it is important to go a bit deeper in the role of dyskerin in cancer due to the incidence and mortality of this disease and the fact that dyskerin has been explored for the development of drugs to selectively or preferentially kill cancer cells. A pioneer study has reported dyskerin expression to be increased in several human cancer types, especially in breast cancers.13 In accord with the known biological functions of the protein, breast cancers with low dyskerin expression contained lower levels of pseudouridine and telomerase RNA than those with high expression. Moreover, cancers with high expression generally exhibited worse histopathological features and prognosis. In line with these findings, DKC1 overexpression is associated with prostate cancer progression.14 Also, DKC1 is downregulated in sporadic chronic lymphocytic leukemia.15 Eighty five percent of the malignant tumors are immortal due to the fact that they possess an enzyme called telomerase that maintains the telomeric length. The human holoenzyme telomerase is a RNP composed of a catalytic subunit, hTERT and a RNA component (hRT), which acts as a template for the addition of a short repetitive sequence d(TTAGGG)n in the 3′ end of the telomeric DNA, and species-specific accessory proteins. These accessory proteins regulate telomerase biogenesis, subcellular localization, and function in vivo. For instance, analysis of affinity-purified telomerase from HeLa cells has identified integral protein components of human telomerase: dyskerin, NHP2, NOP10, RUVBL1(pontin)/ RUVBL2(reptin), GAR1, and TCAB1.16 Dyskerin, NHP2, and NOP10 are required for the stability and accumulation of hTR in vivo. The heterotrimer of dyskerin, NOP10, and NHP2 is deposited onto each hairpin unit of the H/ACA motif in a highly chaperoned biogenesis process. Cotranscriptional association of the heterotrimer is followed by an exchange of biogenesis factors for the fourth core subunit, GAR1, to produce a biologically functional RNP. Pontin and reptin are two closed ATPases necessary for the stability of dyskerin and hTR in vivo. The current model is that dyskerin, pontin, and reptin form a scaffold that recruits and stabilizes hTR and assembles the telomerase ribonucleoprotein particle. Once this complex is formed, pontin and reptin are believed to dissociate from the complex and release the catalytically active enzyme.17 A serial of causal mutations have been identified in different genes encoding telomerase components.18 Although not as frequent, DKC has a direct correlation with a great number of mutations in the DKC1 gene (Table 1). These mutations are mainly single-amino missense mutations,21 and many are present at the NTE and CTE of dyskerin, which are conserved between yeast and human but are missing in archaea and bacteria. Even yet, DKC1 mutations are not randomly distributed in the other conserved domains. Only two mutations have been found in the TruB_N catalytic domain, one of which is located three residues from the essential aspartate required for pseudouridylation and is associated with a severe form of DKC.36 Mostly, DKC1 mutations are

Table 1. Mutations in the Complexes Associated with DKC and Related Diseasesa gene (protein) Telomerase DKC1 (dyskerin) dyskerin NTE domains DKCLD TruB

chromosomal location

additional references

Xq28

42b 12 6 2 1 8 11

3q21-q28

33

19, 20 21−24 20 25 26 20, 21, 27 19−21, 26−29 30

5p15.33

34

30

Chr 5

3

31

15q14-q15

1

32

17p13.1

4

33

14q12

18

34

PUA CTE TERC (telomerase RNA component) TERT (telomerase reverse transcriptase) NHP2 (nonhistone chromosome protein 2) NOP10 (nucleolar protein 10) WRD79 (TCAB1) Shelterin TINF2 (TIN2)

number of mutations

a

Modified from the telomerase database http://telomerase.asu.edu/ diseases.html.35 bTwo of these mutations occur in the upstream regulatory region.

concentrated in or near to the PUA domain, and these mutations decrease the affinity for RNA binding.37 Therefore, we hypothesized that it is very important to have a comprehensive bioinformatics characterization of dyskerin, which will allow us to expand the knowledge of its function, evolution, and secondary and tertiary structure, opening the way to the development of new drugs to be employed in the treatment of telomeropaties. Hence, our present study focuses on function, evolution, and structure of dyskerin-like proteins.



MATERIAL AND METHODS

Sequence Search

NCBI Server (National Center for Biotechnology Information, NIH, Bethesda) was used to recover sequences corresponding to dyskerin-like proteins using the human dyskerin amino acid sequence, BLASTP program,38 and a profile inclusion expectation (E) value threshold of 0.01. The sequences belonging to the following species were only recovered: Homo sapiens, Rattus norvegicus, Mus musculus, Bos taurus, Gallus gallus, Danio rerio, Drosophila melanogaster, Arabidopsis thaliana, Anopheles gambiae, Aedes aegypti, Caenorhabditis elegans, Schizosaccharomyces pombe, Saccharomyces cerevisiae, Kluyveromyces lactis, Escherichia coli, Salmonella enterica, Shigella dysenteriae, Pseudomonas psychrotolerans, Pyrococcus f uriosus, Pyrococcus abyssi, and Methanocaldococcus jannaschii (GenBank accession nos. NP_001354, NP_596910, NP_001025478, NP_001098865, NP_001026286, NP_001028279.2, NP_525120, NP_191274, XP_318082.4, XP_001656912, NP_499370, NP_594878, NP_013276, XP_453273, NP_289742, WP_000089686, WP_000089712, WP_007162646, NP_579514, WP_010867646, and AAB98132). B

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research Table 2. Identification of Protein Domains in Dyskerin-Like Proteins protein name

accession numbers

NP_596910.1 P40615 NP_001025478.1 Q9ESX5 NP_001098865.1 A7YWH5 NP_001026286.1 Q5ZJH9 NP_001028279.2 Q498Z9 NP_525120.1 O44081

Rattus norvegicus

Idem H. sapiens

PUA

Mus musculus

Idem H. sapiens

Idem H. sapiens

Bos taurus

Idem H. sapiens

PUA LYS_RICH Idem H. sapiens

Gallus gallus

Idem H. sapiens

Idem H. sapiens

Idem H. sapiens

Danio rerio

Idem H. sapiens

Idem H. sapiens

Idem H. sapiens

Drosophila melanogaster

Idem H. sapiens

Idem H. sapiens

Idem H. sapiens

XP_318082.4 Q7PMU4 XP_001656912.1 Q0IG45 NP_499370.1 O17919 NP_594878.1 O14007

Anopheles gambiae

Idem H. sapiens

Idem H. sapiens

Idem H. sapiens

Aedes aegypti

Idem H. sapiens

PUA

Idem H. sapiens

Caenorhabditis elegans

Idem H. sapiens

Idem H. sapiens

Idem H. sapiens

Schizosaccharomyces pombe

Idem H. sapiens

PUA LYS_RICH NLS_BP PUA LYS_RICH 2 NLS_BP PUA LYS_RICH 2 NLS_BP Idem H. sapiens

Idem H. sapiens

O60832

NAP57

centromere/microtubule binding protein cbf5 protein K01G5.5 pseudouridylate synthase

CDD domains

PUA (PS50890) LYS_RICH (PS50318)

H/ACA small nucleolar ribonucleoprotein (H/ ACA snoRNP) subunit 4; nucleolar protein family A member 4; snoRNP protein DKC1; nopp140-associated protein of 57 kDa; cbf5p homologue; CBF5 homologue; nucleolar protein NAP57.

dyskerin fc87a02; fi24a05 Nop60B nucleolar protein 60B, isoform A; Nop60B-PA; dUhg-like 6; minifly AGAP004739-PA

PROSITE domains

DKCLD (PF08068.7) TruB_N (PF01509.13) PUA (PF01472.15)

NP_001354.1

H/ACA ribonucleoprotein complex subunit 4

PFAM domains

Homo sapiens

dyskerin

H/ACA ribonucleoprotein complex subunit 4 dyskeratosis congenital 1 H/ACA ribonucleoprotein complex subunit 4

organism

NX_O60832

PseudoU_synth superfamily (cl00130) PseudoU_synth_hDyskerin (cd02572) TruB_N superfamily (cl17104) DKCLD superfamily (cl06897) PUA superfamily (cl00607) CBF5 (TIGR00425) TruB (COG0130, PRK01550, TIGR00431) Idem H. sapiens

Idem H. sapiens

Cbf5p

NP_013276.1 P33322

Saccharomyces cerevisiae

Idem H. sapiens

hypothetical protein

XP_453273.1 O13473

Kluyveromyces lactis

Idem H. sapiens

H/ACA ribonucleoprotein complex subunit 4

NP_191274.1 Q9LD90 NP_289742.1 P60340

Arabidopsis thaliana

Idem H. sapiens

Escherichia coli

TruB_N TruB-C_2 (PF09157.6)

WP_000089686.1 H6P3V0 WP_000089712.1 Q32BG7 WP_007162646.1 H0JIW3 NP_579514.1 I6US58

Salmonella enterica

Idem E. coli

PseudoU_synth superfamily PseudoU_synth_hDyskerin TruB_N superfamily TruB-C_2 superfamily (cl08528) TruB Idem E. coli

Shigella dysenteriae

Idem E. coli

Idem E. coli

Pseudomonas psychrotolerans

Idem E. coli

Idem E. coli

Pyrococcus f uriosus

Cbf5 centromere/microtubule-binding protein

AAB98132.1 Q57612

Methanococcus jannaschii

DKCLD 2 TruB_N PUA Idem H. sapiens

Cbf5p

WP_010867646.1 Q9V1A5

Pyrococcus abyssi

TruB tRNA pseudouridine synthase B

TruB TruB TruB Cbf5p

C

Idem P. f uriosus

Idem H. sapiens

Idem H. sapiens

Idem H. sapiens

PUA

Idem H. sapiens

PUA HIS_RICH (PS50316) PUA

Idem H. sapiens

Idem H. sapiens

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 1. Sequence conservation in dyskerin-like protein domains. (A) Schematic representation of dyskerin-like proteins architecture showing different protein domains present in species of the three superkingdoms. Sequence logos showing amino acid variability in the three domains: (B) DKCLD (archaeas and eukaryotes), (C) TruB_N (bacteria, archaeas, and eukaryotes), arrow indicate the active site, and (D) PUA (bacteria, archaeas, and eukaryotes). Logos were constructed starting with alignments of each domain from different species. The height of the whole letter stack indicates the information content at that position, and the height of each letter is proportional to the relative frequency of the amino acid. Hydrophobic residues (L, V, I, W, M, F, P) are green, basic residues (R, K) are red, acidic residues (D, E) are blue, and all other residues are black. Asterisks and unfilled letters indicate the positions where human dyskerin mutations occur.

were performed using the WebLogo server version 2.8.2.44 Secondary structure predictions of regions not included in 3-D models were performed with JPRED 3 server.45 All servers are utilized with default parameters. Consensus hydrophobicity profiles were generated using a program designed ad hoc (Ghiringhelli, P.D.; unpublished). In brief, individual hydrophobicity profiles were obtained for each protein using a sliding windows strategy and standard hydrophobicity table. After this, gaps present in a multiple alignment of all considered proteins

Protein Characterization

To obtain all updated information about recovered proteins, we searched amino acid sequences against the UniProtKB/SwissProt database.39 In addition, neXProt40 was also used for Homo sapiens dyskerin. The identification of motifs, patterns, and domains in dyskerin-like proteins was performed using three known servers: Pfam,41 PROSITE,42 and CDD.43 Sequence Logos D

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research were automatically translated to each individual numeric profile. Average hydrophobicity and standard deviation values were calculated column by column. Average represents the consensus hydrophobicity profile and standard deviation represents the dispersion of hydrophobicity values in each column.

plug-in were integrated into a single network (189 nodes and 5818 edges) using Organic Layout. Different statistical parameters were evaluated considering undirected edges. We classified the main categories of proteins present in the network by using gene ontology (GO). GO term statistics were obtained via ClueGO58 plugin. To visualize the nodes and interactions, we select Betweenness Centrality for Map Node Size; Stress Centrality for Map Node Color; and Normalized Max Weight for Map Edge Size. In addition, STRING server56 was used to predict protein− protein interactions of a selected set of eukaryotic dyskerin-like proteins. We performed a STRING search for each organism (M. musculus, G. gallus, D. rerio, D. melanogaster, A. thaliana, C. elegans, S. cerevisiae) in the protein mode and using different keywords (DKC1, CBF5, NOP60B, and NAP57). In all cases the following prediction methods were activated: neighborhood, gene fusion, experiments, databases, and text mining, and only the interactions (associations) with a high confidence score (scores >0.7) were accepted. For all organisms no more than 50 interactors were analyzed.

Phylogeny

Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 546 with the complete dyskerin. The following parameters were used: UPGMA; Bootstrap with 1000 replicates; gap/Missing data = Pairwise deletion; Model = Amino (Dayhoff Matrix); patterns among sites = Same (Homogeneous); rates among sites = Different (Gamma Distributed); gamma parameter = 2.25. Ancestral Sequences Inference

Ancestral states for eukaryotic dyskerin-like proteins were inferred using MEGA version 5.46 The following parameters were used: Method = Maximum Likelihood; Model = Dayhoff matrix based; Initial tree = precomputed tree file; Rate heterogeneity among sites = Gamma distribution using 5 gamma categories; Gaps missing/data treatment = use all sites; Branch swap filter = moderate. Pairwise alignments were performed with the ClustalW2 software using default parameters.



RESULTS

Domains in Dyskerin-Like Proteins

As previously mentioned, dyskerin-like proteins are found in different species from the three superkingdoms. From this universe, we selected a set of 21 representative species to analyze the distribution of characteristic domains: Homo sapiens, Rattus norvegicus, Mus musculus, Bos taurus, Gallus gallusi, Danio rerio, Drosophila melanogaster, Anopheles gambiae, Aedes aegypti, Caenorhabditis elegans, Arabidopsis thaliana, Schizosaccharomyces pombe, Saccharomyces cerevisiae, Kluyveromyces lactis, Escherichia coli, Salmonella enterica, Shigella dysenteriae, Pseudomonas psychrotolerans, Pyrococcus f uriosus, Pyrococcus abyssi, and Methanocaldococcus jannaschii. By comparing these recovered sequences with the UniProtKB/ Swiss-Prot database, we were able to carry out a first general protein characterization. For human dyskerin we also used neXProt. Accession numbers of proteins in these databases are shown in Table 2. Eukaryotic dyskerin have three well-characterized domains (DKCLD, TruB_N, and PUA) besides nuclear and nucleolar localization signals (Figure 1). Archaeal and bacterial pseudouridine synthases obviously lack nuclear and nucleolar localization signals but keep the TruB_N and PUA domains. Finally, archaea pseudouridine synthases also retain DKCLD domain. DKCLD is an N-terminal domain present in 459 species, and 504 sequences were found deposited in Pfam under this name. This domain showed interactions with other three domains stored in the Pfam (Nop10p, TruB_N, and PUA). A sequence logo corresponding to this domain showed wide sequence conservation (Figure 1), especially at residues 23, 28, 31, 34, 43, 48−50, 53, and 55−58. This domain is detected by Pfam and CDD in all eukaryotic and archaeal studied proteins (Table 2). Six of the mutations described in human dyskerin are found in this domain20 (Table 1). TruB_N family (N terminal domain of TruB family pseudouridylate synthase) includes all eukaryotic, bacterial, and archaeal pseudouridine synthases similar to human dyskerin, minifly protein of Drosophila melanogaster, TruB of bacterial, and Cbf5 of archaeas and eukaryotes. These members are involved in modifying bases in RNA molecules. They carry

Homology Modeling and Structural Alignment

The BLAST program and H. sapiens dyskerin sequence (NP_001354) were used to identify putative dyskerin-like structures from Protein Data Bank (PDB). The top ranked record identified was the unique eukaryotic sequence, which has experimentally determined structure: the yeast Cbf5 structure (PDB ID: 3U28A; resolution of 1.90 Å), and this was selected for the following analysis. Thus, seven homology models were built for other selected eukaryotic proteins. Threedimensional structures of dyskerin-like eukaryotic proteins from Homo sapiens, Mus musculus, Gallus gallus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana were constructed by homology modeling using SWISSMODEL47 and the yeast Cbf5 structure (PDB ID: 3U28A) as a template. The stereochemical quality of the constructed models was assessed using two structure assessment tools: PROCHECK48 and PROSA-web.49 To further investigate structural relationships of the eukaryotic dyskerin-like proteins, we carried out structural superpositions and a multiple alignment using the MATRAS program50 and manual inspection and adjustments. ENDscript server51 was used to display the multiple structure alignment. Molecular representations were performed using Jmol. Interaction Network

For the generation of the complete network, we used the Cytoscape program,52 open source network analysis software. It features the possibility to analyze protein networks with plug-in software extensions, allowing analyzing and interpreting the mouse dyskerin network. We constructed a PPI network by integrating diverse interaction databases, IntAct,53 Molecular Interactions Database (MINT),54 GeneMania,55 and Search Tool for the Retrieval of Interacting Genes/Proteins (STRING).56 We used the PSICQUICUniversal57 Cytoscape plug-in to get the data. Interactions from different databases, starting the search with human DKC1, were recovered. After manually curated (delete repeated nodes and small molecules), networks of interactions obtained with PSICQUICUniversal57 E

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 2. Secondary structure and hydrophobic profiles present in dyskerin-like proteins. L: loop, H: α-helix, and E: β-sheet. Subscripts indicate the quantity of residues that constitute each motif of secondary structure. The secondary structure of NTE and CTE was predicted with Jpred3, and other structures were obtained from DALI Server alignments. Consensus hydrophobicity profiles were generated according to the described in Material and Methods section.

(Table 2). The PUA domain is also present in 4603 species, and 3369 sequences were found deposited under this name. This domain shows interactions with the other five domains stored in the Pfam (Nop10p, TGT, TruB_N, DKCLD, and DUF1947). PUA is a small, compact domain that consists of 64−96 amino acids and presents a common RNA recognition surface, revealed by the structures of several PUA−RNA complexes.59,60 As can be seen in the sequence logo, this domain presented residues that are even conserved in bacterial sequences (Figure 1); among them we can mention the hydrophobic residues and two highly conserved positions occupied by glycines (23 and 66). As previously mentioned, PUA domain accumulates the greatest amount of mutations described in human dyskerin (Table 1).20,21,27 Of all of the proteins analyzed in this work, the dyskerin-like bacterial proteins are the ones with the lowest sequence conservation in the PUA domain. Finally, eukaryotic dyskerin-like proteins also contain two intracellular localization signals: nuclear localization sequence (NLS) and nucleolar localization sequences (NoLSs). NLSs and NoLSs have a very similar amino acid composition with a high prevalence of basic residues in both cases. However, Scott et al. demonstrated that these two types of signals are recognized as different by the cell molecular machinery.62 These authors also classified NoLSs and NLSs signals in three

out the conversion of uracil bases to pseudouridine most in tRNAs and rRNA through TruB_N catalytic domain that contains the active site (Asp 29 in Figure 1C). The TruB_N domain is present in 4803 species, and 5359 sequences were found deposited under this name. This domain showed interactions with other five domains stored in the Pfam (Nop10p, PUA, TruB-C_2, TruB-C, and DKCLD). The analysis of the sequence logo corresponding to TruB_N domain (Figure 1) shows that even though this domain is present in all proteins the sequence conservation is very poor. In particular, because bacterial proteins have insertions in this domain they do not appear in the other sequences. Furthermore, this domain contains only two of the mutations previously described in human dyskerin25 (Table 1). TruB_N domain is detected by Pfam and CDD in all analyzed proteins (Table 2). PUA (pseudouridine synthase and archaeosine transglycosylase) domain is a highly conserved RNA-binding domain contained for many archaeal, bacterial, and eukaryotic proteins. These proteins include those involved in ribosome biogenesis and translation, enzymes that catalyze tRNA and rRNA posttranscriptional modifications, as well as in enzymes involved in proline biosynthesis, translation factors, and yeast glutamate kinases.59−61 Accordingly, PUA domain is detected in three databases (Pfam, Prosite, and CDD) in all studied sequences F

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 3. Phylogenetic inferences based on dyskerin-like proteins. (A) Standard phylogenetic tree. Complete dyskerin-like proteins of selected organisms from the three superkingdoms were multiple aligned and standard phylogeny was inferred. Each kingdom constitutes a superclade. Eukaryote superclade can be divided in three monophyletic clades: vertebrata (vert.), invertebrata (invert.), and plant and fungi (P&F). Numbers in the nodes indicate the node consistency. (B) Evolution from ancestral sequences. The last common ancestor sequence for each monophyletic clade was inferred and mutational changes needed to generate the known extant sequences were determined through pairwise alignment of each extant and the corresponding last common ancestor sequences. Different colors correspond to different protein regions: orange, NTE; red, DKLCD; violet, TruB_N; blue, PUA; green, CTE; gray, spacer regions. Letters indicate the type of mutational change: c = conservative change; n = nonconservative change; i = insertion; and d = deletion.

share an α-helix. The terminal region of the DKCLD domain begins with one α-helix of 11 or 12 residues (depending on the species), while TruB_N domain begins having the last four residues of the same α-helix. This could be the reason why some servers predict a longest TruB_N domain (which includes the complete α helix).

different types: NLS-only, NoLS-only, and joint NoLS−NLS. Proteins containing the joint NoLS−NLS usually are reported to have overlapping NLS and NoLS near its C-terminus, but other ones are reported to contain one of these in the Nterminus and other in the C-terminus.62 Eukaryotic dyskerinlike proteins belong to the later type. These motifs are also identified as lysine-rich sequences (LYS_RICH) by some servers (Table 2).

Phylogeny

Dyskerin belongs to a family highly conserved from archaea to eukaryotes. Members of this family are named Cbf5p in yeast, trypanosomes, and archaea; Minifly (MFL)/Nop60B in Drosophila; NAP57 in rat; and dyskerin in other mammals. In eukaryotes, proteins of this family accumulate mainly into nucleoli and are involved in several essential processes. Their high level of evolutionary conservation further testifies the importance of proteins belonging to the dyskerin family. For instance, human dyskerin and its Drosophila homologue (the MFL protein) share 64.5% of identity and 85.2% of similarity. With the exception of NAF1, homologues of dyskerin (aCbf5), NOP10 (aNop10), NHP2 (L7Ae), and GAR1 (aGar1) are found in archaea, where they assemble with sRNAs that are homologous to eukaryotic H/ACA snoRNAs. Archaeal sRNAs assemble directly into active pseudouridylating sRNPs, reflecting a simpler cellular organization. Previously, only two works reported partial phylogenetic analysis inside the dyskerin-like protein collection. Zucchini et al.63 analyzed the human TruB_N domain and Yokobori et al.64 studied only the archaeal dyskerin-like proteins. Therefore, this is the first phylogenetic analysis of all dyskerin-like proteins (Figure 3A). The tree topology obtained with complete dyskerin-like proteins matches species phylogeny. As can be seen in Figure 3, when we considered human dyskerin the proteins that are more distant phylogenetically are bacterial TruB (dyskerin-like proteins from E. coli, S. enterica, S. dysenteriae, and P. psychrotolerans). This is in agreement with previous works because although both proteins exhibit high

Secondary Structure and Hydrophobicity Conservation

As seen in Figure 2, all dyskerin-like proteins are highly hydrophilic proteins and exhibit quite similar hydrophobicity profiles (represented by the low dispersion in the graphs) when analyzed by superkingdoms. In Figure 2 it can also be seen that there is a strong conservation of secondary structure (derived from PDB structures) for each of the domains among proteins of different groups (eukaryotic, archaea, and bacteria). As previously mentioned, the eukaryotic dyskerin-like proteins have the three characteristic domains (DKCLD, TruB_N, and PUA) and also have the NTE and CTE domains. Although the last two domains are present in all eukaryotic species, these are the most variable ones. They have great length differences (NTE length varies from 16 to 56 residues and the CTE between residues 87−204), and residue conservation is not as marked as in the other domains. The secondary structure of NTE domain was predicted with the Jpred3 server because it was not in the structures stored in the PDB. Jpred3 predicts that NTE is a low-structured region, mainly loop regions, except in some proteins where α helix or a β sheet (of 4 or 3 residues) are predicted. CTE domain also has high variability in length and sequence, but the nearest region to the PUA domain has a secondary structure highly conserved in all analyzed proteins (according to the PDB and Jpred3): two α helices of 9 and 5 residues were detected. A remarkable feature of the secondary structure of the analyzed domains is that both DKCLD and TruB_N domains G

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 4. Comparison of eukaryote structural models of selected dyskerin-like proteins. (A) Structural superposition and sequence conservation mapping of the dyskerin-like protein models. Red positions are maximally conserved and white positions are less conserved. (B) Key geometric parameters used for validation of the models, see in the text for explanation. (C) Amino acid sequence conservation in each protein region. Identities and conservative changes are expressed as percentages.

melanogaster, Anopheles gambiae, and Aedes aegypti have a range of 54.5 to 80.5% in amino acid identity and 76.2 to 92.5% in amino acid similarity with respect to other extant sequences analyzed.

structural similarity they have low sequence similarity. Moreover, bacterial proteins have TruB_N domains longer than archaea or eukaryotes, and their PUA domain is shorter with a smaller amount of residues involving loss of α-helix, which is conserved in archaea and eukaryotes. These differences could be related to the fact that PUA domain of bacterial dyskerin-like proteins makes non sequence-specific contacts with the acceptor stem of tRNA, in contrast with the PUA domain of dyskerin-like proteins in archaea and eukaryotes.36

Structural Analysis of Eukaryotic Proteins

Three-dimensional structures of dyskerin-like proteins from H. sapiens, M. musculus, G. gallus, D. rerio, D melanogaster, A. thaliana, and C. elegans were built by homology modeling using SWISS-MODEL with the yeast Cbf5 structure (PDB ID: 3U28A) as a template (Figure 4A). The predicted models quality was evaluated by the PROCHECK and PROSA-web programs (Figure 4B). PROCHECK program predicts the percentage of residues present in core, allowed, generally allowed, and disallowed regions (based on Ramachandran statistics) and the goodness factor (G factor), which indicates how typical or atypical the residue’s location is on the Ramachandran plot. Additionally, PROSA-web program checks the model quality by the PROSA Z-score of the structure, which is representative of overall model quality and is used to test whether the model structure is within the range of scores usually found in proteins of similar size. The PROCHECK analysis of all dyskerin-like protein models showed that the percentage of residues predicted in the disallowed regions was acceptable (-0.5 for a good model and was >−0.19 in the seven models. The PROSA Z-score for all dyskerin models has a range between −8.07 and −7.10, being the value for 3U28A structure of −7.22. PROSA tool also showed that the seven structures were within the acceptable range of X-ray and NMR studies (data not shown). Overall results obtained by PROCHECK and PROSA analyses showed good results for the seven models built by SWISS-MODEL, concluding that they were reliable for further studies. The quality of models was also assessed by comparing predicted structures to the experimentally solved structure (3U28A). By superimposition; a consensus structure was created (Figure 4A) and rmsd was calculated using MATRAS. A model can be considered accurate when its rmsd is ≤2 Å.65 In all pairs the obtained values were very good (Homo sapiens rmsd = 0.10 Å; Mus musculus rmsd = 0.10 Å;

Inference of Ancestral Eukaryotic Sequences

Evolutionary relationships between eukaryotic dyskerin-like proteins can be analyzed considering the three monophyletic groups that they constitute (Figure 3A). Vertebrata, invertebrata, and plant and fungi each have a last common ancestor that we called Va, Ia, and PFa, respectively (Figure 3B). To investigate the sequence variability throughout the evolution of each monophyletic group, we carried out a reconstruction of ancestral sequences starting with extant sequences and using as a guide tree the fractions corresponding to the tree previously shown (Figure 3A). Later, a pairwise alignment between each extant sequence and the corresponding last common ancestor was performed, and conservative changes, nonconservative changes, insertions, and deletions were analyzed. As can be seen in Figure 3B, in the three monophyletic groups, the DKCLD, TruB_N, and PUA domains have lesser variability than NTE and CTE domains. Insertions and deletions occur only in the NTE and CTE domains but ever preserving the existence of NLSs and NoLSs. Vertebrata clade globally exhibits a less divergent evolution, ranging between 76.3 and 87.7% in amino acid identity and 90.0 and 94.6% in amino acid similarity. Plant and fungi clades occupy intermediate positions having a range of 67.0 to 77.7% in amino acid identity and 82.6 to 89.7% in amino acid similarity with respect to its last common ancestor. The invertebrata clade is the most divergent, ranging between 55.7 and 88.4% in amino acid identity and 76.5 and 94.0% in amino acid similarity with respect to Ia. In this clade, the dyskerin-like protein of Caenorhabditis elegans is the most divergent eukaryotic sequence, ranging between 49.0 and 57.7% in amino acid identity and 69.7 and 79.4% in amino acid similarity, whereas dyskerin-like proteins of Drosophila H

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 5. Eukaryotic dyskerin-like proteins interaction network. (A) Interaction network directly linked to dyskerin. Node size represents the “Betweenness centrality”; Node colors the “Stress centrality” (both of which indicate node importance in the network), Edge size the “Normalized max weight Edge Size”, and Edge colors the interaction type: interacts_with (light blue edges); in_same_component (gray edges); reacts_with (blue edges); state_change (red edges). Networks obtained by STRING for the different eukaryotic organism are also shown: (B) Vertebrata, (C) Invertebrata, (D) Plant and Fungi. The thickness of the lines represents the strength of the association (confidence view). For each node we provided the canonical name and the original STRING name.

Gallus gallus rmsd = 0.16 Å; Danio rerio rmsd = 0.09 Å; Drosophila melanogaster rmsd = 0.10 Å; Caenorhabditis elegans rmsd = 0.09 Å; and Arabidopsis thaliana rmsd = 0.09 Å). In general, structure was more conserved than the protein sequence. Regions with higher structural and sequence conservation are shown in the consensus structure (Figure 4A) and in a multiple alignment form (Figure S1, Supporting Information). The protein region comprised between DKCLD domain and PUA domain has an overall sequence conservation of 81.1% (Figure 4C).

Here we present a network model of eukaryotic dyskerin interactions based on information extracted from diverse PPI databases (MINT, IntAct, GeneMania, and STRING) using PSICQUICUniversal plug-in in Cytoscape. After integrating all components from database searches, the merged protein interactions network (Figure 5) contains 189 nodes and 5818 edges (representing different kinds of interactions: interacts with in same component; reacts with state change). Some of the main topological concepts of PPI are node degree, that is the number of interacting partners per protein; clustering coef f icient, which shows the ratio between the number of neighbors of node that interact with each other and the maximum number of possible interactions between neighboring nodes; and many centrality measures, involving structural attribute nodes that show the importance of them in the network. Stress centrality is an attribute that counts the number of shortest paths passing through a node. Betweenness centrality of the node reflects the amount of control that this node exerts over the interactions of other nodes in the network and favors nodes that join communities. High values of these two parameters (Stress and Betweenness centrality) indicate that the protein has great influence over information flows in the whole network.66,67 Thus, we analyzed the topological features and attributes with the Network Analysis plug-in in Cytoscape and observed that all nodes in this network are interconnected and the average node degree of the extended network is 124.3, and the network clustering coef f icient (average of clustering coefficients for all nodes) was 0.84. For dyskerin, clustering coeff icient was 0.15 and node degree was 58. Betweenness and stress centrality

Protein−Protein Interactions of Eukaryotic Dyskerin-Like Proteins

As mentioned in the Introduction, dyskerin is part of the Box H/ACA RNPs but is also involved in other functions such as telomere maintenance,4,5 regulation of microRNAs,6 and in translational control of mRNA.7,8 This supports the concept that dyskerin has dynamic interactions with associated proteins, instead of forming a static complex with a fixed number of members. The involvement of dyskerin in such a variety of processes has led to an enormous research interest in this protein. In recent years, several techniques have been employed to identify complex of proteins that could lead to a better understanding of the protein function. Therefore, an integrative protein−protein interaction (PPI) approach could be useful to reveal the complexity of dyskerin interactions. It should be stressed that not all dyskerin-like proteins described in literature are well-represented in single protein databases such as IntAct, MINT, GeneMania, and STRING. I

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Then, we proceeded to find their homologous proteins, being able to detect KRR1 in A. thaliana (AAM64563, expected value: 7 × 10−105) and NOP10 in D. rerio (NP_001003868, expected value: 3 × 10−34). Finally, TERT homologues in S. cerevisiae (EST2, CAL36058) and D. melanogaster (RT_like, AAR86939) were detected with a very low E value (4 × 10−17 in the first case and 0.012 in the second).

values of dyskerin (0.13 and 13 104, respectively) were in the range of SMG5 (0.15 and 18 978), while EST1A (0.37 and 32 114) and TERT (0.26 and 25 724) showed the highest values. All measures indicated significant positive biases for dyskerin, SMG5, EST1A, and TERT (i.e., more betweenness and stress centrality), showing for this proteins higher connectivity and high frequent participation in protein complexes. ClueGO analysis results showed that dyskerin interactors are clustered into four main groups according to their biological processes and molecular functions: telomere maintenance, double-stranded telomeric DNA binding, Cajal body (CB), and nuclear-transcribed mRNA catabolic process, with the bulk of the direct interactors belonging to the first two groups. CBs are suborganelles found in the nucleus of proliferative cells like embryonic cells and tumor cells. In contrast with cytoplasmic organelles, CBs lack any phospholipid membrane that would separate their content, mainly consisting of proteins and RNA, from the surrounding nucleoplasm. These suborganelles have been implicated in RNA-related metabolic processes such as snRNPs biogenesis, maturation and recycling, histone mRNA processing, and telomere maintenance. Experimental evidence indicates that CBs contribute to the biogenesis of telomerase enzyme. Two of the most important components of telomerase (hTR and hTERT) accumulate within CBs in all cells in which telomerase is active but not in primary or ALT cells (where little or no hTERT is present). CBs are involved in the assembly and function of human telomerase.68 Additionally, we evaluated coexpression of DKC1 and other proteins at Xq28 using STRING and GeneMania servers (data not shown). This search allowed us to detect coexpression of DKC1 and 48 proteins located at the Xq28. However, we have not found specific publications where interaction between DKC1 and any of them was validated. Then, we constructed seven additional protein interactions networks using the STRING server, which is a database of known and predicted protein interactions based on the sources derived from the genomic context, high-throughput experiments, and previous knowledge. There are bioinformatics tools (algorithms) that allow predicting PPIs, such as gene neighborhood, gene fusion, phylogenetic profile, and interologs. In the last approach, the associations between two proteins are predicted when they have homologous proteins in other species, and it has been experimentally shown to interact. Thus, we analyzed the undirected protein−protein networks from seven eukaryotic organisms to predict the existence of homologous proteins that interact with dyskerin and evaluated if besides conservation in sequence and structure of proteins there was also a conservation of interactors somehow linked to dyskerin. Figure 5 shows the networks of interactions with a confidence score higher than 0.7, and no more than 10 interactors were shown for chart simplicity. We also performed the same analysis allowing up to 50 interactors and analyzing the interactions with dyskerin that were supported on a confidence score greater than 0.8. The analysis of these results allowed us to infer a group of stably preserved interactors in the eukaryotic organisms studied: GAR1, NAF1, NHP2, NOP56, NOP58, and RUVBL1 (interactions between these proteins and dyskerin were directly found with the STRING server); RUVBL2, FBL, and NOP10 (which have no homologues in G. Gallus); and SHQ1 (for which we did not find a homologue in A. thaliana). Furthermore, KRR1, NOP10, and TERT proteins were present on almost all networks inferred with STRING.



DISCUSSION Human dyskerin is a conserved nucleolar protein involved in at least three basic processes: maintenance of telomere integrity, biogenesis and function of the ribosome, and pseudouridylation of various cellular RNAs. Dyskerin is widely studied because its mutations are implicated in DKC, cancer, and other pathological disorders. Therefore, we hypothesized that having a comprehensive bioinformatic characterization of this protein will help us to develop new drugs for treatment of those diseases. We started with a sequence characterization, recovering the dyskerin-like protein sequences of 20 one different representative species from eukaryotes, archaea, and bacteria. The comprehensive analysis of these sequences allowed us to identify conserved features of shared domains in the three superkingdoms (TruB_N and PUA domains) and shared domains in archaea and eukaryotes (DKCLD). Although present in eukaryotes, archaea, and bacteria, the domain with the lowest sequence conservation is TruB_N. Both DKCLD and PUA domains exhibited the greater sequence conservation, with PUA domain even presenting residues that are conserved in bacteria. Then, we compared and analyzed the secondary structure with the hydrophobic profile of dyskerin-like proteins. Almost all of them have highly conserved hydrophilic properties between proteins of different superkingdoms. The domain having the greater divergence in the hydrophobic profile was TruB_N, the one with the lower sequence conservation. The greatest difference among all analyzed dyskerin-like proteins was observed in the NTE and CTE regions. These two regions are present only in eukaryotes, and even among them there are differences in length and sequence. Regarding this aspect, all modifications annotated so far by high-throughput mass spectrometry (MS) experiments39 reside at the NTE and CTE regions. We should mention the elimination of initiator methionine and N-acetylation of residue 2, phosphorylation of serines 21, 451, 453, 455, 485, 494, 513 and of threonine 458, and sumoylation of lysines 16, 39, 46, and 448 evolutively preserved at variable degree. Conservation analysis of phosphorylated residues is weakened by the absence of known effector kinases. For example, even if no specific function has been attributed so far to any phosphorylation event, it is interesting to note that three of them (at Ser 21, 485, and 494) are reported to occur specifically at mitosis and thus might be related to cell proliferation.69 Moreover, in the NTE region of human dyskerin, several pathogenic mutations (Table 1) and two post-translational modifications (N-acetylation and phosphorylation) were described and experimentally determined.40 One of the mutations (A2V) occurs in the alanine that undergoes post-translational modification (N-acetylation), and the clinical presentation is recessive DKC. It is also very important to take into account the present information about sumoylation, which could be relevant for both localization and assembly of the protein. Modification of dyskerin by SUMO1 and SUMO2/3, first revealed by high-throughput MS experiJ

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

the CTE. In this regard, amino acids in the NTE and CTE of human dyskerin are considered disordered by DISOPRED3 in PSIPRED Server.75 Disordered regions are dynamic and can be, partially or completely, extended in solution. Native disorder also exists in global structures, which have regular secondary structure but have not condensed into a stable globular fold. It has been speculated that disordered regions function as molecular recognition elements of proteins and nucleic acids, allowing recognition of several targets with high specificity and low affinity. Therefore, NTE residues could be involved in the recognition of the RNA by the PUA domain.73 Furthermore, the absence of a static structure could be the reason why neither NTE nor CTE appear in the 3-D structure of yeast’s dyskerin. Also, we demonstrated through a phylogenetic study of the complete dyskerin-like proteins its wide distribution and conservation. The existence of dyskerin-like proteins in different species and the presence of highly conserved regions indicate the preservation of the main biological functions of the dyskerin-like proteins. In addition, we explored the evolutive history of each eukaryotic dyskerin-like protein through the reconstruction of ancestral sequences. Vertebrata, invertebrata, and plant and fungi constitute individual monophyletic clades, and each has a last common ancestor. In the vertebrata clade, G. gallus and R. norvegicus dyskerin-like proteins are the extant sequences that have required the fewest and the most mutational steps to be derived from the last common ancestor, respectively. In the invertebrata clade, C. elegans is the more divergent extant sequence derived from the last common ancestor, requiring between 2.4 and 4 times more mutational steps than the other invertebrates. While in plant and fungi clade the quantity of mutational steps from the last common ancestor are the most homogeneous. Finally, because dyskerin is involved in a variety of biological processes we performed an integrative protein−protein interaction approach. Most PPI are conserved among homologous eukaryotic and archaeal proteins, although some adaptive changes have occurred in eukaryotes; for instance, human dyskerin contains additional terminal extensions not found in aCbf5.37 Only interactions described in eukaryotic organisms were used for the analysis of PPI; the network graphically shows the involvement of dyskerin in different complexes. According to the literature, our PPI results showed that dyskerin interacts with proteins involved in a variety of processes such as telomere maintenance, double-stranded telomeric DNA binding, Cajal body binding, and nucleartranscribed mRNA catabolic process. Our results also highlight the preservation of proteins interacting with dyskerin because most of them are present in the different analyzed networks. Also, we identified conserved dyskerin interactor proteins among the different eukaryotes organisms analyzed using ortholog-based and comparing PPI network approaches. Likewise, we found that current PPI data of diverse organisms leads to complementary predictions.

ments at four residues, provided a direct correlation between pathologic mutations and sumoylation defects. Not only was the lack of sumoylation on residues mutated in X-DC patients shown to reduce protein stability, but also it was found that the K39R mutation could be rescued by a chimeric SUMO3dyskerin product in which this modification was mimicked, demonstrating its fundamental effect on dyskerin functionality.69 All of these findings indicate regions that the NTE region may have functional relevance and could play a role in the regulation of dyskerin, modulating its activity. Predicted secondary structures in the NTE region of eukaryotic dyskerin-like proteins showed a disordered region (only in few sequences is the existence of an α-helix predicted), while for the CTE region the presence of two conserved α-helix motifs in the same position is revealed The functional relevance of CTE region is related to the existence of bipartite NLS. Initially, it has been described that dyskerin localized in the nucleoplasm and then translocates to the nucleolus and to the CBs.70 However, after a detailed bioinformatics and molecular analysis of the DKC1 transcriptional activity, Angrisani et al.71 were able to show that DKC1 encodes a new alternatively spliced mRNA (isoform 3) able to direct the synthesis of a variant dyskerin with unexpected cytoplasmic localization that may have biological functions in cell growth and adhesion.40,69 Intriguingly, when overexpressed in HeLa cells, the new isoform promotes cell-to-cell and cell-to-substratum adhesion, increases the cell proliferation rate, and leads to cytokeratin hyperexpression. In addition, the other three alternative dyskerin isoforms, lacking the CTE region and PUA domains, have been described.72 Another aspect studied of dyskerin-like proteins was the comparison and analysis of tertiary structures. We built seven models of eukaryotic dyskerin-like proteins by homology modeling using SWISS-MODEL and the tertiary structure stored in the PDB corresponding to yeast dyskerin (3U28A) as template. Then, the seven models were superimposed to 3U28A using MATRAS program, and this analysis allowed us to confirm that there is a strong structural conservation beyond the sequence differences, mainly in the PUA domain. The function of the PUA domain seems to be that of anchoring in the correct position the RNA helix upstream of the target uridine. Using the archaeal Cbf5 structure as template, a structural model for the human dyskerin was proposed by Rashid et al.36 Even if this model lacks the dyskerin N-terminal (NTE, 1−35 aa) and C-terminal (CTE, 359−513 aa) regions, this shows that most of dyskerin pathologic mutations converge on the same side of the PUA domain.36 Moreover, as previously mentioned, the other region with accumulation of mutations is the NTE. In agreement with this, the crystal structure of the yeast CBF5-NOP10-GAR1 complex showed that the NTE folds into a new structural layer covering on the PUA domain.73 This finding suggests that these mutations may affect the binding to substrate RNAs or to a still unidentified partner of the complex. The same structure confirms the importance of CTE in binding SHQ1 protein (an early assembly factor whose C-terminal SHQ1-specific domain interacts with the PUA domain of CBF5).74 Recently, a structural model has been reported based on the yeast CBF5 of human cytoplasmatic dyskerin (isoform 3),71 and the same template was used to build human dyskerin.9 This models supports the conservation of the domains required for both NAF1 and SHQ1 binding but again does not cover the presence of the NTE and lacks part of



CONCLUSIONS As a general conclusion we carried out a novel bioinformatics insight into dyskerin that allowed us to describe the phylogeny of dyskerin-like proteins and PPI of human dyskerin, reflecting their conservation, functional diversity, and therefore the central role that this protein has in the dynamics of cellular processes. Hence, our present study focuses on function, evolution, and structure of dyskerin-like proteins. K

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

D.; Pandolfi, P. P.; Derenzini, M. Dyskerin expression influences the level of ribosomal RNA pseudo-uridylation and telomerase RNA component in human breast cancer. J. Pathol. 2006, 210 (1), 10−18. (14) Sieron, P.; Hader, C.; Hatina, J.; Engers, R.; Wlazlinski, A.; Müller, M.; Schulz, W. A. DKC1 overexpression associated with prostate cancer progression. Br. J. Cancer 2009, 101 (8), 1410−1416. (15) Poncet, D.; Belleville, A.; t’kint de Roodenbeke, C.; Roborel de Climens, A.; Ben Simon, E.; Merle-Beral, H.; Callet-Bauchu, E.; Salles, G.; Sabatier, L.; Delic, J.; Gilson, E. Changes in the expression of telomere maintenance genes suggest global telomere dysfunction in Bchronic lymphocytic leukemia. Blood 2008, 111 (4), 2388−2391. (16) Fu, D.; Collins, K. Purification of human telomerase complexes identifies factors involved in telomerase biogenesis and telomere length regulation. Mol. Cell 2007, 28, 773−785. (17) Veinteicher, A. S.; Meng, Z.; Mason, P. J.; Veenstra, T. D.; Artandi, S. E. Identification of ATPases pontin and reptin as telomerase components essential for holenzyme activity. Cell 2008, 132, 945−957. (18) Walne, A. J.; Dokal, I. Advances in the understanding of dyskeratosis congenita. Br. J. Hamaetol. 2009, 145, 164−172. (19) Heiss, N. S.; Knight, S. W.; Vulliamy, T. J.; Klauck, S. M.; Wiemann, S.; Mason, P. J.; Poustka, A.; Dokal, I. X-linked dyskeratosis congenita is caused by mutations in a highly conserved gene with putative nucleolar functions. Nat. Genet. 1998, 19, 32−38. (20) Vulliamy, T. J.; Marrone, A.; Knight, S. W.; Walne, A.; Mason, P. J.; Dokal, I. Mutations in dyskeratosis congenita: their impact on telomere length and the diversity of clinical presentation. Blood 2006, 107, 2680−2685. (21) Knight, S. W.; Heiss, N. S.; Vulliamy, T. J.; Greschner, S.; Stavrides, G.; Pai, G. S.; Lestringant, G.; Varma, N.; Mason, P. J.; Dokal, I.; Poustka, A. X-Linked Dyskeratosis Congenita Is Predominantly Caused by Missense Mutations in the DKC1 Gene. Am. J. Hum. Genet. 1999, 65, 50−58. (22) Hassock, S.; Vetrie, D.; Giannelli, F. Mapping and characterization of the X-linked dyskeratosis congenita (DKC) gene. Genomics 1999, 55, 21−27. (23) Cossu, F.; Vulliamy, T. J.; Marrone, A.; Badiali, M.; Cao, A.; Dokal, I. A novel DKC1 mutation, severe combined immunodeficiency (T+B-NK- SCID) and bone marrow transplantation in an infant with Hoyeraal-Hreidarsson syndrome. Br. J. Haematol. 2002, 119, 765−768. (24) Wong, J. M.; Kyasa, M. J.; Hutchins, L.; Collins, K. Telomerase RNA deficiency in peripheral blood mononuclear cells in X-linked dyskeratosis congenita. Hum. Genet. 2004, 115, 448−455. (25) Knight, S. W.; Heiss, N. S.; Vulliamy, T. J.; Aalfs, C. M.; McMahon, C.; Richmond, P.; Jones, A.; Hennekam, R. C.; Poustka, A.; Mason, P. J.; Dokal, I. Unexplained aplastic anaemia, immunodeficiency, and cerebellar hypoplasia (Hoyeraal-Hreidarsson syndrome) due to mutations in the dyskeratosis congenita gene, DKC1. Br. J. Haematol. 1999, 107, 335−339. (26) Knight, S. W.; Vulliamy, T. J.; Morgan, B.; Devriendt, K.; Mason, P. J.; Dokal, I. Identification of novel DKC1 mutations in patients with dyskeratosis congenita: implications for pathophysiology and diagnosis. Hum. Genet. 2001, 108, 299−303. (27) Marrone, A.; Mason, P. J. Dyskeratosis congenita. Cell. Mol. Life Sci. 2003, 60, 507−517. (28) Hiramatsu, H.; Fujii, T.; Kitoh, T.; Sawada, M.; Osaka, M.; Koami, K.; Irino, T.; Miyajima, T.; Ito, M.; Sugiyama, T.; Okuno, T. A novel missense mutation in the DKC1 gene in a Japanese family with X-linked dyskeratosis congenita. Pediatr. Haematol. Oncol. 2002, 19, 413−419. (29) Ding, Y. G.; Zhu, T. S.; Jiang, W.; Yang, Y.; Bu, D. F.; Tu, P.; Zhu, X. J.; Wang, B. X. Identification of a novel mutation and a de novo mutation in DKC1 in two Chinese pedigrees with Dyskeratosis congenita. J. Invest. Dermatol. 2004, 123, 470−473. (30) Du, H. Y.; Pumbo, E.; Ivanovich, J.; An, P.; Maziarz, R. T.; Reiss, U. M.; Chirnomas, D.; Shimamura, A.; Vlachos, A.; Lipton, J. M.; Goyal, R. K.; Goldman, F.; Wilson, D. B.; Mason, P. J.; Bessler, M. TERC and TERT gene mutations in patients with bone marrow failure

The data presented in this work could be of importance because it will allow us to expand the knowledge of dyskerin function, opening the way to the development of new drugs to be employed in the treatment of telomeropaties.



ASSOCIATED CONTENT

S Supporting Information *

Alignment of eukaryote structural models of selected dyskerinlike proteins. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Tel: 541143657100, ext. 5635. Fax: 541143657132 E-mail: [email protected]; [email protected]. Author Contributions §

D.E.G. and P.D.G. have equally contributed to the work.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS D.E.G. and P.D.G. are researchers from CONICET. D.E.G. is Director of the National Cancer Institute of Argentina. C.S.C has a postdoctoral fellowship from CONICET. This work was supported by Grants from Quilmes National University, CONICET and MINCyT (Argentina).



REFERENCES

(1) Grozdanov, P. N.; Fernandez-Fuentes, N.; Fiser, A.; Meier, U. T. Pathogenic NAP57 mutations decrease ribonucleoprotein assembly in dyskeratosis congenital. Hum. Mol. Genet. 2009, 18, 4546−4551. (2) Wang, C.; Meier, U. T. Architecture and assembly of mammalian H/ACA small nucleolar and telomerase ribonucleoproteins. EMBO J. 2004, 23, 1857−1867. (3) Kiss, T.; Fayet-Lebaron, E.; Jády, B. E. Box H/ACA small ribonucleoproteins. Mol. Cell 2010, 37, 597−606. (4) Mitchell, J. R.; Cheng, J.; Collins, K. A box H/ACA small nucleolar RNA-like domain at the human telomerase RNA 3′ end. Mol. Cell. Biol. 1999, 19, 567−576. (5) Mitchell, J. R.; Wood, E.; Collins, K. A telomerase component is defective in the human disease dyskeratosis congenita. Nature 1999, 402, 551−555. (6) Alawi, F.; Lin, P. Loss of dyskerin reduces the accumulation of a subset of H/ACA snoRNA-derived miRNA. Cell Cycle 2010, 9, 2467− 2469. (7) Bellodi, C.; Kopmar, N.; Ruggero, D. Deregulation of oncogeneinduced senescence and p53 translational control in X-linked dyskeratosis congenita. EMBO J. 2010, 29, 1865−1876. (8) Rocchi, L.; Pacilli, A.; Sethi, R.; Penzo, M.; Schneider, R. J.; Treré, D.; Brigotti, M.; Montanaro, L. Dyskerin depletion increases VEGF mRNA internal ribosome entry site-mediated translation. Nucleic Acids Res. 2013, 41, 8308−8318. (9) Rocchi, L.; Barbosa, A. J. M.; Onofrillo, C.; Del Rio, A.; Montanaro, L. Inhibition of Human Dyskerin as a New Approach to Target Ribosome Biogenesis. PLoS One 2014, 9 (7), e101971. (10) Walne, A. J.; Dokal, I. Dyskeratosis Congenita: a historical perspective. Mech. Ageing Dev. 2008, 129, 48−59. (11) Armanios, M. Telomerase and idiopathic pulmonary fibrosis. Mutat. Res. 2012, 730, 52−58. (12) Holohan, B.; Wright, W. E.; Shay, J. W. Cell biology of disease: Telomeropathies: an emerging spectrum disorder. J. Cell Biol. 2014, 205 (3), 289−299. (13) Montanaro, L.; Brigotti, M.; Clohessy, J.; Barbieri, S.; Ceccarelli, C.; Santini, D.; Taffurelli, M.; Calienni, M.; Teruya-Feldstein, J.; Trerè, L

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research and the significance of telomere length measurements. Blood 2009, 113, 309−316. (31) Vulliamy, T.; Beswick, R.; Kirwan, M.; Marrone, A.; Digweed, M.; Walne, A.; Dokal, I. Mutations in the telomerase component NHP2 cause the premature ageing syndrome dyskeratosis congenita. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 8073−8078. (32) Walne, A. J.; Vulliamy, T.; Marrone, A.; Beswick, R.; Kirwan, M.; Masunari, Y.; Al-Qurashi, F. H.; Aljurf, M.; Dokal, I. Genetic heterogeneity in autosomal recessive dyskeratosis congenita with one subtype due to mutations in the telomerase-associated protein NOP10. Hum. Mol. Genet. 2007, 16, 1619−1629. (33) Zhong, F.; Savage, S. A.; Shkreli, M.; Giri, N.; Jessop, L.; Myers, T.; Chen, R.; Alter, B. P.; Artandi, S. E. Disruption of telomerase trafficking by TCAB1 mutation causes dyskeratosis congenita. Genes Dev. 2011, 25, 11−16. (34) Walne, A. J.; Vulliamy, T.; Beswick, R.; Kirwan, M.; Dokal, I. TINF2 mutations result in very short telomeres: analysis of a large cohort of patients with dyskeratosis congenita and related bone marrow failure syndromes. Blood 2008, 112, 3594−3600. (35) Podlevsky, J. D.; Bley, C. J.; Omana, R. V.; Qi, X.; Chen, J. J. The telomerase database. Nucleic Acids Res. 2008, 36 (Database issue), D339−D343. (36) Rashid, R.; Liang, B.; Baker, D. L.; Youssef, O. A.; He, Y.; Phipps, K.; Terns, R. M.; Terns, M. P.; Li, H. Crystal structure of a Cbf5-Nop10-Gar1 complex and implications in RNA-guided pseudouridylation and dyskeratosis congenita. Mol. Cell 2006, 21, 249− 260. (37) Trahan, C.; Martel, C.; Dragon, F. Effects of dyskeratosis congenita mutations in dyskerin, NHP2 and NOP10 on assembly of H/ACA pre-RNPs. Hum. Mol. Genet. 2010, 19, 825−836. (38) Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389−3402. (39) The UniProt Consortium. Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2014, 42 (Database issue), D191−D198. (40) Gaudet, P.; Argoud-Puy, G.; Cusin, I.; Duek, P.; Evalet, O.; Gateau, A.; Gleizes, A.; Pereira, M.; Zahn-Zabal, M.; Zwahlen, C.; Bairoch, A.; Lane, L. neXtProt: organizing protein knowledge in the context of human proteome projects. J. Proteome Res. 2013, 12 (1), 293−298. (41) Finn, R. D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; Sonnhammer, E. L.; Tate, J.; Punta, M. Pfam: the protein families database. Nucleic Acids Res. 2014, 42 (Database issue), D222−D230. (42) Sigrist, C. J.; de Castro, E.; Cerutti, L.; Cuche, B. A.; Hulo, N.; Bridge, A.; Bougueleret, L.; Xenarios, I. New and continuing developments at PROSITE. Nucleic Acids Res. 2013, 41 (Database issue), D344−D347. (43) Marchler-Bauer, A.; Zheng, C.; Chitsaz, F.; Derbyshire, M. K.; Geer, L. Y.; Geer, R. C.; Gonzales, N. R.; Gwadz, M.; Hurwitz, D. I.; Lanczycki, C. J.; Lu, F.; Lu, S.; Marchler, G. H.; Song, J. S.; Thanki, N.; Yamashita, R. A.; Zhang, D.; Bryant, S. H. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res. 2013, 41 (Database issue), D348−D352. (44) Crooks, G. E.; Hon, G.; Chandonia, J. M.; Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 2004, 14, 1188− 1190. (45) Cole, C.; Barber, J. D.; Barton, G. J. The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 2008, 36 (Web Server issue), W197−W201. (46) Tamura, K.; Peterson, D.; Peterson, N.; Stecher, G.; Nei, M.; Kumar, S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 2011, 28, 2731−2739. (47) Biasini, M.; Bienert, S.; Waterhouse, A.; Arnold, K.; Studer, G.; Schmidt, T.; Kiefer, F.; Cassarino, T. G.; Bertoni, M.; Bordoli, L.; Schwede, T. SWISS-MODEL: modelling protein tertiary and

quaternary structure using evolutionary information. Nucleic Acids Res. 2014, 42 (Web Server issue), W252−W258. (48) Laskowski, R. A.; MacArthur, M. W.; Moss, D.; Thornton, J. M. PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 1993, 26, 283−291. (49) Wiederstein, M.; Sippl, M. J. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007, 35, W407−W410. (50) Kawabata, T. MATRAS: A program for protein 3D structure comparison. Nucleic Acids Res. 2003, 31 (13), 3367−3369. (51) Gouet, P.; Robert, X.; Courcelle, E. ESPript/ENDscript: Extracting and rendering sequence and 3D information from atomic structures of proteins. Nucleic Acids Res. 2003, 31 (13), 3320−3323. (52) Smoot, M. E.; Ono, K.; Ruscheinski, J.; Wang, P. L.; Ideker, T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 2011, 27, 431−432. (53) Kerrien, S.; Alam-Faruque, Y.; Aranda, B.; Bancarz, I.; Bridge, A.; Derow, C.; Dimmer, E.; Feuermann, M.; Friedrichsen, A.; Huntley, R.; Kohler, C.; Khadake, J.; Leroy, C.; Liban, A.; Lieftink, C.; Montecchi-Palazzi, L.; Orchard, S.; Risse, J.; Robbe, K.; Roechert, B.; Thorneycroft, D.; Zhang, Y.; Apweiler, R.; Hermjakob, H. IntAct-open source resource for molecular interaction data. Nucleic Acids Res. 2007, 35 (Database issue), D561−D565. (54) Zanzoni, A.; Montecchi-Palazzi, L.; Quondam, M.; Ausiello, G.; Helmer-Citterich, M.; Cesareni, G. MINT: a Molecular INTeraction database. FEBS Lett. 2002, 513, 135−140. (55) Mostafavi, S.; Ray, D.; Warde-Farley, D.; Grouios, C.; Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008, 9 (1), S4. (56) Szklarczyk, D.; Franceschini, A.; Kuhn, M.; Simonovic, M.; Roth, A.; Minguez, P.; Doerks, T.; Stark, M.; Muller, J.; Bork, P.; Jensen, L. J.; von Mering, C. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011, 39 (1), D561−D568. (57) Aranda, B.; Blankenburg, H.; Kerrien, S.; Brinkman, F. S.; Ceol, A.; Chautard, E.; Dana, J. M.; De Las Rivas, J.; Dumousseau, M.; Galeota, E.; Gaulton, A.; Goll, J.; Hancock, R. E.; Isserlin, R.; Jimenez, R. C.; Kerssemakers, J.; Khadake, J.; Lynn, D. J.; Michaut, M.; O’Kelly, G.; Ono, K.; Orchard, S.; Prieto, C.; Razick, S.; Rigina, O.; Salwinski, L.; Simonovic, M.; Velankar, S.; Winter, A.; Wu, G.; Bader, G. D.; Cesareni, G.; Donaldson, I. M.; Eisenberg, D.; Kleywegt, G. J.; Overington, J.; Ricard-Blum, S.; Tyers, M.; Albrecht, M.; Hermjakob, H. PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat. Methods 2011, 8, 528−529. (58) Bindea, G.; Mlecnik, B.; Hackl, H.; Charoentong, P.; Tosolini, M.; Kirilovsky, A.; Fridman, W. H.; Pagès, F.; Trajanoski, Z.; Galon, J. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 2009, 25 (8), 1091−1093. (59) Aravind, L.; Koonin, E. V. Novel predicted RNA-binding domains associated with the translation machinery. J. Mol. Evol. 1999, 48, 291−302. (60) Pérez-Arellano, I.; Gallego, J.; Cervera, J. The PUA domain - a structural and functional overview. FEBS J. 2007, 274, 4972−4984. (61) Cerrudo, C. S.; Ghiringhelli, P. D.; Gomez, D. E. Protein universe containing a PUA RNA-binding domain. FEBS J. 2014, 281, 74−87. (62) Scott, M. S.; Boisvert, F.; McDowall, M. D.; Lamond, A. I.; Barton, G. J. Characterization and prediction of protein nucleolar localization sequences. Nucleic Acids Res. 2010, 38, 7388−7399. (63) Zucchini, C.; Strippoli, P.; Biolchi, A.; Solmi, R.; Lenzi, L.; D’Addabbo, P.; Carinci, P.; Valvassori, L. The human TruB family of pseudouridine synthase genes, including the Dyskeratosis Congenita 1 gene and the novel member TRUB1. Int. J. Mol. Med. 2003, 11 (6), 697−704. (64) Yokobori, S.; Itoh, T.; Yoshinari, S.; Nomura, N.; Sako, Y.; Yamagishi, A.; Oshima, T.; Kita, K.; Watanabe, Y. Gain and loss of an intron in a protein-coding gene in Archaea: the case of an archaeal RNA pseudouridine synthase gene. BMC Evol. Biol. 2009, 9, 198. M

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research (65) Rayan, A. New tips for structure prediction by comparative modeling. Bioinformation 2009, 3, 263−267. (66) Brandes, U. A faster algorithm for betweenness centrality. J. Mathematical Sociol. 2001, 25, 163−177. (67) Barabási, A. L.; Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5, 101−113. (68) Zhu, Y.; Tomlinson, R. L.; Lukowiak, A. A.; Terns, R. M.; Terns, M. P. Telomerase RNA accumulates in Cajal bodies in human cancer cells. Mol. Biol. Cell 2004, 15 (1), 81−90. (69) Angrisani, A.; Vicidomini, R.; Turano, M.; Furia, M. Human dyskerin: beyond telomeres. Biol. Chem. 2014, 395 (6), 593−610. (70) Heiss, N. S.; Girod, A.; Salowsky, R.; Wiemann, S.; Pepperkok, R.; Poustka, A. Dyskerin localizes to the nucleolus and its mislocalization is unlikely to play a role in the pathogenesis of dyskeratosis congenita. Hum. Mol. Genet. 1999, 8, 2515−2524. (71) Angrisani, A.; Turano, M.; Paparo, L.; Di Mauro, C.; Furia, M. A new human dyskerin isoform with cytoplasmic localization. Biochim. Biophys. Acta 2011, 1810, 1361−1368. (72) Turano, M.; Angrisani, A.; Di Maio, N.; Furia, M. Intron retention: a human DKC1 gene common splicing event. Biochem. Cell Biol. 2013, 91, 506−512. (73) Li, S.; Duan, J.; Li, D.; Yang, B.; Dong, M.; Ye, K. Reconstitution and structural analysis of the yeast box H/ACA RNA-guided pseudouridine synthase. Genes Dev. 2011, 25, 2409−2421. (74) Li, S.; Duan, J.; Li, D.; Ma, S.; Ye, K. Structure of the Shq1Cbf5-Nop10-Gar1 complex and implications for H/ACA RNP biogenesis and dyskeratosis congenita. EMBO J. 2011, 30, 5010−5020. (75) Buchan, D. W. A.; Minneci, F.; Nugent, T. C. O.; Bryson, K.; Jones, D. T. Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res. 2013, 41 (W1), W340−W348.

N

DOI: 10.1021/pr500956k J. Proteome Res. XXXX, XXX, XXX−XXX