An Integrative Proteomic Approach Identifies Novel Cellular SMYD2

May 10, 2016 - Protein methylation is a post-translational modification with important roles in transcriptional regulation and other biological proces...
0 downloads 9 Views 4MB Size
Article pubs.acs.org/jpr

An Integrative Proteomic Approach Identifies Novel Cellular SMYD2 Substrates Hazem Ahmed,† Shili Duan,‡ Cheryl H. Arrowsmith,†,‡ Dalia Barsyte-Lovejoy,† and Matthieu Schapira*,†,§ †

Structural Genomics Consortium, University of Toronto, 101 College Street, MaRS Centre, South Tower, Toronto, Ontario M5G 1L7, Canada ‡ Princess Margaret Cancer Centre and Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5S 1A8, Canada § Department of Pharmacology and Toxicology, University of Toronto; 1 King’s College Circle, Toronto, Ontario M5S 1A8, Canada S Supporting Information *

ABSTRACT: Protein methylation is a post-translational modification with important roles in transcriptional regulation and other biological processes, but the enzyme−substrate relationship between the 68 known human protein methyltransferases and the thousands of reported methylation sites is poorly understood. Here, we propose a bioinformatic approach that integrates structural, biochemical, cellular, and proteomic data to identify novel cellular substrates of the lysine methyltransferase SMYD2. Of the 14 novel putative SMYD2 substrates identified by our approach, six were confirmed in cells by immunoprecipitation: MAPT, CCAR2, EEF2, NCOA3, STUB1, and UTP14A. Treatment with the selective SMYD2 inhibitor BAY-598 abrogated the methylation signal, indicating that methylation of these novel substrates was dependent on the catalytic activity of the enzyme. We believe that our integrative approach can be applied to other protein lysine methyltransferases, and help understand how lysine methylation participates in wider signaling processes. KEYWORDS: SMYD2, non-histone substrates, lysine methylation, data integration, pan-methyl lysine antibody



INTRODUCTION Protein lysine methylation is an important post-translational modification (PTM) that impacts many biological processes1 and is associated with multiple human diseases, particularly cancer.2,3 Recent mass spectrometry experiments4 suggest that more than 1500 non-histone lysine methyltransferase (KMT) substrates are modified by over 60 KMT enzymes characterized so far.1 However, less than 10−15% of these modification sites have been associated with their corresponding KMT enzymes in the literature to date.5 The goal of this study is to propose a systematic bioinformatic approach to predict missing enzyme/substrate relationships, with a particular focus on the lysine methyltransferase SMYD2. SMYD2 is one of five characterized SMYD family proteins (SET and MYND domain-containing proteins),6 which is known to catalyze the methylation of 13 published non-histone protein substrates. Previously reported SMYD2 substrates are TP53,7 RB1,8,9 HSP90,10 ERα,11 and PARP1.12 Additionally, Lanouette et al. recently identified SIX1, SIX2, SIN3B, and DHX15 as novel SMYD2 substrates using a multistate computational protein design strategy.13 And most recently, Olsen et al. used a SILAC-based proteomic approach to confirm the methylation of four more SMYD2 substrates: BTF3, PDAP1, AHNAK, and AHNAK2.14 Here, we apply an integrative bioinformatic approach to discover 6 additional substrates, increasing the total number of known SMYD2 substrates. Our bioinformatic approach is based on the integration of biochemical, structural, and biological network analyses. The main hypothesis of our approach is that intersecting multiple © XXXX American Chemical Society

tolerant, orthogonal selection methods is more powerful than imposing one or two, over-restrictive filter(s). Interrogated data sources included physical protein−protein interactions, protein functional associations (co-expression and biological pathways), published substrates in the literature, methylation sites detected by mass spectrometry experiments, multiple sequence alignment, and cellular localization. Machine learning algorithms, such as Markov graph clustering,15 overlapping neighbor extended-based clustering algorithm,16 and text mining,17 were used to analyze this heterogeneous body of data (Figure 1). Six of the putative SMYD2 substrate candidates were methylated in cells in the absence but not in the presence of BAY-598, a selective SMYD2 inhibitor.18 Our results reflect the diversity of non-histone protein targets modified by SMYD2, and uncover potential relationships of SMYD2 with novel biological processes and disease-related proteins. We believe our bioinformatic substrate selection process is broadly applicable to other PMTs, and is complementary with proteomics-based experimental approaches14,19 and structure-based computational predictions.13



METHODS

Databases

Mass spectrometry data used in our selection process are downloaded from the Phosphosite Plus database,4 in addition to Received: March 13, 2016

A

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 1. Integration of multiple data sources. The substrate prediction method presented here relies on the integration of a diverse array of data.

recent high-throughput mass spec experiments.20 Interaction data with SMYD2 are retrieved from published experimental results of human-chromatin-related protein interactions,21 as well as from String Database V9.1.17 Localization data are retrieved from UniProt Knowledgebase22 and GeneCards Database (http://www.genecards.org/). Expression data are from the Human Protein Atlas23 and the Human Proteome Map.24 Functional association data are retrieved from Gene Ontology Database,25 as well as String Database V9.1.17

with internal weight greatly higher than their external weight. The internal weight is defined as the total sum of edge weights contained entirely by a group of proteins within the identified cluster, whereas the external weight is defined as the total weight of edges that connect the cluster with the rest of the network.16 Another advantage of ClusterONE algorithm is that it can make use of weights on network edges. For the purpose of this task, each node in the network represents a protein that could belong to the shortlist of SMYD2 candidate substrates and each weighted edge represents the interaction score (confidence) between the corresponding protein pair. The use of ClusterONE algorithm allowed to retrieve a set of more flexible and realistic group of biologically related protein clusters, and helped to focus on potential substrates candidates for SMYD2 with no previously known direct association. We believe the integration of biological network analysis to our multiple selection filters and criteria is what made our shortlist of selected substrate candidates more diverse, yet biologically relevant to SMYD2 (Table 1).

Data Integration

We made use of protein networks17 to integrate multiple selection criteria as color-coded edges (PPI, Co-Expression, Pathways and Text Mining) (Figure 2). Network analysis is then conducted using machine learning tools and clustering algorithms to derive interesting patterns from protein interaction and functional networks. We initially used Markov graph Clustering (MCL) algorithm to highlight biologically related clusters of proteins. MCL is a flow-based network clustering algorithm, which is based on the hypothesis that the number of paths between two arbitrary proteins lying in a natural cluster is expected to be higher than the number of paths between two proteins lying in different natural clusters.15 We then gave a preference in selecting hits that are closely related to SMYD2 cluster, but not necessarily known to be directly interacting or associated with SMYD2 itself (Figure 2). The use of a cluster-based indirect association inference is another distinctive feature in our selection process, compared to previous related work,10 which restricted the conducted proteomic analyses to direct (or immediate) interaction partners shared between SMYD family proteins. As with many other clustering techniques, Markov graph clustering assumes that each protein on a graph node should belong to one, and only one, biologically related cluster. Nonetheless, proteins can have multiple functions and could therefore belong to more than one group of other biologically related proteins (or clusters). We used the Overlapping Neighbor Extended-based Clustering (ClusterONE) algorithm16 implemented in Cytoscape (Version 3.2.1) to allow for cluster overlap. The main idea of ClusterONE algorithm is seeking densely internally connected cluster of nodes (or proteins) that are weakly connected externally, or in other words, seeking strongly connected clusters

Cell-Based Assays

The putative substrate cDNAs from the MGC collection were cloned into pAcGFP vector using infusion system (Clontech). P53 K370me1 antibody was generated in rabbits using GSRA HSSHLKmeSKKGQSTSRHK KLH conjugated peptide at the University of Toronto, Division of Comparative Medicine. To examine the methylation status of putative substrates in HEK293 cells, 2 × 105 cells were seeded in 6-well plates and co-transfected with GFP-tagged substrate and FLAG-tagged SMYD2 (a kind gift from Dr. Shelley Berger) using Lipofectamine 2000 (Invitrogen). BAY-598 compound18 was added 4 h after transfection. The day after transfection, cells were harvested and lysed in total lysis buffer (20 mM Tris-HCl, pH = 8, 150 mM NaCl, 1 mM EDTA, 10 mM MgCl2, 0.5% Triton-X100, 12.5 U/mL benzonase (Sigma), and complete EDTA-free protease inhibitor cocktail (Roche)). Cell lysates were immunoprecipitated overnight with anti-GFP antibody (Invitrogen) and washed immunoprecipitates subjected to 5−12% SDS-PAGE and transferred to a PVDF membrane. Monoclonal anti-GFP antibody (Clontech) and rabbit anti-TP53 K370me1 antibody were used, followed by detection with IRDye680RD goat anti-mouse IgG (LI-COR Biosciences) and IRDye800CW goat anti-rabbit IgG (LI-COR B

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 2. Network analysis of known and putative SMYD2 substrate candidates. Known SMYD2 substrates (green star) and select proteins from the human methylome that matched our SMYD2 sequence specificity profile are mapped in a refined protein interaction network. Nontested (due to failed cloning or poor expression), tested, and experimentally confirmed SMYD2 substrate candidates are indicated. Some edges are shown as dashed line to visually dissociate the two main clusters. The unrefined version of the cluster, including all 127 proteins that matched our sequence motif, is shown in Figure S-1.

The first filter applied in our selection process was that the substrate should be present in the experimental methylome, a collection of 2248 lysine methylation sites from the human proteome identified by mass spectrometry in 1528 proteins (as of Dec 2014) available in the database Phosphosite Plus.4 This ensured that the predicted substrate lysines were indeed available for methylation. Experimental methylation sites were then matched against a sequence motif focused on the PMT of interest. SMYD2 substrates exhibit unexpected sequence diversity,14,27 and a selectivity profile previously reported by Lanouette et al.13 was enriched with a number of known SMYD2 substrates (Figure 4). The previously published motif13 was helpful to our selection process, not only because it provided some insight on SMYD2 substrate preference, but equally important, it also defined the location of the most critical side chains flanking the substrate lysine, namely positions −1, + 1, and +2. However, as mentioned by Lanouette et al., the motif would not have predicted the methylation of HSP90 and RB1 by SMYD2. This is probably because the motif is derived from the structure of SMYD2 in complex with a TP53 peptide, while it has been shown that SMYD2 is capable of adopting alternate binding modes.10 We therefore introduced additional acceptable side chains flanking the substrate lysine (Figure 4, bold characters), which rescued substrate candidates (Figure 5) that were later confirmed experimentally. The main benefit of using the proposed enriched motif is that it implicitly integrates the

Biosciences) on the Odyssey Imager (LI-COR Biosciences). Immunofluorescence microscopy was performed on HEK293 cells grown on the coverslips, stained with DAPI (Sigma), and analyzed on Zeiss spinning disc confocal microscope (Zeiss AxioVert 200M).



RESULTS A distinctive feature of our selection process is the use of multiple orthogonal, permissive filters, and selection criteria rather than adopting a few restrictive filters. This reduces the risks of missing hits that would not perfectly comply with prerequisites that are too demanding, such as adhesion to a very specific amino acid sequence motif. We first filtered-out methylation sites that are not satisfying two prerequisites: the sites must be part of the known methylome, and must match a predefined sequence motif. A set of five criteria were then used to rank substrate candidates: physical interactions with SMYD2, cellular localization, and tissue expression matching those of SMYD2, pathway analysis, and text mining supporting functional interaction with SMYD2 (Figure 3). Some of the selection criteria are more important than others and, therefore, are assigned greater selection power (Table S-1). For example, previously reported physical interaction with SMYD2 has a greater weight than cellular co-localization with the enzyme. The use of weighted selection criteria based on their contribution power to the selection process mimics the decision tree algorithm that is commonly used in machine learning practices.26 C

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research Table 1. Confirmation Rate of Predicted SMYD2 Substratesa

a The combined predictive support (dark blue, high; light blue, medium; pink, low; white, no support) coming from different selection criteria is indicated. Major mismatches to our substrate sequence motif (Figure 4C) are not (A) or are (B) tolerated. *: the experimental SMYD2 interaction score is indicated.21

Figure 4. Enrichment of a previous SMYD2 substrate signature (A) Structure-based SMYD2 specificity profile.13 (B) Known SMYD2 substrates. Side chains that do not comply with the profile shown in A are in bold. (C) Enriched SMYD2 specificity profile used in this work. Added side chains are underlined and in bold.

Figure 3. Filters and criteria used in the SMYD2 substrate selection process. The attrition rates resulting from the two hard filters imposed on putative SMYD2 methylation sites are indicated. The compliance of the resulting 127 proteins with the five subsequent selection criteria is shown as a Venn diagram (black font). The numbers of substrates experimentally tested and confirmed are indicated (yellow).

Figure 5. Predicted methylation sites of the confirmed novel SMYD2 substrates. The substrate lysine (gray) and amino acids added to our enriched SMYD2 specificity profile (bold) are highlighted.

127 proteins (Figure 3, Table S-2). These 127 proteins were then mapped onto protein interaction networks to identify proteins likely to interact directly or indirectly with SMYD2 (Figure S-1). Two sources of human protein−protein interactions

binding modes of multiple SMYD2 substrates published in the literature. Of the 2248 lysine methylation sites in the methylome, 160 matched our permissive SMYD2 selectivity profile in D

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Retrospective analysis of the experimentally confirmed hits provides some indications on the most successful selection strategy. Importantly, we find that, out of 14 SMYD2 substrate candidates selected for experimental validation, all 6 validated substrates were either previously reported to interact directly with SMYD2, or were clustered with the enzyme in protein interaction networks (Figure 2), suggesting that this selection criterion should be a must-have rather than a nice-to-have in future work. Network analysis that gives preference to proteins clustered with SMYD2 (but not necessarily known to be directly associated with SMYD2) was utilized as one of our selection factors, but was not used as a hard filter in our selection process, as protein−protein interaction databases are not exhaustive. We next imposed direct interaction with SMYD2 as a hard filter to all putative substrates, and asked whether the expected increase in accuracy could allow us to rescue previously missed substrates by making our required substrate sequence motif more permissive. In this new selection strategy, we cross referenced proteins from the experimental human methylome with proteins reported to physically interact directly with SMYD2 by affinity purification coupled with mass spectrometry,21 and allowed one or more major mismatches from our previously defined SMYD2 substrate sequence motif (Figure 4C). Major mismatches were defined as amino acids that were neither found in Lanouette’s SMYD2 specificity profile,13 nor in published SMYD2 substrates, such as K and S at position −1, or P at position +1 (Figure S-3). We were unable to experimentally detect methylation by SMYD2 of the four resulting substrate candidates (BTF3, NACA, CBWD2, and CKAP5) (Figure 6), suggesting that major mismatches to our predefined sequence motif were not acceptable. Interestingly, BTF3 was later reported to be a valid SMYD2 substrate by Olsen et al.14 Differences in methods to experimentally detect methylation by SMYD2 are probably underlying these conflicting results. In particular, the antibody that we used to detect methylated lysines was raised against a well-documented SMYD2 substrate methylation site on TP53. It is possible that this antibody does not recognize methylated lysines with flanking sequences that are too dissimilar to TP53, such the ones tested in this exercise, that include major mismatches in our predefined motif. Commercially available pan monomethyl lysine antibodies are of poor quality and were not found useful in this study. Taken together, these results indicate that the integration of multiple permissive selection methods derived from heterogeneous data sources can identify novel methyltransferase substrates with methylations site sequences sometimes significantly dissimilar from known substrates. This bioinformatic method made use of a sequence motif previously derived from a SMYD2 crystal structure,13 and relies on prior knowledge of valid substrates. It is therefore complementary to experimental approaches such as peptide arrays or SILAC-based proteomic.14,31,32

were used: (i) a recently published collection of protein−protein interactions focused on chromatin factors21 and (ii) interaction networks from the STRING database.17 Next, we refined our hits using cellular localization information and tissue expression profiles. SMYD2 is known to be mainly cytoplasmic,28,29 and highly expressed in heart and brain.30 Proteins localized in the cytoplasm and those expressed in the heart or brain were therefore prioritized, representing 43 and 55 of our 127 substrate candidates, respectively. While we prioritized cytoplasmic substrates, we did not ignore nuclear targets altogether, as 20% of SMYD2 is localized in the nucleus:29 putative nuclear substrates that were supported by multiple alternative selection criteria were rescued in our selection process (Figure 3). Finally, we gave a preference to hits that are functionally associated with SMYD2 based on shared biological pathways (KEGG and Reactome databases), shared molecular function (Gene Ontology), or text mining (substrate candidate and SMYD2 present in the same abstract in PubMed). Integrating these multiple selection criteria resulted in a refined network (Figure 2), composed of 61 proteins mostly organized in two clusters, from which 14 novel SMYD2 substrate candidates satisfying most selection criteria were selected. Ideally, SMYD2 substrate candidates would not only be in the human methylome and have a known methylation site that matches our predefined SMYD2 substrate sequence motif (Figure 3, filters 1 and 2), but they would also be physically interacting, co-localized, co-expressed with SMYD2, be functionally related to SMYD2, and be co-cited with SMYD2 in the published literature. We find that none of the 127 proteins passing the first two filters meet all these prioritization criteria (Figure 3). Fifteen met four criteria, 17 met three, and 33 met two. To increase our chances of success, we selected putative substrates that met diverse combinations of the five prioritization criteria (Table 1, top). The selected SMYD2 substrate candidates were tested in cells. We took advantage of a broad-selectivity methyl lysine antibody that was originally generated against a K370 TP53 peptide7,29 for rapid confirmation of the substrates. In addition to recognizing the K370me1 TP53, the antibody also detected other SMYD2 substrates in the overexpression system (Figure 6A). Importantly, it is possible that the antibody does not recognize methyllysines in the context of sequences that are too divergent from the original TP53 antigen peptide; consequently, putative substrates that did not confirm in this assay cannot be definitely ruled out as SMYD2 substrates. We overexpressed the putative substrate GFP fusions in HEK293 cells together with Flag tagged SMYD2 and performed immunoprecipitation using anti-GFP antibody (Figure 6B). Microtubule Associated Protein Tau (MAPT), Cell Cycle and Apoptosis Regulator 2 (CCAR2), Eukaryotic Translation Elongation Factor 2 (EEF2), Nuclear Receptor Coactivator 3 (NCOA3), STIP1 homology and U-box containing protein 1 (STUB1), and U3 Small Nucleolar RNA-Associated Protein 14 Homologue A (UTP14A) were identified as methylated by SMYD2 in this system (Figure 6C). Treatment with 1 μM BAY-598, a potent and selective SMYD2 inhibitor,18 abrogated the methylation signal, indicating that methylation was directly dependent on the catalytic activity of SMYD2 (Figure 6D). We also investigated the cellular localization of the confirmed SMYD2 substrates. CCAR2 and UTP14A were nuclear, while EEF2, MAPT, NCOA3, and STUB1 preferentially localized to the cytoplasm in the overexpressed setting. These findings are consistent with the predominantly cytoplasmic localization of SMYD229 (Figure S-2).



DISCUSSION By integrating a diverse array of orthogonal data sets, and by limiting our search to the experimental human methylome, we identified novel SMYD2 substrates (Table 1). While we believe that the systematic integration of multiple orthogonal data sources was beneficial to our selection process, the availability and quality of some of the data that were used can be a limitation to our approach. Knowledge of previously reported substrates is a prerequisite to generate a signature motif, but is not available for all PMTs. For instance, our selection strategy could not be applied to PRDMs, a poorly characterized branch of the human E

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 6. Antibody based confirmation of cellular SMYD2 substrates. (A) An antibody generated against TP53 peptide (GSRAHSSHLKmeSKKGQSTSRHK) recognizes many substrates that are methylated by ectopic SMYD2. A total of 293 cells were transfected with P53-Flag and control or SMYD2Flag plasmids and immunoblotted with methyl K370 antibody. (B) Schematic for investigating the methylation of cellular substrates by SMYD2. (C) SMYD2 methylation of cellular substrates. Fourteen putative substrates were cloned as GFP fusions and transfected into 293 cells with control or SMYD2-Flag plasmid. (D) Confirmation that substrates methylation is dependent on the SMYD2 catalytic activity. After transfection, cells were treated with 1 μM of the SMYD2 inhibitor BAY-598,18 and substrate methylation was evaluated as above.

identified as substrate for other PMTs should be deprioritized as putative methylation substrates for novel enzymes. Some of the novel SMYD2 substrates identified here have well established disease association, and the effect of methylation by SMYD2 on protein function remains to be investigated. For example, lysine methylation of MAPT (tau protein) is believed to protect against pathological aggregation of tau in Alzheimer’s disease.35 CCAR2, another SMYD2 substrate revealed here, competes with MDM2 for TP53 binding and acts as a tumor suppressor in breast cancer through TP53 regulation.36 We find that SMYD2 methylates CCAR2, and predict that the targeted side chain is K123, a lysine that can be ubiquitinated.37,38

methyltransferase phylogenetic tree. Additionally, current knowledge of the human interactome and the human methylome is both incomplete and sometimes inaccurate. It is estimated that existing human protein−protein interaction (PPI) networks include as much as 64% false positives and between 43% and 71% false negatives.33 While integration of PPI networks is an important component of our substrate selection strategy, it can be partially misleading. We note that K314 of RelA passed all available selection filters and criteria as a SMYD2 substrate candidate, but was not confirmed experimentally (Table 1). Interestingly, this site is already reported to be monomethylated by SETD7/9 in vitro and in vivo.34 Retrospectively, this suggests that sites previously F

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research



(5) Biggar, K. K.; Li, S. S.-C. Non-histone protein methylation as a regulator of cellular signalling and function. Nat. Rev. Mol. Cell Biol. 2015, 16 (1), 5−17. (6) Xu, G.; Liu, G.; Xiong, S.; Liu, H.; Chen, X.; Zheng, B. The Histone Methyltransferase Smyd2 Is a Negative Regulator of Macrophage Activation by Suppressing Interleukin 6 (IL-6) and Tumor Necrosis Factor α (TNF-α) Production. J. Biol. Chem. 2015, 290 (9), 5414−5423. (7) Huang, J.; Perez-Burgos, L.; Placek, B. J.; Sengupta, R.; Richter, M.; Dorsey, J. A.; Kubicek, S.; Opravil, S.; Jenuwein, T.; Berger, S. L. Repression of p53 activity by Smyd2-mediated methylation. Nature 2006, 444 (7119), 629−632. (8) Saddic, L. A.; West, L. E.; Aslanian, A.; Yates, J. R.; Rubin, S. M.; Gozani, O.; Sage, J. Methylation of the retinoblastoma tumor suppressor by SMYD2. J. Biol. Chem. 2010, 285 (48), 37733−37740. (9) Cho, H.-S.; Hayami, S.; Toyokawa, G.; Maejima, K.; Yamane, Y.; Suzuki, T.; Dohmae, N.; Kogure, M.; Kang, D.; Neal, D. E. RB1 methylation by SMYD2 enhances cell cycle progression through an increase of RB1 phosphorylation. Neoplasia 2012, 14 (6), 476−IN8. (10) Abu-Farha, M.; Lanouette, S.; Elisma, F.; Tremblay, V.; Butson, J.; Figeys, D.; Couture, J.-F. Proteomic analyses of the SMYD family interactomes identify HSP90 as a novel target for SMYD2. J. Mol. Cell. Biol. 2011, 3 (5), 301−308. (11) Zhang, X.; Tanaka, K.; Yan, J.; Li, J.; Peng, D.; Jiang, Y.; Yang, Z.; Barton, M. C.; Wen, H.; Shi, X. Regulation of estrogen receptor α by histone methyltransferase SMYD2-mediated protein methylation. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (43), 17284−17289. (12) Piao, L.; Kang, D.; Suzuki, T.; Masuda, A.; Dohmae, N.; Nakamura, Y.; Hamamoto, R. The histone methyltransferase SMYD2 methylates PARP1 and promotes poly (ADP-ribosyl) ation activity in cancer cells. Neoplasia 2014, 16 (3), 257−264.e2. (13) Lanouette, S.; Davey, J. A.; Elisma, F.; Ning, Z.; Figeys, D.; Chica, R. A.; Couture, J.-F. Discovery of Substrates for a SET Domain Lysine Methyltransferase Predicted by Multistate Computational Protein Design. Structure 2015, 23 (1), 206−215. (14) Olsen, J. B.; Cao, X. J.; Han, B.; Chen, L. H.; Horvath, A.; Richardson, T. I.; Campbell, R. M.; Garcia, B. A.; Nguyen, H. Quantitative profiling of the activity of protein lysine methyltransferase SMYD2 using SILAC-based proteomics. Mol. Cell. Proteomics 2016, 15, 892−905. (15) Brandes, U.; Gaertler, M.; Wagner, D. Experiments on Graph Clustering Algorithms; LNCS Springer, 2003; Vol. 2832. (16) Nepusz, T.; Yu, H.; Paccanaro, A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods 2012, 9 (5), 471−472. (17) Franceschini, A.; Szklarczyk, D.; Frankild, S.; Kuhn, M.; Simonovic, M.; Roth, A.; Lin, J.; Minguez, P.; Bork, P.; von Mering, C. STRING v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013, 41 (D1), D808− D815. (18) Eggert, E.; Hillig, R. C.; Kohr, S.; Stockigt, D.; Weiske, J.; Barak, N.; Mowat, J.; Brumby, T.; Christ, C. D.; Ter Laak, A.; Lang, T.; Fernandez-Montalvan, A. E.; Badock, V.; Weinmann, H.; Hartung, I. V.; Barsyte-Lovejoy, D.; Szewczyk, M.; Kennedy, S.; Li, F.; Vedadi, M.; Brown, P. J.; Santhakumar, V.; Arrowsmith, C. H.; Stellfeld, T.; Stresemann, C. Discovery and Characterization of a Highly Potent and Selective Aminopyrazoline-Based in vivo Probe (BAY-598) for the Protein Lysine Methyltransferase SMYD2. J. Med. Chem. 2016, DOI: 10.1021/acs.jmedchem.5b01890. (19) Carlson, S. M.; Gozani, O. Emerging technologies to map the protein methylome. J. Mol. Biol. 2014, 426 (20), 3350−3362. (20) Wu, Z.; Cheng, Z.; Sun, M.; Wan, X.; Liu, P.; He, T.; Tan, M.; Zhao, Y. A chemical proteomics approach for global analysis of lysine monomethylome profiling. Mol. Cell. Proteomics 2015, 14 (2), 329−39. (21) Marcon, E.; Ni, Z.; Pu, S.; Turinsky, A. L.; Trimble, S. S.; Olsen, J. B.; Silverman-Gavrila, R.; Silverman-Gavrila, L.; Phanse, S.; Guo, H. Human-chromatin-related protein interactions identify a demethylase complex required for chromosome segregation. Cell Rep. 2014, 8 (1), 297−310.

CONCLUSIONS Continued progress in connecting the ∼60 human PMTs to thousands of known methylation sites will reveal unsuspected regulatory mechanisms, signaling pathways, and opportunities in drug discovery. The bioinformatic approach presented here, where multiple data sources are interrogated with permissive filters, and intersected, can be customized toward different PMTs based on data availability. The method will gain in efficiency as more data becomes available. For instance, previous knowledge of specific methylation sites of an enzyme can be used to broaden or refine its substrate sequence signature. Conversely, bioinformatic predictions can accelerate the discovery of novel substrates. We therefore believe that experimental and computational approaches can complement each other and that a combined implementation of both methodologies is a valid path forward.



ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.6b00220. Figure S-1: Protein interaction map of the 127 hits. Figure S-2: Cellular localization of novel substrates. Figure S-3: Permissive SMYD2 selectivity profile. Table S-1: Summary of filters implemented and associated data source. Table S-2: partial table of 160 sites selected from the methylome matching our permissive SMYD2 signature motif (PDF) Full list of 160 sites selected from the methylome matching our permissive SMYD2 signature motif (XLSX)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The SGC is a registered charity (number 1097737) that receives funds from AbbVie, Bayer, Boehringer Ingelheim, Genome Canada through the Ontario Genomics Institute [OGI-055], GlaxoSmithKline, Janssen, Lilly Canada, the Novartis Research Foundation, the Ontario Ministry of Economic Development and Innovation, Pfizer, Takeda, and the Wellcome Trust [092809/Z/10/Z]. We thank Arin Dunning for the antibody generation and Mani Ravichandran for help with the antibody purification. Our thanks to Dr. Shelley Berger for providing Flagtagged SMYD2.



REFERENCES

(1) Moore, K. E.; Gozani, O. An unexpected journey: lysine methylation across the proteome. Biochim. Biophys. Acta, Gene Regul. Mech. 2014, 1839 (12), 1395−1403. (2) Versteeg, R. Cancer: tumours outside the mutation box. Nature 2014, 506 (7489), 438−439. (3) Hamamoto, R.; Saloura, V.; Nakamura, Y. Critical roles of nonhistone protein lysine methylation in human tumorigenesis. Nat. Rev. Cancer 2015, 15 (2), 110−124. (4) Hornbeck, P. V.; Zhang, B.; Murray, B.; Kornhauser, J. M.; Latham, V.; Skrzypek, E. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015, 43 (D1), D512−D520. G

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research (22) Pundir, S.; Magrane, M.; Martin, M. J.; O’Donovan, C. Searching and navigating UniProt databases. Cur. Protoc. Bioinformatics 2015, 1.27.1−1.27.10. (23) Uhlén, M.; Fagerberg, L.; Hallström, B. M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, Å.; Kampf, C.; Sjöstedt, E.; Asplund, A. Tissue-based map of the human proteome. Science 2015, 347 (6220), 1260419. (24) Kim, M.-S.; Pinto, S. M.; Getnet, D.; Nirujogi, R. S.; Manda, S. S.; Chaerkady, R.; Madugundu, A. K.; Kelkar, D. S.; Isserlin, R.; Jain, S. A draft map of the human proteome. Nature 2014, 509 (7502), 575−581. (25) The Gene Ontology Consortium. Gene ontology consortium: going forward. Nucleic Acids Res. 2015, 43 (D1), D1049−D1056. (26) Kingsford, C.; Salzberg, S. L. What are decision trees? Nat. Biotechnol. 2008, 26 (9), 1011−1013. (27) Xu, S.; Zhong, C.; Zhang, T.; Ding, J. Structure of human lysine methyltransferase Smyd2 reveals insights into the substrate divergence in Smyd proteins. J. Mol. Cell. Biol. 2011, 3 (5), 293−300. (28) Donlin, L. T.; Andresen, C.; Just, S.; Rudensky, E.; Pappas, C. T.; Kruger, M.; Jacobs, E. Y.; Unger, A.; Zieseniss, A.; Dobenecker, M.-W. Smyd2 controls cytoplasmic lysine methylation of Hsp90 and myofilament organization. Genes Dev. 2012, 26 (2), 114−119. (29) Nguyen, H.; Allali-Hassani, A.; Antonysamy, S.; Chang, S.; Chen, L. H.; Curtis, C.; Emtage, S.; Fan, L.; Gheyi, T.; Li, F. LLY-507, a Cellactive, Potent, and Selective Inhibitor of Protein-lysine Methyltransferase SMYD2. J. Biol. Chem. 2015, 290 (22), 13641−13653. (30) Brown, M. A.; Sims, R. J.; Gottlieb, P. D.; Tucker, P. W. Identification and characterization of Smyd2: a split SET/MYND domain-containing histone H3 lysine 36-specific methyltransferase that interacts with the Sin3 histone deacetylase complex. Mol. Cancer 2006, 5 (1), 26. (31) Rathert, P.; Dhayalan, A.; Murakami, M.; Zhang, X.; Tamas, R.; Jurkowska, R.; Komatsu, Y.; Shinkai, Y.; Cheng, X.; Jeltsch, A. Protein lysine methyltransferase G9a acts on non-histone targets. Nat. Chem. Biol. 2008, 4 (6), 344−346. (32) Dhayalan, A.; Kudithipudi, S.; Rathert, P.; Jeltsch, A. Specificity analysis-based identification of new methylation targets of the SET7/9 protein lysine methyltransferase. Chem. Biol. 2011, 18 (1), 111−120. (33) Cannistraci, C. V.; Alanis-Lobato, G.; Ravasi, T. Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding. Bioinformatics 2013, 29 (13), i199−i209. (34) Yang, X. D.; Huang, B.; Li, M.; Lamb, A.; Kelleher, N. L.; Chen, L. F. Negative regulation of NF-κB action by Set9-mediated lysine methylation of the RelA subunit. EMBO J. 2009, 28 (8), 1055−1066. (35) Funk, K. E.; Thomas, S. N.; Schafer, K. N.; Cooper, G. L.; Liao, Z.; Clark, D. J.; Yang, A. J.; Kuret, J. Lysine methylation is an endogenous post-translational modification of tau protein in human brain and a modulator of aggregation propensity. Biochem. J. 2014, 462 (1), 77−88. (36) Qin, B.; Minter-Dykhouse, K.; Yu, J.; Zhang, J.; Liu, T.; Zhang, H.; Lee, S.; Kim, J.; Wang, L.; Lou, Z. DBC1 functions as a tumor suppressor by regulating p53 stability. Cell Rep. 2015, 10 (8), 1324−1334. (37) Kim, W.; Bennett, E. J.; Huttlin, E. L.; Guo, A.; Li, J.; Possemato, A.; Sowa, M. E.; Rad, R.; Rush, J.; Comb, M. J.; Harper, J. W.; Gygi, S. P. Systematic and quantitative assessment of the ubiquitin-modified proteome. Mol. Cell 2011, 44 (2), 325−40. (38) Mertins, P.; Qiao, J. W.; Patel, J.; Udeshi, N. D.; Clauser, K. R.; Mani, D. R.; Burgess, M. W.; Gillette, M. A.; Jaffe, J. D.; Carr, S. A. Integrated proteomic analysis of post-translational modifications by serial enrichment. Nat. Methods 2013, 10 (7), 634−7.

H

DOI: 10.1021/acs.jproteome.6b00220 J. Proteome Res. XXXX, XXX, XXX−XXX