Construction and Deciphering of Human Phosphorylation-Mediated

May 26, 2015 - To facilitate the analysis of dynamic phosphoproteomic data, we integrated a time series clustering tool into PhoSigNet. Moreover, a sp...
2 downloads 5 Views 2MB Size
Subscriber access provided by UNIV OF CALIFORNIA SAN DIEGO LIBRARIES

Article

Construction and deciphering of human phosphorylationmediated signaling transduction networks Menghuan Zhang, Hong Li, Ying He, Han Sun, Li Xia, Li-Shun Wang, Bo Sun, Liangxiao Ma, Guoqing Zhang, Jing Li, Yi-Xue Li, and Lu Xie J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b00249 • Publication Date (Web): 26 May 2015 Downloaded from http://pubs.acs.org on June 5, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 52

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Construction and deciphering of human phosphorylation-mediated signaling transduction networks

Menghuan Zhang1,2, Hong Li2,3, Ying He2,3, Han Sun2,3, Li Xia4, Lishun Wang4, Bo Sun1,2, Liangxiao Ma2, Guoqing Zhang2, Jing Li1, Yixue Li1,2,3*, Lu Xie2*

1

Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China

2

Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, Shanghai, 201203, China

3

Key Laboratory of Systems Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, 200031, China

4

Key Laboratory of Cell Differentiation and Apoptosis of National Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200025, China

Correspondence should be addressed to Dr. Lu Xie at E-mail: [email protected]. Phone: +86 13301670946, or to Dr. Yixue Li at Email: [email protected]. Tel: +86 13916378087.

1

ACS Paragon Plus Environment

Page 2 of 52

Page 3 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Abstract Protein phosphorylation is the most abundant reversible covalent modification. Human protein kinases participate in almost all biological pathways, and approximately half of the kinases are associated with disease. PhoSigNet was designed to store and display human phosphorylation-mediated signal transduction networks, with additional information related to cancer. It contains 11,976 experimentally validated directed edges and 216,871 phosphorylation sites. Moreover, 3,491 differentially expressed proteins in human cancer from dbDEPC, 18,907 human cancer variation sites from CanProVar, and 388 hyperphosphorylation sites from PhosphoSitePlus were collected as annotation information. Compared with other phosphorylation-related databases, PhoSigNet not only takes the kinase-substrate regulatory relationship pairs into account but also extends regulatory relationships upand downstream (e.g., from ligand to receptor, from G protein to kinase, and from transcription factor to targets). Furthermore, PhoSigNet allows the user to investigate the impact of phosphorylation modifications on cancer. Using one set of in-house time series phosphoproteomics data, the reconstruction of a conditional and dynamic phosphorylation-mediated signaling network was exemplified. We expect PhoSigNet to be a useful database and analysis platform, benefiting both proteomics and cancer studies.

Keywords phosphorylation, signal transduction, network, database, cancer 2

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction The great diversity of the proteome in comparison to the relatively small number of genes is achieved mainly by posttranslational protein modifications (PTM). The most frequent type of PTM is phosphorylation. It is estimated that up to 30% of proteins in a mammalian cell are phosphorylated at any given time1. Reversible protein phosphorylation not only changes the physicochemical properties of proteins but also affects every basic cellular process (e.g., metabolism, differentiation, motility, membrane transport, and immunity)2. Protein kinases and protein phosphatases are key components of regulatory pathways, many of which have been explored in depth. Accumulating evidence suggests that the phosphorylation-mediated signaling network is not static, but can be dynamically rewired in different samples and diseases or following different treatments3,4. Previous studies have revealed the central roles of phosphorylation in human health and disease. For example, phosphorylation of pRB1 has been associated with tumorigenesis through the control of cell division5. Given this important role in health and disease, phosphorylated proteins have been regarded as potential disease biomarkers or therapeutic targets. Although protein phosphorylation affects an estimated one-third of all proteins and is the most widely studied PTM1,6, only a small subset of total in vivo sites have been discovered to date. Therefore, the development of global and quantitative methods for elucidating dynamic phosphorylation events is essential for a systematic understanding of cellular behavior. Mass spectrometry (MS) has become a powerful 3

ACS Paragon Plus Environment

Page 4 of 52

Page 5 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

technology for proteomics and the method of choice for unbiased (i.e., hypothesis-free) analysis of in vivo phosphorylation7-12. Thus, the simultaneous identification and quantification of thousands of phosphopeptides from one sample has become something of a ‘routine’ assay11,12. Although studies based on protein–protein interactions (PPIs), genetic interactions, and motif-based predictions have uncovered important clues regarding the organization and regulation of kinase-substrate protein networks, their use in understanding the complexity of cellular signaling regulation mediated by phosphorylation is limited. Because data concerning signal transduction are increasing, it has become possible to reconstruct complete phosphorylation-mediated signaling networks by considering not only kinase-substrate relationships but also their corresponding up- or downstream interactions. To accomplish this, three types of information should be integrated: phosphorylation sites, kinase-substrate relationships, and up- and downstream signaling interactions. At the same time, the high-throughput nature and complexity of MS/MS data pose computational challenges for proteome-scale phosphorylation analyses in a biological context. A pure data repository is insufficient for such tasks. Powerful computational tools must accompany data repositories to allow knowledge extraction. In this work, we developed a systematic resource for phosphorylation-mediated signaling networks (PhoSigNet) that consists of a background network database and two analysis tools. Our database incorporates the existing features of numerous previous databases, with an emphasis on collecting kinase-substrate relationships and 4

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

their up- or downstream interactions. Accordingly, PhoSigNet can be used to investigate signal transduction networks of single or multiple proteins of interest, to understand the collaborative functions of these proteins rather than their isolated functions, to easily construct signaling networks for proteomics data under a specific biological state and to compare the similarities or differences between signaling networks under different biological states. To facilitate the analysis of dynamic phosphoproteomic data, we integrated a time series clustering tool into PhoSigNet. Moreover, a special receptor enrichment query interface for cancer has been implemented because the database incorporated a large amount of cancer-related genetic variation and expression information in addition to phosphorylation data. This interface gives the platform an additional application branch. PhoSigNet is freely available in the public domain at http://lifecenter.sgst.cn/PhoSigNet/.

Methods System Configuration PhoSigNet consists of a relational database and a dynamic web interface. The framework of the database and web server is shown in Figure 1. The database was implemented using MySQL5 (http://www.mysql.com/). The web interface was implemented with PHP technology.

Data Collection Three types of data were integrated: phosphorylation site information, signal 5

ACS Paragon Plus Environment

Page 6 of 52

Page 7 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

transduction information and protein node annotation information. In addition to general functional annotation, cancer-related annotation was integrated for every protein node. Phosphorylation site information — As we know, one protein can be phosphorylated at multiple sites; such rich phosphorylation sites may constitute the modification switches of a signaling network1,13-17. Therefore, phosphorylation site information is an important resource for studying signal transduction networks. In the current version of PhoSigNet, site information was automatically retrieved from five databases (SysPTM 2.017, HPRD18, PhosphoSitePlus19, PhosphoELM20, Swiss-Prot21). The site information will be updated every time a new major database version is released. Signal transduction information — In a signaling network, each edge represents a direction of signal transduction performed by protein-protein interaction partners. The interaction pairs that form the signal transduction edges of PhoSigNet were collected in the order of signal transduction cascades (i.e., ligand-receptor, kinase-substrate, phosphatase-substrate, GPCR-G protein-effector, and transcription factor-target gene) (Table 1.0). The first layer consists of ligand-receptor interactions collected from the Human Plasma Membrane Receptome database (HPMR)22. The second layer consists of the GPCR-G protein-effector cascade. G proteins are important signal transducing molecules in cells that are located within the cell and are activated by G protein-coupled receptors (GPCRs) that span the cell membranes; in turn, G proteins activate a cascade of signaling events that can change the cell’s fate23. Interactions 6

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 52

among GPCRs, G-proteins and effectors were collected from gpDB24 and GPCRDB25. During signal transmission, the signal is passed from a receptor to the kinase; alternatively, the receptor may be a kinase that phosphorylates the corresponding substrates.

Kinase-substrate

interactions

are

the

core

edges

of

our

phosphorylation-mediated signal transduction network. Therefore, we collected kinase-substrate edges from the RegPhos26, Human Protein Reference Database (HPRD) 18, PhosphoNetworks27,PATHWAY database in KEGG28, and human signal transduction pathways in Netpath29. In contrast with the function of kinases, phosphatases dephosphorylate substrates and can reverse their functions. The interactions between phosphatases and substrates were also collected from the HPRD18, PATHWAY database in KEGG28, and human signal transduction pathways in Netpath29. The last indispensable effector of signaling networks is transcription factors, which transmit the signal to the target gene and affect gene expression, thereby eventually altering cellular processes. Experimentally verified transcription factor-target gene interactions were collected from the KEGG28 and TRANSFAC database30. Protein node annotation Information —In our phosphorylation-mediated signaling network, each node represents a protein with phosphorylation site information. Comprehensive protein annotation information was incorporated to allow detailed functional analysis of any network module in PhoSigNet. Protein attribute information including kinases, phosphatases and transcription factors was collected from Swiss-Prot21. Other general information was collected from BIOMART in 7

ACS Paragon Plus Environment

Page 9 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Ensembl31, including UniProt/SwissProt ID, protein description, chromosome location of the gene and gene ontology. Pathway involvement information was collected from KEGG28. Previously, our group constructed two cancer-specific protein information resources: updated CanProVar with cancer-related protein sequence variations (crVARs) caused by gene mutations32, and dbDEPC with cancer-related protein expression changes (crDEPs)33. Both protein sequence variations and expression changes are important molecular phenotypes in human cancer, and they have often been used as cancer biomarkers or drug targets34-36. Phosphorylation dysregulation such as hyperphosphorylation sites (HPS) are also involved in human diseases such as cancer37-39. Therefore, in addition to general protein annotations, we incorporated cancer-related protein information to develop an expanded cancer analysis platform in PhoSigNet. crVAR data were collected from the human Cancer Proteome Variation Database (CanProVar, updated version 2.0) (http://lifecenter.sgst.cn/CanProVar/)32. crDEP data were collected from the updated database of differentially expressed proteins in human cancers (vs. normal tissues, or vs. other cancers) (dbDEPC 2.0)33. Hyperphosphorylation datasets were downloaded from PhosphoSitePlus19. Protein domain information was downloaded from PFAM (version Pfam27.0)40. PhoSigNet Data Integration — As described above, the PhoSigNet data were collected from diverse resources, resulting in various protein identifiers from different databases. To integrate heterogeneous data and avoid redundancy, we downloaded different protein IDs from Biomart of Ensembl31, including the UniProt/SwissProt ID, UniProt/SwissProt Accession, UniProt Gene Name, EntrezGene ID, RefSeq Protein 8

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ID, RefSeq mRNA, Ensembl Gene ID, Ensembl Transcript ID, Ensembl Protein ID, and associated gene name. Finally, the proteins were uniformly mapped to their UniProt/SwissProt ID with redundancies removed.

Analysis Tools Phosphorylation is a dynamic and reversible protein modification. The PhoSigNet database allows for static browsing and mapping of phosphorylated proteins and sites; however, other tools should be integrated to analyze dynamic phosphoproteomics data. We developed ExpCluster as an expanded analysis platform in PhoSigNet. Due to the incorporation of cancer-related protein annotation information, we also developed the CanReceptor platform to analyze the impacts of phosphorylation in cancer receptor signaling. 1) ExpCluster ExpCluster is used to perform clustering analysis to obtain quantitative dynamic phosphoproteomics data with similar dynamic expression or phosphorylation patterns using STEM software. STEM (short time series expression miner) is a software package used to analyze short time series expression data; however, it can also be used for other types of sequential experiments, such as dose response and temperature response experiments (approximately 8 time points or fewer)41,42. Users can choose either the STEM clustering method or k-means41,42. Before clustering, the protein/gene expression time series must be transformed to change the start time to 041. Three types of transformations are provided to accomplish this 9

ACS Paragon Plus Environment

Page 10 of 52

Page 11 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

change: Log normalized data, Normalized data, or No normalization/add 041. Given a time series vector of protein/gene expression value (v0, v1, v2, …, vn), the transformations are as follows: # Log normalized data — transforms the vector to (0,log 2( v 1

v0

) ,log 2(

v2

v0

) ,..., log 2(

vn

v0

))

# Normalized data — transforms the vector to (0,v 1 − v 0 ,v 2 − v 0 ,...,v n − v 0) # No normalization/add 0 — transforms the vector to (0,v 0 ,v 1,v 2 ,...,v n) 2) CanReceptor CanReceptor was designed to identify cancer-triggering receptors by calculating whether cancer-related proteins are enriched in the downstream pathway or interaction partners mediated by the queried receptor. Because we integrated a whole set of cancer-related protein annotation information into PhoSigNet, we could develop such an application platform for cancer research. The CanReceptor enrichment algorithm is based on hypergeometric analysis. The hypergeometric test uses the hypergeometric distribution to measure the statistical significance. A random variable X follows the hypergeometric distribution if its probability mass function is given by:

P(X = k) =

K N - K ( )( ) k n -k

N n

( )

Where: N is the population of total proteins in all pathways or interaction partners mediated by 414 cancer related receptors collected in our database; K is the number of cancer-related proteins in the population; n is the number of proteins in the downstream pathway or interaction partners mediated by the queried receptor; k is the 10

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a number of cancer-related proteins in n; and ( ) is a binomial coefficient. Finally,

b

we calculated Benjamini and Hochberg corrected p-values.

Case Study A set of previously published in-house quantitative phosphoproteomics data generated by SILAC-based experiments on proteins during U937 cell death triggered by PKCδ-CF was employed as a case data set to demonstrate the clustering analysis in PhoSigNet43. The U937 human acute myeloid leukemia (AML) cell line was lysed in RPMI-1640 medium (Sigma-Aldrich, St. Louis, MO). The pTRE2hyg PKCδ-CF plasmid was transferred to the U937 cells to establish the U937PKCδ-CF stable transformant. The U937-PKCδ-CF cells were encoded with heavy arginine and lysine isotopes and cultured in tetracycline-free medium for 2, 3, 4 and 5 days. Cells from each day were mixed with cells from day 0 encoded with light Arg/Lys isotopes (Experiment 1), or heavy Arg/Lys-labeled cells at day 0 were mixed with the light Arg/Lys-labeled cells in tetracycline-free medium for 2, 3, 4 and 5 days (Experiment 2). The equally mixed cells were separated into cytoplasmic and nuclear fractions, and the proteins were enzymatically digested. In this study, 3000 unique phosphosites on 2531 phosphopeptides corresponding to 1544 non-redundant phosphoproteins were identified. Among these phosphoproteins, 1173 proteins containing 2070 phosphosites in the cytoplasm and 698 proteins containing 1544 phosphosites in the nucleus were detected43.

11

ACS Paragon Plus Environment

Page 12 of 52

Page 13 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Results Contents and Statistics PhoSigNet is a systematic resource integrating a phosphorylation-mediated signaling network database and two analysis tools. The structure of PhoSigNet is depicted in Figure 1. The PhoSigNet database currently houses information related to 11,976 experimentally determined signaling edges with 6 main edge types on 3,241 proteins (Supplementary Table 1). The number of different nodes collected in PhoSigNet is shown in Figure 2A. The signaling transferring edges are mainly from ligand to receptor, GPCR to G protein, G protein to effector, kinase to substrate, phosphatase to substrate and TF to target gene (Fig. 2B). Many edge types are signal transduction-specific; however, some edge types with different attributes may share the same signal transduction process. Kinase-substrate is the most frequent edge type, with 7,223 relationships. During signal transduction in a cell, many enzymes may be involved in addition to kinases or phosphatases; therefore, it is necessary to add these relationships to construct a more realistic signal transduction network. Edges directed from enzyme to enzyme or from protein to protein were collected from KEGG23 and Netpath24 and classified as ‘other edges.’ There are 216,871 phosphorylation sites (Supplementary Table 2) and 258 dephosphorylation sites in our database (Fig. 2C). Additionally, cancer-related information, including 18,907 protein variations (crVARs) (Supplementary

Table

3),

3,491

abnormally

expressed

proteins

(crDEPs)

(Supplementary Table 4) and 388 disease-related hyperphosphorylation sites (HPS) (Supplementary Table 5) were also integrated into PhoSigNet. 12

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 52

Data Accessibility An online version of PhoSigNet is available (http://lifecenter.sgst.cn/PhoSigNet/), where users can browse database information using three query approaches: single protein query, protein group query and background network query. Users may query the database by gene name, Swiss-Prot/UniProt ID/AC, Entrez ID, Ensembl ID or NCBI protein GI. The background network browse engine allows users to access protein interaction pairs for each edge type. Protein Query — The Protein query result is returned as eight sections (Supplementary Fig. 1). “Protein Summary” provides basic protein information, including the UniProt ID that can be used to link to public databases, protein description,

chromosome

location,

and

node

attribute

in

our

network.

“Phosphorylation site” and “Dephosphorylation site” list the phosphorylation sites and dephosphorylation sites, along with Pfam domain, the catalytic kinase and data source for each site. “Mutation site,” “Differential expressed protein” and “Hyperphosphorylation site” indicate the variation site, expression change and hyperphosphorylation site, respectively, along with corresponding cancer type name and data source. “Direct Interaction Pairs” provides all interaction pairs, the pair attributes and data sources. “Function Annotation” provides the correlated KEGG pathway and Gene Ontology information. Protein Group Query — Protein Group allows users to search signaling networks based on a list of proteins of interest or proteins identified in a phosphoproteomics 13

ACS Paragon Plus Environment

Page 15 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

experiment. Through input of two data sets from phosphoproteomics experiments, users can also construct state-specific signal transduction networks and perform comparative analysis. The Protein Group query result is returned as two sections (Fig. 3). “Protein Summary” shows the protein UniProt ID that can be used to link to more detailed protein information, such as phosphorylation status, dephosphorylation status, cancer-related information and node attribute in our network. “Upstream and Downstream” shows direct interaction pairs of the queried proteins in a table or graph format (Figs. 3B and 3C). Detailed information of each node in the interaction network can also be shown. Background Network — This function is shown on the Browse page as “Search by Edge Type” (Supplementary Fig. 2). Through this page, users can browse interaction pairs by each edge type and download our data to construct a new network. Because relationships between kinases and substrates are too large to show in one graph, we list all kinases and the number of interaction pairs they mediate (Supplementary Fig. 3). Clicking the graph or table to identify one kinase of interest allows the users to browse the interaction pairs of that kinase for all of its substrates. Phosphorylation Peptide Sequence Database — Finally, we also provide a protein sequence database that includes phosphorylation site information to facilitate peptide modification detection in shotgun proteomics. The database is available in a .fasta format, with each modification peptide included as an independent entry. The compilation of phosphosites from all of the different studies contained in the curated databases is likely to contain a larger number of false-positives than the typically 14

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 52

reported 1-2%. We suggest that confidence metrics should be checked in the data source (i.e., in vitro or in vivo) or the original publication (PubMed ID) that we provide with each site. This information would allow the users to filter identified phosphorylation sites. This data set of phosphorylation peptide sequences with phosphorylation

site

information

can

be

downloaded

from

http://lifecenter.sgst.cn/PhoSigNet/download.html.

Dynamic Phosphoproteomics Data Analyzer: ExpCluster One major characteristic of phosphorylation modification is its dynamics. The protein phosphorylation status changes at different temporal and spatial intervals; therefore, many phosphoproteomics studies measure two or more biological time points. We integrated a comparative and dynamic phosphoproteomics data analysis tool into our PhoSigNet; this tool is shown as the ‘ExpCluster’ tool interface (Fig. 4A). The statistical significance of the clustering predictions are evaluated using STEM methods, and the expression trend of the predicted clusters are plotted as line charts and displayed as continuous values (Fig. 4B). We exemplify the use of this tool with a set of in-house previously published data generated by a SILAC-based quantitative phosphoproteomics experiment of the U937 cell death process triggered by PKCδ-CF expression

43

. Protein kinase C delta type (KPCD) is a serine/threonine kinase that

possesses both pro-apoptotic properties during DNA damage-induced apoptosis and anti-apoptotic properties during cytokine receptor-initiated cell death. We analyzed the U937-PKCδ-CF cell response data using cells encoded with heavy arginine and 15

ACS Paragon Plus Environment

Page 17 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

lysine isotopes and cultured in tetracycline-free medium for 5 time points: 0, 2, 3, 4 and 5 days43. The quantitative phosphoproteome dataset was analyzed with the Census exploited by John R. Yates’s lab44 with the fold threshold against cells at day 0 set at | log2 (H/L) | = 0.58 (here H and L indicate the heavy and light isotopes, respectively) to evaluate the significantly regulated phosphopeptides. Based on the time series data, regulated phosphopeptides in both the cytoplasm and nucleus had similar trends at the same time point. While phosphopeptides at the late stages were predominantly downregulated, the numbers of down- and upregulated phosphopeptides during the early stages were similar. These KPCD-induced time series data were inputted into the ExpCluster platform in PhoSigNet. The STEM clustering method first defines a set of distinct and representative model temporal expression profiles that are independent of the data42. These model profiles correspond to possible profiles of a protein's change in expression over time. The model profiles all start times as 0; then, between two time points a model profile can either hold steady, increase or decrease an integral number of time units. Protein expression time series are transformed to start at time 0, and each protein is assigned to the model profile to which its time series most closely matches based on the correlation coefficient. Of the 50 model profiles, 8 profiles were identified as significant in the cytoplasm and 10 profiles were identified as significant in the nucleus; these profiles were defined as the ‘cytoCluster’ and ‘nuclCluster,’ respectively. 16

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 52

Supplementary Figure 4A presents the clusters in the cytoplasm. The network of proteins in each model can be plotted to show more detailed interactions and facilitate biological function illustration. Taking ‘model profile 9’ as an example, this profile is one of the significant clusters in the cytoplasm, with an expression value trend of ‘0, -1, -2, -3, -4’ (Supplementary Fig. 4B). A total of 153 proteins can be mapped to the background network, such as SHC1, CDK1, TBB5, MK01, ARBK1, and STK4 (_HUMAN) (Supplementary Fig. 4C). Three proteins in ‘profile #9’ have experimental evidence of phosphorylation sites: K.FLEESVSMS#PEER.A (UCHL3), R.ADLNQGIGEPQS#PSR.R (EFHD2) and R.ELFDDPSY#VNVQNLDK.A (SHC1). However, only the protein with the SHC1 site can be mapped to our network. SHC1 is a signaling adapter that couples activated growth factor receptors to signaling pathways45.

Once

phosphorylated,

(R.ELFDDPSY#272VNVQNLDK.A)

and

isoform

p46Shc

isoform

p52Shc

(R.ELFDDPSY#317VNVQNLDK.A) of SHC1 couple activated receptor tyrosine kinases to Ras via the recruitment of the GRB2/SOS complex; moreover, these isoforms have been implicated in the cytoplasmic propagation of mitogenic signals45. Thus, isoform p46Shc and isoform p52Shc may function as initiators of the Ras signaling cascade in various non-neuronal systems45. Isoform p66Shc of SHC1 (R.ELFDDPSY#427VNVQNLDK.A) does not mediate Ras activation, but is involved in signal transduction pathways that regulate the cellular response to oxidative stress and life span45. Isoform p66Shc acts as a downstream target of the tumor suppressor p53 and is indispensable for the ability of stress-activated p53 to 17

ACS Paragon Plus Environment

Page 19 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

induce the elevation of intracellular oxidants, cytochrome c release and apoptosis45. The expression of isoform p66Shc has been correlated with life span45. Supplementary Fig. 4D displays the upstream regulatory ‘small network’ of protein SHC1. SRC, CSK, PKCA, FYN, LCK, and JAK1 are tyrosine-protein kinases that can catalyze SHC1 into its phosphorylated state. In ‘profile #9,’ the phosphorylation level of SHC1 gradually declines and is correlated with the expression level change of GNAI2, CSK and PTPRC. These phenomena may underlie the mechanisms of apoptosis in the U937 cell line. Supplementary Figure 5A presents the clusters in the nucleus. Likewise, the network of proteins in each model can be plotted. Taking ‘model profile 41’ as an example, this profile is one of the significant clusters in the nucleus, with the expression value trend of ‘0, 1, 2, 3, 4’ (Supplementary Fig. 5B). A total of 50 proteins can be mapped to our background network, such as MYH9, HS90A, HS90B, NONO, DDX3X, ANXA6, and FLNA (_HUMAN) (Supplementary Fig. 5C). Only protein LMNB1 in ‘profile #41’ has a phosphorylation site; based on our experimental data, this site is R.LS#278SEMNTSTVNSAR.E. Additionally, LMNB1 can be mapped to our network. LMNB1 belongs to “lamins,” components of the nuclear lamina that are thought to provide a framework for the nuclear envelope and may also interact with chromatin. Increased phosphorylation of the lamins occurs before envelope disintegration and most likely plays a role in regulating lamin associations46-51. Supplementary Fig. 5D displays the upstream regulatory ‘small network’ of protein LMNB1. KPCB, CDK1, and ZAGL2 are kinases that may catalyze LMNB1 into its 18

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 52

phosphorylated state. In profile #41, the phosphorylation level of LMNB1 gradually increases along with the expression levels of IF6, DDX3X, MYH9, HS90A, HS90B, and HSPB1. These proteins with gradually increased expression levels may be regarded as biomarkers for apoptosis. In summary, the cluster analysis run by PhoSigNet on an in-house proteomic phosphorylation dataset demonstrated that time series proteomic data can be analyzed systematically

by

combining

high-throughput

experiments

and

appropriate

computational tools. Proteins or genes in one cluster may have similar biological roles. Moreover, to explore the protein/gene roles in phosphorylation-mediated cellular signal transduction, these proteins/genes can be mapped to PhoSigNet to construct their signaling interaction network, and detailed phosphorylation site functions could be analyzed with this background network.

Cancer-Triggering Receptor Analyzer: CanReceptor Because we collected cancer-related information at the phosphorylation site level, sequence mutation level and protein expression level in addition to the signal transduction data for PhoSigNet, we developed another analysis platform (CanReceptor) to perform cancer-triggering receptor inference based on downstream enrichment analysis. Two types of enrichment analyses were designed using the hypergeometric

test:

receptor

downstream

interaction

partner

enrichment

(Supplementary Table 6) and receptor downstream pathway element enrichment (Supplementary Table 7). When either downstream interaction partners or pathway 19

ACS Paragon Plus Environment

Page 21 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

elements are enriched with cancer-related annotation, the upstream mediating receptor might represent a cancer-triggering receptor. In the receptor-mediated pathway element enrichment analysis, thirteen cancer types were found to have triggering receptors after the Benjamini and Hochberg corrected significance calculation (Supplementary Fig. 6). These cancers consisted of ovarian cancer, breast cancer, head and neck cancer, lung cancer, hepatocellular carcinoma, central nervous system neoplasms, gastric cancer, intestinal cancer, colorectal cancer, leukemia, lymphoma, thyroid carcinoma and neoplasms. In some cancer types, multiple receptors were found to activate downstream pathways. Using the p-value distribution of 414 receptors in lung cancer as an example (Fig. 5A), we highlight the ERBB2_HUMAN

and

ALK_HUMAN

receptors

as

the

most

prominent

cancer-triggering receptors in lung cancer. Activating mutations of receptor tyrosine-protein kinase ERBB2 have been identified in a majority of lung tumors52,53, and the ALK tyrosine kinase receptor is used as a signature for non-small cell lung cancer54,55. Fig. 5B shows the ALK_HUMAN-mediated pathway in lung cancer. Some receptors were found to trigger downstream pathways in multiple cancers. For example, frizzled-2 (FZD2) is a candidate cancer-causing receptor in five different cancer types, including lymphoma, intestinal cancer, colorectal cancer, lung cancer and central nervous system neoplasms. The somatostatin receptor (SS-R) is a candidate in three different cancer types, including colorectal cancer, lung cancer and lymphoma. FZD2 is a membrane protein and a receptor for the Wnt proteins. Many cancers have been reported to be associated with FZD2, such as Wilms’ tumor, 20

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 52

melanoma and lung cancer36,56-58. SS-R was reported to be related to medullary thyroid carcinomas (MTC), small cell lung carcinomas, brain tumors and breast tumors59. In direct interaction partner layer enrichment analysis, significant receptors were found in almost all cancer types (Fig. 6). A total of 13 receptors that have the highest significant p-values (p-value ≤0.001) are depicted (Fig. 6). Among these receptors, 10 receptors (LRP1, LSHR, FZD1, GPR63, GPR12, PE2R3, PAR4, EDNRB, SDC1, and TRAF2 (_HUMAN)) were also discovered in downstream pathway element enrichment

analysis.

Prolow-density

lipoprotein

receptor-related

protein

1

(LRP1_HUMAN) is an endocytic receptor involved in endocytosis and phagocytosis of apoptotic cells. Recent research revealed that the LRP-1 precursor on secreted proteins of human hepatocellular carcinoma cells possessed three N-glycosylation sites60. LRP-1 was also identified as a differentially expressed protein in pancreatic tumors61. Prostaglandin E2 receptor EP3 subtype (PE2R3_HUMAN) participates in signaling that regulates tumor-associated angiogenesis and tumor growth62. Downregulated expression of syndecan-1 (SDC1_HUMAN) was associated with a high metastatic potential in human hepatocellular carcinoma63. The expression level of syndecan-1 could also serve as a signature in breast cancer, prostate cancer and lung cancer64-66. We marked proteins that had been reported as signatures in each corresponding cancer type in the previous literature with red font in Figure 6. In summary, to facilitate studying the cancer impact of a phosphorylation-mediated network, we built the cancer-triggering analyzer platform CanReceptor. In this 21

ACS Paragon Plus Environment

Page 23 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

platform, three query methods are provided: protein query, protein group query and cancer-triggering receptor query (Supplementary Fig. 7). Proteins can be mapped to the phosphorylation-mediated signaling network of interest for a given cancer type using the protein query and protein group query methods. In the cancer-triggering receptor query, the number of potential cancer-causing receptors for each cancer type can be explored. Users can click the number to show the detailed receptors (p-value ≤0.05). Furthermore, the shortest paths or direct interaction pairs mediated by the receptor can be displayed. For example, there are many proteins with variations in direct interaction pairs with EGFR_HUMAN in lung cancer (Supplementary Fig. 8A). We can see that EGFR_HUMAN can function as both the kinase and substrate. Among the downstream proteins, EZRI_HUMAN is a down regulated protein in lung cancer (Supplementary Fig. 8B).

Discussion The introduction of mass spectrometry as a primary technology in the field of proteomics has accelerated research on phosphorylation modification and signal transduction. However, the majority of signaling network studies have focused on the construction of kinase-substrate interaction networks. In practice, kinase-substrate interactions represent only one step in the phosphorylation-mediated pathways that are the main skeleton controlling cellular biological processes. Additionally, pathway-pathway

cross

talks

make

signal

transduction

a

multi-step,

multi-dimensional network action. In view of this, a comprehensive background 22

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

signal transduction network involving all phosphorylation-mediated pathways is desirable and would enhance the overview of signal transduction action and local investigation into the context of cell biology. Therefore, in this work we developed PhoSigNet, the human phosphorylation mediated signaling network. PhoSigNet integrates phosphorylation modification and pathway information from different sources plus two analysis platforms. The PhoSigNet database is composed of 11,976 experimentally determined signaling edges with six main edge types on 3,241 proteins; among these proteins are 60,855 phosphorylation sites on 2,902 proteins. The six signaling transferring edges were integrated from nine databases and were formed by ligand to receptor, kinase to substrate, phosphatase to substrate, GPCR to G protein, G protein to effector, and TF to target gene interaction relationships. Phosphorylation sites and phosphoprotein information were also integrated. PhoSigNet is the most comprehensive human phosphorylation-mediated signaling regulatory network from a holistic perspective; however, without data mining tools a vast data source may be data-rich but information-poor. Therefore, two analysis platforms (ExpCluster for dynamic phosphoproteomics data analysis and CanReceptor for static inference of cancer-triggering receptors) have been developed along with the database in PhoSigNet. Instantaneity and reversibility are the two main characteristics of phosphorylation modification. Although at some time points as many as 30% of proteins may be phosphorylated, there may be shifts in the elements of these proteins. Olsen et al. 23

ACS Paragon Plus Environment

Page 24 of 52

Page 25 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

detected temporal dynamic changes in the EGF signaling network in HeLa cells, and measured phosphorylated protein groups at five different time points using phosphorylation quantitative proteomics strategies46. This represented the first application of an overall signaling network study for mammalian cells using a phosphorylated quantitative proteomics approach, and demonstrated that proteomics research has tremendous value in the field of signaling networks46. The construction of a dynamic phosphorylation network capable of demonstrating the cascades of phosphorylation events would be ideal. However, this is a non-trivial job. In our previous works, we developed a linear regression model to analyze a microRNA-initiated primary and secondary transcription regulation network67. We also developed a time delay linear regression model to relate regulator expression levels at a given time point to the expression levels of their target genes at a later time point68. We attempted to incorporate these regression models into our analysis of dynamic phosphoproteomics data. Unfortunately, both algorithms were established based on miRNA and/or mRNA level time series expression profiles. Although they performed well in constructing transcription regulatory networks, they failed in the construction of protein-level signal transduction networks. This is most likely because transcription regulation and signal transduction act with different mechanisms; the former is based on a sequence-lock cascade effect, while the latter relies more on on/off switch phosphorylation mediation. We attempted to use Bayesian modeling and Differential Equation modeling, but found that both could only be appropriately applied to single signaling pathway reconstructions due to the complex parameters 24

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and intensive computation required. Finally, we developed the ExpCluster platform by incorporating the simple clustering algorithm STEM41-43 to analyze dynamic phophoproteomics data. ExpCluster performs the analysis in three steps: first, transform protein expression levels or phosphorylation status at each time point to numeric parameters compared to the original starting status; second, perform clustering to search for similar patterns; and third, analyze each pattern’s biological function. Results in ExpCluster provide a quick view of all cluster patterns in combination with the corresponding p-values. ExpCluster allows the visualization of the protein/gene relationships in the cluster results. Clusters of proteins with similar expression levels in time series data may contain functionally related members and be named to reflect their specific components. The overall process of using ExpCluster was demonstrated using a case study. By analyzing an in-house phosphorylation dataset identified by MS/MS, we showed that these KPCD-induced time series data have six significant clusters (p-value ≤0.05) in the cytoplasm and four significant clusters (p-value ≤0.05) in the nucleus. Moreover, the cluster datasets could be mapped to PhoSigNet to construct a small network where detailed phosphorylation site functions could be analyzed when the site information was available. The second tool, CanReceptor, takes advantage of the enriched cancer-related protein annotation information in PhoSigNet, collected from our previously established resources dbDEPC and CanProVar32,33. dbDEPC harbors cancer-related protein abnormal expression information, while CanProVar provides cancer-related protein 25

ACS Paragon Plus Environment

Page 26 of 52

Page 27 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

sequence variation information. Pathway enrichment and interaction partner enrichment analysis methods were designed for CanReceptor based on the hypothesis that an initiating receptor might be “cancer-triggering” when either downstream pathway elements or immediate interaction partners mediated by an upstream receptor are enriched with cancer-related abnormal proteins. Our results listed the significantly inferred cancer-triggering receptors from these two enrichment analyses after statistical adjustment. Previous studies and biological function analyses provided the rationality for many of the results. However, it should be noted that these results are inferred from the static information currently existing in PhoSigNet, including 414 receptors and all related information from dbDEPC 2.0 and CanProVar. Therefore, results might be subject to change when more data are added. Future development may support user data input and enrichment analysis. Although we only exemplified enrichment for receptors, other types of protein nodes such as protein kinases could also be attractive drug targets, as has been emphasized in recent studies. Protein kinases have become favorable targets in the quest for ‘molecularly targeted’ cancer chemotherapeutics69. Therefore, algorithms to predict drug targets for any type of proteins in PhoSigNet, including ligands, receptors, and kinases, may be designed in the future. With the increment of protein modification and protein localization and interaction knowledge, PhoSigNet may undergo multiple-dimension expansions with more layers of annotation and more sophisticated data mining platforms.

26

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Author contribution Menghuan Zhang performed the data collection, database and web page construction, and bioinformatics analysis. Menghuan Zhang, Hong Li and Lu Xie conceived the study. Menghuan Zhang and Lu Xie wrote the manuscript. Ying He, Han Sun, Li Xia, Lishun Wang, Bo Sun, Jing Li, and Yixue Li helped with data collection or bioinformatics.

Liangxiao Ma and Guoqing Zhang are IT support persons in charge of biological databases in our institute. All authors read and approved the final manuscript.

Competing interests The authors declare no competing financial interests.

Declarations The publication costs for this article were funded by the National Hi-Tech Program (2012AA020201).

Acknowledgements This work was funded by the National Hi-Tech Program (2012AA020201), Key Infectious Disease Project (2012ZX10002012-014), National Key Basic Research Program (2011CB910204, 2010CB912702), National Natural Science Foundation of China (31070752, 31000582), and Shanghai Pudong Science and Technology Committee project PKJ2013-D08.

27

ACS Paragon Plus Environment

Page 28 of 52

Page 29 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure Legends Figure 1. Database construction. The framework of PhoSigNet. Figure 2. Data content of PhoSigNet. A. Node numbers for each attribute in the network. B. Edge numbers for each attribute in the network. C. Numbers of phosphorylation sites and dephosphorylation sites. Figure 3. Protein group query result. B. Interaction pairs in graph form. C. Interaction pairs in table form. Figure 4. Workflow of dynamic data analysis by ExpCluster. A. ExpCluster tool query interface. B. Using the STEM method, the clustering results show different expression models and statistical significance. C. Protein/gene expression values in ‘Model Profile 37.’ D. Interaction network. Mapping proteins/genes in ‘Model Profile 37’ to the background network in PhoSigNet. Figure 5. Workflow of the cancer receptor inference by pathway enrichment analysis in CanReceptor exemplified by lung cancer analysis. A. P-values of hypergeometric tests in pathway enrichment analysis for 414 receptors in lung cancer were plotted. Points above the dashed line represent receptors with p-values ≤0.05. Red dots represent the two most significant receptors: ALK_HUMAN and ERBB2_HUMAN. B. The ALK_HUMAN-mediated pathway in lung cancer. There are many cancer-related proteins in this pathway. Pink node: ALK_HUMAN; red nodes: kinase; orange nodes: kinase and substrate; blue nodes: substrate. Triangle nodes: proteins with variation number; diamond nodes: upregulated proteins; hexagon nodes: downregulated proteins; octagon nodes: proteins with hyperphosphorylation 28

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 52

sites; square nodes: proteins with at least two cancer-related dysfunctions (e.g., one protein with sequence variation and an upregulated protein). Figure 6. Workflow of the cancer receptor inference by interaction partner enrichment analysis in CanReceptor. Using hypergeometric analysis, the p-values of 414 receptors in 45 cancer types are plotted. Proteins marked with red font have been reported as signatures for each corresponding cancer type. Receptors with significant p-values (p-value ≤0.001) are shown with their gene names.

Table Legends Table 1. Network data source of PhoSigNet. Sources HPMR gpDB

Web link http://receptome.stanford.edu /HPMR/ http://bioinformatics.biol.uoa .gr/gpDB

GPCRDB HPRD

http://www.gpcr.org/7tm/ http://www.hprd.org/

RegPhos

http://regphos.mbc.nctu.edu.t w/ http://phosphonetworks.org/

PhosphoNetworks KEGG

NetPath

http://www.genome.jp/kegg/ pathway.html

http://www.netpath.org/

Information Ligand-Receptor Interaction

Count 610

GPCR-GProtein Interaction GProtein-Effector Interaction Gprotein-coupled receptors Kinase-Substrate Interaction Phosphatase-Substrate Interaction Kinase-Substrate Interaction

616 1360 806 1918 135 2087

Kinase-Substrate Interaction 3814 Enzyme Catalysis Kinase-Substrate Interaction Phosphatase-Substrate Interaction Transcription Factor-Target Gene Interaction Enzyme Catalysis Kinase-Substrate Interaction Phosphatase-Substrate Interaction

29

ACS Paragon Plus Environment

534 334 44 95 83 889 44

Page 31 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

TRANSFAC SwissProt

Transduction relations http://www.gene-regulation.c Transcription Factor-Target om/pub/databases.html Gene Interaction http://www.uniprot.org/down Kinase loads Phosphatase Transcription Factor

151 1036 457 129 413

Supporting Information Available: This material is available free of charge via the Internet at http://pubs.acs.org. Supplementary Figure 1. Protein query result. The Protein query result is returned as eight sections. “Protein Summary” provides basic protein information. “Phosphorylation site” and “Dephosphorylation site” list the phosphorylation sites and dephosphorylation sites, along with Pfam domain, the catalytic kinase and data source for each site. “Mutation site,” “Differential expressed protein” and “Hyperphosphorylation site” indicate the variation site, expression change and hyperphosphorylation site, respectively, along with corresponding cancer type name and data source. “Direct Interaction Pairs” provides all interaction pairs, the pair attributes and data sources. “Function Annotation” provides the correlated KEGG pathway and Gene Ontology information. Supplementary Figure 2. Background network query. This function is shown on the Browse page as “Search by Edge Type”. Through this page, users can browse interaction pairs by each edge type and download our data to construct a new network. Supplementary Figure 3. Kinase summary. As relationships between kinases and substrates are too large to show in one graph, we list all kinases and the number of 30

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

interaction pairs they mediate. Clicking the graph or table to identify one kinase of interest allows the users to browse the interaction pairs of that kinase for all of its substrates. Supplementary Figure 4. Case study: dynamic pattern analysis of time series phosphoproteomics data of the effect of PKCD treatment on the U937 human acute myeloid leukemia (AML) cell line. (A) Clustering results for protein time series expression data in the cytoplasm. We only show significant clusters (p-value ≤ 0.05). The profile model depicts the expression patterns of proteins at five time points. (B) Protein expression in profile #9. (C) Network mapping results of proteins in profile #9. A total of 153 proteins were found in the PhoSigNet background network; direct interaction pairs of these proteins are shown. (D) The relationship between SHC1 and some of its upstream proteins are highlighted. Pink nodes: clustered proteins in profile #9; red nodes: kinase; orange nodes: kinase and substrate; blue nodes: substrate. Supplementary Figure 5. Clustering results for time series protein expression data in the nucleus. (A) We only show significant clusters (p-value ≤0.05). The profile model depicts the expression pattern of proteins at five time points. (B) Protein expression in profile #41. (C) Network mapping results of proteins in profile #41. A total of 50 proteins were found in the PhoSigNet background network; direct interaction pairs of these proteins are shown. (D) The relationship between LMNB1 and some of its upstream proteins are highlighted. Pink nodes: clustered proteins in profile #41; red nodes: kinase; orange nodes: kinase and substrate; blue nodes: 31

ACS Paragon Plus Environment

Page 32 of 52

Page 33 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

substrate. Supplementary Figure 6. P-value distribution of 414 receptors in 45 cancer types in pathway enrichment analysis by CanReceptor. Points above the dashed line represent receptors with p-values ≤0.05. Receptors in 13 cancer types had higher p-values (the 13 cancer types are highlighted with red). The histogram presents the cancer-related protein annotation numbers. Supplementary Figure 7. “Cancer-triggering” receptor platform. (A) Query methods available in the ‘cancer’ application platform. (B) Protein query. (C) Protein group query. (D) Triggering cancer receptors query. Supplementary Figure 8. Direct interaction pairs of EGFR_HUMAN in lung cancer. (A) There are many proteins with variations in direct interaction pairs with EGFR_HUMAN in lung cancer. (B) We can see that EGFR_HUMAN can function as both the kinase and substrate. Among the downstream proteins, EZRI_HUMAN is a down regulated protein in lung cancer (Supplementary Fig. 8B).

Supplementary Table 1. Experimentally determined signaling edges in the PhoSigNet database with 6 main edge types on 3,241 proteins. Supplementary Table 2. Phosphorylation sites in the PhoSigNet database. There are 216,871 phosphorylation sites. Supplementary Table 3. Cancer-related variations in the PhoSigNet database. There are 18,907 protein variations (crVARs). Supplementary Table 4. Cancer-related differentially expressed proteins in the 32

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

PhoSigNet database. There are 3,491 abnormally expressed proteins (crDEPs). Supplementary Table 5. Disease-related hyperphosphorylation sites in the PhoSigNet database. There are 388 disease-related hyperphosphorylation sites (HPS). Supplementary Table 6. Results of receptor downstream interaction partner enrichment analysis. Supplementary Table 7. Results of receptor downstream interaction partner enrichment analysis.

References (1) Cohen, P., The regulation of protein function by multisite phosphorylation--a 25 year update. Trends Biochem Sci 2000, 25, (12), 596-601. (2) Arends, M. J.; Wyllie, A. H., Apoptosis: mechanisms and roles in pathology. Int Rev Exp Pathol 1991, 32, 223-54. (3) Oppermann, F. S.; Grundner-Culemann, K.; Kumar, C.; Gruss, O. J.; Jallepalli, P. V.; Daub, H., Combination of chemical genetics and phosphoproteomics for kinase signaling analysis enables confident identification of cellular downstream targets. Mol Cell Proteomics 2012, 11, (4), O111 012351. (4) Krueger, K. E.; Srivastava, S., Posttranslational protein modifications: current implications for cancer detection, prevention, and therapeutics. Mol Cell Proteomics 2006, 5, (10), 1799-810. (5) Cohen, P., The role of protein phosphorylation in human health and disease. The 33

ACS Paragon Plus Environment

Page 34 of 52

Page 35 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Sir Hans Krebs Medal Lecture. Eur J Biochem 2001, 268, (19), 5001-10. (6) Ptacek, J.; Devgan, G.; Michaud, G.; Zhu, H.; Zhu, X.; Fasolo, J.; Guo, H.; Jona, G.; Breitkreutz, A.; Sopko, R.; McCartney, R. R.; Schmidt, M. C.; Rachidi, N.; Lee, S. J.; Mah, A. S.; Meng, L.; Stark, M. J.; Stern, D. F.; De Virgilio, C.; Tyers, M.; Andrews, B.; Gerstein, M.; Schweitzer, B.; Predki, P. F.; Snyder, M., Global analysis of protein phosphorylation in yeast. Nature 2005, 438, (7068), 679-84. (7) Aebersold, R.; Mann, M., Mass spectrometry-based proteomics. Nature 2003, 422, (6928), 198-207. (8) Mumby, M.; Brekken, D., Phosphoproteomics: new insights into cellular signaling. Genome Biol 2005, 6, (9), 230. (9) Stasyk, T.; Huber, L. A., Mapping in vivo signal transduction defects by phosphoproteomics. Trends Mol Med 2012, 18, (1), 43-51. (10)

Nilsson, C. L., Advances in quantitative phosphoproteomics. Anal Chem

2012, 84, (2), 735-46. (11)

Ren, J.; Gao, X.; Liu, Z.; Cao, J.; Ma, Q.; Xue, Y., Computational analysis of

phosphoproteomics: progresses and perspectives. Curr Protein Pept Sci 2011, 12, (7), 591-601. (12)

Metodiev, M.; Alldridge, L., Phosphoproteomics: A possible route to novel

biomarkers of breast cancer. Proteomics Clin Appl 2008, 2, (2), 181-94. (13)

Hoffmann, I.; Clarke, P. R.; Marcote, M. J.; Karsenti, E.; Draetta, G.,

Phosphorylation and activation of human cdc25-C by cdc2--cyclin B and its involvement in the self-amplification of MPF at mitosis. EMBO J 1993, 12, (1), 34

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

53-63. (14)

Salazar, C.; Hofer, T., Multisite protein phosphorylation--from molecular

mechanisms to kinetic models. FEBS J 2009, 276, (12), 3177-98. (15)

Breitkreutz, A.; Choi, H.; Sharom, J. R.; Boucher, L.; Neduva, V.; Larsen, B.;

Lin, Z. Y.; Breitkreutz, B. J.; Stark, C.; Liu, G.; Ahn, J.; Dewar-Darch, D.; Reguly, T.; Tang, X.; Almeida, R.; Qin, Z. S.; Pawson, T.; Gingras, A. C.; Nesvizhskii, A. I.; Tyers, M., A global protein kinase and phosphatase interaction network in yeast. Science 2010, 328, (5981), 1043-6. (16)

Li, H.; Xing, X.; Ding, G.; Li, Q.; Wang, C.; Xie, L.; Zeng, R.; Li, Y.,

SysPTM: a systematic resource for proteomic research on post-translational modifications. Mol Cell Proteomics 2009, 8, (8), 1839-49. (17)

Li, J.; Jia, J.; Li, H.; Yu, J.; Sun, H.; He, Y.; Lv, D.; Yang, X.; Glocker, M. O.;

Ma, L.; Yang, J.; Li, L.; Li, W.; Zhang, G.; Liu, Q.; Li, Y.; Xie, L., SysPTM 2.0: an updated systematic resource for post-translational modification. Database (Oxford) 2014, 2014, bau025. (18)

Keshava Prasad, T. S.; Goel, R.; Kandasamy, K.; Keerthikumar, S.; Kumar,

S.; Mathivanan, S.; Telikicherla, D.; Raju, R.; Shafreen, B.; Venugopal, A.; Balakrishnan, L.; Marimuthu, A.; Banerjee, S.; Somanathan, D. S.; Sebastian, A.; Rani, S.; Ray, S.; Harrys Kishore, C. J.; Kanth, S.; Ahmed, M.; Kashyap, M. K.; Mohmood, R.; Ramachandra, Y. L.; Krishna, V.; Rahiman, B. A.; Mohan, S.; Ranganathan, P.; Ramabadran, S.; Chaerkady, R.; Pandey, A., Human Protein Reference Database--2009 update. Nucleic Acids Res 2009, 37, (Database issue), 35

ACS Paragon Plus Environment

Page 36 of 52

Page 37 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

D767-72. (19)

Hornbeck, P. V.; Kornhauser, J. M.; Tkachev, S.; Zhang, B.; Skrzypek, E.;

Murray, B.; Latham, V.; Sullivan, M., PhosphoSitePlus: a comprehensive resource for investigating

the

structure

and

function

of

experimentally

determined

post-translational modifications in man and mouse. Nucleic Acids Res 2012, 40, (Database issue), D261-70. (20)

Dinkel, H.; Chica, C.; Via, A.; Gould, C. M.; Jensen, L. J.; Gibson, T. J.;

Diella, F., Phospho.ELM: a database of phosphorylation sites--update 2011. Nucleic Acids Res 2011, 39, (Database issue), D261-7. (21)

Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.; Estreicher, A.;

Gasteiger, E.; Martin, M. J.; Michoud, K.; O'Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M., The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31, (1), 365-70. (22)

Ben-Shlomo, I.; Yu Hsu, S.; Rauch, R.; Kowalski, H. W.; Hsueh, A. J.,

Signaling receptome: a genomic and evolutionary perspective of plasma membrane receptors involved in signal transduction. Sci STKE 2003, 2003, (187), RE9. (23)

Gilman, A. G., G proteins: transducers of receptor-generated signals. Annu

Rev Biochem 1987, 56, 615-49. (24)

Theodoropoulou, M. C.; Bagos, P. G.; Spyropoulos, I. C.; Hamodrakas, S. J.,

gpDB: a database of GPCRs, G-proteins, effectors and their interactions. Bioinformatics 2008, 24, (12), 1471-2. (25)

Vroling, B.; Sanders, M.; Baakman, C.; Borrmann, A.; Verhoeven, S.; Klomp, 36

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

J.; Oliveira, L.; de Vlieg, J.; Vriend, G., GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Res 2011, 39, (Database issue), D309-19. (26)

Lee, T. Y.; Bo-Kai Hsu, J.; Chang, W. C.; Huang, H. D., RegPhos: a system

to explore the protein kinase-substrate phosphorylation network in humans. Nucleic Acids Res 2011, 39, (Database issue), D777-87. (27)

Newman, R. H.; Hu, J.; Rho, H. S.; Xie, Z.; Woodard, C.; Neiswinger, J.;

Cooper, C.; Shirley, M.; Clark, H. M.; Hu, S.; Hwang, W.; Jeong, J. S.; Wu, G.; Lin, J.; Gao, X.; Ni, Q.; Goel, R.; Xia, S.; Ji, H.; Dalby, K. N.; Birnbaum, M. J.; Cole, P. A.; Knapp, S.; Ryazanov, A. G.; Zack, D. J.; Blackshaw, S.; Pawson, T.; Gingras, A. C.; Desiderio, S.; Pandey, A.; Turk, B. E.; Zhang, J.; Zhu, H.; Qian, J., Construction of human activity-based phosphorylation networks. Mol Syst Biol 2013, 9, 655. (28)

Kanehisa, M.; Goto, S., KEGG: kyoto encyclopedia of genes and genomes.

Nucleic Acids Res 2000, 28, (1), 27-30. (29)

Kandasamy, K.; Mohan, S. S.; Raju, R.; Keerthikumar, S.; Kumar, G. S.;

Venugopal, A. K.; Telikicherla, D.; Navarro, J. D.; Mathivanan, S.; Pecquet, C.; Gollapudi, S. K.; Tattikota, S. G.; Mohan, S.; Padhukasahasram, H.; Subbannayya, Y.; Goel, R.; Jacob, H. K.; Zhong, J.; Sekhar, R.; Nanjappa, V.; Balakrishnan, L.; Subbaiah, R.; Ramachandra, Y. L.; Rahiman, B. A.; Prasad, T. S.; Lin, J. X.; Houtman, J. C.; Desiderio, S.; Renauld, J. C.; Constantinescu, S. N.; Ohara, O.; Hirano, T.; Kubo, M.; Singh, S.; Khatri, P.; Draghici, S.; Bader, G. D.; Sander, C.; Leonard, W. J.; Pandey, A., NetPath: a public resource of curated signal transduction pathways. Genome Biol 2010, 11, (1), R3. 37

ACS Paragon Plus Environment

Page 38 of 52

Page 39 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(30)

Matys, V.; Fricke, E.; Geffers, R.; Gossling, E.; Haubrock, M.; Hehl, R.;

Hornischer, K.; Karas, D.; Kel, A. E.; Kel-Margoulis, O. V.; Kloos, D. U.; Land, S.; Lewicki-Potapov, B.; Michael, H.; Munch, R.; Reuter, I.; Rotert, S.; Saxel, H.; Scheer, M.; Thiele, S.; Wingender, E., TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31, (1), 374-8. (31)

Flicek, P.; Ahmed, I.; Amode, M. R.; Barrell, D.; Beal, K.; Brent, S.;

Carvalho-Silva, D.; Clapham, P.; Coates, G.; Fairley, S.; Fitzgerald, S.; Gil, L.; Garcia-Giron, C.; Gordon, L.; Hourlier, T.; Hunt, S.; Juettemann, T.; Kahari, A. K.; Keenan, S.; Komorowska, M.; Kulesha, E.; Longden, I.; Maurel, T.; McLaren, W. M.; Muffato, M.; Nag, R.; Overduin, B.; Pignatelli, M.; Pritchard, B.; Pritchard, E.; Riat, H. S.; Ritchie, G. R.; Ruffier, M.; Schuster, M.; Sheppard, D.; Sobral, D.; Taylor, K.; Thormann, A.; Trevanion, S.; White, S.; Wilder, S. P.; Aken, B. L.; Birney, E.; Cunningham, F.; Dunham, I.; Harrow, J.; Herrero, J.; Hubbard, T. J.; Johnson, N.; Kinsella, R.; Parker, A.; Spudich, G.; Yates, A.; Zadissa, A.; Searle, S. M., Ensembl 2013. Nucleic Acids Res 2013, 41, (Database issue), D48-55. (32)

Li, J.; Duncan, D. T.; Zhang, B., CanProVar: a human cancer proteome

variation database. Hum Mutat 2010, 31, (3), 219-28. (33)

He, Y.; Zhang, M.; Ju, Y.; Yu, Z.; Lv, D.; Sun, H.; Yuan, W.; He, F.; Zhang, J.;

Li, H.; Li, J.; Wang-Sattler, R.; Li, Y.; Zhang, G.; Xie, L., dbDEPC 2.0: updated database of differentially expressed proteins in human cancers. Nucleic Acids Res 2012, 40, (Database issue), D964-71. (34)

van Rhijn, B. W.; van Tilborg, A. A.; Lurkin, I.; Bonaventure, J.; de Vries, A.; 38

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Thiery, J. P.; van der Kwast, T. H.; Zwarthoff, E. C.; Radvanyi, F., Novel fibroblast growth factor receptor 3 (FGFR3) mutations in bladder cancer previously identified in non-lethal skeletal disorders. Eur J Hum Genet 2002, 10, (12), 819-24. (35)

Dong, J. T., Prevalent mutations in prostate cancer. J Cell Biochem 2006, 97,

(3), 433-47. (36)

Pode-Shakked, N.; Metsuyanim, S.; Rom-Gross, E.; Mor, Y.; Fridman, E.;

Goldstein, I.; Amariglio, N.; Rechavi, G.; Keshet, G.; Dekel, B., Developmental tumourigenesis: NCAM as a putative marker for the malignant renal stem/progenitor cell population. J Cell Mol Med 2009, 13, (8B), 1792-808. (37)

Liu, P.; Gan, W.; Inuzuka, H.; Lazorchak, A. S.; Gao, D.; Arojo, O.; Liu, D.;

Wan, L.; Zhai, B.; Yu, Y.; Yuan, M.; Kim, B. M.; Shaik, S.; Menon, S.; Gygi, S. P.; Lee, T. H.; Asara, J. M.; Manning, B. D.; Blenis, J.; Su, B.; Wei, W., Sin1 phosphorylation impairs mTORC2 complex integrity and inhibits downstream Akt signalling to suppress tumorigenesis. Nat Cell Biol 2013, 15, (11), 1340-50. (38)

Sayan, M.; Shukla, A.; MacPherson, M. B.; Macura, S. L.; Hillegass, J. M.;

Perkins, T. N.; Thompson, J. K.; Beuschel, S. L.; Miller, J. M.; Mossman, B. T., Extracellular signal-regulated kinase 5 and cyclic AMP response element binding protein are novel pathways inhibited by vandetanib (ZD6474) and doxorubicin in mesotheliomas. Am J Respir Cell Mol Biol 2014, 51, (5), 595-603. (39)

Koganti, S.; Hui-Yuen, J.; McAllister, S.; Gardner, B.; Grasser, F.; Palendira,

U.; Tangye, S. G.; Freeman, A. F.; Bhaduri-McIntosh, S., STAT3 interrupts ATR-Chk1 signaling to allow oncovirus-mediated cell proliferation. Proc Natl Acad Sci U S A 39

ACS Paragon Plus Environment

Page 40 of 52

Page 41 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

2014, 111, (13), 4946-51. (40)

Finn, R. D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R. Y.; Eddy, S.

R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; Sonnhammer, E. L.; Tate, J.; Punta, M., Pfam: the protein families database. Nucleic Acids Res 2014, 42, (Database issue), D222-30. (41)

Ernst, J.; Bar-Joseph, Z., STEM: a tool for the analysis of short time series

gene expression data. BMC Bioinformatics 2006, 7, 191. (42)

Ernst, J.; Nau, G. J.; Bar-Joseph, Z., Clustering short time series gene

expression data. Bioinformatics 2005, 21 Suppl 1, i159-68. (43)

Xia, L.; Wang, T. D.; Shen, S. M.; Zhao, M.; Sun, H.; He, Y.; Xie, L.; Wu, Z.

X.; Han, S. F.; Wang, L. S.; Chen, G. Q., Phosphoproteomics study on the activated PKCdelta-induced cell death. J Proteome Res 2013, 12, (10), 4280-301. (44)

Park, S. K.; Venable, J. D.; Xu, T.; Yates, J. R., 3rd, A quantitative analysis

software tool for mass spectrometry-based proteomics. Nat Methods 2008, 5, (4), 319-22. (45)

Audero, E.; Cascone, I.; Maniero, F.; Napione, L.; Arese, M.; Lanfrancone,

L.; Bussolino, F., Adaptor ShcA protein binds tyrosine kinase Tie2 receptor and regulates migration and sprouting but not survival of endothelial cells. J Biol Chem 2004, 279, (13), 13224-33. (46)

Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.;

Mann, M., Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127, (3), 635-48. 40

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(47)

Beausoleil, S. A.; Villen, J.; Gerber, S. A.; Rush, J.; Gygi, S. P., A

probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat Biotechnol 2006, 24, (10), 1285-92. (48)

Dephoure, N.; Zhou, C.; Villen, J.; Beausoleil, S. A.; Bakalarski, C. E.;

Elledge, S. J.; Gygi, S. P., A quantitative atlas of mitotic phosphorylation. Proc Natl Acad Sci U S A 2008, 105, (31), 10762-7. (49)

Mayya, V.; Lundgren, D. H.; Hwang, S. I.; Rezaul, K.; Wu, L.; Eng, J. K.;

Rodionov, V.; Han, D. K., Quantitative phosphoproteomic analysis of T cell receptor signaling reveals system-wide modulation of protein-protein interactions. Sci Signal 2009, 2, (84), ra46. (50)

Olsen, J. V.; Vermeulen, M.; Santamaria, A.; Kumar, C.; Miller, M. L.; Jensen,

L. J.; Gnad, F.; Cox, J.; Jensen, T. S.; Nigg, E. A.; Brunak, S.; Mann, M., Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci Signal 2010, 3, (104), ra3. (51)

Rigbolt, K. T.; Prokhorova, T. A.; Akimov, V.; Henningsen, J.; Johansen, P. T.;

Kratchmarova, I.; Kassem, M.; Mann, M.; Olsen, J. V.; Blagoev, B., System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation. Sci Signal 2011, 4, (164), rs3. (52)

Stephens, P.; Hunter, C.; Bignell, G.; Edkins, S.; Davies, H.; Teague, J.;

Stevens, C.; O'Meara, S.; Smith, R.; Parker, A.; Barthorpe, A.; Blow, M.; Brackenbury, L.; Butler, A.; Clarke, O.; Cole, J.; Dicks, E.; Dike, A.; Drozd, A.; Edwards, K.; Forbes, S.; Foster, R.; Gray, K.; Greenman, C.; Halliday, K.; Hills, K.; Kosmidou, V.; 41

ACS Paragon Plus Environment

Page 42 of 52

Page 43 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Lugg, R.; Menzies, A.; Perry, J.; Petty, R.; Raine, K.; Ratford, L.; Shepherd, R.; Small, A.; Stephens, Y.; Tofts, C.; Varian, J.; West, S.; Widaa, S.; Yates, A.; Brasseur, F.; Cooper, C. S.; Flanagan, A. M.; Knowles, M.; Leung, S. Y.; Louis, D. N.; Looijenga, L. H.; Malkowicz, B.; Pierotti, M. A.; Teh, B.; Chenevix-Trench, G.; Weber, B. L.; Yuen, S. T.; Harris, G.; Goldstraw, P.; Nicholson, A. G.; Futreal, P. A.; Wooster, R.; Stratton, M. R., Lung cancer: intragenic ERBB2 kinase mutations in tumours. Nature 2004, 431, (7008), 525-6. (53)

Arcila, M. E.; Chaft, J. E.; Nafa, K.; Roy-Chowdhuri, S.; Lau, C.; Zaidinski,

M.; Paik, P. K.; Zakowski, M. F.; Kris, M. G.; Ladanyi, M., Prevalence, clinicopathologic associations, and molecular spectrum of ERBB2 (HER2) tyrosine kinase mutations in lung adenocarcinomas. Clin Cancer Res 2012, 18, (18), 4910-8. (54)

Rikova, K.; Guo, A.; Zeng, Q.; Possemato, A.; Yu, J.; Haack, H.; Nardone, J.;

Lee, K.; Reeves, C.; Li, Y.; Hu, Y.; Tan, Z.; Stokes, M.; Sullivan, L.; Mitchell, J.; Wetzel, R.; Macneill, J.; Ren, J. M.; Yuan, J.; Bakalarski, C. E.; Villen, J.; Kornhauser, J. M.; Smith, B.; Li, D.; Zhou, X.; Gygi, S. P.; Gu, T. L.; Polakiewicz, R. D.; Rush, J.; Comb, M. J., Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell 2007, 131, (6), 1190-203. (55)

Soda, M.; Choi, Y. L.; Enomoto, M.; Takada, S.; Yamashita, Y.; Ishikawa, S.;

Fujiwara, S.; Watanabe, H.; Kurashina, K.; Hatanaka, H.; Bando, M.; Ohno, S.; Ishikawa, Y.; Aburatani, H.; Niki, T.; Sohara, Y.; Sugiyama, Y.; Mano, H., Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 2007, 448, (7153), 561-6. 42

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(56)

Page 44 of 52

Rhee, C. S.; Sen, M.; Lu, D.; Wu, C.; Leoni, L.; Rubin, J.; Corr, M.; Carson,

D. A., Wnt and frizzled receptors as potential targets for immunotherapy in head and neck squamous cell carcinomas. Oncogene 2002, 21, (43), 6598-605. (57)

Bazhin, A. V.; Tambor, V.; Dikov, B.; Philippov, P. P.; Schadendorf, D.;

Eichmuller,

S.

B.,

cGMP-phosphodiesterase

6,

transducin

and

Wnt5a/Frizzled-2-signaling control cGMP and Ca(2+) homeostasis in melanoma cells. Cell Mol Life Sci 2010, 67, (5), 817-28. (58)

Lee, E. H.; Chari, R.; Lam, A.; Ng, R. T.; Yee, J.; English, J.; Evans, K. G.;

Macaulay, C.; Lam, S.; Lam, W. L., Disruption of the non-canonical WNT pathway in lung squamous cell carcinoma. Clin Med Oncol 2008, 2008, (2), 169-179. (59)

Reubi, J. C.; Laissue, J.; Krenning, E.; Lamberts, S. W., Somatostatin

receptors in human cancer: incidence, characteristics, functional correlates and clinical implications. J Steroid Biochem Mol Biol 1992, 43, (1-3), 27-35. (60)

Cao, J.; Shen, C.; Wang, H.; Shen, H.; Chen, Y.; Nie, A.; Yan, G.; Lu, H.; Liu,

Y.; Yang, P., Identification of N-glycosylation sites on secreted proteins of human hepatocellular carcinoma cells with a complementary proteomics approach. J Proteome Res 2009, 8, (2), 662-72. (61)

Turtoi, A.; Musmeci, D.; Wang, Y.; Dumont, B.; Somja, J.; Bevilacqua, G.;

De Pauw, E.; Delvenne, P.; Castronovo, V., Identification of novel accessible proteins bearing

diagnostic

and

therapeutic

potential

in

human

pancreatic

ductal

adenocarcinoma. J Proteome Res 2011, 10, (9), 4302-13. (62)

Amano, H.; Hayashi, I.; Endo, H.; Kitasato, H.; Yamashina, S.; Maruyama, T.; 43

ACS Paragon Plus Environment

Page 45 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Kobayashi, M.; Satoh, K.; Narita, M.; Sugimoto, Y.; Murata, T.; Yoshimura, H.; Narumiya, S.; Majima, M., Host prostaglandin E(2)-EP3 signaling regulates tumor-associated angiogenesis and tumor growth. J Exp Med 2003, 197, (2), 221-32. (63)

Matsumoto, A.; Ono, M.; Fujimoto, Y.; Gallo, R. L.; Bernfield, M.; Kohgo,

Y., Reduced expression of syndecan-1 in human hepatocellular carcinoma with high metastatic potential. Int J Cancer 1997, 74, (5), 482-91. (64)

Zellweger, T.; Ninck, C.; Mirlacher, M.; Annefeld, M.; Glass, A. G.; Gasser, T.

C.; Mihatsch, M. J.; Gelmann, E. P.; Bubendorf, L., Tissue microarray analysis reveals prognostic significance of syndecan-1 expression in prostate cancer. Prostate 2003, 55, (1), 20-9. (65)

Joensuu, H.; Anttonen, A.; Eriksson, M.; Makitaro, R.; Alfthan, H.; Kinnula,

V.; Leppa, S., Soluble syndecan-1 and serum basic fibroblast growth factor are new prognostic factors in lung cancer. Cancer Res 2002, 62, (18), 5210-7. (66)

Maeda, T.; Alexander, C. M.; Friedl, A., Induction of syndecan-1 expression

in stromal fibroblasts promotes proliferation of human breast cancer cells. Cancer Res 2004, 64, (2), 612-21. (67)

Tu, K.; Yu, H.; Hua, Y. J.; Li, Y. Y.; Liu, L.; Xie, L.; Li, Y. X., Combinatorial

network of primary and secondary microRNA-driven regulatory mechanisms. Nucleic Acids Res 2009, 37, (18), 5969-80. (68)

Huang, T.; Liu, L.; Qian, Z.; Tu, K.; Li, Y.; Xie, L., Using GeneReg to

construct time delay gene regulatory networks. BMC Res Notes 2010, 3, (1), 142. (69)

Warmuth, M.; Kim, S.; Gu, X. J.; Xia, G.; Adrian, F., Ba/F3 cells and their 44

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

use in kinase drug discovery. Curr Opin Oncol 2007, 19, (1), 55-60.

45

ACS Paragon Plus Environment

Page 46 of 52

Page 47 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

170x180mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1 Database Construction. The framework of PhoSigNet. 116x79mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 48 of 52

Page 49 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2 Data content of PhoSigNet. A. Node numbers for each attribute in the network. B. Edge numbers for each attribute in the network. C. Numbers of phosphorylation sites and dephosphorylation sites. 229x308mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3 Protein group query result. B. Interaction pairs in graph form. C. Interaction pairs in table form. 187x207mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 50 of 52

Page 51 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4 Workflow of dynamic data analysis by ExpCluster. A. ExpCluster tool query interface. B. Using STEM method, clustering result shows different expression models and statistical significance. C. Protein/gene expression values in ‘Model Profile 37’. D. Interaction network. Mapping proteins/genes in ‘Model Profile 37’ to the background network in PhoSigNet. 165x160mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5 Workflow of cancer receptor inference by pathway enrichment analysis in CanReceptor, examplified by lung cancer analysis. A. P-values of hypergeometric test in pathway enrichment analysis for 414 receptors in lung cancer were plotted. Points above the dashed line represent receptors with p-value