PhosphoSiteAnalyzer: A Bioinformatic Platform for ... - ACS Publications

Apr 3, 2012 - user-interface via execution of Adobe Reader. Relevant .txt files ... max fold-change across a time course experiment. The structure of ...
1 downloads 0 Views 3MB Size
Technical Note pubs.acs.org/jpr

PhosphoSiteAnalyzer: A Bioinformatic Platform for Deciphering Phospho Proteomes Using Kinase Predictions Retrieved from NetworKIN Martin V. Bennetzen,*,† Jürgen Cox,‡ Matthias Mann,‡ and Jens S. Andersen*,† †

Center for Experimental BioInformatics, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Campusvej 55, DK-5230 Odense M, Denmark ‡ Department of Proteomics and Signal Transduction, Max-Planck-Institute of Biochemistry, Munich, Am Klopferspitz 18, DE-82152 Martinsried, Germany S Supporting Information *

ABSTRACT: Phosphoproteomic experiments are routinely conducted in laboratories worldwide, and because of the fast development of mass spectrometric techniques and efficient phosphopeptide enrichment methods, researchers frequently end up having lists with tens of thousands of phosphorylation sites for further interrogation. To answer biologically relevant questions from these complex data sets, it becomes essential to apply computational, statistical, and predictive analytical methods. Here we provide an advanced bioinformatic platform termed “PhosphoSiteAnalyzer” to explore large phosphoproteomic data sets that have been subjected to kinase prediction using the previously published NetworKIN algorithm. NetworKIN applies sophisticated linear motif analysis and contextual network modeling to obtain kinase−substrate associations with high accuracy and sensitivity. PhosphoSiteAnalyzer provides an algorithm to retrieve kinase predictions from the public NetworKIN webpage in a semiautomated way and applies hereafter advanced statistics to facilitate a user-tailored in-depth analysis of the phosphoproteomic data sets. The interface of the software provides a high degree of analytical flexibility and is designed to be intuitive for most users. PhosphoSiteAnalyzer is a freeware program available at http://phosphosite.sourceforge.net. KEYWORDS: phosphorylation, networks, computational prediction, NetworKIN, statistical analysis, kinase−substrate association, bioinformatics tool



INTRODUCTION During the past decade, development of mass spectrometric methods has made it possible to identify and quantify proteins and their post-translational modifications on a global scale in cells and tissues. In particular, highly efficient phosphopeptide enrichment methods primarily based on TiO21 and IMAC2 have made it possible to identify thousands of phosphoproteins and specific phosphorylation sites in a single mass spectrometry-based proteomic study.3−8 Thus, researchers routinely obtain a long list of identified and quantified phosphorylation sites for statistical analysis and biological interpretation. As a consequence, community databases have been designed to integrate information on published phosphorylation sites, like Phospho.ELM,9 Phosida,10 and PhosphoSite.11 While several highly advanced bioinformatic tools exists to analyze large data sets on the protein level like GProX,12 DAVID,13 Ingenuity (Ingenuity Systems), and ProteinCenter (Proxeon/Thermo Scientific) user-friendly tools designed for the analysis of global, © XXXX American Chemical Society

quantitative, and site-specific protein phosphorylation data sets are more limited. Given the massive amount of phosphoproteomic data produced in laboratories around the world, userfriendly software that integrate quantitative information of phosphorylation sites with kinase predictions, kinase comodulation, and consensus-motif analysis, are expected to be highly beneficial for the scientific community. One of the major challenges in phosphoproteomic analysis is how to associate kinases with substrates identified in the form of individual phosphorylation sites. Considering data size and complexity, computational power is needed to predict associations in a probabilistic way. This has been accomplished by computational predictive tools such as ScanSite,14 KinasePhos,15 NetPhosK,16 and NetPhorest,17 where kinases are predicted from short linear consensus motifs in the region Received: January 8, 2012

A

dx.doi.org/10.1021/pr300016e | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Technical Note

around the phosphorylated residue. Additional contextual information (such as protein−protein interactions, colocalization, coexpression, and comentioning in Pubmed abstracts) has been taken into account by the NetworKIN resource.18 Briefly, NetworKIN first uses the NetPhorest algorithm17 to predict kinase families that can be responsible for the phosphorylation of a given substrate based on linear motif analysis using artificial neural networks and position-specific scoring matrices. NetworKIN then models an association network around the substrates, the specific kinases of the predicted kinase families, and connecting nodes using the STRING algorithm.19 A motif score and a context score is then calculated for each kinase− substrate association. An advantage of combining both linear motif and network information is the gain in prediction accuracy by 2.5-fold as compared to motif-based prediction alone.20 NetworKIN is accessible as a web-based tool but without an interface to analyze large-scale phophoproteomic data sets. To address this issue, we developed PhosphoSiteAnalyzer as presented here. PhosphoSiteAnalyzer utilizes the publicly available web version of NetworKIN to retrieve kinase predictions from a list of phosphorylation sites in a semiautomated way, thus making the power of NetworKIN more compatible with complex data sets. Files are uploaded to the web-page one at a time to circumvent violation of the NetworKIN license. The NetworKIN server will therefore not (and should not) suffer from parallel storms of uploaded files. The retrieved kinase predictions hereafter serve as metadata for further statistical analysis by using the computing power of the statistical environment R.21 PhosphoSiteAnalyzer is constructed in a modular way where each statistical module facilitates different aspects of phosphoproteomic analysis as described below. All R-scripts are freely available and readable by the user for detailed inspection of statistical formulas and with only minor programming experience for adjustments if needed. A user manual and test files are available with the program as well.



Figure 1. Schematic outline of the infrastructure of the PhosphoSiteAnalyzer. The heart of the program is written in C# that orchestrates the execution of R-scripts and accession to the NetworKIN webpage behind the scene. A user-friendly interface is constructed in C# as well. Two basic operations are present in the PhosphoSiteAnalyzer modules: (1) retrieval of predicted kinases via the NetworKIN webpage (red arrows) and (2) statistical analysis (blue arrows).



INPUT FILES AND BASIC CONCEPTS PhosphoSiteAnalyzer uses four different input files (Table 1): (1) A .fasta file that include protein sequences, (2) a Table 1. Structure and Format of the Four Input Files for PhosphoSiteAnalyzer PhosphoSite Information file

Annotation file

Annotation/ Expression file

>[id]

three columns:

[protein sequence A] >[id] [protein sequence B] >

[id]

four columns: [id]

Five columns: [id]

[position] [amino acid]

[position] [amino acid] [subset index]

kinase prediction (NetworKIN) .txt

statistical analysis

[position] [amino acid] [subset index] [signal power] statistical analysis

.txt

.txt

file type structure

.fasta file

..

INFRASTRUCTURE AND PROGRAMMING

PhosphoSiteAnalyzer is written as a Windows application in C# using the .NET 4.0 framework. The C# coding part functions as a platform for execution of a console version of the statistical environment R that applies R-scripts for statistical analysis to the phosphoproteomic data uploaded by the user. The output figures from R are then shown in the PhosphoSiteAnalyzer user-interface via execution of Adobe Reader. Relevant .txt files produced by R are saved in a temporary folder that can be used for analysis by the user in other programs such as R and Excel. The kinase predictions are retrieved using a specific htmloperational C# script that uploads small .fasta and .txt files to the NetworKIN server in a semiautomated way. The only required user interaction is limited to saving the NetworKIN result files manually. The infrastructure of PhosphoSiteAnalyzer is outlined in Figure 1. R-Scripts are provided by PhosphoSiteAnalyzer and R version 2.11.1 whereas relevant R-packages (‘seqinr’, ’gplots’, ’igraph’, and ‘plotrix’), Adobe Reader and the .Net 4.0 framework can be installed via the PhosphoSiteAnalyzer interface. Additionally, the NetworKIN license can be read and accepted via the PhosphoSiteAnalyzer interface.

used for format

kinase prediction (NetworKIN) .fasta

'phosphoSiteInformation' file (.txt) that includes protein identifiers corresponding to entries in the .fasta file and information about the position of modified and phosphorylated amino acid residues, (3) an 'annotation' file (.txt) that includes the same information as the 'phosphoSiteInformation' file combined with a subset index column, and (4) an 'annotation/ expression' file (.txt) that includes the same information as the 'annotation file' combined with a signal power column. The 'subset index' indicates phosphorylation sites grouped into subsets that are required for statistical testing. Examples of subsets are sites associated with a particular cluster, sites annotated as 'regulated' (up/down/not) or 'responsive to a perturbation' (yes/no), and sites associated with classes of proteins based on meta-data (e.g., biological functions and cellular localization). Additionally, each phosphorylation site is associated with one signal power value which is used for 'two signal power' statistical modules. The signal power could be a B

dx.doi.org/10.1021/pr300016e | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Technical Note

Figure 2. Schematic workflow of PhosphoSiteAnalyzer. Phosphoproteomic data representing a list of annotated phosphorylation sites and a list of associated site-specific signal powers is uploaded to the program. PhosphoSiteAnalyzer then uses NetworKIN to predict kinases responsible for phosphorylation of the sites. Sites associated with at least one predicted kinase are subjected to thresholding and further bioinformatical and statistical analysis.

ratio, a log-transformed ratio, a P-value, a fold change, and a max fold-change across a time course experiment. The structure of the files explicitly constitute the basic analytical structure and novel concepts of PhosphoSiteAnalyzer where the kinase predictions from NetworKIN are applied to further statistical operations that make possible hypothesis testing based on phosphorylation site subset association and signal power. Moreover, PhosphoSiteAnalyzer makes use of the phosphorylation site information (calculated scores and sequence windows) returned by NetworKIN in the result files. This information is applied to user-defined thresholding and filtering operations prior to statistical analysis and network visualizations.

automated way to directly retrieve kinase predictions from the NetworKIN webpage. When kinase predictions are retrieved, PhosphoSiteAnalyzer can be used to filter the predictions by thresholds based on the motif score corresponding to the NetPhorest posterior probability score, the context score derived from STRING and the NetworKIN score, which is a combination of the motif and context score introduced in the beta version. Additionally, kinase predictions for each site can be filtered out when the NetworKIN score is below a user-defined percentage of the highest scoring kinase prediction. A processed prediction table is thus produced by applying these four threshold options and used for all further analysis. PhosphoSiteAnalyzer produces figures that visualize the effect of each threshold step, the score distributions, and the kinase distribution. Additionally, a consistency check of the predictions is performed based on the ATM/ATR, CDK2/3 and p38 predictions (for further explanation, see the Supporting Information).



RETRIEVAL AND PROCESSING OF NETWORKIN PREDICTIONS PhosphoSiteAnalyzer retrieves kinase predictions from the publicly accessible NetworKIN webpage by sequential uploading of small .fasta files with max 100 protein sequences each and .txt files containing the corresponding phosphorylation site information. The breakdown of the phosphorylation site information into small files heavily increases the processing speed by the NetworKIN server. The .fasta and .txt files are produced via an R-script based on user-defined input files and then uploaded automatically to the NetworKIN webpage oneby-one to diminish NetworKIN webpage accessibility constraints. The user-friendly interface of PhosphoSiteAnalyzer makes it possible to specify all possible parameters for the kinase prediction, such as selection of organism, selection of preferred version of NetworKIN (alpha or beta version), and inclusion of phosphorylation sites from Phospho.ELM and PhosphoSite. Therefore, the user does not have to interact with the NetworKIN webpage when uploading files for kinase prediction. However, the user will need to save manually the result files and to accept the NetworKIN license, which can be done via the PhosphoSiteAnalyzer interface. Thus, PhosphoSiteAnalyzer provides not only a statistical framework for phosphorylation site analysis but also an efficient semi-



BIOINFORMATIC MODULES PhosphoSiteAnalyzer has a modular design where each module represents a statistical analysis method for extracting various biological features from the phosphoproteomic data set. Currently, nine analytical modules are provided by PhosphoSiteAnalyzer (Figure 2, green box). A step-by-step protocol on how to use each module together with a detailed description and a test data set is provided in the Supporting Information. Each module has multiple parameter settings for user-defined analyses and provides informative figures representing the performed bioinformatics analyses displayed directly in the PhosphoSiteAnalyzer interface. The figures are saved as .pdf files (vector graphics) in a temporary folder enabling import into, e.g., Adobe Illustrator for further graphical processing. Subset-Specific Kinase Enrichment Analysis

When phosphorylation sites are divided into subsets (based on clustering, biological context, etc.) it is possible to test for overrepresentation and/or underrepresentation of specific C

dx.doi.org/10.1021/pr300016e | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Technical Note

Figure 3. Examples of phosphosite analyses using the bioinformatic modules of PhosphoSiteAnalyzer. (A and B) Example of subset-wise kinase enrichment analysis based on temporal clusters (subsets) of phosphorylation site profiles. Phosphorylation sites identified and quantified from mass spectrometry data are clustered and each site are then associated with one of the clusters. This information is included in the Annotation file. By applying hypergeometric testing using a user-defined statistical background, kinase enrichment analysis is performed based on the subset distribution of kinase predictions obtained from NetworKIN. The example shows that the predicted ATM/ATR kinases are enriched among the phosphorylation sites of subset 2. (C) Heatmap of kinases predicted to phosphorylate serine residues within SQ-motifs and associated with subset 2. The color code visualizes the NetworKIN score associated with each kinase-site association. (D) Kinase−substrate network based on NetworKIN predictions where the phosphorylation sites are located within SQ-motifs and where scoring thresholds (motif, context and NetworKIN) were applied. The arrows are directed from kinase to substrate and the edge color code visualizes to which subset the phosphorylation site belongs.(E) Heatmap visualization of the landscape of kinases that are predicted to phosphorylate other kinases (given that the kinases are included in the NetworKIN repertoire). (F) Heatmap of position-specific amino acids associated with significantly higher signal powers extracted via rank-based statistical testing (see text and manual). Signal powers are included in the Annotation/Expression file. It is observed that serines located within an SQ-motif are associated with significant high signal powers (phosphorylation ratio). For more information, see text and manual.

several options for user-tailored motif analysis. PhosphoSiteAnalyzer also provides a module that can be applied to phosphorylation sites predicted to be associated with at least one kinase. For linear motif analysis, a novel algorithm is written where the position-specific amino acid background distribution for hypothesis testing is estimated by a bootstrapping procedure and a P-value based on bootstrap replication (Figure 4). Since bootstrapping is computer intensive, the computational time heavily depends on the number of bootstrap replicates used for background estimation. Significant motifs can be extracted directly from the analysis and are similar to the results obtained by Motif-X. PhosphoSiteAnalyzer thus provides the advantage that the user will not need to interact with external motif extraction

kinases associated with these by hypergeometric testing (Fisher’s Exact test) and subsequent Benjamini−Hochberg multiple testing adjustment. Enrichment analysis can be performed relative to all phosphorylation sites or to sites associated with a particular subset. As an illustrative example applying PhosphoSiteAnalyzer on data from our previous investigation of the nuclear DNA damage response22 kinase enrichment analysis reveals that ATM and ATR substrates are significantly overrepresented in subset 2 (Cluster 2, k-means clustering of temporal phosphorylation profiles, Figure 3A,B). Linear Motif Analysis

When dealing with subset-specific phosphorylation sites, linear motif analysis is often performed to search for consensus motifs. The web-based tool Motif-X23 is widely used and offers D

dx.doi.org/10.1021/pr300016e | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Technical Note

Figure 4. Linear motif enrichment analysis module. Linear motifs are extracted on the basis of a bootstrapping algorithm implemented in PhosphoSiteAnalyzer. Several options for the statistical testing procedure (statistical fore- and background, bootstrap replicates, significance and minimal frequency) are available within the program interface to facilitate user-tailored analysis as shown above. For more information: see text and manual.

analysis modules within Cytoscape. In particular in-depth analysis of kinase-substrate networks based on the results from NetworKIN and subsequent processing via PhosphoSiteAnalyzer can be performed by adding layers of network information from STRING which is used for the NetworKIN algorithm.

servers because the algorithm is already included in an R-script stored on the local computer and because the sequence windows are already extracted by NetworKIN. Additionally and orthogonal to this analysis, a special motif-enrichment analysis module is included in the software as well, where simple hypergeometric testing of specific motifs of interest (defined by the user) is performed subset-wise. The difference between the global and specific motif analysis is that the latter is biased in the way that the user search for enrichment of specific motifs.

Kinase Modulation Analysis

When phosphorylation-dependent signaling takes place, kinases are often modulated themselves to mediate signaling via its substrates. This occurs for example during Ephrin-mediated signaling6 and is termed comodulation. PhosphoSiteAnalyzer is able to extract phosphorylated kinases associated with other predicted kinases (if the phosphorylated kinase is included in the NetworKIN kinase set). The kinase−kinase landscape is illustrated in two ways that include orthogonal information: a subset−kinase and a kinase−kinase heatmap, Figure 3E. The applied hierarchical clustering analysis gives direct information about global and coclustered kinase−kinase associations and networks. If the subsets represent states of regulation then comodulation can be extracted as well. Additionally, the kinase−kinase network is visualized including subset information.

Kinase-Motif Association Analysis

Biological processes often depend on phosphorylation-dependent signaling via phosphorylation of sites located within conserved consensus motifs mediated by specific kinases that recognize this specific motif.17,20 A 'kinase-motif association analysis module' of PhosphoSiteAnalyzer is constructed to extract information about predicted kinases and their association with consensus motifs of interest. Hierarchical clustering is subsequently applied to obtain global information about the kinase-substrate landscape, Figure 3C. The user can hereby get a global overview of kinases predicted to phosphorylate substrates of interest defined by the user. Information on kinase and score distributions associated with the kinase-motif relationships is extracted as well.

Signal Power Analysis

In many phosphoproteomic experiments, quantitative information of the identified sites is present. Thus, correlation between quantitative information (ratio, fold change, etc.), consensus motif and predicted kinases associated with each single site gives higher dimensional insight into the phosphorylation signaling network. As a novel feature, PhosphoSiteAnalyzer is able to analyze signal power statistics of substrates associated with the predicted kinases and of the amino acid positions surrounding the central phosphorylated amino acid. This is done based on input from the 'Annotation/Expression' file

Network Visualization

In addition to kinase-motif landscape extraction as described above, substrate-kinase networks can be visualized based on kinases, motifs, substrates, scores, and subsets of interest defined by the user and by applying a user-defined score threshold, Figure 3D. Tables suitable for network visualization using programs like Cytoscape24 are produced and saved in a temporary folder. Thus the user can use the produced tables as material for further network studies and apply other network E

dx.doi.org/10.1021/pr300016e | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Technical Note

uploaded by the user, which is subjected to rank based statistical testing (Wilcoxon test) and Benjamini−Hochberg multiple testing adjustment. Global substrate signal power analysis and targeted analysis based on substrates associated with specific kinases and motifs are performed by PhosphoSiteAnalyzer enabling the user to learn about signal power dependencies. Since rank based statistics is applied, the results obtained do not depend on what kind of signal power is used (ratio, fold change, etc.) as long as the nature of signal power does not affect the ranking of the signal power values (this would be the case if for instance absolute value transformation is performed).

V. Rørkær, L. M. Harder, and D. Pultz from the University of Southern Denmark are acknowledged for performing initial testing and providing essential feedback on the early version of PhosphoSiteAnalyzer. This work was supported in part by grants from the European Community’s Seventh Framework Program (FP7/2007-2013) under grant agreement no. HEALTH-F4-2007-200767 for the collaborative project APOSYS and HEALTH-F4-2008-201648 for the collaborative project PROSPECTS. The Danish Ministry for Science, Technology and Innovation is acknowledged for awarding MVB the EliteResearch stipend 2010 which in part supported this work as well.

Amino Acid Position-Dependent Signal Power Analysis



Complementary to the subset-wise motif analysis based on bootstrapping, a motif enrichment analysis module is also provided by PhosphoSiteAnalyzer where linear motifs associated with substrates of significant signal powers are extracted. The subset-wise motif analysis (described above) depends on the selected subsets which can be based on various kinds of categorical annotation. However, in this module motifs are extracted solely based on quantitative information. Briefly, signal powers associated with each amino acid at each position in the position-matrix are tested (Wilcoxon) against the global signal power distribution and subsequent multiple testing adjustment is performed. PhosphoSiteAnalyzer provides a novel statistical framework for motif enrichment and kinaseassociations based on quantitative data (and not metadata like cluster association). An example of amino acid position dependent signal power analysis is shown in Figure 3F.

(1) Larsen, M. R.; Thingholm, T. E.; Jensen, O. N.; Roepstorff, P.; Jorgensen, T. J. Highly selective enrichment of phosphorylated peptides from peptide mixtures using titanium dioxide microcolumns. Mol. Cell. Proteom.: MCP 2005, 4 (7), 873−86. (2) Villen, J.; Gygi, S. P. The SCX/IMAC enrichment approach for global phosphorylation analysis by mass spectrometry. Nature Protocols 2008, 3 (10), 1630−8. (3) Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127 (3), 635−48. (4) Olsen, J. V.; Vermeulen, M.; Santamaria, A.; Kumar, C.; Miller, M. L.; Jensen, L. J.; Gnad, F.; Cox, J.; Jensen, T. S.; Nigg, E. A.; Brunak, S.; Mann, M. Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Science signaling 2010, 3 (104), ra3. (5) Rigbolt, K. T.; Prokhorova, T. A.; Akimov, V.; Henningsen, J.; Johansen, P. T.; Kratchmarova, I.; Kassem, M.; Mann, M.; Olsen, J. V.; Blagoev, B. System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation. Sci. Signaling 2011, 4 (164), rs3. (6) Jorgensen, C.; Sherman, A.; Chen, G. I.; Pasculescu, A.; Poliakov, A.; Hsiung, M.; Larsen, B.; Wilkinson, D. G.; Linding, R.; Pawson, T. Cell-specific information processing in segregating populations of Eph receptor ephrin-expressing cells. Science 2009, 326 (5959), 1502−9. (7) Matsuoka, S.; Ballif, B. A.; Smogorzewska, A.; McDonald, E. R., 3rd; Hurov, K. E.; Luo, J.; Bakalarski, C. E.; Zhao, Z.; Solimini, N.; Lerenthal, Y.; Shiloh, Y.; Gygi, S. P.; Elledge, S. J. ATM and ATR substrate analysis reveals extensive protein networks responsive to DNA damage. Science 2007, 316 (5828), 1160−6. (8) Bodenmiller, B.; Wanka, S.; Kraft, C.; Urban, J.; Campbell, D.; Pedrioli, P. G.; Gerrits, B.; Picotti, P.; Lam, H.; Vitek, O.; Brusniak, M. Y.; Roschitzki, B.; Zhang, C.; Shokat, K. M.; Schlapbach, R.; ColmanLerner, A.; Nolan, G. P.; Nesvizhskii, A. I.; Peter, M.; Loewith, R.; von Mering, C.; Aebersold, R. Phosphoproteomic analysis reveals interconnected system-wide responses to perturbations of kinases and phosphatases in yeast. Sci. Signaling 2010, 3 (153), rs4. (9) Diella, F.; Cameron, S.; Gemund, C.; Linding, R.; Via, A.; Kuster, B.; Sicheritz-Ponten, T.; Blom, N.; Gibson, T. J. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinf. 2004, 5, 79. (10) Gnad, F.; Ren, S.; Cox, J.; Olsen, J. V.; Macek, B.; Oroshi, M.; Mann, M. PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 2007, 8 (11), R250. (11) Hornbeck, P. V.; Chabra, I.; Kornhauser, J. M.; Skrzypek, E.; Zhang, B. PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics 2004, 4 (6), 1551−61. (12) Rigbolt, K. T.; Vanselow, J. T.; Blagoev, B. GProX, a UserFriendly Platform for Bioinformatics Analysis and Visualization of Quantitative Proteomics Data. Mol. Cell. Proteomics: MCP 2011.



CONCLUDING REMARKS PhosphoSiteAnalyzer is a computational tool created with the aim of facilitating complex kinase−substrate network analysis in a user-friendly and user-tailored way. The program and a detailed step-by-step protocol, which includes a description of all aspects of the analysis modules, are available at http://phosphosite.sourceforge.net, where updates will be available too. All statistical scripts are open-source and can be inspected and modified in a text editor by the user, if necessary. PhosphoSiteAnalyzer is initially written in C# for Windows platforms, but we plan to make it compatible with Unix platforms in the future.



ASSOCIATED CONTENT

S Supporting Information *

PhosphoSiteAnalyzer v. 1.4 manual. This material is available free of charge via the Internet at http://pubs.acs.org.



REFERENCES

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected], [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Developers of NetworKIN (R. Linding, L. J. Jensen, et al.) are acknowledged for making the superior NetworKIN algorithm available to the public via their webpage. The developers of the statistical environment R (The R Development Core Team) and R-packages used for PhosphoSiteAnalyzer (‘seqinr’, ‘gplots’, ‘plotrix’, and ‘igraph’) are acknowledged for providing essential tools for bioinformatical and statistical analysis to the public. S. F

dx.doi.org/10.1021/pr300016e | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Technical Note

(13) Huang da, W.; Sherman, B. T.; Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4 (1), 44−57. (14) Obenauer, J. C.; Cantley, L. C.; Yaffe, M. B. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 2003, 31 (13), 3635−41. (15) Huang, H. D.; Lee, T. Y.; Tzeng, S. W.; Horng, J. T., KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res. 2005, 33, (Web Server issue, W226−9). (16) Blom, N.; Sicheritz-Ponten, T.; Gupta, R.; Gammeltoft, S.; Brunak, S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 2004, 4 (6), 1633−49. (17) Miller, M. L.; Jensen, L. J.; Diella, F.; Jorgensen, C.; Tinti, M.; Li, L.; Hsiung, M.; Parker, S. A.; Boordeaux, J.; Sicheritz-Ponten, T.; Olhovsky, M.; Pasculescu, A.; Alexander, J.; Knapp, S.; Blom, N.; Bork, P.; Li, S.; Cesareni, G.; Pawson, T.; Turk, B. E.; Yaffe, M. B.; Brunak, S.; Linding, R. Linear motif atlas for phosphorylation-dependent signaling. Sci. Signaling 2008, 1 (35), ra2. (18) Linding, R.; Jensen, L. J.; Pasculescu, A.; Olhovsky, M.; Colwill, K.; Bork, P.; Yaffe, M. B.; Pawson, T., NetworKIN: a resource for exploring cellular phosphorylation networks. Nucleic Acids Res. 2008, 36, (Database issue, D695-9). (19) von Mering, C.; Jensen, L. J.; Snel, B.; Hooper, S. D.; Krupp, M.; Foglierini, M.; Jouffre, N.; Huynen, M. A.; Bork, P., STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005, 33, (Database issue, D4337). (20) Linding, R.; Jensen, L. J.; Ostheimer, G. J.; van Vugt, M. A.; Jorgensen, C.; Miron, I. M.; Diella, F.; Colwill, K.; Taylor, L.; Elder, K.; Metalnikov, P.; Nguyen, V.; Pasculescu, A.; Jin, J.; Park, J. G.; Samson, L. D.; Woodgett, J. R.; Russell, R. B.; Bork, P.; Yaffe, M. B.; Pawson, T. Systematic discovery of in vivo phosphorylation networks. Cell 2007, 129 (7), 1415−26. (21) R Development Core Team, R: A Language and Environment for Statistical Computing, http://www.R-project.org. (22) Bennetzen, M. V.; Larsen, D. H.; Bunkenborg, J.; Bartek, J.; Lukas, J.; Andersen, J. S. Site-specific phosphorylation dynamics of the nuclear proteome during the DNA damage response. Mol. Cell. Proteomics: MCP 2010, 9 (6), 1314−23. (23) Schwartz, D.; Gygi, S. P. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nature Biotechnol. 2005, 23 (11), 1391−8. (24) Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11), 2498−504.

G

dx.doi.org/10.1021/pr300016e | J. Proteome Res. XXXX, XXX, XXX−XXX