Technical Note pubs.acs.org/jpr
MsViz: A Graphical Software Tool for In-Depth Manual Validation and Quantitation of Post-translational Modifications Trinidad Martín-Campos,†,# Roman Mylonas,†,‡,# Alexandre Masselot,† Patrice Waridel,‡ Tanja Petricevic,§ Ioannis Xenarios,† and Manfredo Quadroni*,‡ †
Vital-IT Group, Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Protein Analysis Facility, Center for Integrative Genomics, University of Lausanne, CH-1015 Lausanne, Switzerland § Institute of Pathology, University of Lausanne and Centre Hospitalier Universitaire Vaudois, CH-1011 Lausanne, Switzerland ‡
S Supporting Information *
ABSTRACT: Mass spectrometry (MS) has become the tool of choice for the large scale identification and quantitation of proteins and their post-translational modifications (PTMs). This development has been enabled by powerful software packages for the automated analysis of MS data. While data on PTMs of thousands of proteins can nowadays be readily obtained, fully deciphering the complexity and combinatorics of modification patterns even on a single protein often remains challenging. Moreover, functional investigation of PTMs on a protein of interest requires validation of the localization and the accurate quantitation of its changes across several conditions, tasks that often still require human evaluation. Software tools for large scale analyses are highly efficient but are rarely conceived for interactive, in-depth exploration of data on individual proteins. We here describe MsViz, a web-based and interactive software tool that supports manual validation of PTMs and their relative quantitation in small- and medium-size experiments. The tool displays sequence coverage information, peptide-spectrum matches, tandem MS spectra and extracted ion chromatograms through a single, highly intuitive interface. We found that MsViz greatly facilitates manual data inspection to validate PTM location and quantitate modified species across multiple samples. KEYWORDS: tandem mass spectrometry, data-dependent acquisition, post-translational modification, phosphorylation, site localization, extracted ion chromatograms, data visualization, manual validation
1. INTRODUCTION
Considerable efforts in the last three decades has gone into the development of software tools for automated analysis of large scale bottom-up proteomics data sets to efficiently extract protein identifications and quantitation of both total protein levels and biologically relevant PTMs. Software packages built around database search engines such as Sequest,1 MASCOT,2 MaxQuant/Andromeda3 and many others have powered the field and enabled experiments that would otherwise simply be unfeasible. However, while high performance, large scale tools can manage impressive amounts of data, they tend to offer somewhat limited functionalities for visualization and manual inspection of the obtained results. Such functionalities are often performed using third party software conceived for general purposes (e.g., Scaffold) or built ad hoc for specific types of applications (e.g., Skyline4 for targeted quantitative proteomics). Up to now, understandably, most of the efforts of developers have been directed toward increasing the performance and
Modern proteomics is largely based on a bottom-up approach combining protease digestion with data-dependent acquisition (DDA) of tandem mass spectra of peptide precursors. The fragmentation patterns obtained by DDA are matched to sequence databases, leading to identification of peptide sequences and a more or less complete mapping of protein sequences. While rich in structural information, processed bottom-up proteomics data is inherently complex, containing multiple levels of information, e.g., MS2 spectra, precursor intensities, LC retention times, matching scores and protein positional information. Further intrinsic layers of complexity are added by the imperfect nature of protease digestion, LC separation and MS detection, which combine to produce overlapping and redundant peptide-spectrum matches (PSMs). On top of all this, the presence of both biological and artifactual post-translational modifications (PTMs) increases sample and data complexity even further. Considering that biological experimentation typically requires the comparison of data on multiple samples, it is easy to understand that the representation of large DDA data sets is a very challenging task. © 2017 American Chemical Society
Received: April 5, 2017 Published: June 21, 2017 3092
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research
2. MATERIALS AND METHODS
robustness of the tools developed, in order to cope with the increasing mass of data produced by more powerful and sensitive MS instruments used for increasingly sophisticated experiments. The second major focus has thus been to streamline and automate increasingly complex tasks to save investigator’s time and labor. Major efforts were also required to implement within comprehensive software packages different methods of quantitation based on either isotope labeling or label-free approaches5 that were developed in parallel. In this dynamic but also very heterogeneous experimental and analytical landscape, data representation for manual expert inspection has been somewhat neglected, most likely because in very large data sets the amount of information simply appears overwhelming for a human operator. While large scale protein identification/data analysis tools produce highly reliable results, there are cases in which manual validation of results on individual proteins or PTMs is needed. This is especially true for studies centered on PTMs such as phosphorylation, in which the exact localization of the modification site and the quantitation of changes among conditions are essential for data interpretation and assessment. At the same time, the exact localization of phosphorylation sites based on phospho-peptide CID spectra can be a challenging task6 and in a significant fraction of cases the search engine(s) fail to localize a site exactly within a peptide, thus producing ambiguous results. Before embarking on time-consuming functional studies on individual protein targets, biologists typically require manual validation of a particular finding to obtain a high degree of confidence. Furthermore, manual expert inspection of data still has the power of revealing new and unexpected phenomena as it has been recently demonstrated.7,8 Being frequently confronted with the question of manual expert validation, we attempted to use available software platforms and came to the conclusion that, while most offer sophisticated functions, none is able to display all necessary information in a truly convenient and efficient manner so as to maximally facilitate data exploration by experts and nonexperts alike. Typically, tools like Skyline,4 which includes advanced visualization functionalities, also have a relatively steep learning curve so that the time required to master the tool is disproportionately large for simple validation tasks. We describe here MsViz, an open-source software tool which attempts to fill this gap in the spectrum of available proteomics software by providing a very intuitive web-based interface that allows easy but nevertheless comprehensive visualization of small and medium bottom-up proteomics data sets. MsViz is not a large scale proteomics tool and does not perform complex data processing automatically. Rather, it is intended as a supporting platform for manual validation and detailed examination of data mapped to individual proteins of interest. The tool links sequence coverage, peptide-spectrum matches (PSMs), MS2 spectra and precursor quantitation in a very direct manner using only three types of graphic panels in a single browser window. Using pop-up functions, MsViz summarizes the most important parameters for the evidence covering a sequence of interest and allows inspection of spectra, retention times (RT) and the extraction of precursor ion chromatograms (XICs) for specific peptides. Using MsViz, quantitative values can be easily exported as tables while extracted spectra and XICs can be saved as graphics. We found that MsViz greatly facilitates and accelerates validation of bottom-up proteomics results, especially those relative to the localization and quantitation of complex PTMs, especially phosphorylation.
2.1. Sample Preparation
OSBL8 was overexpressed as C-terminally FLAG-tagged protein in HeLa cells and purified with anti-Flag M2 affinity resin. To study differential phosphorylation in different cellular conditions, OSBL8-Flag was immuno-precipitated from cells treated with nocodazole (100 ng/mL, 18 h) to obtain a fraction of OSBL8 at mitosis (FLAG-M) or DMSO for purifying OSBL8 from mainly interphase cells (FLAG-I). Prior to collection, DMSO treated cell dishes were washed with PBS to detach the mitotic cells as to minimize their carry over. After affinity purification and beads elution with SDS, polyacrylamide gel electrophoresis and Coomassie blue staining, the OSBL8 band representing approximately 2 μg of protein with an apparent mw of 145 kDa was in-gel digested with trypsin as described9 and the obtained peptides were analyzed by nanoLC-MS with DDA on a Q-Exactive Plus mass spectrometer (Thermo Scientific). MS1 and MS2 spectra were acquired with a resolution of 70 000 and 17 500, respectively (resolution at m/z = 200). 2.2. Protein Identification
Database searches were carried out with MASCOT 2.6 using a precursor mass tolerance of 10 ppm, a fragment tolerance of 0.02 Da, carbamidomethylation of cysteine as a fixed modification, phosphorylation on serine and threonine, oxidation of methionine and N-terminal protein acetylation as variable modifications. MASCOT searches were carried out in the 2015.12 version of SWISSPROT, human taxonomy, containing 20194 entries, including a decoy database search for false discovery rate (FDR) calculation. Using a significance threshold of 0.05 for PSMs, the FDR was below 2%. Similarly, standard parameters were used for MaxQuant (version 1.5.3.30), which applied precursor and fragment tolerances of 5, respectively 20 ppm. The database used for MaxQuant searches was the release UP000005640_9606 of the curated UNIPROT human proteome (http://www.uniprot.org/ proteomes/), containing 21 038 entries (released October 2015). Identifications were filtered at 1% FDR against a decoy database search. Additionally, MaxQuant applied a 1% site FDR to all PTM modified peptide matches as per default parameters. For both MASCOT and MaxQuant searches, two possible trypsin missed cleavages were considered and the databases were supplemented with a set of common protein contaminants as distributed with the MaxQuant releases. The sequence of OSBL8 (UNPROT AC: Q9BZF1) was identical in the databases used for the two search engines. MS raw data and search engine outputs have been deposited to the ProteomeXchange Consortium via the PRIDE10 partner repository with the data set identifier PXD006254. 2.3. Data Preparation and Loading
MsViz imports full MS1 and MS2 spectral data as well as protein identification results (Figure 1). Vendor-specific raw MS file formats have to be converted into a valid mzML format before uploading to MsViz. Thermo raw files were converted into mzML files using MsConvert from ProteoWizard version 3.0.6447 64-bit.11 The “peak picking” option of ProteoWizard was chosen and set to “true” for all MS levels (“1−”) and used the vendor-specific algorithm as recommended by ProteoWizard for centroiding. No filter on peak intensity or S/N ratio was applied. All other parameters of ProteoWizard were left as per default. 3093
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research
files were removed from the folder and the parent folder was ZIP compressed before submission to MsViz. Databases of human proteins used for Mascot and MaxQuant searches were downloaded from UniProt (www.uniprot.org), and uploaded to MsViz as fasta files. 2.4. Software Availability
A public demo version of MsViz server can be found at http:// msviz-public.vital-it.ch. All data sets presented and discussed here were loaded to the public version. The current documentation for MsViz can be found on http://msviz-docs. vital-it.ch/. MsViz can be easily installed on a desktop or server computer using Docker images. The images and installation instructions can be found at https://github.com/vitalit-sib/ msviz-docker. The source code is available under the GPL 2+ license and can be found at https://github.com/vitalit-sib/ msviz-backend. 2.5. Implementation
MsViz is implemented using a three-tier architecture. The presentation tier was developed in Javascript using the AngularJS framework (https://angularjs.org). For the protein visualization the pviz library was used.12 For the visualization of the XIC and fragment spectra the fishtones library was used (https://github.com/Genentech/fishtones-js). The logic tier was written in the Scala programming language (http://www.scala-lang.org) using the Play framework. Scala is a mixed object oriented and functional programming language for the Java Virtual Machine. Its functional programming style simplifies parallelism for CPU intensive calculations. The data tier was implemented using MongoDB,13 which allows heterogeneous data to be handled and ensures a fast access for search requests.
Figure 1. Data preparation workflow for MsViz. Spectral data is converted into the generic mzML format using MsConvert (Proteowizard package). MASCOT results are exported as mzIdentML files (mzid) and uploaded together with their corresponding mzML files in the .zip format. For MaxQuant results the “txt” folder is ZIP compressed together with the corresponding mzML files.
Currently, outputs from Mascot (http://www.matrixscience. com/) and MaxQuant (www.maxquant.org) can be loaded into MsViz. Mascot results were exported as mzIdentML files with default parameters from the Mascot web interface. The mzIdentML files were moved to a new directory together with the corresponding spectral data in mzML format. The folder was ZIP compressed and then uploaded to MsViz. MaxQuant PSM data are imported into MsViz from the following tables: evidence.txt, msms.txt, parameters.txt, peptides.txt, proteinGroups.txt and summary.txt. The entire txt folder was copied to a new folder together with the corresponding spectral data in mzML format. Unused “txt”
3. RESULTS 3.1. Analysis of the Phosphorylation State of OSBL8
We tested the functionalities of MsViz to carry out an in-depth analysis of the phosphorylation state of oxysterol-binding protein-related protein 8 (OSBL8, UniProt AC: Q9BZF1), a
Figure 2. Graphical overview of the OSBL8 protein (Q9BZF1) sequence coverage in samples FLAG-I and FLAG-M (MASCOT search). The thickness of the green bars is a function of the number of PSMs matching the sequence region, while modification sites are labeled and shown as circles with size proportional to the number of PSMs matching a given position. 3094
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research
Figure 3. Example of MsViz display for the peptide DLHQPSLSPASPHSQGFER (residues 58−76 in OSBL8 (Q9BZF1) protein) from a MASCOT output. After zooming into the protein sequence, all PSMs covering the region of interest can be displayed by clicking a green bar or a blue circle. A tooltip can be used to show PSM details (A). Confidently localized phosphosites are shown as red circles in the PSM bars. A gray circle indicates a site of phosphorylation identified with a lower confidence for the PSM considered, while orange circles are used to show alternate phosphosites 3095
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research Figure 3. continued
identified with the same localization score as the best one. An annotated MS2 spectrum, together with XIC of its precursor m/z in all samples, is shown when clicking on PSMs (here scan numbers 7921 and 8155) (B). A red bar in the XIC shows the retention time of the selected PSM, while green bars correspond to other PSMs with the same precursor m/z and purple bars to other PSMs with the same precursor mass (i.e., with a different charge). Additional gray bars show unidentified or subthreshold PSMs with the same precursor m/z. After a zoom into a MS2 spectrum or a XIC trace, a link button allows the alignment of all other annotated spectra or XIC traces currently shown by MsViz. Intensities of selected precursor masses in the XIC (green area) can be exported to text (.csv) file via a basket button.
attributes, i.e., scan number, precursor m/z, charge, peptide sequence, retention time and score (Figure 3A). When one PSM is clicked, the annotated MS2 spectrum and the XIC of the selected m/z in all samples are displayed in the right half of the window. The time of acquisition of the PSM is marked on the XIC trace by a red bar along with those of other spectra taken for the same m/z or other m/z values corresponding to the same total precursor mass (see legend to Figure 3B). Contextual pop-ups display additional information such as precursor m/z, charge state and PTM localization probability when moving the cursor over the bars. It is possible to zoom in and align multiple XIC traces. With a mouse drag the user can readily extract and export quantitative data on selected XIC peaks (Figure 3B, green area) to a “results basket” that can be saved at the end of the analysis as a text file (.csv) format. The content of the basket is the main output of MsViz analyses and only contains values for those peptides specifically selected by the user. It is thus important to select carefully the modified peptide species to quantitate across samples, including some unmodified peptides that will be used for normalizing total levels of the protein of interest. We used MsViz to compare phosphorylation sites of the OSBL8 protein in FLAG-I and FLAG-M samples. As an example, PSMs were matched by MASCOT to phosphorylated species on both Ser 65 and Ser 68. As the residues 65 and 68 are in the same tryptic peptide, the first question was to know if both positions were really phosphorylated (hence two different phosphoforms existed) or if one of the two localizations was an artifact of spectrum matching. Clicking on the two monophosphorylated PSMs (scan nr 7921 and 8155) with distinct assignment of the phosphate group displayed the MS2 spectra and their inspection allowed to validate the localization on both sites done by the search engine. Moreover two distinct peaks were visible in the XIC of the precursor, the first one corresponding to DLHQPSLS*PASPHSQGFER (pS65), the second to DLHQPSLSPAS*PHSQGFER (pS68) (Figure 3B) (residues 58−76). Quantitation of the XIC peaks was directly possible from the same interface, showing that the first eluting phosphopeptide (pS65) was much more abundant in the FLAG-I sample (>10fold difference), while the second one (pS68) was still more abundant in this sample, but with a lower fold change (∼2×). Notably a third phosphorylation site (pS63) was identified in the same peptide, but it only appeared in multiphosphorylated species, suggesting that it was the result of sequential modifications. Thereafter with MsViz it was very easy to examine in the same view these multiphosphorylated PSMs, and to show that the doubly phosphorylated peptide pS63, pS65 was also more abundant in the I sample (∼3×), whereas the triply phosphorylated form (pS63, pS65, pS68) was only observed in the M sample (see Figure 6). In order to take in account eventual differences in the amounts of loaded peptides between samples, we also measured XIC intensities of two OSBL8 “reference” peptides, i.e., unmodified peptides not
member of oxysterol binding protein (OSBP)−related protein family (ORP).14 OSBL8 is an endoplasmic reticulum (ER) and nuclear envelope bound protein that has been described to take part in a variety of cellular functions.15−17 To understand if and how OSBL8 is regulated by PTMs, we compared the data obtained for OSBL8 purified from interphase (FLAG-I) vs mitotic (FLAG-M) cells, to identify and accurately localize phosphorylation sites and quantitate differences in levels of phosphorylation linked to cell cycle progression. 3.2. MsViz Presents a Comprehensive View on MS Data Matching Individual Proteins
Data-dependent tandem MS data were submitted for protein identification using both MASCOT and MaxQuant (Supplementary Tables S1 and S2). Mascot .dat files containing protein identifications were converted to mzIdentML files (.mzid) and raw MS data files to .mzML format to import identifications and MS intensities into MsViz. MaxQuant result files in .txt format were parsed directly by MsViz. When searches for the two samples FLAG-I (SearchID: MSC_FLAG-I_7557) and FLAG-M (SearchID: MSC_FLAGM_7558) are imported into MSViz, they become visible in the “Searches” page and can be selected for comparison (http:// msviz-public.vital-it.ch). A list of protein IDs is displayed, aggregating identifications including numbers of PSMs and scores from all chosen database searches. The protein of interest can be selected and this leads to the main MsViz view (Figure 2) which displays the protein sequence with a summary of the coverage observed in the different samples considered (green bars with variable thickness as a function of number of PSMs). Possible strong differences in sequence coverage between samples can be readily identified in this view. A positional information display and a magnifying glass help navigating the sequence and identifying regions of interest in which to zoom in. More importantly, a pull-down menu allows the selection of PTMs that were considered in the database search and can now be displayed in MsViz. For example if phosphorylation is selected, modification sites are labeled and shown as circles with size proportional to the number of PSMs matching a given position (Figure 2). Thus, MsViz exploits the PTM localization information provided by the search engine while at the same time summarizing information from all PSMs to give a preliminary overview of the most probable modification sites (blue circles in Figure 2) and quantitative differences between samples. 3.3. MsViz Enables Detailed Examination of Individual PSMs and XIC-Based Quantitation of Precursor Intensities
Clicking on a modification site or anywhere on the sequence displays all PSMs covering that particular stretch in a stacked bar format, with modified residues color- and symbol-coded according to the probability of localization determined by the search engine. A tooltip allows hovering on the PSMs of interest with the mouse to display a box containing all PSM 3096
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research
Figure 4. MsViz analysis of the doubly phosphorylated peptide GYSSPEPDIQDSSGSEAQSVKPSTR (residues 796−820) of OSBL8 (Q9BZF1). Annotated MS2 spectra and XIC of the precursor m/z allow the assignment of the different isomers to their respective elution peaks. In the window of annotated spectra, there is the possibility of showing the Mascot identification assignment and localization score of lower rank matches. Assignment of the different positional isomers of phosphosites could be manually confirmed by zooming into the annotated spectra to the region containing the discriminating fragment ions y14++, y15++, b6 and their corresponding phospho neutral losses.
covering the protein sequence containing phosphorylations. Comparison of these reference peptide intensities showed very
similar amounts of OSBL8 protein between FLAG-M and FLAG-I samples (see Figure 6). 3097
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research
Figure 5. MsViz display of the OSBL8 (Q9BZF1) peptide EAYPTPTKDLHQPSLSPASPHSQGFER (residues 50−76) in the sample I (Interphase) analyzed by Mascot (MSC_FLAG-I-7557) or MaxQuant (MXQ_I-7557). The monophosphorylated form of the peptide was identified by Mascot with the same localization score for the phosphorylation on Thr 54 or Thr 56, while MaxQuant confidently localized the modification on Thr 54.
Figure 6. Comparison of phosphopeptide intensities (in log10 scale) of OSBL8 (Q9BZF1) in interphase (FLAG-I) vs mitosis (FLAG-M) samples. This graph was based on the intensity data (peak height) contained in a text (.csv) file exported from MsViz. Results from quantitation can be found in Supplementary Table S3. Intensities of two unmodified peptides (ref. pept.) were also extracted to evaluate the differences in OSBL8 amounts between the samples.
and pS799, scan nr 8938) GYSSPEPDIQDS*S*GSEAQSVKPSTR (pS807 and pS808, scan nr. 8652) to three distinct peaks in the XIC of FLAG-M and FLAG-I samples (Figure 4). Furthermore, by zooming and aligning the annotated MS/ MS spectra, the assignment of the different positional isomers of phosphosites were manually confirmed by the presence of the specific fragment ions y14++, y15++, b6 and their corresponding phospho group neutral losses. We could thus highlight the different quantitative behavior of the three phosphoforms between FLAG-I and FLAG-M samples, with
In another example we used MsViz to get a more detailed view of OSBL8 phosphorylation in a stretch of sequence covered by the peptide GYSSPEPDIQDSSGSEAQSVKPSTR (residues 796−820). This peptide was mostly identified as multiphosphorylated forms, and the presence of 8 potential sites of phosphorylation made their localization and the quantification of the different positional isomers quite challenging. But with MsViz, we could for example quickly assign the three doubly phosphorylated isomers GYSS*PEPDIQDS*SGSEAQSVKPSTR (pS799 and pS807, scan nr 8521), GYS*S*PEPDIQDSSGSEAQSVKPSTR (pS798 3098
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research
PSMs, extract XICs and show sequence coverages for complex experiments.18 However, it appears that the viewer itself was not meant to be used for interactive data analysis, since it is not possible to intervene manually to e.g., modify the integration of a XIC peak. Also the possibility of displaying side-by-side sequence coverages and precursor intensities in different samples are limited or nonexistent in the MaxQuant Viewer and comparison of two or more MS2 spectra in the same view isto the best of our knowledgenot possible. On the other hand, MaxQuant automatically carries out an aggregated quantification of phosphorylation at the site level, which is not possible with MsViz. Another very popular and powerful tool is Skyline.4 Due to its focus on targeted proteomics, Skyline has very sophisticated functions for extracting and integrating XIC traces for identified precursors. However, it cannot display an overview of protein sequence coverage by PSMs for several samples simultaneously. More importantly, for modified peptides Skyline groups (and therefore quantitates) precursors based on the site localization provided by the search engine. While this is mostly correct, in case of localization ambiguities (for example arising often with di- and triphosphopeptides) this can lead to some quantitation artifacts. MsViz, on the other hand, does not process further the PSMs produced by the search engine and purely provides a visualization platform, leaving to the operator the choice of which peak(s) to integrate for which precursor(s). Clearly the differences between MsViz on one hand and MaxQuant and Skyline on the other hand reflect a fundamental difference in scope for development of such tools, i.e., low throughput detailed analysis vs high throughput handling of large experiments/data sets. At the other end of the spectrum, several software tools have been specifically developed for manual and semiautomated validation of individual PSMs, de novo sequencing results and PTMs.19−21 Again these tools are highly focused on the identification step or/and the validation of PTMs but usually do not allow to link to XICs and quantitative data. PeptideShaker22 is a sophisticated, noncommercial suite of tools for high throughput protein identification using numerous search engines, which provides meta-scoring and advanced options for result visualization, but is not explicitely focused on PTMs nor on their quantitation. Other recently published free tools, Peptigram23 and X!TandemPipeline,24 are closer to the range of applications covered by MsViz but again provide only some of the functionalities of our tool (while at the same time satisfying other, different needs). Some popular commercial tools do have comprehensive, well developed GUIs for data display. Proteome Discoverer, which incorporates SEQUEST and can link to other search engines, can display all PSMs and show XICs for selected matches with the additional bonus of automatic quantitation and flexibility in upstream data processing. Obviously the scope of such a large commercial package is much broader than that of MsViz. And yet even Proteome Discoverer does not offer some of the key visualization features of MsViz, such as the ability of displaying multiple spectra and XICs side-by-side and the graphical mapping on the protein sequence of all PSMs obtained in multiple samples. Similar considerations apply to PEAKS (www.bioinfor.com).
a higher abundance of the pS807,pS808 peptide in the interphase sample, and a parallel decrease of the other isomers. With MsViz it was also possible to compare the results obtained with the two supported search engines, either for confirming PTM identification, or for resolving localization ambiguities. For example, the OSBL8 peptide EAYPTPTKDLHQPSLSPASPHSQGFER (residues 50−76) was identified by Mascot with a phosphorylation on Thr 54 or Thr 56 in the FLAG-I sample, but without a clear discrimination between the two positions. Here, MaxQuant gave a clearer result, localizing the phosphorylation on Thr 54 with a good score (Figure 5), a result that was validated by manual inspection. 3.4. OSBL8 Results
Overall, the analysis of phosphosites on OSBL8 by Mascot identified 18 confident phosphosites (localization score >70%: T13, S14, S32, T39, S63, S65, S68, T235, S305, S314, T660, T774, S778, S798, S799, S807, S808, S832) and 2 lower confidence ones (localization score 50−70%: S364, S810), while highlighting 3 potential sites with ambiguous localization scores (T54 or T56, S88 or S89, S91 or S93). As of March 31st 2017, four of the confident phosphosites (T235, T660, S778, S832) and one ambiguous phosphorylation (S88 or S89) were not listed in the Phosphosite.org (considering sites with 2 or more PSMs) nor Swissprot nor Phospho.ELM databases (Supplementary Table S3). The comparison of phosphopeptide intensities between FLAG-I and FLAG-M samples showed altogether an increase of phosphorylation in mitosis (Figure 6). In regions of the protein containing several phosphosites, a higher intensity of the multiple phosphorylated forms of the peptides could also be observed in mitosis, in parallel with a decrease of the less phosphorylated species (see for example residues 63−68). A detailed look at the results nevertheless revealed a more complex pattern of site-specific and domain-specific changes of phosphorylation stoichiometry between interphase and mitosis. For example, the predicted disordered region 799−835 showed almost invariant levels of modification; a majority of other sites underwent partial changes and a few sites (namely Thr774 and Thr778) appeared to be uniquely phosphorylated during mitosis. While the biological interpretation of OSBL8 phosphorylation levels is beyond the scope of this work, we found MsViz to be especially useful for unraveling complex situations implying multiphospho peptides with quantitative changes. The ability to locate and accurately quantitate phosphosites was instrumental for all attempts to infer the possible protein kinases responsible for the modification(s). 3.5. Comparison with Other Tools and Unique Features of MsViz
A bewildering variety of software tools have been created (a list can be found on https://omictools.com/ or https://bio.tools/) that handle MS-based proteomics data for postprocessing and extracting the most diverse sort of information. The existence of so many tools underscores (i) the complexity and variety of the data, (ii) the need to tailor data analysis to the special needs of the investigator, and (iii) the lack of perfect, “universal” software solutions. In particular, as mentioned previously, visualization for manual data inspection has rarely been at the center of the interest of software developers. We performed a quick survey of mainstream and/or recent noncommercial software packages to compare their data visualization capabilities with those present in MsViz. MaxQuant for example possesses a sophisticated viewer that allows to display
4. CONCLUSIONS AND PERSPECTIVES After surveying the landscape of available proteomics software tools, we felt that there was a need for a simple tool for data 3099
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research
(4) MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26, 966−968. (5) Bantscheff, M.; Schirle, M. Quantitative mass spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 2007, 389, 1017− 1031. (6) Fermin, D.; Walmsley, S. J.; Gingras, A.-C.; Choi, H.; Nesvizhskii, A. I. LuciPHOr: Algorithm for Phosphorylation Site Localization with False Localization Rate Estimation Using Modified Target-Decoy Approach. Mol. Cell. Proteomics 2013, 12, 3409−3419. (7) Leidecker, O.; Bonfiglio, J. J.; Colby, T.; Zhang, Q.; Atanassov, I.; Zaja, R.; Palazzo, L.; Stockum, A.; Ahel, I.; Matic, I. Serine is a new target residue for endogenous ADP-ribosylation on histones. Nat. Chem. Biol. 2016, 12, 998−1000. (8) Bhogaraju, S.; Kalayil, S.; Liu, Y.; Bonn, F.; Colby, T.; Matic, I.; Dikic, I. Phosphoribosylation of Ubiquitin Promotes Serine Ubiquitination and Impairs Conventional Ubiquitination. Cell 2016, 167, 1636−1649.e13. (9) Shevchenko, A.; Tomas, H.; Havlis, J.; Olsen, J. V.; Mann, M. Ingel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc. 2006, 1, 2856−2860. (10) Vizcaíno, J. a; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. a; Sun, Z.; Farrah, T.; Bandeira, N.; et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 2014, 32, 223−226. (11) Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012, 30, 918−920. (12) Mukhyala, K.; Masselot, A. Visualization of protein sequence features using JavaScript and SVG with pViz.js. Bioinformatics 2014, 30, 3408−3409. (13) Chodorow, K.; Dirolf, M. MongoDB: The Definitive Guide, 2nd ed.; Sebastopol, CA, 2013. (14) Kentala, H.; Weber-Boyvat, M.; Olkkonen, V. M. OSBP-Related Protein Family: Mediators of Lipid Transport and Signaling at Membrane Contact Sites. Int. Rev. Cell Mol. Biol. 2016, 321, 299−340. (15) Yan, D.; Mäyränpäa,̈ M. I.; Wong, J.; Perttilä, J.; Lehto, M.; Jauhiainen, M.; Kovanen, P. T.; Ehnholm, C.; Brown, A. J.; Olkkonen, V. M. OSBP-related protein 8 (ORP8) suppresses ABCA1 expression and cholesterol efflux from macrophages. J. Biol. Chem. 2008, 283, 332−340. (16) Zhou, T.; Li, S.; Zhong, W.; Vihervaara, T.; Béaslas, O.; Perttilä, J.; Luo, W.; Jiang, Y.; Lehto, M.; Olkkonen, V. M.; et al. OSBP-related protein 8 (ORP8) regulates plasma and liver tissue lipid levels and interacts with the nucleoporin Nup62. PLoS One 2011, 6, e21078. (17) Galmes, R.; Houcine, A.; van Vliet, A. R.; Agostinis, P.; Jackson, C. L.; Giordano, F. ORP5/ORP8 localize to endoplasmic reticulummitochondria contacts and are involved in mitochondrial function. EMBO Rep. 2016, 17, 800−810. (18) Tyanova, S.; Temu, T.; Carlson, A.; Sinitcyn, P.; Mann, M.; Cox, J. Visualization of LC-MS/MS proteomics data in MaxQuant. Proteomics 2015, 15, 1453−1456. (19) Curran, T. G.; Bryson, B. D.; Reigelhaupt, M.; Johnson, H.; White, F. M. Computer aided manual validation of mass spectrometrybased proteomic data. Methods 2013, 61, 219−226. (20) Lahesmaa-Korpinen, A.-M.; Carlson, S. M.; White, F. M.; Hautaniemi, S. Integrated data management and validation platform for phosphorylated tandem mass spectrometry data. Proteomics 2010, 10, 3515−3524. (21) Helsens, K.; Timmerman, E.; Vandekerckhove, J.; Gevaert, K.; Martens, L. Peptizer, a tool for assessing false positive peptide identifications and manually validating selected results. Mol. Cell. Proteomics 2008, 7, 2364−2372. (22) Vaudel, M.; Burkhart, J. M.; Zahedi, R. P.; Oveland, E.; Berven, F. S.; Sickmann, A.; Martens, L.; Barsnes, H. PeptideShaker enables
visualization able to extract and display a selected set of parameters for small and medium data sets. Compared to all the other tools mentioned so far, MsViz has two more specific advantages. First, it is web-based and thus platform- and workstation-independent, and this makes it ideal for results sharing and multioperator work, namely within core proteomics facilities and between facilities and their customers. Second, once data are imported, using MsViz does not require any significant training, at least for users familiar with the structure of proteomics data. We thus believe that there is a category of applications for which MsViz can significantly facilitate the work of the data analyst and thus lead to better and faster extraction of relevant information.
■
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.7b00194. Table S1: Mascot searches output tables for FLAG-I and FLAG-M samples (XLSX) Table S2: MaxQuant searches output tables for FLAG-I and FLAG-M samples (XLSX) Table S3: MsViz phosphorylation results for OSBL8 (XLSX)
■
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. Phone: + 41 21 692 39 47. Fax: + 41 21 692 39 05. ORCID
Manfredo Quadroni: 0000-0002-2720-4084 Author Contributions #
T.M.-C. and R.M. contributed equally to this work.
Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was funded by internal funding from the University of Lausanne, Faculty of Biology and Medicine. This work was supported through core funding from the University of Lausanne (RM, PW, TP, MQ) and the Swiss Institute of Bioinformatics (TMC, RM, AM, IX). We would like to thank the VITAL-IT support team at the Swiss Institute of Bioinformatics for help with implementation of the public server version. Thanks to Alexandra Potts and Jachen Barblan at the Protein Analysis Facility for technical help and sample and data processing.
■
REFERENCES
(1) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976−989. (2) Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S.; Perkins, D. N. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551− 3567. (3) Cox, J.; Neuhauser, N.; Michalski, A.; Scheltema, R. A.; Olsen, J. V.; Mann, M. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 2011, 10, 1794−1805. 3100
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101
Technical Note
Journal of Proteome Research reanalysis of MS-derived proteomics data sets. Nat. Biotechnol. 2015, 33, 22−24. (23) Manguy, J.; Jehl, P.; Dillon, E. T.; Davey, N. E.; Shields, D. C.; Holton, T. A. Peptigram: A Web-Based Application for Peptidomics Data Visualization. J. Proteome Res. 2017, 16, 712−719. (24) Langella, O.; Valot, B.; Balliau, T.; Blein-Nicolas, M.; Bonhomme, L.; Zivy, M. X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification. J. Proteome Res. 2017, 16, 494−503.
3101
DOI: 10.1021/acs.jproteome.7b00194 J. Proteome Res. 2017, 16, 3092−3101