MsViz, a graphical software tool for in-depth ... - ACS Publications

Mass spectrometry (MS) has become the tool of choice for the large scale identification and quantitation of proteins and their post-translational modi...
0 downloads 6 Views 2MB Size
Subscriber access provided by EAST TENNESSEE STATE UNIV

Technical Note

MsViz, a graphical software tool for in-depth manual validation and quantitation of post-translational modifications Trinidad Martin Campos, Roman Mylonas, Alexandre Masselot, Patrice Waridel, Tanja Petricevic, Ioannis Xenarios, and Manfredo Quadroni J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00194 • Publication Date (Web): 21 Jun 2017 Downloaded from http://pubs.acs.org on June 22, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MsViz, a graphical software tool for in-depth manual validation and quantitation of post-translational modifications

Trinidad Martín-Campos1,#, Roman Mylonas1,2,#, Alexandre Masselot1 , Patrice Waridel2, Tanja 3

1

2*

Petricevic , Ioannis Xenarios , Manfredo Quadroni

1

Vital-IT Group, SIB Swiss Institute of Bioinformatics, Lausanne ,Switzerland

2

Protein Analysis Facility, University of Lausanne, CH-1015 Lausanne, Switzerland

3

Institute of Pathology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland

#

These authors contributed equally to this work

*To whom correspondence should be addressed E-mail: [email protected]. Phone: + 41 21 692 39 47. Fax: + 41 21 692 39 05.

Abstract Mass spectrometry (MS) has become the tool of choice for the large scale identification and quantitation of proteins and their post-translational modifications (PTMs). This development has been enabled by powerful software packages for the automated analysis of MS data. While data on PTM’s of thousands of proteins can nowadays be readily obtained, fully deciphering the complexity and combinatorics of modification patterns even on a single protein often remains challenging. Moreover, functional investigation of PTMs on a protein of interest requires validation of the localization and the accurate quantitation of its changes across several conditions, tasks that often still require human evaluation. Software tools for large scale analyses are highly efficient but are rarely conceived for interactive, in-depth exploration of data on individual proteins. We here describe MsViz, a web-based and interactive software tool that supports manual validation of PTMs and their relative quantitation in small- and medium-size experiments. The tool displays sequence coverage information, peptidespectrum matches, tandem MS spectra and extracted ion chromatograms through a single, highly intuitive interface. We found that MsViz greatly facilitates manual data inspection to validate PTM location and quantitate modified species across multiple samples.

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

Tandem mass spectrometry ; data-dependent acquisition ; post-translational

Keywords :

modification ; phosphorylation ; site localization ; extracted ion chromatograms ; data visualization; manual validation

Introduction Modern proteomics is largely based on a bottom-up approach combining protease digestion with datadependent acquisition (DDA) of tandem mass spectra of peptide precursors. The fragmentation patterns obtained by DDA are matched to sequence databases, leading to identification of peptide sequences

and

a

more

or

less

complete

mapping

of

protein

sequences.

While rich in structural information, processed bottom-up proteomics data is inherently complex, containing multiple levels of information, e.g. MS2 spectra, precursor intensities, LC retention times, matching scores and protein positional information. Further intrinsic layers of complexity are added by the imperfect nature of protease digestion, LC separation and MS detection, which combine to produce overlapping and redundant peptide-spectrum matches (PSMs). On top of all this, the presence of both biological and artefactual post-translational modifications (PTMs) increases sample and data complexity even further. Considering that biological experimentation typically requires the comparison of data on multiple samples, it is easy to understand that the representation of large DDA datasets is a very challenging task. Considerable efforts in the last three decades has gone into the development of software tools for automated analysis of large scale bottom-up proteomics datasets to efficiently extract protein identifications and quantitation of both total protein levels and biologically relevant PTMs. Software packages

built

around

database

search

engines

such

as

Sequest

1

2

,

MASCOT ,

MaxQuant/Andromeda 3 and many others have powered the field and enabled experiments that would otherwise simply be unfeasible. However, while high performance, large scale tools can manage impressive amounts of data, they tend to offer somewhat limited functionalities for visualization and manual inspection of the obtained results. Such functionalities are often performed using third party software conceived for general TM

purposes (e.g. Scaffold ) or built ad hoc for specific types of applications (e.g. Skyline

4

for targeted

quantitative proteomics).

2 ACS Paragon Plus Environment

Page 3 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Up to now, understandably, most of the efforts of developers have been directed towards increasing the performance and robustness of the tools developed, in order to cope with the increasing mass of data produced by more powerful and sensitive MS instruments used for increasingly sophisticated experiments. The second major focus has thus been to streamline and automate increasingly complex tasks to save investigator’s time and labor. Major efforts were also required to implement within comprehensive software packages different methods of quantitation based on either isotope labelling or label-free approaches

5

that were developed in parallel. In this dynamic but also very

heterogeneous experimental and analytical landscape, data representation for manual expert inspection has been somewhat neglected, most likely because in very large data-sets the amount of information

simply

appears

overwhelming

for

a

human

operator.

While large scale protein identification- / data analysis tools produce highly reliable results, there are cases in which manual validation of results on individual proteins or PTMs is needed. This is especially true for studies centered on PTMs such as phosphorylation, in which the exact localization of the modification site and the quantitation of changes among conditions are essential for data interpretation and assessment. At the same time, the exact localization of phosphorylation sites based on phospho-peptide CID spectra can be a challenging task

6

and in a significant fraction of cases the

search engine(s) fail to localize a site exactly within a peptide, thus producing ambiguous results. Before embarking on time consuming functional studies on individual protein targets, biologists typically require manual validation of a particular finding to obtain a high degree of confidence. Furthermore, manual expert inspection of data still has the power of revealing new and unexpected phenomena as it has been recently demonstrated

7,8

.

Being frequently confronted with the question of manual expert validation, we attempted to use available software platforms and came to the conclusion that, while most offer sophisticated functions, none is able to display all necessary information in a truly convenient and efficient manner so as to maximally facilitate data exploration by experts and non-experts alike. Typically, tools like Skyline 4, which includes advanced visualization functionalities, also have a relatively steep learning curve so that the time required to master the tool is disproportionately large for simple validation tasks. We describe here and open-source MsViz, a software tool which attempts to fill this gap in the spectrum of available proteomics software by providing a very intuitive web-based interface that allows easy but nevertheless comprehensive visualization of small and medium bottom-up proteomics

3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

datasets. MsViz is not a large scale proteomics tool and does not perform complex data processing automatically. Rather, it is intended as a supporting platform for manual validation and detailed examination of data mapped to individual proteins of interest. The tool links sequence coverage, peptide-spectrum matches (PSMs), MS2 spectra and precursor quantitation in a very direct manner using only three types of graphic panels in a single browser window. Using pop-up functions, MsViz summarizes the most important parameters for the evidence covering a sequence of interest and allows inspection of spectra, retention times (RT) and the extraction of precursor ion chromatograms (XICs) for specific peptides. Using MsViz, quantitative values can be easily exported as tables while extracted spectra and XICs can be saved as graphics. We found that MsViz greatly facilitates and accelerates validation of bottom-up proteomics results, especially those relative to the localization and quantitation of complex PTMs, especially phosphorylation.

2. Materials and Methods

2.1 Sample preparation OSBL8 was overexpressed as C-terminally FLAG-tagged protein in HeLa cells and purified with antiFlag M2 affinity resin. To study differential phosphorylation in different cellular conditions, OSBL8-Flag was immuno-precipitated from cells treated with nocodazole (100ng/ml, 18h) to obtain a fraction of OSBL8 at mitosis (FLAG-M) or DMSO for purifying OSBL8 from mainly interphase cells (FLAG-I). Prior to collection, DMSO treated cell dishes were washed with PBS to detach the mitotic cells as to minimize their carry over. After affinity purification and beads elution with SDS, polyacrylamide gel electrophoresis and Coomassie blue staining, the OSBL8 band representing approx. 2 ug of protein with an apparent mw of 145 kDa was in-gel digested with trypsin as described

9

and the obtained

peptides were analysed by nanoLC-MS with DDA on a Q-Exactive Plus mass spectrometer (Thermo Scientific). MS1 and MS2 spectra were acquired with a resolution of 70’000 and 17’500, respectively (resolution at m/z =200).

2.2 Protein identification

4 ACS Paragon Plus Environment

Page 5 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Database searches were carried out with MASCOT 2.6 using a precursor mass tolerance of 10 ppm, a fragment tolerance of 0.02 Da, carbamidomethylation of cysteine as a fixed modification, phosphorylation on serine and threonine, and oxidation of methionine and N-terminal protein acetylation as variable modifications. MASCOT searches were carried out in the 2015.12 version of SWISSPROT, human taxonomy, containing 20194 entries, including a decoy database search for false discovery rate (FDR) calculation. Using a significance threshold of 0.05 for PSMs, the FDR was below 2%. Similarly, standard parameters were used for MaxQuant (version 1.5.3.30), which applied precursor and fragment tolerances of 5, respectively 20 ppm after recalibration. The database used for MaxQuant searches was the release UP000005640_9606 of the curated UNIPROT human proteome (http://www.uniprot.org/proteomes/), containing 21038 entries (released October 2015). Identifications were filtered at 1% FDR against a decoy database search. Additionally, MaxQuant applied a 1% site FDR to all PTM modified peptide matches as per default parameters. For both MASCOT and MaxQuant searches, two possible trypsin missed cleavages were considered and the databases were supplemented with a set of common protein contaminants as distributed with the MaxQuant releases. The sequence of OSBL8 (UNPROT AC: Q9BZF1) was identical in the databases used for the two search engines. MS raw data and search engine outputs have been deposited to the ProteomeXchange Consortium via the PRIDE

10

partner repository with the dataset identifier

PXD006254.

2.3 Data preparation and loading MsViz imports full MS1 and MS2 spectral data as well as protein identification results (Figure 1). Vendor-specific raw MS file formats have to be converted into a valid mzML format before uploading to MsViz. Thermo raw files were converted into mzML files using MsConvert from ProteoWizard version 3.0.6447 64-bit 11. The “peak picking” option of ProteoWizard was chosen and set to “true” for all MS levels (“1-“) and used the vendor-specific algorithm as recommended by ProteoWizard for centroiding. No filter on peak intensity or S/N ratio was applied. All other parameters of ProteoWizard were left as per default. Currently, outputs from Mascot (http://www.matrixscience.com/) and MaxQuant (www.maxquant.org) can be loaded into MsViz. Mascot results were exported as mzIdentML files with default parameters from the Mascot web interface. The mzIdentML files were moved to a new directory together with the

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

corresponding spectral data in mzML format. The folder was ZIP compressed and then uploaded to MsViz. MaxQuant PSM data are imported into MsViz from the following tables: evidence.txt, msms.txt, parameters.txt, peptides.txt, proteinGroups.txt and summary.txt. The entire txt folder was copied to a new folder together with the corresponding spectral data in mzML format. Unused “txt” files were removed from the folder and the parent folder was ZIP compressed before submission to MsViz. Databases of human proteins used for Mascot and MaxQuant searches were downloaded from UniProt (www.uniprot.org), and uploaded to MsViz as fasta files.

2.4 Software availability A public demo version of MsViz server can be found at http://msviz-public.vital-it.ch. All datasets presented and discussed here were loaded to the public version. The current documentation for MsViz can be found on http://msviz-docs.vital-it.ch/. MsViz can be easily installed on a desktop or server computer using Docker images. The images and installation instructions can be found at https://github.com/vitalit-sib/msviz-docker . The source code is available under the GPL 2+ license and can be found at https://github.com/vitalit-sib/msviz-backend .

2.5 Implementation MsViz is implemented using a three-tier architecture. The presentation tier was developed in Javascript using the AngularJS framework (https://angularjs.org ). For the protein visualization the pviz library was used

12

. For the visualization of the XIC and fragment spectra the fishtones library

was used (https://github.com/Genentech/fishtones-js). The logic tier was written in the Scala programming language (http://www.scala-lang.org ) using the Play framework. Scala is a mixed object oriented and functional programming language for the Java Virtual Machine. Its functional programming style simplifies parallelism for CPU intensive calculations. The data tier was implemented using MongoDB

13

, which allows heterogeneous data to be handled

and ensures a fast access for search requests.

3. Results

3.1 Analysis of the phosphorylation state of OSBL8

6 ACS Paragon Plus Environment

Page 7 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

We tested the functionalities of MsViz to carry out an in-depth analysis of the phosphorylation state of oxysterol-binding protein-related protein 8 (OSBL8, UniProt AC: Q9BZF1), a member of oxysterol binding protein (OSBP)–related protein family (ORP)14. OSBL8 is an endoplasmic reticulum (ER) and nuclear envelope bound protein that has been described to take part in a variety of cellular functions 15,16,17

.

To understand if and how OSBL8 is regulated by PTMs, we compared the data obtained for OSBL8 purified from interphase (FLAG-I) vs mitotic (FLAG-M) cells, to identify and accurately localize phosphorylation sites and quantitate differences in levels of phosphorylation linked to cell cycle progression.

3.2 MsViz presents a comprehensive view on MS data matching individual proteins Data-dependent tandem MS data were submitted for protein identification using both MASCOT and MaxQuant (Supplementary Tables S1 and S2). Mascot .dat files containing protein identifications were converted to mzIdentML files (.mzid) and raw MS data files to .mzML format to import identifications and MS intensities into MsViz. MaxQuant result files in .txt format were parsed directly by MsViz. When searches for the two samples FLAG-I (SearchID: MSC_FLAG-I_7557) and FLAG-M (SearchID: MSC_FLAG-M_7558) are imported into MSViz, they become visible in the “Searches” page and can be selected for comparison (http://msviz-public.vital-it.ch). A list of protein IDs is displayed, aggregating identifications including numbers of PSMs and scores from all chosen database searches. The protein of interest can be selected and this leads to the main MsViz view (Figure 2) which displays the protein sequence with a summary of the coverage observed in the different samples considered (green bars with variable thickness as a function of number of PSMs). Possible strong differences in sequence coverage between samples can be readily identified in this view. A positional information display and a magnifying glass help navigating the sequence and identifying regions of interest on which to zoom in. More importantly, a pull-down menu allows the selection of PTMs that were considered in the database search and can now be displayed in MsViz. For example if phosphorylation is selected, modification sites are labelled and shown as circles with size proportional to the number of PSMs matching a given position (Figure 2). Thus MsViz exploits the PTM localization information provided by the search engine while at the same time summarizing

7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

information from all PSMs to give a preliminary overview of the most probable modification sites (blue circles in figure 2) and quantitative differences between samples.

3.3

MsViz enables detailed examination of individual PSMs and XIC-based quantitation of

precursor intensities Clicking on a modification site or anywhere on the sequence displays all PSMs covering that particular stretch in a stacked bar format, with modified residues color- and symbol-coded according to the probability of localization determined by the search engine. A tooltip allows hovering on the PSMs of interest with the mouse to display a box containing all PSM attributes, i.e. scan number, precursor m/z, charge, peptide sequence, retention time and score (Figure 3A). When one PSM is clicked, the annotated MS2 spectrum and the XIC of the selected m/z in all samples are displayed in the right half of the window. The time of acquisition of the PSM is marked on the XIC trace by a red bar along with those of other spectra taken for the same m/z or other m/z values corresponding to the same total precursor mass (see legend to figure 3B). Contextual pop-ups display additional information such as precursor m/z, charge state and PTM localization probability when moving the cursor over the bars. It is possible to zoom in and align multiple XIC traces. With a mouse drag to the user can readily extract and export quantitative data on selected XIC peaks (Figure 3B, green area) to a “results basket” that can be saved at the end of the analysis as a text file (.csv) format. The content of the basket is the main output of MsViz analyses and only contains values for those peptides specifically selected by the user. It is thus important to select carefully the modified peptide species to quantitate across samples, including some unmodified peptides that will be used for normalizing total levels of the protein of interest. We used MsViz to compare phosphorylation sites of the OSBL8 protein in FLAG-I and FLAG-M samples. As an example, PSMs were matched by MASCOT to phosphorylated species on both Ser 65 and Ser 68. As the residues 65 and 68 are in the same tryptic peptide, the first question was to know if both positions were really phosphorylated (hence two different phosphoforms existed) or if one of the two localizations was an artefact of spectrum matching. Clicking on the two monophosphorylated PSMs (scan nr 7921 and 8155) with distinct assignment of the phosphate group displayed the MS2 spectra and their inspection allowed to validate the localization on both sites done by the search engine. Moreover two distinct peaks were visible in the XIC of the precursor, the first

8 ACS Paragon Plus Environment

Page 9 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

one

corresponding

to

DLHQPSLS*PASPHSQGFER

(pS65),

the

second

to

DLHQPSLSPAS*PHSQGFER (pS68) (Figure 3B) (residues 58-76). Quantitation of the XIC peaks was directly possible from the same interface, showing that the first eluting phosphopeptide (pS65) was much more abundant in the FLAG-I sample (>10-fold difference), while the second one (pS68) was still more abundant in this sample, but with a lower fold change (~2x). Notably a third phosphorylation site (pS63) was identified in the same peptide, but it only appeared in multi-phosphorylated species, suggesting that it was the result of sequential modifications. Thereafter with MsViz it was very easy to examine in the same view these multiphosphorylated PSMs, and to show that the doubly phosphorylated peptide pS63,pS65 was also more abundant in the I sample (~3x), whereas the triply phosphorylated form (pS63, pS65, pS68) was only observed in the M sample (Figure 6). In order to take in account eventual differences in the amounts of loaded peptides between samples, we also measured XIC intensities of two OSBL8 “reference” peptides, i.e. unmodified peptides not covering the protein sequence containing phosphorylations. Comparison of these reference peptide intensities showed very similar amounts of OSBL8 protein between FLAG-M and FLAG-I samples (Figure 6).

In another example we used MsViz to get a more detailed view of OSBL8 phosphorylation in a stretch of sequence covered by the peptide GYSSPEPDIQDSSGSEAQSVKPSTR (residues 796-820). This peptide was mostly identified as multi-phosphorylated forms, and the presence of 8 potential sites of phosphorylation made their localization and the quantification of the different positional isomers quite challenging. But with MsViz, we could for example quickly assign the three doubly phosphorylated isomers

GYSS*PEPDIQDS*SGSEAQSVKPSTR

GYS*S*PEPDIQDSSGSEAQSVKPSTR

(pS798

(pS799 and

and

pS807,

pS799,

scan scan

nr nr

8521), 8938)

GYSSPEPDIQDS*S*GSEAQSVKPSTR (pS807 and pS808, scan nr. 8652) to three distinct peaks in the XIC of FLAG-M and FLAG-I samples (Figure 4). Furthermore, by zooming and aligning the annotated MS/MS spectra, the assignment of the different positional isomers of phosphosites were manually confirmed by the presence of the specific fragment ions y14++, y15++, b6 and their corresponding phospho group neutral losses. We could thus highlight the different quantitative behavior of the three phosphoforms between FLAG-I and FLAG-M samples,

9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

with a higher abundance of the pS807,pS808 peptide in the interphase sample, and a parallel decrease of the other isomers. With MsViz it was also possible to compare the results obtained with the two supported search engines, either for confirming PTM identification, or for resolving localization ambiguities. For example, the OSBL8 peptide EAYPTPTKDLHQPSLSPASPHSQGFER (residues 50-76) was identified by Mascot with a phosphorylation on Thr 54 or Thr 56 in the FLAG-I sample, but without a clear discrimination between the two positions. Here, MaxQuant gave a clearer result, localizing the phosphorylation on Thr 54 with a good score (Figure 5), a result that was validated by manual inspection.

3.4 OSBL8 results Overall, the analysis of phosphosites on OSBL8 by Mascot identified 18 confident phospho-sites (localization score > 70%: T13, S14, S32, T39, S63, S65, S68, T235, S305, S314, T660, T774, S778, S798, S799, S807, S808, S832) and 2 lower confidence ones (localization score 50-70 %: S364, S810), while highlighting 3 potential sites with ambiguous localization scores (T54 or T56, S88 or S89, st

S91 or S93). As of March 31 2017, four of the confident phosphosites (T235, T660, S778, S832) and one ambiguous phosphorylation (S88 or S89) were not listed in the Phosphosite.org (considering sites with 2 or more PSMs) nor Swissprot nor Phospho.ELM databases (Supplementary Table S3).

The comparison of phosphopeptide intensities between FLAG-I and FLAG-M samples showed altogether an increase of phosphorylation in mitosis (Figure 6). In regions of the protein containing several phosphosites, a higher intensity of the multiple phosphorylated forms of the peptides could also be observed in mitosis, in parallel with a decrease of the less phosphorylated species (see for example residues 63-68). A detailed look at the results nevertheless revealed a more complex pattern of site-specific and domain-specific changes of phosphorylation stoichiometry between interphase and mitosis. For example, the predicted disordered region 799-835 showed almost invariant levels of modification; a majority of other sites underwent partial changes and a few sites (namely Thr774 and Thr778) appeared to be uniquely phosphorylated during mitosis. While the biological interpretation of OSBL8 phosphorylation levels is beyond the scope of this work, we found MsViz to be especially useful for unraveling complex situations implying multi-phospho peptides with quantitative changes.

10 ACS Paragon Plus Environment

Page 11 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The ability to locate and accurately quantitate phosphosites was instrumental for all attempts to infer the possible protein kinases responsible for the modification(s).

3.5 Comparison with other tools and unique features of MsViz A bewildering variety of software tools have been created (a list can be found on https://omictools.com/ or https://bio.tools/) that handle MS-based proteomics data for post-processing and extracting the most diverse sort of information. The existence of so many tools underscores i) the complexity and variety of the data ii) the need to tailor data analysis to the special needs of the investigator and iii) the lack of perfect, “universal” software solutions. In particular, as mentioned previously, visualization for manual data inspection has rarely been at the centre of the interest of software developers. We performed a quick survey of mainstream and/or recent non-commercial software packages to compare their data visualization capabilities with those present in MsViz. MaxQuant for example possesses a sophisticated viewer that allows to display PSMs, extract XICs and show sequence coverages for complex experiments

18

. However, it appears that the viewer itself

was not meant to be used for interactive data analysis, since it is not possible to intervene manually to e.g. modify the integration of a XIC peak. Also the possibility of displaying side-by-side sequence coverages and precursor intensities in different samples are limited or non-existent in the MaxQuant Viewer and comparison of two or more MS2 spectra in the same view is -to the best of our knowledge- not possible. On the other hand, MaxQuant automatically carries out an aggregated quantification of phosphorylation at the site level, which is not possible with MsViz. Another very 4

popular and powerful tool is Skyline . Due to its focus on targeted proteomics, Skyline has very sophisticated functions for extracting and integrating XIC traces for identified precursors. However, it cannot display an overview of protein sequence coverage by PSMs for several samples simultaneously. More importantly, for modified peptides Skyline groups (and therefore quantitates) precursors based on the site localization provided by the search engine. While this is mostly correct, in case of localization ambiguities (for example arising often with di- and tri-phosphopeptides) this can lead to some quantitation artefacts. MsViz, on the other hand, does not process further the PSMs produced by the search engine and purely provides a visualization platform, leaving to the operator the choice of which peak(s) to integrate for which precursor(s). Clearly the differences between MsViz on one hand and MaxQuant and Skyline on the other hand reflect a fundamental difference in scope

11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

for development of such tools, i.e. low throughput detailed analysis vs. high throughput handling of large experiments/datasets. At the other end of the spectrum, several software tools have been specifically developed for manual and semi-automated

validation of individual PSMs, de novo sequencing results and PTMs

19–21

.

Again these tools are highly focused on the identification step or/and the validation of PTMs but usually do not allow to link to XICs and quantitative data. PeptideShaker

22

is a sophisticated, non-commercial suite of tools for high throughput protein

identification using numerous search engines, which provides meta-scoring and advanced options for result visualization, but is not explicitely focused on PTMs nor on their quantitation. Other recently published free tools, Peptigram

23

and X!TandemPipeline,

24

are closer to the range of applications

covered by MsViz but again provide only some of the functionalities of our tool (while at the same time satisfying other, different needs). Some popular commercial tools do have comprehensive, well developed GUIs for data display. Proteome DiscovererTM, which incorporates SEQUEST and can link to other search engines, can display all PSMs and show XICs for selected matches with the additional bonus of automatic quantitation and flexibility in upstream data processing. Obviously the scope of such a large commercial package is much broader than that of MsViz. And yet even Proteome Discoverer does not offer some of the key visualization features of MsViz, such as the ability of displaying multiple spectra and XICs side-by-side and the graphical mapping on the protein sequence of all PSMs obtained in multiple samples. Similar considerations apply to PEAKS (www.bioinfor.com).

4.0 Conclusions and perspectives After surveying the landscape of available proteomics software tools, we felt that there was a need for a simple tool for data visualization able to extract and display a selected set of parameters for small and medium datasets. Compared to all the other tools mentioned so far, MsViz has two more specific advantages. First, it is web-based and thus platform- and workstation-independent and this makes it ideal for results sharing and multi-operator work, namely within core proteomics facilities and between facilities and their customers. Second, once data are imported, using MsViz does not require any significant training, at least for users familiar with the structure of proteomics data. We thus believe

12 ACS Paragon Plus Environment

Page 13 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

that there is a category of applications for which MsViz can significantly facilitate the work of the data analyst and thus lead to better and faster extraction of relevant information.

5.0 Supporting information The following files are available free of charge at the ACS website http://pubs.acs.org: Supplementary Table S1. Mascot searches output tables for FLAG-I and FLAG-M samples (.xlsx file) Supplementary Table S2. MaxQuant searches output tables for FLAG-I and FLAG-M samples (.xlsx file) Supplementary Table S3. MsViz phosphorylation results for OSBL8 (.xlsx file)

6.0 Acknowledgements This work was funded by internal funding from the University of Lausanne, Faculty of Biology and Medicine. We would like to thank the VITAL-IT support team at the Swiss Institute of Bioinformatics for help with implementation of the public server version. Thanks to Alexandra Potts and Jachen Barblan at the Protein Analysis Facility for technical help and sample and data processing.

7.0 Funding This work was supported through core funding from the University of Lausanne (RM, PW, TP, MQ) and the Swiss Institute of Bioinformatics (TMC, RM, AM, IX).

8.0 References

1)

Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976–989.

(2)

Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S.; Perkins, D. N. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551–3567.

(3)

Cox, J.; Neuhauser, N.; Michalski, A.; Scheltema, R. A.; Olsen, J. V; Mann, M. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 2011, 10, 1794–1805.

(4)

MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26, 966–968.

13 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 25

(5)

Bantscheff, M.; Schirle, M. Quantitative mass spectrometry in proteomics : a critical review. Anal. Bioanal. Chem. 2007, 1017–1031.

(6)

Fermin, D.; Walmsley, S. J.; Gingras, A.-C.; Choi, H.; Nesvizhskii, A. I. LuciPHOr: Algorithm for Phosphorylation Site Localization with False Localization Rate Estimation Using Modified Target-Decoy Approach. Mol. Cell. Proteomics 2013, 12, 3409–3419.

(7)

Leidecker, O.; Bonfiglio, J. J.; Colby, T.; Zhang, Q.; Atanassov, I.; Zaja, R.; Palazzo, L.; Stockum, A.; Ahel, I.; Matic, I. Serine is a new target residue for endogenous ADP-ribosylation on histones. Nat. Chem. Biol. 2016, 12, 998–1000.

(8)

Bhogaraju, S.; Kalayil, S.; Liu, Y.; Bonn, F.; Colby, T.; Matic, I.; Dikic, I. Phosphoribosylation of Ubiquitin Promotes Serine Ubiquitination and Impairs Conventional Ubiquitination. Cell 2016, 167, 1636–1649.e13.

(9)

Shevchenko, A.; Tomas, H.; Havlis, J.; Olsen, J. V; Mann, M. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc. 2006, 1, 2856–2860.

(10)

Vizcaíno, J. a; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. a; Sun, Z.; Farrah, T.; Bandeira, N.; et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 2014, 32, 223–226.

(11)

Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012, 30, 918–920.

(12)

Mukhyala, K.; Masselot, A. Visualization of protein sequence features using JavaScript and SVG with pViz.js. Bioinformatics 2014, 30, 3408–3409.

(13)

Chodorow, K.; Dirolf, M. MongoDB :the definitive guide; O’Reilly, Ed.; 2nd ed.; Sebastopol,CA, 2013.

(14)

Kentala, H.; Weber-Boyvat, M.; Olkkonen, V. M. OSBP-Related Protein Family: Mediators of Lipid Transport and Signaling at Membrane Contact Sites. Int. Rev. Cell Mol. Biol. 2016, 321, 299–340.

(15)

Yan, D.; Mäyränpää, M. I.; Wong, J.; Perttilä, J.; Lehto, M.; Jauhiainen, M.; Kovanen, P. T.; Ehnholm, C.; Brown, A. J.; Olkkonen, V. M. OSBP-related protein 8 (ORP8) suppresses ABCA1 expression and cholesterol efflux from macrophages. J. Biol. Chem. 2008, 283, 332– 340.

(16)

Zhou, T.; Li, S.; Zhong, W.; Vihervaara, T.; Béaslas, O.; Perttilä, J.; Luo, W.; Jiang, Y.; Lehto, M.; Olkkonen, V. M.; et al. OSBP-related protein 8 (ORP8) regulates plasma and liver tissue lipid levels and interacts with the nucleoporin Nup62. PLoS One 2011, 6, e21078.

(17)

Galmes, R.; Houcine, A.; van Vliet, A. R.; Agostinis, P.; Jackson, C. L.; Giordano, F. ORP5/ORP8 localize to endoplasmic reticulum-mitochondria contacts and are involved in mitochondrial function. EMBO Rep. 2016, 17, 800–810.

(18)

Tyanova, S.; Temu, T.; Carlson, A.; Sinitcyn, P.; Mann, M.; Cox, J. Visualization of LC-MS/MS proteomics data in MaxQuant. Proteomics 2015, 15, 1453–1456.

(19)

Curran, T. G.; Bryson, B. D.; Reigelhaupt, M.; Johnson, H.; White, F. M. Computer aided manual validation of mass spectrometry-based proteomic data. Methods 2013, 61, 219–226.

(20)

Lahesmaa-Korpinen, A.-M.; Carlson, S. M.; White, F. M.; Hautaniemi, S. Integrated data management and validation platform for phosphorylated tandem mass spectrometry data. Proteomics 2010, 10, 3515–3524.

14 ACS Paragon Plus Environment

Page 15 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(21)

Helsens, K.; Timmerman, E.; Vandekerckhove, J.; Gevaert, K.; Martens, L. Peptizer, a tool for assessing false positive peptide identifications and manually validating selected results. Mol. Cell. Proteomics 2008, 7, 2364–2372.

(22)

Vaudel, M.; Burkhart, J. M.; Zahedi, R. P.; Oveland, E.; Berven, F. S.; Sickmann, A.; Martens, L.; Barsnes, H. PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat. Biotechnol. 2015, 33, 22–24.

(23)

Manguy, J.; Jehl, P.; Dillon, E. T.; Davey, N. E.; Shields, D. C.; Holton, T. A. Peptigram: A Web-Based Application for Peptidomics Data Visualization. J. Proteome Res. 2017, 16, 712– 719.

(24)

Langella, O.; Valot, B.; Balliau, T.; Blein-Nicolas, M.; Bonhomme, L.; Zivy, M. X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification. J. Proteome Res. 2017, 16, 494–503.

15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 25

Figure legends

Figure 1 The data preparation workflow for MsViz. Spectral data is converted into the generic mzML format using MsConvert (Proteowizard package). MASCOT results are exported as mzIdentML files (mzid) and uploaded together with their corresponding mzML files in the .zip format. For MaxQuant results the “txt” folder is ZIP compressed together with the corresponding mzML files.

Figure 2 Graphical overview of the OSBL8 protein (Q9BZF1) sequence coverage in samples FLAG-I and FLAG-M (MASCOT search). The thickness of the green bars is a function of the number of PSMs matching the sequence region, while modification sites are labelled and shown as circles with size proportional to the number of PSMs matching a given position.

Figure 3 Example of MsViz display for the peptide DLHQPSLSPASPHSQGFER (residues 58-76 in OSBL8 (Q9BZF1) protein) from a MASCOT output. After zooming into the protein sequence, all PSMs covering the region of interest can be displayed by clicking a green bar or a blue circle. A tooltip can be used to show PSM details (A). Confidently localized phosphosites are shown as red circles in the PSM bars. A grey circle indicates a site of phosphorylation identified with a lower confidence for the PSM considered, while orange circles are used to show alternate phosphosites identified with the same localization score as the best one. An annotated MS2 spectrum, together with XIC of its precursor m/z in all samples, is shown when clicking on PSMs (here scan numbers 7921 and 8155) (B). A red bar in the XIC shows the retention time of the selected PSM, while green bars correspond to other PSMs with the same precursor m/z and purple bars to other PSMs with the same precursor mass (i.e. with a different charge). Additional grey bars show unidentified or subthreshold PSMs with the same precursor m/z. After a zoom into a MS2 spectrum or a XIC trace, a link button allows the alignment of all other annotated spectra or XIC traces currently shown by MsViz. Intensities of selected precursor masses in the XIC (green area) can be exported to text (.csv) file via a basket button.

16 ACS Paragon Plus Environment

Page 17 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4 MsViz analysis of the doubly phosphorylated peptide GYSSPEPDIQDSSGSEAQSVKPSTR (residues 796-820) of OSBL8 (Q9BZF1). Annotated MS2 spectra and XIC of the precursor m/z allow the assignment of the different isomers to their respective elution peaks. In the window of annotated spectra, there is the possibility of showing the Mascot identification assignment and localization score of lower rank matches. Assignment of the different positional isomers of phosphosites could be manually confirmed by zooming into the annotated spectra to the region containing the discriminating fragment ions y14++, y15++, b6 and their corresponding phospho neutral losses.

Figure 5 MsViz display of the OSBL8 (Q9BZF1) peptide EAYPTPTKDLHQPSLSPASPHSQGFER (residues 50-76) in the sample I (Interphase) analyzed by Mascot (MSC_FLAG-I-7557) or MaxQuant (MXQ_I7557). The mono-phosphorylated form of the peptide was identified by Mascot with the same localization score for the phosphorylation on Thr 54 or Thr 56, while MaxQuant confidently localized the modification on Thr 54.

Figure 6 Comparison of phosphopeptide intensities (in log10 scale) of OSBL8 (Q9BZF1) in interphase (FLAGI) vs. mitosis (FLAG-M) samples. This graph was based on the intensity data (peak height) contained in a text (.csv) file exported from MsViz. Results from quantitation can be found in Supplementary Table S3. Intensities of two unmodified peptides (Ref. pept.) were also extracted to evaluate the differences in OSBL8 amounts between the samples.

17 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 25

Figure 1

18 ACS Paragon Plus Environment

Page 19 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2

19 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 25

Figure 3A)

20 ACS Paragon Plus Environment

Page 21 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3B)

21 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 25

Figure 4

22 ACS Paragon Plus Environment

Page 23 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5

23 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 25

Figure 6

24 ACS Paragon Plus Environment

Page 25 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

609x289mm (300 x 300 DPI)

ACS Paragon Plus Environment