TOPPView: An Open-Source Viewer for Mass Spectrometry Data Marc Sturm* and Oliver Kohlbacher Center for Bioinformatics, Eberhard Karls University Tu ¨ bingen, Sand 14, 72076 Tu ¨ bingen, Germany Received February 20, 2009
Abstract: Visualization of complex mass spectrometric data sets is becoming increasingly important in proteomics and metabolomics. We present TOPPView, an integrated data visualization and analysis tool for mass spectrometric data sets. TOPPView allows the visualization and comparison of individual mass spectra, twodimensional LC-MS data sets and their accompanying metadata. By supporting standardized XML-based data exchange formats, data import is possible from any type of mass spectrometer. The integrated analysis tools of the OpenMS Proteomics Pipeline (TOPP) allow efficient data analysis from within TOPPView through a convenient graphical user interface. TOPPView runs on all major operating systems and is available free of charge under an open-source license at http://www.openms.de. Keywords: mass spectrometry • viewer • visualization • proteomics • metabolomics
Introduction Liquid chromatography coupled with mass spectrometry (LC-MS) has become a standard technique for proteomics and metabolomics experiments. Today, it is widely used for highthroughput biomarker discovery and identification of potential drug targets. The analysis of proteomics and metabolomics samples is typically a multistage process. First, MS survey spectra, which are mostly used for quantitation, are recorded. In the second step, the most abundant peaks are selected and fragmented to obtain tandem MS (MS/MS) spectra. MS/MS spectra are used for the identification of compounds. Although LC-MS is a well-established technique, there are many factors that influence the quality of the produced data. Visual inspection of the data can quickly reveal errors made during sample preparation, problems with the LC, or contamination of the sample. Because of the huge amount of data produced by a single proteomics or metabolomics experiment, computational tools are crucial for analyzing the data. The main analysis steps are identification and quantitation of compounds. Again, a visual inspection of the quantitation results is often needed to validate the output of analysis algorithms. MS instrument software could be used for most of these tasks, but has several drawbacks. Instrument software is typically not freely available and distributable due to licensing issues. Additionally, most of the instrument software cannot import data from other instruments or vendors. Therefore, different software has to be used for different data sets. * Corresponding author. E-mail:
[email protected].
3760 Journal of Proteome Research 2009, 8, 3760–3763 Published on Web 05/08/2009
Viable alternatives to instrument software are freely available viewers: Pep3D1 displays LC-MS maps as a 2D density plot with highlighted MS/MS precursor positions. Another viewer, msInspect,2 offers both visualization and several data processing algorithms. The visualization consists of 2D plots for LC-MC maps and plots for single spectra. Insilicos Viewer3 displays the chromatogram of an LC-MS map and the spectrum corresponding to a time point selected in the chromatogram. All of these viewers support visual exploration of the data through zooming into the data representations. However, they are often not flexible enough and do not support more complex tasks. In this work, we introduce TOPPView, a visualization and analysis tool for mass spectrometric data, particularly for data generated for proteomics and metabolomics applications. It offers a variety of views for LC-MS data and can be used to visualize profile data, centroided data and results of automated analysis pipelines. TOPPView shares many features with the viewers mentioned above and introduces several new concepts that set it apart: TOPPView can display one data set per window, but additionally, several data sets can be overlaid in the same window. The displayed data can be filtered according to user-defined rules, based on the properties of the data points or annotated metadata. It also offers the functionality of the TOPP command line tools in a convenient graphical interface.
Features TOPPView has been designed as a versatile tool suitable for many frequently occurring tasks handling LC-MS data. Depending on the task at hand, different views are used: Single spectra are visualized as a plot of intensity against mass-tocharge ratio (1D view). LC-MS maps are displayed in a 2D view from a bird’s-eye perspective with color-coded intensities. Selected regions of LC-MS maps can also be displayed in a 3D view. Examples of the views are shown in Figures 1 and 3. A multiple document interface is used to allow opening several views in one TOPPView instance. The user can interact with the visualized data in many different ways. Zooming into the data, moving the currently displayed region of the data and measuring distances between data points are all possible. Peak colors and other properties of the views are configurable by the user. The 2D view offers several special features: (1) MS/MS precursor peaks can be highlighted and the corresponding MS/MS spectra can be opened in a 1D view; (2) projections of the currently displayed data points to the m/z and retention time axis can be displayed. This can be very useful for a close inspection of elution profiles and isotopic patterns corresponding to a feature. In contrast 10.1021/pr900171m CCC: $40.75
2009 American Chemical Society
TOPPView: An Open-Source Viewer for MS Data
technical notes
Figure 1. The figure shows the TOPPView main window with an open 2D view and a 1D view in a second tab. On the right side, the layers and filters are shown. The view contains a peak layer (colored dots), a feature layer (black polygons), and a layer which is currently hidden. The feature layer is selected (blue highlighting) and two filters, which restrict the quality and charge of the displayed features, are set.
to other viewers, which also contain some of the described features, TOPPView offers them all in one consistent tool. Next, we will describe features that are unique to TOPPView. Layer Interface. The most important feature of TOPPView is the layer interface. Each available view supports displaying several layers (several data sets) on top of each other. In many cases, this is superior to displaying only one data set. In the 1D view, profile data and centroided data can be displayed together (see Figure 3). This is particularly useful to verify the centroiding, check for saturation effects, or for spectrum recalibration. Another use-case is comparing recorded MS/MS spectra to theoretical MS/MS spectra. In the 2D view, the layer concept becomes even more powerful. Map alignment can be validated visually by superimposing two peak maps. It is also possible to check the output of quantitation algorithms by displaying the peptide feature centroids and boundaries on top of the underlying peak data (see Figure 1). Quantitative data is supported through the OpenMS featureXML format. Additionally, manual curation of the quantitative data is possible in TOPPView. Data Filtering. Another key concept of TOPPView is data filtering. It is often desirable to show only those data points which conform to certain criteria. Therefore, filters can be defined for each layer which hides part of the data, for example, all peaks below an intensity threshold. The filters can also be based on peak metadata, like an annotated signal-to-noise level. Quantitation result layers can also be filtered according to intensity, charge, quality and metadata. Hiding unwanted data allows the user to concentrate on the important features of the data. It also improves the performance of the viewer. Metadata Visualization. Besides displaying the actual spectral data, TOPPView can also be used to display and edit
Figure 2. This figure shows the GUI used for calling TOPP tools from TOPPView. The Gaussian noise filter is selected and its main parameters are shown. Advanced parameters could be enabled using the check box in the bottom. Below the parameters, the description of the currently selected parameter is displayed.
metadata. This is especially useful when submitting to a public database like Pride.4 Often, at least part of the metadata required by MIAPE5 or journal submission guidelines is missing. The graphical user interface (GUI) of TOPPView offers a very convenient way to complement the missing information. Journal of Proteome Research • Vol. 8, No. 7, 2009 3761
technical notes
Sturm and Kohlbacher
Figure 3. This figure shows the 1D view (left) and the 3D view (right) of TOPPView. The 1D view has two layers with profile data (blue) and the corresponding centroided data (red). The 3D view shows part of an LC-MS map containing several peptide features.
TOPP Integration. In addition to the data visualization, TOPPView can also be used to process the displayed data. The OpenMS Proteomics Pipeline (TOPP),6 a collection of command-line tools for computational proteomics, can be accessed directly from the viewer. TOPP offers a very rich functionality ranging from noise filtering and baseline reduction, over peak picking, to differential quantitation applications and peptide/ protein identification. When calling a TOPP tool from TOPPView, the parameters can be set through a GUI (see Figure 2). The processing is done in the background so that the user can continue working. When the execution is finished, the output is opened in TOPPView. The GUI for the TOPP tools is especially useful for finding good algorithm parameters. The parameters can then be stored for processing large data sets with the TOPP command line tools. The integration of TOPP makes TOPPView a very powerful analysis tool. As an example, we will now describe how the parameters of a centroiding algorithm can be optimized in the viewer. After opening an LC-MS profile data map, a single spectrum can be selected and displayed in a 1D view. The single spectrum is then run through the centroiding algorithm. The resulting centroided data is displayed along with profile data, which makes a quick visual assessment of the algorithm performance possible. The user can now alter the algorithm parameters and rerun the algorithm until suitable parameters are found. These brief application examples shall illustrate some of the features of TOPPView. A more detailed description of TOPPView’s capabilities is available in the comprehensive tutorial provided on the project’s Web site. We refer to this document for additional examples and a more detailed description of the user interface and in-depth explanations of the individual features. To achieve a broad applicability of mass spectrometry tools, it is crucial to support as many file formats as possible. TOPPView supports most nonproprietary data formats. The preferred formats are mzData,7 as well as the recently released MzML8 format, developed by the Human Proteome Organisation-Proteomics Standards Initiative (HUPO-PSI). Additionally, many other formats like mzXML,9 Sequest input files (DTA) and ANDI/MS files are supported. Quantitation results and peptide/protein identifications are supported through the OpenMS formats featureXML, consensusXML and idXML. Having described the main features of TOPPView, we will now briefly go into technical details. TOPPView is developed 3762
Journal of Proteome Research • Vol. 8, No. 7, 2009
as open-source software under the Lesser GNU Public License (LGPL) and hosted at SourceForge.10 It is distributed as part of TOPP, which is based on OpenMS, an open-source software framework for mass spectrometry.11 Many components of TOPPView, for example, the views, are part of OpenMS, so they can be reused in other GUIs. Through the use of ANSI C++, TOPPView is platformindependent. The source code package can be compiled on most operating systems, for example, different Linux distributions, MacOS and Windows. For convenience, binary installers of the TOPP tools are provided for several operating systems. Both packages and the TOPPView tutorial are available from the OpenMS Web site at http://www.openms.de.
Discussion and Outlook We introduce TOPPView, a versatile viewer for mass spectrometry data. TOPPView addresses both the needs of scientists working with MS data and developers of data analysis algorithms. It offers multiple configurable views to choose from, depending on the data and task to perform. Each view can display several data sets as different layers. The data in each layer can be filtered according to user settings. Additionally, a convenient GUI for the TOPP tools is provided. These features make TOPPView a very powerful tool for the whole mass spectrometry community. TOPPView is under active development and several new features are planned for the next releases. An improved 1D view is currently being implemented; it facilitates validating MS/ MS identifications, and supports manual annotation and identification of peptide mass spectra. Several extensions are planned, like support for chromatograms and export of the displayed data as scalable vector graphics (SVG) for publication. Finally, we would like to emphasize that TOPPView is opensource software. Anyone can influence the future developments through feedback, bug reports, feature requests or contributions to the source code.
References (1) Li, X. J.; Pedrioli, P. G.; Eng, J.; Martin, D.; Yi, E. C.; Lee, H.; Aebersold, R. A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Anal. Chem. 2004, 76, 3856–3860.
technical notes
TOPPView: An Open-Source Viewer for MS Data (2) Bellew, M.; Coram, M.; Fitzgibbon, M.; Igra, M.; Randolph, T.; Wang, P.; May, D.; Eng, J.; Fang, R.; Lin, C.; Chen, J.; Goodlett, D.; Whiteaker, J.; Paulovich, A.; McIntosh, M. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics 2006, 22, 1902–1909. (3) Insilicos_Viewer. http://www.insilicos.com/Insilicos Viewer.html. (4) Jones, P.; Ct, R. G.; Martens, L.; Quinn, A. F.; Taylor, C. F.; Derache, W.; Hermjakob, H.; Apweiler, R. PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. 2006, 34, D659-663. (5) Taylor, C. F.; Paton, N. W.; Lilley, K. S.; Binz, P. A.; Julian, R. K.; Jones, A. R.; Zhu, W.; Apweiler, R.; Aebersold, R.; Deutsch, E. W.; Dunn, M. J.; Heck, A. J.; Leitner, A.; Macht, M.; Mann, M.; Martens, L.; Neubert, T. A.; Patterson, S. D.; Ping, P.; Seymour, S. L.; Souda, P.; Tsugita, A.; Vandekerckhove, J.; Vondriska, T. M.; Whitelegge, J. P.; Wilkins, M. R.; Xenarios, I.; Yates, J. R.; Hermjakob, H. The minimum information about a proteomics experiment (MIAPE). Nat. Biotechnol. 2007, 25, 887–893. (6) Kohlbacher, O.; Reinert, K.; Gro¨pl, C.; Lange, E.; Pfeifer, N.; SchulzTrieglaff, O.; Sturm, M. TOPPsthe OpenMS proteomics pipeline. Bioinformatics 2007, 23 (2), 191–197.
(7) Orchard, S.; Hermjakob, H.; Taylor, C. F.; Binz, P. A.; Hoogland, C.; Julian, R.; Garavelli, J. S.; Aebersold, R.; Apweiler, R. Autumn 2005 Workshop of the Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI) Geneva, September, 4-6, 2005. Proteomics 2006, 6 (3), 738–741. (8) Deutsch, E. mzML: a single, unifying data format for mass spectrometer output. Proteomics 2008, 8, 2776–2777. (9) Pedrioli, P. G. A.; Eng, J. K.; Hubley, R.; Vogelzang, M.; Deutsch, E. W.; Raught, B.; Pratt, B.; Nilsson, E.; Angeletti, R. H.; Apweiler, R.; Cheung, K.; Costello, C. E.; Hermjakob, H.; Huang, S.; Julian, R. K.; Kapp, E.; McComb, M. E.; Oliver, S. G.; Omenn, G.; Paton, N. W.; Simpson, R.; Smith, R.; Taylor, C. F.; Zhu, W.; Aebersold, R. A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 2004, 22 (11), 1459–1466. (10) Sourceforge. http://www.sourceforge.net. (11) Sturm, M.; Bertsch, A.; Gro¨pl, C.; Hildebrandt, A.; Hussong, R.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Zerck, A.; Reinert, K.; Kohlbacher, O. OpenMSsan open-source software framework for mass spectrometry. BMC Bioinf. 2008, 9, 163.
PR900171M
Journal of Proteome Research • Vol. 8, No. 7, 2009 3763