Mass++: A Visualization and Analysis Tool for Mass Spectrometry

Jun 26, 2014 - Mass++ is plug-in software designed to satisfy diverse needs, and users can develop new automatic routines, viewers, and algorithms as ...
3 downloads 19 Views 4MB Size
Technical Note pubs.acs.org/jpr

Mass++: A Visualization and Analysis Tool for Mass Spectrometry Satoshi Tanaka,*,† Yuichiro Fujita,† Howell E. Parry,†,⊥ Akiyasu C. Yoshizawa,† Kentaro Morimoto,† Masaki Murase,† Yoshihiro Yamada,† Jingwen Yao,†,# Shin-ichi Utsunomiya,† Shigeki Kajihara,† Mitsuru Fukuda,‡,§ Masayuki Ikawa,‡,§ Tsuyoshi Tabata,‡,∇ Kentaro Takahashi,‡ Ken Aoshima,‡,○ Yoshito Nihei,∥ Takaaki Nishioka,∥ Yoshiya Oda,‡ and Koichi Tanaka† †

Koichi Tanaka Laboratory of Advanced Science and Technology, Shimadzu Corporation, Kyoto 604-8511, Japan Eisai Product Creation Systems, Eisai Co., Ltd., Tsukuba, Ibaraki 300-2635, Japan § iBioTech Co., Tsukuba, Ibaraki 300-0031, Japan ∥ Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara 630-0192, Japan ‡

S Supporting Information *

ABSTRACT: We have developed Mass++, a plug-in style visualization and analysis tool for mass spectrometry. Its plugin style enables users to customize it and to develop original functions. Mass++ has several kinds of plug-ins, including rich viewers and analysis methods for proteomics and metabolomics. Plug-ins for supporting vendors’ raw data are currently available; hence, Mass++ can read several data formats. Mass++ is both a desktop tool and a software development platform. Original functions can be developed without editing the Mass++ source code. Here, we present this tool’s capability to rapidly analyze MS data and develop functions by providing examples of label-free quantitation and implementing plug-ins or scripts. Mass++ is freely available at http://www.first-ms3d.jp/english/. KEYWORDS: Mass++, MassBank, identification, quantitation, platform, plug-in, mass spectrometry



INTRODUCTION Today, mass spectrometry (MS) plays an important role as an analysis technology in life sciences such as proteomics and metabolomics. After data acquisition by a mass spectrometer, various software products are used for visualizing, processing, and analyzing the raw data. Commercial software supplied with the MS instrument is usually used for these purposes. However, such software products usually cannot be controlled by thirdparty software. Therefore, even if researchers have original ideas such as new algorithms and automation systems, they are not easy to implement. Many free tools for the visualization, processing, and analysis of mass spectrometry data are now being distributed (e.g., Trans Proteomic Pipeline (TPP),1,2 Open MS,3,4 MaxQuant,5,6 MZmine, 7,8 XCMS, 9,10 ProteoWizard, 11,12 Skyline, 13,14 mzAPI,15 pyteomics,16 and CoreFlow17). These tools provide powerful functions; however, some of them do not allow the integration of additional code to implement researchers’ ideas, or, even if they allow integration of additional code, they require users to write a large amount of code for the implementation. (See the Supporting Information for data analysis tools and development environments.) Here, we report on a new software application, Mass++ (Mass plus plus), for viewing and manipulating any type of © 2014 American Chemical Society

mass spectrometry data. It is capable of performing a wide variety of manual or automatic tasks such as peak detection, smoothing, and automatic data submission into identification search engines such as Mascot and X! Tandem. Mass++ can read various mass spectrometer file formats, so users can analyze several kinds of data files in the same way. Mass++ can convert sample data from these formats to common formats such as AIA (netCDF), mzXML, and mzML. Mass++ is plug-in software designed to satisfy diverse needs, and users can develop new automatic routines, viewers, and algorithms as plug-ins. Thus, each user can customize Mass++ according to their objectives. Mass++ plug-ins are written in C++, VB.NET, or C#.NET, and the necessary functions can be added using these languages without reference to the Mass++ source code. Additionally, Mass++ has a script console that enables creation of simple programs using a script language. We have focused on developing comprehensive identification and quantitation analysis. Mass++ can directly post peak lists and parameters to various search engines and register the results into an internal database, which users can confirm afterward. Quantitation results are also saved in the internal database. Received: February 23, 2014 Published: June 26, 2014 3846

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853

Journal of Proteome Research

Technical Note

chromatogram values can be omitted by reading these precalculated values, thereby file-reading performance is improved without any loss of contents (see the Supporting Information for the details of the MSB file format). In particular, the MSB format greatly improves performance when a large LC-MS data file or multiple LC-MS data files are analyzed. (See the Supporting Information for the performance test results.)

Users can determine the differences between samples using the peak matrix, distribution plot, and overlapping view functions. Mass++ contributes to metabolomics by improving the chemical identification of small molecules, which is the principal bottleneck in metabolomics study. For chemical identification after automatic peak detection of LC-MS raw data, users can submit a batch of large peak data sets from Mass++ to MassBank, which is a public repository of reference mass spectra. They can also contribute to the database by using the provided function to generate MassBank-formatted records from the peak data semiautomatically.

Basic Functions

Mass++ is general-purpose software capable of performing a wide variety of tasks. The program contains several peakdetection algorithms (Table 1) and can display the peaks on an



SOFTWARE STRUCTURE OF MASS++ Mass++ is implemented in a plug-in style to simplify customizing and adding functions (see the Supporting Information for the plug-in structure). A plug-in is a software component that can be added to another software application. For example, web browsers such as Internet Explorer or Firefox cannot read PDF files immediately after installation; however, they can read PDF files after Adobe Reader is installed. In this case, Adobe Reader works as a plug-in for these web browsers. Another example is an add-in (or add-on) for Excel, i.e., a small program that adds a specific feature to Excel. Such an add-in is an actual instance of a plug-in. In Mass++, users can customize Mass++ according to their objectives, adding functions or removing unnecessary functions as needed to increase performance or simplify the software.



Table 1. Peak-Detection Algorithmsa algorithm

description

MWD GION AB3D

Peak-detection algorithm suitable for identification. Peak-detection algorithm suitable for quantitation. Peak-detection algorithm for detecting peaks from 2D data; suitable for label-free quantitation. local Very simple peak-detection algorithm: it simply picks local maximum maximum points. a Peak-detection algorithms are implemented for different purposes. For more details, refer to the Supporting Information.

m/z scale or a time scale. One of the goals in designing Mass++ was to create a program for viewing mass spectra, so Mass++ has several functions for this purpose such as file input/output functions, a waveform viewer, a 3D viewer, zoom-in functionality, peak detection, smoothing, and retention time (RT) alignment (Figure 1). (See the Supporting Information for the full list of functions.)

IMPLEMENTATION

Supported Data Formats

Mass++ supports various instrument data formats and common formats: LCMSsolution, GCMSsolution (Shimadzu), Xcalibur (Thermo Scientific), Analyst, AnalystQS (AB Sciex), MassLynx (Waters), MassHunter (Agilent), LaunchPad (Kratos) mzXML,18 mzML,19 and netCDF.20 (See the Supporting Information for supported data formats.) All functions for reading file formats are also implemented as Mass++ plug-ins. Hence, Mass++ can be made to read any new file format by developing a plug-in for the corresponding format. This structure simplifies supporting newly released mass spectrometer file formats. Mass++ can export sample data as mzXML, mzML, and mass spectrum binary (MSB) files. mzXML and mzML are already standard formats for mass spectrometry and are available in many tools. However, they have data access problems, such as when generating a chromatogram.

Identification and Quantitation

For proteome data analysis, users can invoke identification and quantitation functions. Conventionally, the standard protocol for identifying proteins from MS data begins with extracting peaks, saving them to a text file, and then posting this to a search engine; it is thus quite time-consuming. In contrast, Mass++ can directly post peak lists and parameters to certain search engines such as Mascot21 and X! Tandem,22 and search results are stored in the Mass++ internal database and can be displayed via the viewing functionality of Mass++. In addition, Mass++ provides quantitation data for the peaks and manages quantitation results using a “peak matrix” in which each row represents a peak and each column represents a sample. Users can create a peak matrix step-by-step using a wizard. The quantitation results are also stored in the internal database and linked to the corresponding identification results. Peaks related to target substances can therefore be found easily in the original mass spectrometric data. Quantitation processing using Mass++ creates a peak matrix containing the differential analysis results of all peaks, in a table format. We prepared 11 samples of a protein mixture containing five proteins from yeast, which are described below, as an example. All of them contained 200 fmol of yeast extracts. Three of them were spiked with 10 fmol of BSA tryptic peptides, five of them were spiked with 50 fmol of BSA tryptic peptides, and the other three samples were not spiked with BSA tryptic peptides (these contained only yeast extracts). The samples were analyzed by LC-MS on a LC-MS-IT-TOF system (Shimadzu). The peak matrix is created via the following steps in Mass++:

MSB File Format

MSB is an original and lossless data file format for Mass++. This format has been designed with a focus on improving filereading performance, retaining both spectrum information and chromatogram information. This format is a binary format that contributes to improved reading performance; however, the most distinctive feature is its data structure. In general cases, the most time-consuming process in reading mass spectrometry data is the reconstruction of the chromatogram. In the MSB format, the chromatogram data is stored as data objects; the data of a chromatogram is divided into multiple fragments, and statistical values, such as the maximum intensity value and the total intensity value, which are precalculated for each fragment, are stored within the MSB file. Thus, especially when reading chromatogram data across intervals corresponding to the data fragments, calculations of 3847

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853

Journal of Proteome Research

Technical Note

Figure 1. Mass++ has various functions for displaying, manipulating, and analyzing MS raw data, all implemented as plug-ins.

are small enough to be able to notice the differences (p value < 0.0001). Users can visually check peaks using a distribution plot and an overlapping view by clicking on peak rows in the peak matrix. The distribution plot displays the distribution of peak intensities or areas, and it can show a boxplot. The overlapping view presents the spectrum or chromatogram waveform, and the peak shape can be confirmed. In this case, we can confirm that peak intensities and areas increase with increasing BSA.

(1) Registering samples in groups. (2) Normalization. (3) Retention time (RT) alignment. (4) Peak-position determination. (5) Peak-value calculation. (6) Other analysis. In step 1, sample data are classified into groups, usually according to the properties of samples such as a control group and a treatment group. In this example, groups are classified according to the amount of BSA (0, 10, and 50 fmol). After the sample data is read, the columns of the peak matrix are created. In step 2, samples are normalized in order to correct the intensity gaps that often appear between different samples. Mass++ includes normalization methods using internal standards, totalized peak intensities, and so forth. (See the Supporting Information for the full list of methods.) In step 3, RT alignment is performed using a dynamic programming algorithm in order to correct the RT gaps that appear between different samples. In step 4, the peak positions are determined. Mass++ includes various methods for this such as detecting peaks (label-free), importing from a file (targeted peaks), and MRM (see the Supporting Information for the full list of methods). In this example, the peak positions are determined by detecting peaks from 11 samples. After the peak positions are determined, the peak matrix rows are generated. In step 5, the peak intensities, or areas under the waveform of the peak, are calculated from the spectra or chromatograms. After this process, all elements in the peak matrix are fixed. If needed, users can perform various analyses on the peak matrix such as statistical analysis and identification, which are used for annotating each substance peak. In this example, an analysis of variance (ANOVA) and identification are performed. Figure 2 presents the results of the quantitation. p value and substance columns are appended after ANOVA and identification are performed. The p values of peaks annotated as BSA

MassBank

MassBank,23 http://www.massbank.jp/, is a public repository of mass spectra of small molecules that currently contains 40 064 mass spectra contributed by 27 laboratories (as of January 2014). It is one of the most referenced databases for chemically identifying small molecules detected by GC- and LC-MS analysis of biological samples. Mass++ has been optimized to visualize and analyze metabolome as well as proteome data by collaborating with the MassBank project. MassBank is both a simple database and a platform containing a search engine; rich search functions are available via a web browser (Table 2). However, users previously had to extract peak data in text format and paste it into the web browser because raw data is not accepted as a query for the MassBank spectral search. However, MassBank provides a simple object access protocol (SOAP) application programming interface (API) that enables applications to be written without any web browser. Mass++ can perform a search in MassBank through its SOAP API; hence, users can search MassBank data by simply selecting a raw data spectrum as a query. Mass++ is very helpful for constructing a user’s private mass spectral library. MassBank system installers for Windows and Linux are available as open-source software, so anyone can construct their own MassBank private library in a laboratory. All spectral data with metadata in a MassBank record should be prepared by following the “MassBank Record Format”. 3848

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853

Journal of Proteome Research

Technical Note

Figure 2. Quantitation results are shown in a table called “Peak Matrix”. Each row represents a peak. The first several columns present peak RT position, m/z position, substance, and p value. The remaining columns present peak intensities or areas for each sample. Overlapping view, which displays computationally aligned chromatograms, group plot, and box plot for a specified peak, is displayed by double clicking a peak row after creating the peak matrix. In this example, users can confirm the areas or intensities of peaks that are annotated as BSA or are within different groups.



Table 2. Database Search Services for MassBank Supported in Mass++a service spectrum search peak search peak-difference search batch search

Plug-in Development

detail Search Search Search Search

DEVELOPMENT OF USER FUNCTIONS

The Mass++ standard development kit (SDK), an additional package, enables users with programming skills to create Mass++ functions without needing the Mass++ source code, using C++ or .NET technologies. SDK contains files, library files, and documents needed for developing Mass++ plug-ins. Each Mass++ plug-in function is called by an event, which is specified by an attribute named “call type”. For example, when a spectrum is displayed, a DRAW_SPEC event occurs, and any plug-ins can respond by displaying additional information on the graph. Some libraries used for accessing MS raw data are publicly available, enabling us to implement new functionality ideas. However, the following steps are usually required to process or analyze data: (i) open a raw data file, (ii) find or select the target object, (iii) process/analyze the data, (iv) output the result, and (v) close the raw data file. In Mass++ plug-in development, the program can access objects already displayed; it is thus sufficient for a developer to develop a program that consists of just the processing/analyzing and output parts. All files for a given plug-in are installed in a folder containing the plug-in definition file (plugin.xml) and the dynamic link library (DLL) file, namely, the program itself. The parameter settings file, written in XML, and other resources such as icons and help files are also installed in the same folder.

similar spectra on a peak-by-peak basis. spectra by m/z values. spectra by m/z differences. similar spectra in a batch process.

Mass++ calls certain MassBank search functions. “Batch Search” is implemented as a selectable search engine in the identification function in Mass++.

a

Creating MassBank records has been very time-consuming because researchers had to manually extract sample information and peaks from several kinds of raw data. Furthermore, the extraction method differs with MS instrument because the supplied software differs for each instrument. Now, Mass++ can semiautomatically export MassBank records that are generated from various data formats using a wizard. These records can then be easily registered in MassBank. The public MassBank server also distributes public MassBank records, so anyone can create original databases from public spectrum data and private spectra acquired in a laboratory. Furthermore, Mass++ users can search databases using peak information or their sample data (Figure 3). 3849

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853

Journal of Proteome Research

Technical Note

Figure 3. Mass++ provides two functions for MassBank: searching and building a MassBank database. Mass++ can export MassBank record files to register into an in-house MassBank database. Mass++ can also search for similar spectra in a MassBank database (in-house or the public one).

After the Mass++ SDK is installed, the Mass++ plug-in development wizard for Microsoft Visual Studio is automatically installed. Mass++ plug-ins can be easily developed using this wizard. The Mass++ SDK is freely distributed as is the main software. Mass++ plug-ins can be developed using Microsoft Visual Studio 2010, which is an integrated development environment (IDE). This is a commercial software product, but the Express Edition24 of Visual Studio is freeware; anyone can download and install it. Mass++ requires some program libraries such as boost,25 xerces-c,26 and wxWidgets,27 but these are freely available on the Internet. Therefore, an environment for Mass++ plug-in development can be built for free. Figure 4 presents an example of plug-in development. This example demonstrates how to create a function for drawing lines at product ion positions on the spectrum waveform view. The plug-in information and user interface, including the call type (see the Supporting Information for examples of call type), function names, resource files, and menu structures, are defined in the plug-in definition file (plugin.xml). The call type means the trigger for calling functions, such as executing a menu, drawing a waveform, opening a spectrum or a chromatogram, clicking a mouse button, pushing a key, detecting peaks, and so on. Libraries in the SDK have various kind of functions for accessing and analyzing MS data, the graphical user interface (GUI), database access, and network connections; these are described in the API document contained in the Mass++ SDK. In this case, the “drawProductPos” function is called in response to the “DRAW_SPEC_FG” event, which is fired while drawing additional information in the foreground of a spectrum waveform. In the next drawProductPos step, the

appropriate code for drawing lines at product ion positions is written. Using the Visual Studio debug function, we can confirm that green lines are drawn at product ion positions on the spectrum waveform. Some algorithms such as peak detection, normalization, peak filter, and RT alignment are implemented as Mass++ plug-ins. Plug-in development makes it possible to add new algorithms such as these to Mass++. The plug-in structure allows users to implement a new algorithm/methodology without reading the entire source code and understanding the encoded logic of Mass++ itself; this is the original and fundamental philosophy of Mass++ development. Scripting

Many computer scientists/engineers can use the C/C++ or .NET Framework languages, but most chemists and biologists are unfamiliar with these programming languages. Additionally, developing plug-ins is often too time-consuming for implementing a simple test program or a temporary program. In such cases, script languages are a better choice because some researchers have experience with scripting languages such as Perl, Python, or Ruby. Users can implement simple functionality via Mass++’s script console. At present, Mass++ supports IronPython,28 a Python language implementation for the .NET framework environment. In the script console, .NET framework classes in the Mass++ SDK, in the kome.clr namespace, can be used. The specification of these classes can be checked in the Mass++ SDK documentation produced by Doxygen29 and via tutorial documents and sample programs (Figure 5). 3850

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853

Journal of Proteome Research

Technical Note

Figure 4. Mass++ plug-in packaged as a folder containing a plug-in definition file (XML), a program file (DLL), a parameter definition file (XML), and other resources such as icons, help files, and documents. Developers can implement and test their original plug-ins using Microsoft Visual Studio. The plug-in development wizard for Visual Studio is also distributed for free on the Internet.

These functions make Mass++ a powerful and universal viewer for mass spectra. The second feature is its wide variety of analysis/analysis-assistance functions for omics data. Although it does not currently provide its own original search engine for proteome research, Mass++ obviates time-consuming manual processes by automatically posting the peak lists to existing search engines. Furthermore, Mass++ can read AXIMA MALDI data, which is useful for glycan analysis. The plug-ins for both of these functions are contained in Mass++; aside from Mass++, no other freeware with functions focusing on MALDI data is available. The third feature is a notable one, namely, the MassBank collaboration functions. Mass++ offers an alternative to the time-consuming process of manually formatting the data for MassBank records and simplifies posting extracted peak lists to the MassBank web site for spectra search in the same manner as the proteome search. These features enable Mass++ to be utilized as analysis/analysis-assistance software for both proteomics and metabolomics. Mass++ is thus valuable frontend software for the integrated analysis of proteome data and metabolome data. The fourth and last key Mass++ feature is its flexible plug-in structure. All functions are implemented as plug-ins, making it easy to add or remove functions. Moreover, Mass++ provides a plug-in development environment, so users can develop their own functions with programming languages or easy-to-use script languages.

Figure 5. Example of a script program written in IronPython and employing several classes of the Mass++ SDK. This program calculates and displays the total intensity of the active spectrum.



DISCUSSION Simply stated, Mass++ has four predominant features. The first feature is a basic data management system for mass spectra. Currently, the basic feature set of Mass++ contains various functions for visualizing annotated mass spectra and for reading/writing multiple data file formats. Additionally, it has peak-detection functions using newly developed algorithms. 3851

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853

Journal of Proteome Research

Technical Note

oriented toward mass spectrometry. We believe that Mass++ not only helps researchers such as biologists, chemists, and bioinformaticians but also all developers in the mass spectrometry field, and we hope that many researchers will develop their own ideas as plug-ins. Mass++ is everyone’s software, developed by everyone.

Mass++ was originally developed to resolve the problems described in the Introduction via the following features. Regarding the problem that commercial software cannot be controlled by third-party software, Mass++ has a plug-in structure, so anyone familiar with programming can develop a new algorithm, work-flow, and so on using the freely available plug-in development system. Users can employ a scripting language for plug-in development; it is thus relatively easy to control Mass++ or to add functions. Regarding the difficulties in adding user code to implement functionality using a development environment distributed as freeware, this is essentially the same as the problem of commercial software and is dealt with by Mass++ as described above. Many tools are publicly distributed as open-source software to which developers can add their original functions. However, developers not only have to implement new functions but also typically have to read large amounts of source code and write programs in multiple parts according to events such as mouse clicking, drawing, closing a sample, and so on (i.e., typical software development that includes editing the opensource software). It makes development and maintenance difficult. In Mass++ plug-in development, developers do not have to understand Mass++’s source code itself and can manage their own source code as they like because responses to events can be defined with functions using call types. In addition, developers can use functions contained in existing plug-ins. This new development system results in lower development costs when compared with creating a new tool from scratch or editing in open-source software source code. The plug-in structure realizes many of Mass++’s distinguishing features. First, it enables users to add or remove functions to/from their own copy of Mass++; hence, it is possible to build an original optimized version of Mass++ for each user according to their needs. Second, if users do not find their desired functions, they can implement them themselves with relative ease utilizing existing rich plug-in functions. Moreover, there is no technical restriction on implementing “wrapper” plug-ins for other programs; thus, it is possible to use other tools from Mass++ via plug-ins. For instance, users can employ proteome search engines Mascot and X! Tandem and proteome analysis programs PeptideProphet,30 ProteinProphet,31 and iProphet32 in TPP via Mass++; plug-ins for using these software products are distributed with the Mass++ main software. Hence, when developing plug-ins, users can utilize Mass++ as a kind of “glue” program for required functions and/ or tools. Note that the software licenses for plug-ins are mutually independent; according to the Mass++ user’s license, users can release their self-implemented plug-ins under any license except for the copy-left type, which is incompatible with the Mass++ license. Open-source plug-ins and commercial plug-ins can thus be used together in a single instance of Mass++, specifically, in the same plug-in execution environment. In recent times, parallel computing and its specialized forms (cloud computing) are hot topics, and many researchers are investigating opportunities to apply them to omics studies. Mass++ cannot run under a parallel computing environment at this time. However, Mass++ has a command line input/output mode, and using this mode, it can work as an analysis manager under a proper job management system. We think this suggests the future direction of our development. In conclusion, Mass++ is not just a simple visualization/ analysis program, but it is a more general software platform



ASSOCIATED CONTENT

S Supporting Information *

Plug-in structure, Mass++ menus, supported data formats, peak detection algorithms, normalization methods, peak position determination methods, examples of call types in Mass++, MSB file format, results of performance tests for MSB file format, example of Mass++ plug-in development using C#, table of MS data analysis tools, and table of development environments. This material is available free of charge via the Internet at http://pubs.acs.org. Mass++ runs on 32-bit and 64-bit Windows and can be downloaded free of charge via the Internet at http://www.first-ms3d.jp/english/. A Mass++ community is operated as a Google Group (http://groups. google.com/group/massplusplus/).



AUTHOR INFORMATION

Corresponding Author

*Phone: +81-75-823-2897. Fax: +81-75-823-2900. E-mail: [email protected]. Present Addresses ⊥

(H.E.P.) Thermo Fisher Scientific, Altrincham, Cheshire WA14 5TP, United Kingdom. # (J.Y.) Shimadzu Research Laboratory Ltd., Trafford Wharf Road, Manchester M17 1GP, United Kingdom. ∇ (T.T.) Department of Molecular and Cellular Bioanalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan. ○ (K.A.) Biostatistics Clinical Science, Japan Biostatistics, CCLO, Eisai Product Creation Systems, Bunkyo-ku, Tokyo 112-8088, Japan. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was originally funded by the Japan Science and Technology Agency (CREST). This work is currently funded by the Japan Society for the Promotion of Science (JSPS) through the “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program)”, initiated by the Council for Science and Technology Policy (CSTP).



ABBREVIATIONS

MS, mass spectrometry; TPP, Trans Proteomic Pipeline; LCMS, liquid chromatography-mass spectrometry; 3D, threedimensional; RT, retention time; MRM, multiple reaction monitoring; ANOVA, analysis of variance; GC, gas chromatography; SOAP, simple object access protocol; API, application programming interface; SDK, standard development kit; DLL, dynamic link library; IDE, integrated development environment; XML, extensible markup language 3852

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853

Journal of Proteome Research



Technical Note

(17) Pasculescu, A.; Schoof, E. M.; Creixell, P.; Zheng, Y.; Olhovsky, M.; Tian, R.; So, J.; Vanderlaan, R. D.; Pawson, T.; Linding, R.; Colwill, K. CoreFlow: a computational platform for integration, analysis and modeling of complex biological data. J. Proteomics 2014, 100, 167−73. (18) Pedrioli, P. G.; Eng, J. K.; Hubley, R.; Vogelzang, M.; Deutsch, E. W.; Raught, B.; Pratt, B.; Nilsson, E.; Angeletti, R. H.; Apweiler, R.; Cheung, K.; Costello, C. E.; Hermjakob, H.; Huang, S.; Julian, R. K.; Kapp, E.; McComb, M. E.; Oliver, S. G.; Omenn, G.; Paton, N. W.; Simpson, R.; Smith, R.; Taylor, C. F.; Zhu, W.; Aebersold, R. A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 2004, 22, 1459− 66. (19) Martens, L.; Chambers, M.; Sturm, M.; Kessner, D.; Levander, F.; Shofstahl, J.; Tang, W. H.; Römpp, A.; Neumann, S.; Pizarro, A. D.; Montecchi-Palazzi, L.; Tasman, N.; Coleman, M.; Reisinger, F.; Souda, P.; Hermjakob, H.; Binz, P. A.; Deutsch, E. W. mzML−a community standard for mass spectrometry data. Mol. Cell. Proteomics 2011, 10, R110.000133. (20) Standard specification for analytical data interchange protocol for chromatographic data; ASTM International: West Conshohocken, PA, 2009; pp E1947−98. (21) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551−67. (22) Fenyö, D.; Beavis, R. C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 2003, 75, 768−74. (23) Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass. Spectrom. 2010, 45, 703−714. (24) Microsoft Visual Studio. http://www.visualstudio.com/ (accessed Jan 29, 2014). (25) Boost C++ Libraries. http://www.boost.org/ (accessed Jan 29, 2014). (26) Xerces-C++ XML Parser. http://xerces.apache.org/xerces-c/ (accessed Jan 29, 2014). (27) wxWidgets. http://www.wxwidgets.org/ (accessed Jan 29, 2014). (28) IronPython. http://ironpython.codeplex.com/ (accessed Jan 29, 2014). (29) Doxygen. http://www.stack.nl/~dimitri/doxygen/ (accessed Jan 29, 2014). (30) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, 5383− 92. (31) Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 2003, 75, 4646−58. (32) Shteynberg, D.; Deutsch, E. W.; Lam, H.; Eng, J. K.; Sun, Z.; Tasman, N.; Mendoza, L.; Moritz, R. L.; Aebersold, R.; Nesvizhskii, A. I. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 2011, 10, M111.007690.

REFERENCES

(1) Keller, A.; Eng, J.; Zhang, N.; Li, X. J.; Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 2005, 1, 0017. (2) Deutsch, E. W.; Mendoza, L.; Shteynberg, D.; Farrah, T.; Lam, H.; Tasman, N.; Sun, Z.; Nilsson, E.; Pratt, B.; Prazen, B.; Eng, J. K.; Martin, D. B.; Nesvizhskii, A. I.; Aebersold, R. A guided tour of the Trans-Proteomic Pipeline. Proteomics 2010, 10, 1150−9. (3) Sturm, M.; Bertsch, A.; Gröpl, C.; Hilderbrandt, R.; Hussong, R.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Zerck, A.; Reinert, K.; Kohlbacher, O. Open MS − an open-source software framework for mass spectrometry. BMC Bioinf. 2008, 9, 163. (4) Weisser, H.; Nahnsen, S.; Grossmann, J.; Nilse, L.; Quandt, A.; Brauer, H.; Sturm, M.; Kenar, E.; Kohlbacher, O.; Aebersold, R.; Malmström, L. An automated pipeline for high-throughput label-free quantitative proteomics. J. Proteome Res. 2013, 12, 1628−44. (5) Cox, J.; Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008, 26, 1367−72. (6) Cox, J.; Matic, I.; Hilger, M.; Nagaraj, N.; Selbach, M.; Olsen, J. V.; Mann, M. A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat. Protocol. 2009, 4, 698−705. (7) Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinf. 2010, 11, 395. (8) Pluskal, T.; Uehara, T.; Yanagida, M. Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching. Anal. Chem. 2012, 84, 4396−403. (9) Benton, H. P.; Wong, D. M.; Trauger, S. A.; Siuzdak, G. XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization. Anal. Chem. 2008, 80, 6382−9. (10) Tautenhahn, R.; Patti, G. J.; Rinehart, D.; Siuzdak, G. XCMS online: a web-based platform to process untargeted metabolomic data. Anal. Chem. 2012, 84, 5035−9. (11) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 2008, 24, 2534−2536. (12) Chambers, M. C.; MacLean, B.; Burke, R.; Amode, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; Hoff, K.; Kessner, D.; Tasman, N.; Shulman, N.; Frewen, B.; Baker, T. A.; Brusniak, M. Y.; Paulse, C.; Creasy, D.; Flashner, L.; Kani, K.; Moulding, C.; Seymour, S. L.; Nuwaysir, L. M.; Lefebvre, B.; Kuhlmann, F.; Roark, J.; Rainer, P.; Detlev, S.; Hemenway, T.; Huhmer, A.; Langridge, J.; Connolly, B.; Chadick, T.; Holly, K.; Eckels, J.; Deutsch, E. W.; Moritz, R. L.; Katz, J. E.; Agus, D. B.; MacCoss, M.; Tabb, D. L.; Mallick, P. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012, 30, 918−20. (13) MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26, 966−8. (14) Schilling, B.; Rardin, M. J.; MacLean, B. X.; Zawadzka, A. M.; Frewen, B. E.; Cusack, M. P.; Sorensen, D. J.; Bereman, M. S.; Jing, E.; Wu, C. C.; Verdin, E.; Kahn, C. R.; Maccoss, M. J.; Gibson, B. W. Platform-independent and label-free quantitation of proteomic data using MS1 extracted ion chromatograms in skyline: application to protein acetylation and phosphorylation. Mol. Cell. Proteomics 2012, 11, 202−14. (15) Askenazi, M.; Parikh, J. R.; Marto, J. A. mzAPI: a new strategy for efficiently sharing mass spectrometry data. Nat. Methods 2009, 6, 240−1. (16) Goloborodko, A. A.; Levitsky, L. I.; Ivanov, M. V.; Gorshkov, M. V. Pyteomics−a Python framework for exploratory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Mass Spectrom. 2013, 24, 301−4. 3853

dx.doi.org/10.1021/pr500155z | J. Proteome Res. 2014, 13, 3846−3853