iMonDB: Mass Spectrometry Quality Control through Instrument

Mar 23, 2015 - To this end, we introduce the Instrument MONitoring DataBase (iMonDB), which allows us to automatically extract, store, and manage the ...
1 downloads 12 Views 1MB Size
Subscriber access provided by Northern Illinois University

Technical Note

iMonDB: Mass spectrometry quality control through instrument monitoring Wout Bittremieux, Hanny Willems, Pieter Kelchtermans, Lennart Martens, Kris Laukens, and Dirk Valkenborg J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b00127 • Publication Date (Web): 23 Mar 2015 Downloaded from http://pubs.acs.org on March 25, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

iMonDB: Mass spectrometry quality control through instrument monitoring Wout Bittremieux,†,‡ Hanny Willems,¶ Pieter Kelchtermans,§,k,¶ Lennart Martens,§,k Kris Laukens,∗,†,‡,@ and Dirk Valkenborg∗,¶,⊥,#,@ Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium, Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp / Antwerp University Hospital, Antwerp, Belgium, Flemish Institute for Technological Research (VITO), Mol, Belgium, Department of Medical Protein Research, VIB, Ghent, Belgium, Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium, CFP, University of Antwerp, Antwerp, Belgium, and I-BioStat, Hasselt University, Hasselt, Belgium E-mail: [email protected]; [email protected] Phone: +32 (0) 3 265 33 10; +32 (0) 14 33 52 37. Fax: +32 (0) 3 265 37 77; +32 (0) 14 33 55 99

Abstract ∗

To whom correspondence should be addressed Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium ‡ Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp / Antwerp University Hospital, Antwerp, Belgium ¶ Flemish Institute for Technological Research (VITO), Mol, Belgium § Department of Medical Protein Research, VIB, Ghent, Belgium k Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium ⊥ CFP, University of Antwerp, Antwerp, Belgium # I-BioStat, Hasselt University, Hasselt, Belgium @ Shared senior authorship †

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Over the past few years, awareness has risen that for mass-spectrometry-based proteomics methods to mature into everyday analytical and clinical practices, extensive quality assessment is mandatory. A currently overlooked source of qualitative information originates from the mass spectrometer itself. Apart from the actual mass spectral data, raw-data objects also contain parameter settings and sensory information about the mass instrument. This information gives a detailed account on the operation of the instrument, which eventually can be related to observations in mass spectral data. The advantage of instrument information at the lowest level is the high sensitivity to detect emerging defects in a timely fashion. To this end, we introduce the Instrument MONitoring DataBase (iMonDB), which allows to automatically extract, store, and manage the instrument parameters from raw-data objects into a highly efficient database structure. This enables us to monitor the instrument parameters over a considerable time period. Time course information about the instrument performance is necessary to define the normal range of operation and to detect anomalies that may correlate with instrument failure. The proposed tools foster an additional handle on quality control and are released as open source under the permissive Apache 2.0 license. The tools can be downloaded from https://bitbucket.org/proteinspector/imondb.

Keywords mass spectrometry, quality control, instrument monitoring, database, iMonDB, jMonDB

Introduction Over the past few years, awareness has risen that for mass-spectrometry-based proteomics methods to mature into everyday analytical and clinical practices, extensive quality assessment is mandatory. 1 As a result, this awareness has recently enabled a shift in focus toward a “quality by design” paradigm. 2 In order to assess the quality of a mass spectrometry experiment, several different quality control metrics have been defined 3–6 . These metrics can 2

ACS Paragon Plus Environment

Page 2 of 22

Page 3 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

be computed from the experimental spectral-derived data, which may optionally include the peptide identification results, and they capture the important operational characteristics of a mass spectrometer. However, besides these metrics based on the experimental data, there is a separate part of qualitative information available in the form of the mass spectrometer instrument parameters. Possible parameters are for example the status of the ion source, the vacuum, or a turbo pump, depending on the type of instrument. This information provides complementary insights into the operational characteristics of a mass spectrometer compared to the aforementioned quality control metrics that originate from the mass spectral data. Metrics based on the experimental data can capture a wide range of possible problems during a mass spectrometry experiment; however, if these metrics are seen to deviate from normal or previously observed values, this behavior still has to be related to the malfunctioning of a particular element in the proteomics workflow or defect in the mass spectrometry setup. On the other hand, the instrument parameters directly indicate which part of the instrument is outside its normal range of operation, and this information can subsequently be related to the interpretation of the experimental data. As such, metrics based on the experimental data and the instrument parameters are offering complementary information: both are able to indicate shortcomings of the mass instrumentation, yet they operate on different layers of information. An important advantage of monitoring instrument parameters is that emerging problems can be spotted in time and remedial measures can be suggested, for example, to replace the inert gas in the collision cell. When looking at statistics based on the mass spectral data alone, an emerging problem could remain undetected while the system deteriorates further, since most instruments are cleverly constructed to compensate for the malfunctioning of a component in the mass spectrometry process, for example, by increasing the ion target for fragmentation. Usually instrument parameters have been largely ignored for the analysis of mass spectral data and quality assessment. The main reason is that instrument parameters are recorded in

3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a vendor-specific raw data format, but they are not retained after conversion to an open file format, such as the mzML format. 7 Although the instrument parameters can be read from a raw file using vendor software, this functionality is often cumbersome and rarely advertised prominently. Therefore, Scheltema and Mann 8 developed an approach to monitor the status of the peripheral equipment, such as the liquid chromatography and the electrospray devices, based on a limited number of instrument parameters, with the aim of taking automated real-time action if a failure situation was detected. To our knowledge, extensive longitudinal monitoring of the mass spectrometer normal range of operation to discover typical and atypical instrument behavior has not been conducted due to a lack of access to sensory data and therefore this important source of quality information remains largely unexploited. In this paper we present an approach to automatically extract the instrument sensor information from raw data files, along with a highly efficient database structure to store this information. Currently this functionality is limited to the extraction of instrument parameters from Thermo Scientific raw files, but a considerable part of the technical workflow is kept generalized, with the expansion toward other vendors in mind. Using specialized processing and visualization tools, historical instrument parameters can be monitored for a large number of experiments over time in order to confirm and detect instrument failure, fostering additional methods of quality control.

Experimental section The core of the instrument monitoring functionality consists of a database that is optimized to store the vast amount of instrument parameters. Figure 1 gives an overview of the different steps required to set up the Instrument MONitoring DataBase (iMonDB) and how the database interfaces with external components. This central data storage enables the integration of several software tools that interact with the database, such as the iMonDB Collector, a Java application to set up and populate the database in an automated fashion.

4

ACS Paragon Plus Environment

Page 4 of 22

Page 5 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Each step in this workflow will be discussed in more detail in the next subsection.

Instrument monitoring database The central database, called iMonDB, is a relational database used to store historical mass spectrometer instrument parameters. Care has been taken to provide a general database structure that can accommodate a wide range of instrument types and different parameter settings. In addition, the database is optimized to perform the most common queries in a computationally efficient manner. The entity-relationship diagram of the iMonDB is shown in Figure 2 and contains four types of information pertaining to the monitoring and interpreting of instrument parameters. First, the main node of information is an individual instrument of a certain model. By making use of a controlled vocabulary (CV), such as the PSI-MS controlled vocabulary, 9 an instrument can be unambiguously cataloged, while at the same time the database structure supports all instrument models that have been formally defined. Next, associated with a particular instrument are all the runs (or a specific subset thereof) that were performed on it. A run represents a sample that was analyzed on the mass spectrometer, and is represented by a single output file containing the raw measurement values. Various metadata can be associated with a run, detailing specific characteristics of the way a run was carried out, such as, for example, the sample type and content, the column features, etc. The metadata cannot be automatically retrieved, but should be provided manually by the user in a key-value format, and it can originate, for example, from an electronic lab notebook. Metadata can subsequently be used to retrieve data that adheres to a specific characteristic, for example, to visualize instrument performance on only specific standard runs. Third, the raw file associated with a run contains the instrument parameters that were in effect during the execution of the experiment. The parameters that are recorded over the course of the experiment obviously depend on the model type of the instrument and 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

are explicitly specified by a property, which denominates a single parameter, and is defined in a controlled vocabulary. Most mass spectrometry experiments have a time dimension, for example, the column retention time when combined with liquid chromatography. Therefore, a particular instrument parameter has multiple values that are associated with this time dimension. In fact, every scan that records a mass spectrum can be associated with an instrument parameter analogue. As the main application of the iMonDB is to monitor instrument behavior over a long time interval (i.e. years) and not to monitor instrument stability over the period of a single LC-MS run (i.e. hours), summary statistics are computed for the repeated time course observations instead of storing each observation individually in the database. The advantage of this operation is that the data volume for quality control is significantly reduced, while the summary statistics provide sufficient information about the operational characteristics of the instrument. The parameter values of a single experiment are represented by their mean and median value, the first/third quartile, the minimum/maximum value, and the standard deviation. Finally, another type of information that is related to the operation of a mass spectrometer are external events that may occur. For example, machine calibrations, periodic maintenance events, or even unexpected incidents that an operator likes to report, such as, for example, if an unusual sound is produced by a turbo pump. Unlike the instrument parameters, this type of information cannot be automatically retrieved and instead it has to be manually provided by the user. However, this information is vital when interpreting the evolution of the instrument parameters over time. The previous sources of information are stored in a database structure that is both general and expressive. On the one hand it does not impose any restrictions on the mass spectrometer instrument manufacturer or even the specific instrument model type, as it is conceived as a flexible framework such that new instrument parameters can be added easily. On the other hand, the iMonDB is a rigid framework as instrument parameters can be unambiguously defined by making use of specialized controlled vocabularies. By optionally combining the

6

ACS Paragon Plus Environment

Page 6 of 22

Page 7 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

instrument parameters with a more advanced laboratory information management system (LIMS), such as, for example, colims, 10 all information relevant to a mass spectrometry experiment can be structurally stored. Combined with external events, this data integration would enable highly advanced interpretations of the instrument parameters, for example, when contrasted to the protein coverage, identification scores, etc.

Software implementations In order to set up and populate the iMonDB several Java tools have been developed. First, there is a Java application programming interface (API) geared toward developers. Based on this API there are processing and visualization tools available. All tools are released as open source and are available from the project website (https://bitbucket.org/ proteinspector/imondb/). API The jMonDB Java API features a high-level API for use by bioinformaticians and developers. The API provides all required functionality to extract instrument parameters from experimental raw files, and to store and retrieve this extracted data from the iMonDB. This API allows other developers to easily provide full iMonDB support in their own applications or to rapidly prototype powerful new tools. The extraction of the instrument parameters from experimental raw files is based on functionality provided by the ProteoWizard library. 11 Making use of the abstraction layer ProteoWizard provides for experimental raw files, lightweight extraction tools have been implemented. Currently this functionality is limited to the extraction of instrument parameters from Thermo Scientific raw files, but the functionality to extract instrument parameters for other vendors is planned. Table 1 lists all instrument models that are presently supported, however, other Thermo Scientific instruments might be supported implicitly, and explicit support for additional instruments is added continuously. 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Specifically, two sources of instrument parameters are extracted from the raw files: the tune method information and the status log information. The tune method contains several configurable values for numerous instrument parameters, such as, for example, the flow of the sheath gas and the auxiliary gas for an electrospray ionization (ESI) probe, the voltage and temperature of the ESI capillary, the voltage of the multipole and various different lenses, etc. Obviously, the set values of these tune parameters depend on the application of the experiment. Furthermore, the status log contains information on all the sensors in the mass spectrometer, for which the values are recorded at the same time intervals as the mass spectral data. The status log includes the actual values of some parameters specified in the tune method, as well as additional instrument parameters. For example, Figure 3 shows the temperature of the ESI capillary in degrees Celsius, as configured in the tune method and as measured over the course of the experiments in the status log. The measured temperature roughly equals the set value in the tune method, however, it slightly deviates during the course of the experiments, as evidenced by the quartile values and the extreme values, which are used to summarize the repeated sensor readouts. Additional status log values include, for example, the pressure of the vacuum; the speed, power, and temperature of several turbo pumps; the flow rate of the syringe pump; etc. To interact with the iMonDB, either to store extracted instrument parameters or to retrieve previously stored data, the jMonDB API makes use of the Java Persistence API (JPA), 12 and specifically the Hibernate implementation, 13 to convert between records in the database and Java objects. This allows the user to work with high-level Java objects, instead of having to manage the various tables in the database and other technicalities. Retrieving or storing of instrument parameters in the iMonDB is implemented by executing the corresponding JPA persistence methods, while more advanced queries can easily be performed using the Java Persistence Query Language (JPQL). 14

8

ACS Paragon Plus Environment

Page 8 of 22

Page 9 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Data collection The iMonDB Collector is a computer program that can be scheduled to keep the information in the iMonDB up to date. Experimental raw files for which the instrument parameters are not present in the iMonDB yet are gathered, after which these parameters are extracted from the raw files and stored in the database. The Collector can be run on a daily or weekly basis, and is fully configurable. For example, based on a lab’s particular file naming scheme, specific metadata can be extracted from the file names and directory structures by making use of user-specified regular expressions. Extensive documentation for the full capabilities of the data collector are available on the project website. Data visualization The iMonDB Viewer forms a basic GUI application to visualize instrument parameters retrieved from the iMonDB. Each instrument parameter can be visualized over time, as is shown in Figure 4. For each experiment the median value is shown in black, while the dark gray band depicts the first quartile and the third quartile, and the light gray band depicts the minimum and maximum values. This type of graph visualizes several elements of the summary statistics in an intuitive manner. Additionally, it is possible to select only the experiments that are associated with specific metadata to visualize only a subset of the experiments that are associated with a particular instrument. Next, the vertical lines in Figure 4 illustrate the external events submitted by the operator. It should be noticed that there are four different colors that represent predefined event categories. A user is able to add comments on instrument calibration, maintenance and service, downtime and errors, and an undefined category to store information on miscellaneous events, such as, for example, operator intuition. The causal effects of these events can be visually related to the trends in the instrument parameters. Obviously, these events cannot be extracted from the experimental raw files, therefore the iMonDB Viewer offers a reporting tool to add these events to the database and visualize them, as is shown in Figure 5. Several 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

event types can occur at a specific date, and can be augmented with customizable information on the observed problem, such as service reports or links to electronic lab notebooks, and the actions that have been undertaken to solve the problem. This simple tool allows for a structured recording of all the events that occurred on an instrument and yield an invaluable source of information for the downstream interpretation and analysis of the instrument parameters. Furthermore, the events are stored in the centralized iMonDB, which forms a single point of updated information, while users can easily retrieve this information on their personal computer by connecting to the iMonDB.

Case study Instrument failure detection A case study is presented to illustrate how the instrument parameters can be employed to detect/predict instrument failure. For this case study the application of the iMonDB was restricted to incorporate only the LC-MS runs originating from standard quality control samples, and more particularly BSA samples. The advantage of such QC runs is that they are measured on a daily basis and that they easily allow us to assess the operation of a mass spectrometer because of their low sample complexity and controlled sample content. It should be noticed that in general the computation of quality control metrics is not restricted to a specific sample type, but a homogeneous sample list may be required when interpreting some of the instrument characteristics. If LC-MS runs vary over time due to a change in the organism or a change in the wet lab protocol then it is difficult to link detected anomalies to a particular instrument artifact. The graph displayed by the iMonDB Viewer in Figure 4 shows the FT turbo pump 4 of a Thermo Scientific Orbitrap Velos on standard BSA LC-MS runs. From the graph, it can be noticed that the power consumption started to increase at the beginning of June 2012. The reason for this abnormal incline is that the turbo pump was deteriorating. In this case, intelligent electronics tried to increase the power consumption to remain fully functional. 10

ACS Paragon Plus Environment

Page 10 of 22

Page 11 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

However, the increased power consumption was insufficient to continuously achieve the required speed and overcome the friction, after which the instrument finally broke down (down time: red line). After replacement of the turbo pump, the power consumption returned to its original value, and correct operation of the instrument could be resumed. Note that the instrument operator observed a high-pitched noise before the turbo pump finally broke down and reported this in the lab notebook (undefined event: yellow line). While this is a clear-cut example of instrument malfunctioning, which eventually will lead to an intervention, by consistently monitoring the instrument parameters, this malfunctioning could have been predicted as the increase in power consumption serves as a proxy for turbo pump retrogression. Hence, by closely monitoring the instrument parameters and defining a normal range of operation, a suspected malfunctioning can be diagnosed early on and be reported. Next, based on this diagnosis, a timely intervention can prevent a highly undesirable loss of precious sample, analysis time, and effort. qcML export Recently the qcML format 15 has been proposed as a standard data format for mass spectrometry quality control information. Even though the primary purpose of the iMonDB is to provide longitudinal data storage, the qcML format can be used to easily interchange instrument parameters. By making use of the jMonDB API and the equivalent jqcML API, 16 it is straightforward to export data from the iMonDB to the XML-based qcML format. Supplementary Figure 1 contains the full Java code that is required to store a specific run, extracted from the iMonDB, in a qcML file. Optionally the iMonDB can be forgone as well, and the instrument parameters can be extracted from an experimental raw file and directly stored in a qcML file by making use of the appropriate API functions.

11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Conclusions Although quality control has recently been rightly identified as a vital element of a mass spectrometry experiment, instrument parameters have been largely ignored. In contrast, these parameters are an invaluable source of quality control information: they are often directly related to the performance of the mass spectrometer and are able to identify possible problems in an early stage. In this paper, we have presented the instrument monitoring database (iMonDB) and associated tools to easily store instrument parameters in the database and visualize information from the database. Furthermore, an open-source Java API is provided to facilitate the handling of instrument parameters by other developers. Currently this API is only able to extract instrument parameters from Thermo Scientific experimental raw files, other vendors are not (yet) supported. Nevertheless, the iMonDB database structure is kept general but expressive, and allows for integration with other mass spectrometer vendors as well. The highlighted case study has illustrated the importance and potential benefits that can be gained by carefully monitoring the instrument parameters. Tracking this information can help to assure a consistent and high-level quality of the experimental results. Furthermore, instrument downtime and sample loss can be avoided by scheduling a targeted maintenance when specific parts of the instrument are outside their normal range of operation, but have not completely broken down yet. Monitoring instrument parameters has a big potential to contribute to the stability of mass spectrometry experiments, which is of vital importance for the development of robust proteome analysis workflows. Furthermore, for industry, the iMonDB can be supportive to manage and automate data provenance. As future work, we intent to augment the sensor information with the traditional quality metrics that are derived from mass spectral data itself. Both sources of quality control data can provide complementary views on the performance of a mass spectrometry experiment. A combined analysis of these data sources could lead to stronger differentiators on the 12

ACS Paragon Plus Environment

Page 12 of 22

Page 13 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

instrument performance. Furthermore, the current visualization tool only shows historical data in a univariate manner, however, possibly more advanced knowledge can be gleaned by looking at the data in a multivariate fashion 17 or by making use of advanced data mining techniques. The iMonDB, its accompanying Java API, and various tools are freely available and are released as open source under the permissive Apache 2.0 license. The binaries, source code, and extensive documentation can be accessed at the project website at https://bitbucket. org/proteinspector/imondb.

Acknowledgement We would like to thank Geert Baggerman, An Staes, and Michiel Akeroyd for sharing their extensive knowledge on the various aspects of the instrument operation, the insightful discussions of the instrument settings, and testing the software in their labs at the Center for Proteomics (Antwerp), the Kris Gevaert Lab (Ghent), and DSM (Delft). This work was supported by SBO grant “InSPECtor” (120025) of the Flemish agency for Innovation by Science and Technology (IWT). L.M. also acknowledges support from the PRIME-XS project, grant agreement number 262067, funded by the European Union 7th Framework Program.

Supporting Information Available A Java code snippet to export instrument parameters retrieved from the iMonDB to the qcML XML-based interchange format.

This material is available free of charge via the

Internet at http://pubs.acs.org/.

13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References (1) Martens, L. Bringing proteomics into the clinic: The need for the field to finally take itself seriously. PROTEOMICS - Clinical Applications 2013, 7, 388–391. (2) Tabb, D. L. Quality assessment for clinical proteomics. Clinical Biochemistry 2013, 46, 411–420. (3) Rudnick, P. A. et al. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Molecular & Cellular Proteomics 2010, 9, 225–241. (4) Pichler, P.; Mazanek, M.; Dusberger, F.; Weilnb¨ock, L.; Huber, C. G.; Stingl, C.; Luider, T. M.; Straube, W. L.; K¨ocher, T.; Mechtler, K. SIMPATIQCO: A serverbased software suite which facilitates monitoring the time course of LC-MS performance metrics on Orbitrap instruments. Journal of Proteome Research 2012, 11, 5540–5547. (5) Taylor, R. M.; Dance, J.; Taylor, R. J.; Prince, J. T. Metriculator: quality assessment for mass spectrometry-based proteomics. Bioinformatics 2013, 29, 2948–2949. (6) Bereman, M. S.; Johnson, R.; Bollinger, J.; Boss, Y.; Shulman, N.; MacLean, B.; Hoofnagle, A. N.; MacCoss, M. J. Implementation of statistical process control for proteomic experiments via LC MS/MS. Journal of The American Society for Mass Spectrometry 2014, 25, 581–587. (7) Martens, L. et al. mzML—a community standard for mass spectrometry data. Molecular & Cellular Proteomics 2011, 10, R110.000133–R110.000133. (8) Scheltema, R. A.; Mann, M. SprayQc: A real-time LC-MS/MS quality monitoring system to maximize uptime using off the shelf components. Journal of Proteome Research 2012, 11, 3458–3466.

14

ACS Paragon Plus Environment

Page 14 of 22

Page 15 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(9) Mayer, G. et al. The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary. Database 2013, 2013, bat009–bat009. (10) colims - CompOmics LIMS system. https://code.google.com/p/colims/, Accessed: 03/11/2014. (11) Chambers, M. C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nature Biotechnology 2012, 30, 918–920. (12) Java

Persistence

API.

https://jcp.org/en/jsr/detail?id=317,

Accessed:

03/11/2014. (13) Hibernate. http://hibernate.org/, Accessed: 03/11/2014. (14) Enterprise JavaBeans 3.0. https://jcp.org/en/jsr/detail?id=220,

Accessed:

03/11/2014. (15) Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics 2014, 13, 1905–1913. (16) Bittremieux, W.; Kelchtermans, P.; Valkenborg, D.; Martens, L.; Laukens, K. jqcML: An open-source Java API for mass spectrometry quality control data in the qcML format. Journal of Proteome Research 2014, 13, 3484–3487. (17) Wang, X.; Chambers, M. C.; Vega-Montoto, L. J.; Bunk, D. M.; Stein, S. E.; Tabb, D. L. QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Analytical Chemistry 2014, 86, 2497–2509.

15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 16 of 22

Page 17 of 22

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 18 of 22

Page 19 of 22

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 20 of 22

Page 21 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1: List of explicitly supported instruments, along with their accession number in the PSI-MS controlled vocabulary. 9 Instrument name LTQ LTQ Orbitrap LTQ Orbitrap XL LTQ Velos TSQ Vantage LTQ Orbitrap Velos Q-Exactive Orbitrap Fusion

PSI-MS CV accession number MS:1000447 MS:1000449 MS:1000556 MS:1000855 MS:1001510 MS:1001742 MS:1001911 MS:1002416

21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 22 of 22