rawDiag - an R package supporting rational LC-MS method

Jul 6, 2018 - rawDiag is implemented as R package and can be executed on the command line, or through a graphical user interface (GUI) for less ...
0 downloads 0 Views 2MB Size
Subscriber access provided by Technical University of Munich University Library

Technical Note

rawDiag - an R package supporting rational LCMS method optimization for bottom-up proteomics Christian Trachsel, Christian Panse, Tobias Kockmann, Witold E. Wolski, Jonas Grossmann, and Ralph Schlapbach J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00173 • Publication Date (Web): 06 Jul 2018 Downloaded from http://pubs.acs.org on July 7, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

rawDiag – an R package supporting rational LC-MS method optimization for bottom-up proteomics Christian Trachsel1 , Christian Panse1 ,∗ Tobias Kockmann1 , Witold E. Wolski1 , Jonas Grossmann1 , and Ralph Schlapbach1 ∗ correspondence, 1. Functional Genomics Center Zurich Swiss Federal Institute of Technology Zurich | University of Zurich Winterthurerstr. 190, CH-8057 Zurich, SWITZERLAND E-mail: [email protected] Phone: +41 44 63 53912

Abstract Optimizing methods for liquid chromatography coupled to mass spectrometry (LCMS) is a non-trivial task. Here we present rawDiag, a software tool supporting rational method optimization by providing MS operator-tailored diagnostic plots of scan level metadata. rawDiag is implemented as an R package and can be executed on the R command line, or through a graphical user interface (GUI) for less experienced users. The code runs platform independent and can process a hundred raw files in less than three minutes on current consumer hardware, as we show in our benchmark. As a demonstration of the functionality of our package, we included a real-world example taken from our daily core facility business.

1

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Keywords mass-spectrometry, R-package, multi-platform, method-optimization, reproducible research, quality-control, visualization

1

Introduction

Over the last decade, liquid chromatography coupled with mass spectrometry (LC-MS) has evolved into the method of choice in the field of proteomics. 1–3 During a typical bottom up LC-MS measurement, a complex mixture of analytes is separated by a liquid chromatography system coupled to a mass spectrometer (MS) through an ion source interface. This interface converts the analytes which elute from the chromatography system over time into a beam of ions. The MS records from this ion beam a series of mass spectra containing detailed information on the analyzed sample. 4,5 The resulting raw data consists of the mass spectra and their metadata, typically recorded in a vendor specific binary format. During a measurement, the mass spectrometer applies internal heuristics which enables the instrument to adapt to sample properties, e.g., sample complexity, amount of ions in near real time. Still, method parameters controlling these heuristics need to be set before the measurement. Optimal measurement results require a careful balancing of instrument parameters, but their complex interactions with each other make LC-MS method optimization a challenging task. Here we present rawDiag, a platform independent software tool implemented in R that supports LC-MS operators during the process of empirical method optimization. Our work builds on the ideas of the discontinued software ‘rawMeat‘ (vastScientific). Our application is currently tailored towards spectral data acquired on Thermo Fisher Scientific instruments (raw format), with a particular focus on Orbitrap mass analyzers (Exactive or Fusion instruments). These instruments are heavily used in the field of bottom-up proteomics to analyze complex peptide mixtures derived from enzymatic digests of proteomes.

2

ACS Paragon Plus Environment

Page 2 of 19

Page 3 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

rawDiag is meant to run after MS acquisition, optimally as interactive R shiny application and produces a series of diagnostic plots visualizing the impact of method parameter choices on the acquired data across injections. If static reports are required, pdf files can be generated using R markdown. In this manuscript, we present the architecture and implementation of our tool. We provide example plots, show how plots can be redesigned to meet different requirements and discuss the application based on a typical use case.

Input

Import Layer

1

2

Processing Layer

3

Output Visualization

4

7

5 Rmarkdown

ms Convert

data.frame

mzR filename

mass spectrometry raw data

New Thermo Scientific RawFileReader library read.raw(…)

PDF

scan number

basepeak

PlotCyleLoad(…) PlotCyleTime(…)

as.rawDiag(…)

PlotLockMassCorrection(…)

standard graphic device

exploration mining debugging analytics …

… shiny application

other vendor libraries

8

R Console R> library(rawDiag) R> MSM MSM.summary gp library(rawDiag) R> data(WU163763) R> PlotMassDistribution(WU163763) R> PlotMassDistribution(WU163763, method = 'overlay') + +

theme(legend.position = 'none')

R> PlotMassDistribution(WU163763, method = 'violin') + +

theme(legend.position = 'none')

6

ACS Paragon Plus Environment

Page 6 of 19

Page 7 of 19

Interactive visualization is achieved by embedding the plot functions into a R shiny application. Static versions of the plots can be easily generated by the provided R markdown file that allows the generation of pdf reports.

A 04_S174020

05_S174020

07_S174020

08_S174020

09_S174020

11_S174020

12_S174020

13_S174020

16_S174020

Counts

7500 5000 2500 0 7500 5000 2500 0 7500 5000 2500 0

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

B 7.5e−04 5.0e−04 2.5e−04 0.0e+00 2500

5000

7500

10000

Neutral mass [Da]

Neutral mass [Da]

Density

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

C 8000 6000 4000 2000 2

3

4

5

6

Charge State

Neutral Mass [Da]

Figure 2: Concurrent metadata visualization applying PlotMassDistribution to nine raw files acquired in DDA mode (sample was 1µg HeLa digest). The data are available on MassIVE ftp://massive.ucsd.edu/MSV000082389/ or as data set of the rawDiagpackage as WU163763. A) method trellis; the mass distribution is plotted as a histogram and the color code represents the charge states the precursors B) method overlay; the mass distribution is graphed as a density function and the color code represents the different raw files. C) method violin; the mass distribution is displayed as a violin plot and the colors indicate the different raw files.

2.5

MS data acquisition and data base search

LC-MS data was recorded on a Q Exactive HF-X (Thermo Fisher Scientific) operated in line with an Acquity M-Class (Waters) UPLC. In short, peptides were loaded onto a nanoEase M/Z Symmetry C18 100˚ A, 5 µm, 180 µm x 20 mm trap column (Waters, part # 186008821) and separated running a peace-wise linear gradient from 5% to 24% B in 50 min and 24% 7

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2: The rawDiag cheatsheet lists the functions of the package using a subset of the provided ‘WU163763‘ dataset. Each thumbnail gives an impression of the plot function’s result. The column names ‘trellis,’ ‘overlay’ and ‘violin’ were given as method attribute. Function name

Trellis

Overlay

Violin -

PlotChargeState

Description Shown is the charge state distributions as bar charts with absolute counts.

PlotCycleLoad

Shown is the number of MS2 scans MS1 as a function of retention time (RT) (scatter plots) or its density (violin).

PlotCycleTime

Graphs the time difference between two consecutive MS1 scans (cycle time) with respect to RT (scatter plots) or its density (violin). A smooth curve graphs the trend. The 95th percentile is indicated by a red dashed line.

PlotInjectionTime

Displays the injection time as a function of RT. A smooth curve graphs the trend. The maximum is indicated by a red dashed line.

PlotLockMassCorrection

Graphs the lock mass deviations along RT (note: this example data were acquired without lock mass correction).

PlotMassDistribution

Shown is the mass distribution with respect to charge state (trellis) or filenames (overlay, violin). -

PlotMassHeatmap

Draws a hexagon binned heatmap of the charge deconvoluted mass along RT.

Scatter plot of m/z versus RT on MS1 level (no density; with overplotting.). Violin display the m/z density in each file.

PlotMzDistribution

-

PlotPrecursorHeatmap

According to PlotMassHeatmap but displaying convoluted data (acutal m/z values).

PlotScanFrequency

Graphs scan frequency versus RT or the scan frequency density for violin.

PlotScanTime

Plots scan time as function of RT for each MSn level. A smooth curve displays the trend. A solid red line indicates the transient time of the Orbitrap analyzer.

PlotTicBasepeak

Displays the total ion chromatogram (TIC) and the base peak chromatogram.

8

ACS Paragon Plus Environment

Page 8 of 19

Page 9 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

to 36% B in 10 min over the nanoEase M/Z C18 T3 Col 100˚ A, 1.8 µm, 75 µm x 250 mm analytical column (Waters, part# 186008818) at a flow rate of 300 nl/min (buffer A: Water incl. 0.1% formic acid; buffer B: acetonitrile incl. 0.1% formic acid). Eluting peptides were ionized applying the principle of electro spray ionization on a Digital PicoView 550 (New Objective) nano source equipped with silica emitters (New Objective, part # FS36020-10-N-20-C12 DOM). Data dependent analysis (DDA) was conducted by recording MS1 spectra at 60k resolution over the scan range of 350 to 1400 m/z. MS2 scans were acquired at 7500 resolution (AGC target: 1e5, maxIT: 11 ms) for the most abundant (topN: 36, 48 or 72) precursor signals using an isolation window of 1.3 Da and a NCE of 28 for peptide fragmentation. Dynamic exclusion was set to 10 s. Ions having a charge below 2 and above 6, as well as isotopes were excluded from further analysis. Protein and peptide identification results were generated by Proteome Discoverer version 2.2 using the following search settings: Data was searched against the human Uniprot reference proteome using SEQUEST HT (tryptic enzyme specificity including 2 missed cleavages, precursor mass tolerance: 20 ppm, fragment mass tolerance: 0.5 Da, Carbamidomethyl as static modification, Dynamic Modifications: Methionine oxidation, N-terminal protein acetylation). Resulting identifications were filtered to 1% peptide FDR using Percolator.

3 3.1

Results and discussion Data loading and performance

Our application rawDiag acts as an interface to file reader libraries from MS vendors. These libraries can access the scan data, as well as the scan metadata stored in the proprietary file formats. In its current configuration, rawDiag can read data from Thermo Fischer Scientific raw files via a C# executable. This executable is extracting the information stored in the raw file via the platform-independent RawFileReader .Net assembly which makes the application fast when reading directly from the raw file. The data integrity is checked by 9

ACS Paragon Plus Environment

Journal of Proteome Research

the is.rawDiag function and coerced by the as.rawDiag function into the proper format for the plot functions, if required. A

B method

mzR

RawFileReader

Apple

method

Linux

mzR

RawFileReader

Apple

Linux

64000 5400 32000

processing frequency [Hz]

3600

overall runtime [sec]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 19

1800

900 600 480

240 180

16000

8000

4000

2000

1000

120 90

500 20

40

60

20

40

60

20

number of used processes

40

60

20

40

60

number of used processes

Figure 3: Import layer benchmark – Panel A shows the overall logarithmic scaled runtime required to process 128 raw files. The magenta curve displays the runtime in dependancy of the number of used CPUs when reading directly from the raw files and the blue curve indicates the performance when reading from mzML files. Panel B depicts the corresponding IO throughput in scans extracted per second. Again the CPU dependent throughput for processing raw files is shown in magenta and the throughput for reading mzML files is given in blue. The plots illustrate that both systems, server and laptop, can analyze 95GB of instrument data within less than three minutes when reading directly from the raw files. Please note, we have skipped the benchmark on the Apple platform due to impracticability long runtime and thereof resulting heating issues. For fast method optimization, processing time between the end of the MS acquisition and the resulting visualizations should be short. To test if our software is fast enough for this task, we performed a benchmark analysis. In panel A of Figure 3 the overall runtime dependency of the number of used CPUs is depicted. We derived the processing frequency as spectra loaded per second as shown in Figure 3 panel B. The processing speed on both tested systems is fast when directly reading from the raw file. The performance drops by at least an order of magnitude when using the mzML files as input. We have skipped the benchmark on the Apple platform due to impracticability long runtime and thereof resulting heating issues. Nevertheless, for processing a small number of files (the typical case during 10

ACS Paragon Plus Environment

Page 11 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

method optimization), the processing speed of both systems, reading from both file formats is fast enough to provide the diagnostic visualizations in short time. Processing of a single file containing ≈ 800 000 spectra finishes in less than 50 sec.

3.2

Visualizations

As soon as the data is extracted and loaded into the R session, the different plot functions (see Table 2) can be called upon the data for the visualizations of LC-MS run characteristics. The generated diagnostic plots help the MS operator to draw conclusions about the chosen MS method parameters on a rational base. Base on these conclusions, a hypothesis for the method optimizations can be formulated and data from the optimized methods can be visualized in order to check if the method adaptation led in the right direction. A use case example of this process will be discussed in the following paragraph. To be flexible towards different situations, where a single visualization technique might lose its usability. Most plot functions are implemented in three different versions, to circumvent overplotting issues or to help detect trends in multiple files. Table 2 shows the list of the currently implemented plot functions and Figure 2 gives an example of the flexibility of choosing different visualization styles. In the interactive mode, the application runs as an R shiny server and generates the same plots as in the command line usage but does not require profound R knowledge from the user. All implemented plots can be inspected in different tabs in the shiny instance, and radio buttons allow to switch between the different plot styles. The size of the plots can easily be adjusted to the screen size, and an additional table is generated, summarizing all the loaded data for an overview at a single glance.

3.3

Use case example

Starting from an initial method template described by Kelstrup et al., 13 we analyzed 1 µg of a commercial tryptic HeLa digest on a Q-Exactive HF-X instrument using classical shotgun 11

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

heuristics. Subsequently, the resulting raw data was mined using rawDiag. Inspection of the cycle time plot (not shown) suggested that the distribution of analysis time is suboptimal between the different scan levels (precursor and fragment ions) under the applied chromatographic conditions. To test this hypothesis, we ramped the parameter controlling the number of dependent scans per instrument cycle (TopN), in two steps and tested the two resulting methods by analyzing the same material in technical triplicates. Visualization applying rawDiag confirmed that all three methods exploit the maximum number of dependent scans (18, 36 and 72) during the separation phase of the gradient (see Figure 4B). Concurrently, the MS2 scan speed increased from ≈30 to ≈36 and ≈38, respectively (see Figure 4A). In the modified methods, the instrument is spending 5 and 10 min more time on MS2 scans during the main peptide elution phase, comparing the methods to the initial “Top18” method (see Figure 4E). The optimized methods not only showed better run characteristics but ultimately also resulted in more peptide and protein identifications, as shown in Figure 4C and 4D. For database searches, Proteome Discoverer with search settings described in the experimental procedures section.

12

ACS Paragon Plus Environment

Page 12 of 19

Page 13 of 19

A

B Number of Ms2 per cycle [counts]

40 35

Frequency [Hz]

30 25 20 15 10 5

70 60 50 40 30 20 10

0

0 0

10

20

30

40

50

60

70

0

10

20

30

Time [min]

D ● ● ●

Time [min]

3500

Counts

4000

# MS2 Scans

100000

# Peptides [div by 10] ● ● ●

● ● ●

● ●

18

TopN

72

# PSM

80000 60000

36

60

70

E ● ● ●

120000

# Proteins

50

● ●

● ● ● ●

● ● ●

40

Time [min]

C 4500

Counts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

● ●

● ●

36

72

● ● ●

18

TopN

50 40 30 20 10 0 18 18 18 36 36 36 72 72 72

TopN

Figure 4: A) Moving average of the scan speed of triplicate measurements of “Top18” (blue), “Top36” (green) and “Top72” (orange). B) Number of MS2 scans for each scan cycle for “Top18” (blue), “Top36” (green) and “Top72” (orange). C) Number of Proteins (orange) and peptides (blue) for the different TopN settings (note: number of peptides is divided by 10 in this plot due to scaling reasons). D) Number of PSM (blue) and MS2 scans (orange) for the different TopN settings. E) Time spent on MS1 (blue) and MS2 (orange) for the different TopN settings. Time range for calculation is the elution phase of the peptides between 15-70 min.

Interestingly, the number confident of peptide-spectrum matches (PSM) is reaching a plateau phase in the “Top72” method compared with the “Top36” (“Top36” has a slightly higher number of significant PSM than “Top72”). One potential explanation for this finding can be the quality of the recorded spectra. Since the “Top72” method is sampling the precursors in our sample to such a deep degree, the injection time to prepare the ion packages for many low abundant species might be too low. This low amount of ions for fragmentation can influence the quality of the resulting fragment spectrum and hence lead to a non assigned

13

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

spectrum. Since we still acquire much more MS2 spectra compared to the “Top36” method, this is not yet seen in the number of significant peptide and protein hits (more spectra in general lead to more peptide and protein assignments) but indicates a suboptimal usage of the instrument. This finding could lead directly to a further refinement circle of the method. Hence reducing the number of MS2 scans and increasing the injection time parameter at the same time, could further improve the method. Care should be taken to keep the cycle time of the methods constant. By acquiring data with the newly designed methods and comparing them with the “Top72” method using rawDiag this new hypothesis could be tested.

3.4

Related work

To our knowledge the only alternative tool that is able to extract and visualize metadata from raw files on single scan granularity is rawMeat (Vast Scientific). Unfortunately, it was discontinued years ago and built on the outdated MSFileReader libraries from Thermo Fisher Scientific (MS Windows only). This means that it does not fully support the latest generation of qOrbi instruments. Other loosely related tools 16–19 are tailored towards longitudinal data recording and serve the purpose of quality control 20 (monitoring of instrument performance) rather than method optimization.

4

Conclusion

In this manuscript, we present rawDiag an R package to visualize characteristics of LC-MS measurements. Through its diagnostic plots, rawDiag supports scientists during empirical method optimization by providing a rational base for choosing appropriate data acquisition parameters. The software is fast, interactive and easy to operate through an R shiny GUI application, even for users without prior R knowledge. More advanced users can fully customize the appearance of the visualizations by executing code from the R command line. Therefore, the integration of rawDiag into more complex environments, e.g., data analysis

14

ACS Paragon Plus Environment

Page 14 of 19

Page 15 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

pipelines embedded into LIMS systems, is possible. Currently, the software can directly read from the Thermo Fisher Scientific raw file format, but its architecture allows for adaptation towards other MS data formats. An exciting showcase would be the novel Bruker tdf 2.0 format (introduced for the timsTOF Pro), which stores scan data in an SQLite database directly accessible to R. Future extensions of rawDiag would be accessing additional metadata not stored in the raw files such as scan metadata generated by database search engines, e.g., peptide sequences or identification scores. Such additional data would allow the visualization of assignment rates and score distributions across injections. By linking primary and derived metadata, the nature of rawDiag would change from a simple diagnostic tool to an application that allows big data analysis similar to MassIVE (https://massive.ucsd.edu, March 2018) but with the possibility of bypassing the currently necessary conversion to open data formats like mzML.

5

Availability

The package vignette 21 as well as the R package itself, a Dockerfile which builds the entire architecture from scratch is accessible through a git repository under the following URL: https: //github.com/fgcz/rawDiag. An automatic build of the provided docker recipe is visible through https://hub.docker.com/r/cpanse/rawdiag/. The data used in section 3.3 are available on the MassIVE ftp://massive.ucsd.edu/MSV000082389/, through the lab information managment system bfabric 22 https://fgcz-bfabric.uzh.ch Sample ID 174020, or as data set of the package as WU163763. A demo system including all data shown in this manuscript is available through http://fgcz-ms-shiny.uzh.ch:8080/rawDiag-demo/.

6

Funding Sources

The work has been supported by ETH Zurich and University of Zurich.

15

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

7

Conflict of interest

The authors declare no competing financial interest.

Acknowledgement The authors thank Jim Shofstahl for his support regarding the New Thermo Fisher RawFileReader library. We thank Sven Brehmer from Bruker Daltonics for the discussions of timsTOF file format. We thank Lilly van de Venn for the package logo design. Two anonymous reviewers and editor Susan T. Weintraub have made numerous suggestions and recommendations leading to a much improved presentation of the work. We thank Jay Tracy and our colleagues at the Functional Genomics Center Zurich for proofreading our manuscript, the Swiss Federal Institute of Technology Zurich and the University of Zurich for the support of our work.

References (1) Cox, J.; Mann, M. Quantitative, high-resolution proteomics for data-driven systems biology. Annu. Rev. Biochem. 2011, 80, 273–299. (2) Mallick, P.; Kuster, B. Proteomics: a pragmatic perspective. Nat. Biotechnol. 2010, 28, 695–709. (3) Bensimon, A.; Heck, A. J.; Aebersold, R. Mass spectrometry-based proteomics and network biology. Annu. Rev. Biochem. 2012, 81, 379–405. (4) Savaryn, J. P.; Toby, T. K.; Kelleher, N. L. A researcher’s guide to mass spectrometrybased proteomics. Proteomics 2016, 16, 2435–2443. (5) Matthiesen, R.; Bunkenborg, J. In Mass Spectrometry Data Analysis in Proteomics.

16

ACS Paragon Plus Environment

Page 16 of 19

Page 17 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Methods in Molecular Biology (Methods and Protocols); Matthiesen, R., Ed.; Humana Press, Totowa, NJ, 2013. (6) Wickham, H. Tidy Data. Journal of Statistical Software 2014, 59, DOI: 10.18637/ jss.v059.i10. (7) Shofstahl, J. New RawFileReader from Thermo Fisher Scientific. 2018; http:// planetorbitrap.com/rawfilereader#.WvWESK3QPmE, version 4.0.22. (8) Wilkinson, L. The Grammar of Graphics (Statistics and Computing); Springer-Verlag New York, Inc.: Secaucus, NJ, USA, 2005; DOI: 10.1007/0-387-28695-0. (9) R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2017. (10) Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer-Verlag New York, 2009; DOI: 10.1007/978-0-387-98141-3. (11) Fischer, B.; Neumann, S. mzR. 2017; https://doi.org/10.18129/b9.bioc.mzr. (12) Neumann, S.; Gatto, L. msdata. 2017; https://doi.org/10.18129/b9.bioc.msdata. (13) Kelstrup, C. D.; Bekker-Jensen, D. B.; Arrey, T. N.; Hogrebe, A.; Harder, A.; Olsen, J. V. Performance Evaluation of the Q Exactive HF-X for Shotgun Proteomics. J. Proteome Res. 2018, 17, 727–738, DOI: 10.1021/acs.jproteome.7b00602. (14) Cleveland, W. S. Visualizing Data, 1st ed.; Hobart Press, Summit, New Jersey, U.S.A, 1993. (15) Sarkar, D. Lattice: Multivariate Data Visualization with R; Springer: New York, 2008; DOI: 10.1007/978-0-387-75969-2.

17

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(16) Bittremieux, W.; Tabb, D. L.; Impens, F.; Staes, A.; Timmerman, E.; Martens, L.; Laukens, K. Quality control in mass spectrometry-based proteomics. Mass Spectrom Rev 2017, (17) Bittremieux, W.; Willems, H.; Kelchtermans, P.; Martens, L.; Laukens, K.; Valkenborg, D. iMonDB: Mass Spectrometry Quality Control through Instrument Monitoring. J. Proteome Res. 2015, 14, 2360–2366. (18) Chiva, C.; Olivella, R.; Borras, E.; Espadas, G.; Pastor, O.; Sole, A.; Sabido, E. QCloud: A cloud-based quality control system for mass spectrometry-based proteomics laboratories. PLoS ONE 2018, 13, e0189209, DOI: 10.1371/journal.pone.0189209. (19) Bereman, M. S.; Beri, J.; Sharma, V.; Nathe, C.; Eckels, J.; MacLean, B.; MacCoss, M. J. An Automated Pipeline to Monitor System Performance in Liquid Chromatography-Tandem Mass Spectrometry Proteomic Experiments. J. Proteome Res. 2016, 15, 4763–4769. (20) Pichler, P.; Mazanek, M.; Dusberger, F.; Weilnb¨ock, L.; Huber, C. G.; Stingl, C.; Luider, T. M.; Straube, W. L.; K¨ocher, T.; Mechtler, K. SIMPATIQCO: A Server-Based Software Suite Which Facilitates Monitoring the Time Course of LC–MS Performance Metrics on Orbitrap Instruments. Journal of Proteome Research 2012, 11, 5540–5547, DOI: 10.1021/pr300163u. (21) Trachsel, C.; Kockmann, T.; Panse, C. An Introduction to the rawDiag R Package. 2018; https://github.com/fgcz/rawDiag, version 0.0.3. (22) T¨ urker, C.; Akal, F.; Joho, D.; Panse, C.; Barkow-Oesterreicher, S.; Rehrauer, H.; Schlapbach, R. B-Fabric: The Swiss Army Knife for Life Sciences. Proceedings of the 13th International Conference on Extending Database Technology. New York, NY, USA, 2010; pp 717–720, DOI: 10.1145/1739041.1739135.

18

ACS Paragon Plus Environment

Page 18 of 19

Page 19 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Graphical TOC Entry

19

ACS Paragon Plus Environment