Quantitative Evaluation of Algorithms for Isotopic Envelope Extraction

Sep 28, 2018 - Although several algorithms for extracting isotopic envelopes exist, their performance has not yet been quantified, in part due to the ...
0 downloads 0 Views 796KB Size
Subscriber access provided by NAGOYA UNIV

Article

Quantitative Evaluation of Algorithms for Isotopic Envelope Extraction via Extracted Ion Chromatogram Clustering Mathew Gutierrez, Kyle Handy, and Rob Smith J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00451 • Publication Date (Web): 28 Sep 2018 Downloaded from http://pubs.acs.org on October 1, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Quantitative Evaluation of Algorithms for Isotopic Envelope Extraction via Extracted Ion Chromatogram Clustering Mathew Gutierrez1, Kyle Handy1, Rob Smith1* 1

Department of Computer Science University of Montana Missoula, MT, USA [email protected]

Abstract LC-MS precursor (MS1) data is used increasingly often in conjunction with MS/MS data for quantification, validation, and other computational mass spectrometry tasks. The efficacy of MS1 data on downstream tasks is dependent on the coverage and accuracy of the MS1 isotopic envelope extraction algorithms that delineate them from the dense backgrounds common in complex samples. Although several algorithms for extracted ion chromatogram (XIC) clustering exist, their performance has not yet been quantified, in part due to the difficulty of obtaining, isolating, and running some algorithms, and in part due to the lack of quantitative MS1 ground truth. Using a newly available manually annotated ground truth data set, we measure the performance of several popular XIC clustering algorithms in time, coverage, and accuracy of resulting isotopic envelopes. We intend this work to provide a benchmark against which future algorithms can be scored.

Key Words: Mass Spectrometry, Quantitative, analysis, XICs, clustering, Envelopes, Features, parameters, performance, machine learning.

Introduction Mass spectrometry (MS) is a common chemical analysis technique which scientists and researchers use to elucidate chemical composition through separating, ionizing, and detecting the constituent parts of a sample. MS has application in any field where knowing the composition of a sample is important. Instrumental analysis generates data consisting of points in 3d space, each with a retention 1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 17

time (RT), mass to charge (m/z) and abundance. Points appear in 3d clusters called isotopic envelopes (see Figure 1, right), whose location in m/z and RT correspond to the mass and column elution time (respectively) of a given compound (or group of compounds in the event of isomers--two or more different molecules with nearly or exactly the same masses). Isotopic envelopes are in turn composed of extracted ion chromatograms (XICs) also known as traces. (see Figure 1, left), which occur as a result of mass differences between isotopic variants of the compound’s atomic formula. Typical analysis work flows consist of intermittent (Data Dependent Acquisition), or constant (Data Independent Acquisition) MS/MS events4, which generate fragmentation spectra (see Figure 2) that can be used to identify the compounds manifesting at the precursor m/z and RT of the event. MS1 feature detection can provide useful information that can increase the sensitivity, coverage, and accuracy of MS/MS analysis. Applications include using MS1 feature abundance for quantification, MS1 relative ratios for validating MS/MS

Figure 1: Traces (left) are signals created by isotopes of a molecule that collectively form what is known as an isotopic envelope. (right) identifications, enumerating non-isomer co-fragmenting compounds, and assessing charge state.

There are two principle approaches for processing raw mass spectrometry data into isotopic envelopes: (1) isotope pattern searching and (2) two-stage segmentation.

2 ACS Paragon Plus Environment

Page 3 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Isotope pattern searching attempts to filter MS1 data using expected signal shapes. Most often, it is the wave-like characteristic of the isotopic envelope that is the target of the search. Isotope pattern searches first identify regions of interest (see Figure 3). This focus on a subset of the entire run is necessary

3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 17

Figure 2: In typical MS analysis, MS/MS fragmentation spectra (left) are used to identify the type of molecule manifested in each MS1 envelope of interest.

Figure 3: In the left figure above, Isotope pattern searching identifies regions of interest (red X’s, left) and then apply some sort of pattern matching technique to delineate the bounds of each isotopic envelope. These methods tend to be parameter-sensitive and biased towards high abundance envelopes. due to the computational complexity of the extraction process, which in some forms requires iterative fitting of a 3d shape to the raw data. MS/MS based isotopic envelope pattern detection is a special variant 4 ACS Paragon Plus Environment

Page 5 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

of this approach that uses MS/MS fragmentation spectra identifications to specify the isotopic pattern of the precursor envelope. Though this technique is faster and more specific than general pattern matching, it is constrained by the completeness of the MS/MS annotated features.

Two-stage segmentation takes an incremental approach to MS1 data annotation. In a two-stage segmentation approach, points are first clustered into XICs. XICs are then clustered into isotopic envelopes (see Figure 1). There are many existing software packages that attempt to solve the problem of high-coverage automated MS segmentation, including OpenMS Feature Finder Centroided (FFC),7 SuperHIRN,8 MaxQuant9, msInspect10, moFF11 and flashLFQ12. OpenMS FFC, SuperHIRN, moFF, and flashLFQ employ isotope pattern searching while MaxQuant and msInspect employ two-stage segmentation (XIC delineation and XIC clustering). Several published methods were excluded from the study. SuperHIRN was excluded from the study because it is no longer available. moFF was excluded because it uses experimentally derived MS/MS annotated features to quantify the raw output signals, making it impossible to evaluate outside of the effects of the quality of MS/MS annotations. flashLFQ uses an experimental in-silico digest to do essentially the same. Proprietary options were also excluded from the study, as access to the source code is a prerequisite for the evaluation of the algorithms. Proprietary options may be included in further studies pending access to source code. Additionally, a focus of the study is an investigation to next generation clustering techniques utilizing machine learning as a potential replacement for user parameter selection.

Methods This section describes the algorithms that were evaluated, the metrics used to measure performance, the ground truth data set against which methods were scored, and details of the evaluation such as how parameter settings were handled.

Algorithms

5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 17

OpenMS FFC (version 2.0.1) employs isotope pattern searching in order to recover isotopic envelopes. This process begins with a specially tailored peak picking wavelet function that is specifically designed to correlate with isotopic patterns contained in MS data,13 the basic building block of which is given by:

where Θ Denotes the Heaveside function, mn is the mass of a neutron, μ represents the charge state, and λ is a low rank polynomial describing mean mass signal. This function is employed in an effort to reduce the noise and isolated signals that are the result of electronic and chemical aberrance in the mass spectrometry data, as the the wave function will only fit to isotopic envelopes that are comprised of multiple signals. Once the isotope wavelet function is completed the file is considered “seeded,” where peaks of interest have been identified by convolving the signal with the wavelet function. These seeds are then passed to a heuristic extension routine, where the algorithm identifies pronounced regions around the seeds. A model of isotope profile and retention time is then iteratively fit over these data points, excluding low probability points. The abundance of the envelope is then given by the sum of all data points in its included regions.13 msInspect (1.0) accepts standard mzXML data input from any instrument with high enough resolution to detect isotopic XICs. msInspect is written in Java with alignment and normalization routines written in the statistical programming language R.14 Co-eluting isotopes are grouped with the following process. First, the maximum abundance of an isotope with mass m and charge z is denoted by I(r), r = m/z, with d-1 isotopes having higher m/z values and within tolerance of (r+x)xz, where x=1, …, (d-1) are identified as potential XICs. By default, d=6; if fewer than six co-eluting isotopes are identified, the remaining are assigned the value of the background abundance. The observed isotopic distribution (OID) of the candidate peptide is formed by the following:13

6 ACS Paragon Plus Environment

Page 7 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MaxQuant’s (version 1.5.8.0) approach to determining isotopic envelopes is to model each XIC as a node in an undirected graph. Edges are inserted between two peaks when the difference in mass equals the difference in isotope mass of an average amino acid, and when retention time overlap is high enough to be significant. Edges are added with error margins that account for lack of knowledge of atomic composition. The resulting undirected XIC graph is referred to as “pre-isotopic” because the graphs are not consistent with regard to charge state. MaxQuant’s algorithm then iteratively determines the longest consistent sub-graphs and identifies those as the most likely peptide features.10 The result of choosing the longest most consistent sub-graphs is a much reduced data set of most likely features. Parameter Modulation Parameter configuration is an important variable to measure in any quantitative evaluation. The affect of user modulation of parameters is measured through iterating through feasible parameter settings. OpenMS FFC and MaxQuant both have parameters that can be modified by the end user, while msInspect has only the default settings. Programmatic parameter modulation was implemented for OpenMS FFC and for MaxQuant. Each parameter from both algorithms were provided ranges from minimum viable value to double the default value. For MaxQuant, all possible parameter permutations were tested. Eighty parameter permutations are tested for OpenMS FFC using a random selection in viable range to sample performance of the intractable parameter permutations. MsInspect has no user parameters, so was excluded from this portion of the study.

Metrics One measure of the goodness of an XIC clustering algorithm is the percent of correctly clustered XICs:

Determining correctly clustered XICs for msInspect and MaxQuant is a relatively trivial process, by which each XIC “votes” on which envelope it belongs to. If the majority of the XICs in the resultant envelope of an algorithm are assigned to the correct ground truth envelope, the XICs are considered correctly clustered.

7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 17

Figure 4: Resultant envelopes (left) shown incorrectly clustered (above) by failing to coincide by majority with the ground truth envelopes (right). Correctly clustered Traces (left, below) shown in majority agreement with ground truth cluster. As OpenMS FFC employs a single step annotation process, the output file is a featureXML file containing both resultant envelopes and resultant XICs. The XICs must then be matched to the ground truth XICs as determined by the following match metric.

The match metric favors XICs that show high overlap in the RT dimension and nearness in the m/z dimension. If there is contention over an XIC the contest is resolved in favor of the XIC with higher match quality. The conceding XIC is considered incorrectly clustered. Once the resultant XICs have been matched to ground truth XICs, the evaluation can continue as above by comparing which XICs are correctly clustered. Performance of each experiment is recorded, and if appropriate (MaxQuant, OpenMS, 8 ACS Paragon Plus Environment

Page 9 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

FFC) the user parameters are modified programmatically, and the experiment run again. Time to process is measured for each algorithm. Each analysis was run on a desktop operating with a Linux 4.4 Kernel, 32 GB of RAM and without any other processes running.

Ground Truth Data One of the great barriers to quantitation of automated annotation techniques is the lack of a ground truth data set against which to judge performance. To this end, a complete manually annotated UPS2 dataset has been created for the purpose of providing a benchmark by which to score the performance of existing XIC clustering solutions. Manual annotation presents one technique through which to secure ground truth data. However, the process of manual annotation is prohibitively time intensive. Using novel software designed to make this process more tractable (Annotator, Prime Labs, Inc.) over 1,000 hours were used in manually annotating over 11,000 isotopic envelopes in a centroided run of the Sigma peptide standard UPS2. Though the annotation is certainly not perfect, it will be considered ground truth for the purposes of the experiment, as higher performance annotation is not currently feasible through other means. The ground truth dataset has points grouped into XICs and XICs clustered into envelopes. This two part nature of the dataset lends itself well to the testing of just the XIC clustering features of the chosen annotation solutions. The entire XIC set is provided to the clustering packages of msInspect and MaxQuant. However, as FFC employs a single step approach, the unlabeled dataset must be provided. The output of the algorithms contain the resultant set of envelopes and then are compared to the ground truth envelopes.

Results The output of the clustering algorithms was collected by running command line versions of each algorithm and recording the results. MaxQuant and OpenMS FFC were run iteratively to observe the effect of modulation of user parameters. MaxQuant’s XIC clustering module has two integer and three continuous parameters. FFC has twelve integer and ten continuous. Integer values were tested on a range from zero to twice the default value. Continuous parameters were tested on a range of 5 values. Each range spanning from zero to twice the default value. The resulting set size for MaxQuant is approximately 3500, all of which were tested. The resulting set for FFC is intractable, and therefor 80 were selected 9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 17

randomly from parameter space surrounding the default settings as a representative set. Once all the

Figure 5: Illustration of performance of each clustering algorithm for all intensities. This figure illustrates the challenge associated with choosing appropriate parameters, as many options produce suboptimal results.

10 ACS Paragon Plus Environment

Page 11 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 6: Illustration of optimal performance of each clustering algorithm for all intensities using optimal settings recovered from the set of 3500 parameter configurations for MaxQuant, and 80 parameter configurations for FFC. MsInspect is parameter free. Default parameters are denoted by a red line. Average parameter performance denoted by green line. clustering algorithms were completed, the mean performance of the clustering algorithms was collected by averaging the clustering accuracy across all runs. The optimal performance was noted, and the performance at default settings noted (see Figures 5 and 6). The effect of user parameter modulation on clustering accuracy was analyzed, and binned with other permutations that performed similarly. Default parameters and optimal parameter configuration performance were noted. Effect of parameter modulation can be seen in Figure 7. Each algorithm required a different amount of time to conduct the experiments. MsInspect performed the same analysis faster than MaxQuant, while both vastly outperformed OpenMS FFC in regard to time (see Figure 8).

11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7: The effect of suboptimal parameter performance on envelope clustering accuracy. The two algorithms with user defined parameters are shown as solid continuous lines (blue and Red), where each datum is the count of user chosen parameter configurations at a given clustering accuracy across binned envelope intensities. MsInspect does not have user selectable parameters, and as such can only be displayed at one point (pink dashed line).

12 ACS Paragon Plus Environment

Page 12 of 17

Page 13 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 8: Time across all experiments (note log scale) from left to right OpenMS FFC, MaxQuant, msInspect. Note OpenMS FFC only output meaningful data in the five experiments that took substantially more time.

Discussion Key distinctions between segmentation and isotope pattern searching include that segmentation considers all data in the set and mitigates the computational complexity of pattern matching by dividing the tasks of XIC delineation and XIC clustering into isotopic envelopes. Two-stage segmentation is less reliant on user input and leverages a larger portion of available information in the mass spectrometry output than isotope pattern searching. Not surprisingly, OpenMS FFC—an isotope pattern matching algorithm—was outperformed by MaxQuant and msInspect, both of which are two-stage segmentation approaches. With the increasing demand for high sensitivity and high fidelity mass spectrometry analysis techniques, it is important to have an understanding of the capacity of existing solutions. Of the algorithms tested, we demonstrated a clear trend that high abundance XICs are more often correctly assigned the true isotopic envelope than are low abundance XICs. High abundance XIC’s also are more sensitive to deviation from default settings than are low abundance XICs. The result that high abundance signals are more often correctly clustered is not surprising. However, the knowledge that modulation of user controlled parameters has a disproportionately strong effect on high abundance XICs is unexpected and informative, and the mechanics by which this effect occurs is worth exploring further. Overall, the clustering algorithms tested featured an average envelope clustering accuracy of approximately 50%, and

13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 17

an optimal accuracy of 70%. MaxQuant featured the highest accuracy, but only under specific parameter configurations that are difficult to attain on real data, where the ground truth clustering is unknown. The conclusions that can be drawn from parameter modulation is that optimal settings for envelope clustering accuracy are difficult to find, and sub-optimal settings can yield significantly poorer results. Typically, at least some users seem to trust that default settings are a good approximation for parameter searches which are impossible without ground truth. However, for this dataset, the average parameter configuration for MaxQuant outperformed the default configuration. The opposite is true with OpenMS FFC, where the vast majority of parameter configurations resulted in invalid experimental results. These experiments show us that for this dataset at least, the default parameters of MaxQuant are poorly assigned, and modulation of parameters to optimize the experiment require knowledge of the dataset and an in depth understanding of the algorithms in order to match need with ability of the program. It is clear that for one step pattern matching, the lack of a reduction step in the data punishes the performance of an algorithm in regard to time, with both MaxQuant and msInspect performing more than an order of magnitude faster. It is clear that most user parameters result in sub-optimal performance for the evaluated algorithms on the given data set. This study shows that existing algorithms cannot be tuned without: 1) knowing the answer to the problem at hand a-priori and 2) spending a great deal of time evaluating the many equally likely parameter permutations. Even after optimization, performance varies wildly within a run. This suggests that future algorithms ought to not only be flexible enough to broadly apply to the variety of signals across a run, but also sophisticated enough to tune their own parameters without ground truth. Constructing a complex algorithm with many tunable parameters in an attempt to increase the flexibility of the algorithm across multiple data sets, or even among features in the same file seems to be a popular approach. While it enables an algorithm to approach optimal envelope clustering accuracy with sufficient tuning, it also results in a brittle algorithm that is highly sensitive to improper parameter modulation and comes with a high expectation of user mastery.

Conclusions Our effort to quantify the performance of existing XIC clustering solutions through the creation of and evaluation against ground truth data yielded significant insights into the current state of the art of mass spectrometry analysis. It is clear that there is room for improvement in automated annotation of

14 ACS Paragon Plus Environment

Page 15 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MS1 data, particularly with regard to low abundance XICs. Next generation clustering algorithms will need to have increased sensitivity to lower abundance XICs. Further, the generally negative effect of user adjusted parameters on performance suggest that an ideal algorithm would function independent of user modifiable parameters. Because expecting the ground truth necessary to optimize parameters for each experiment is highly impractical, it is strongly suggested that any new algorithm for envelope clustering would instead utilize either a parameterless algorithm or an automated (e.g. machine learning) approach to optimizing envelope clustering accuracy on a data set by data set basis.

Funding sources: This research was supported by National Science Foundation Grant 366208 to R.S. Supporting Information: ffc_parameter_search.py (documenting the parameter permutations for OpenMS FFC), maxquant_parameter_search.cs (documenting the parameter permutations for MaxQuant), and FeatureXMLEvaluator.java (metrics code for evaluation).

References: 1. Mann, M. Functional and quantitative proteomics using SILAC, Nature reviews Molecular cell biology 2006, 7, 952–958. 2. Wiese, S.; Reidegeld, K. A.; Meyer, H. E.; Warscheid, Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research, B. Proteomics 2007, 7, 340–350.

3. Horvatovich, P.; Mischoff, R., Current technological challenges in biomarker discovery and validation, European Journal of Mass Spectrometry 2009, 16, 101. 4. Nesvizhskii, A. I.; Vitek, O.; Aebersold, Analysis and validation of proteomic data generated by tandem mass spectrometry, R. Nature methods 2007, 4, 787–797.

15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 17

5. Michalski, A.; Cox, J.; Mann, M. More than 100,000 Detectable Peptide Species Elute in Single Shotgun Proteomics Runs but the Majority is Inaccessible to Data-Dependent LC-MS/MS, Journal of Proteome Research 2011, 10, 1785–1793.

6. Röst, H. L.; Schmitt, U.; Aebersold, R.; Malmström, L. Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry, PloS one 2015, 10, e0125108.

7. Bertsch, A.; Gröpl, C.; Reinert, K.; Kohlbacher, O. Data Mining in Proteomics: From Standards to Applications; Springer Press: Germany, Berlin, 2011; 353–367. 8. Mueller, L. N.; Rinner, O.; Schmidt, A.; Letarte, S.; Bodenmiller, B.; Brusniak, M.-Y. Vitek, O.; Aebersold, R.; Müller, M., SuperHirn – a novel tool for high resolution LC-MS-based peptide/protein profiling, Proteomics 2007, 7, 3470–3480.

9. Cox, J.; Mann, M., MaxQuant enables high peptide identification rates, individualized p.p.b.range mass accuracies and proteome-wide protein quantification., Nature Biotechnology 2008, 26, 1367–1372.

10. Bellew M.; Coram M.; Fitzgibbon M.; Igra M.; Randolph T.; Wang P.; May D.; Eng J.; Fang R.; Lin C., A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS, Bioinformatics. 2006 22(15): 1902–1909. 11. Andrea A.; Ludger G.; Kenneth V.; Niels H.; An S.; Lieven C.; Lennart M., moFF: a robust and automated approach to extract peptide ion intensities., Nature Methods 2016, 13, 964.

16 ACS Paragon Plus Environment

Page 17 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

12. Millikin R.; Solntsev S.; Shortreed M.; Smith L.; Ultrafast Peptide Label-Free Quantification with FlashLFQ, Journal of Proteome Research 2018 17 (1), 386-391

13. R. Hussong, A. Hildebr, The isotope wavelet: a signal theoretic framework for analyzing mass spectrometry data, in: Eleventh Annual International Conference on Research in Computational Molecular Biology, 2008

14. Sturm M.; Kohlbacher O.,TOPPView: An Open_Source Viewer for Mass Spectrometry Data Journal of Proteome Research 2009 8 (7), 3760-3763

FOR TOC ONLY

17 ACS Paragon Plus Environment