An adaptive pipeline to maximize isobaric tagging data in large-scale

criterion to report protein identification. Relative quantitative protein values were exported from. Scaffold Q+S. QtI Pipeline Data Processing. The o...
0 downloads 3 Views 2MB Size
Subscriber access provided by UNIV OF SCIENCES PHILADELPHIA

An adaptive pipeline to maximize isobaric tagging data in large-scale MS-based proteomics John Corthesy, Konstantinos Theofilatos, Seferina Mavroudi, Charlotte Macron, Ornella Cominetti, Mona Remlawi, Francesco Ferraro, Antonio Núñez Galindo, Martin Kussmann, Spiridon Likothanassis, and Loïc Dayon J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00110 • Publication Date (Web): 26 Apr 2018 Downloaded from http://pubs.acs.org on April 26, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

An adaptive pipeline to maximize isobaric tagging data in large-scale MS-based proteomics

John Corthésy1, §, Konstantinos Theofilatos2, §, Seferina Mavroudi2, 3, Charlotte Macron1, Ornella Cominetti1, Mona Remlawi1, Francesco Ferraro1, Antonio Núñez Galindo1, Martin Kussmann1, #, Spiridon Likothanassis2, 4, *, and Loïc Dayon1, *

1

Nestlé Institute of Health Sciences, Lausanne, Switzerland

2

InSybio Ltd., London, United Kingdom

3

Department of Social Work, School of Sciences of Health and Care, Technological Educational Institute of Patras, Patras, Greece

4

Department of Computer Engineering and Informatics, University of Patras, Patras, Greece

#

Current address: Liggins Institute, The University of Auckland, New Zealand

§

contributed equally to this study

*To whom correspondence should be addressed

1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Isobaric tagging is the method of choice in Mass Spectrometry (MS)-based proteomics for comparing several conditions at a time. Despite its multiplexing capabilities, some drawbacks appear, when multiple experiments are merged for comparison in large sample-size studies, due to the presence of missing values, which result from the stochastic nature of the Data-Dependent Acquisition (DDA) mode. Another indirect cause of data incompleteness might derive from the proteomic-typical data processing workflow that first identifies proteins in individual experiments and then only quantifies those identified proteins, leaving a large number of unmatched spectra with quantitative information unexploited. Inspired by untargeted metabolomic and label-free proteomic workflows, we developed a quantification-driven bioinformatic pipeline (Quantify then Identify – QtI) that optimizes the processing of isobaric Tandem Mass Tag (TMT) data from large-scale studies. This pipeline includes innovative features, such as Peak Filtering with a self-adaptive pre-processing pipeline optimization method, Peptide Match Rescue (PMR), and Optimized Post-Translational Modification (OPTM). QtI outperforms a classical benchmark workflow in terms of quantification and identification rates, significantly reducing missing data while preserving unmatched features for quantitative comparison. The number of unexploited tandem mass spectra was reduced by 77% and 62% for two human cerebrospinal fluid (CSF) and plasma datasets, respectively.

Keywords : Algorithms; Bioinformatics; Biomarkers; Discovery; Isobaric tagging; Machine learning; Protein identification; Quantification; Tandem mass spectrometry; Tandem mass tag

2 ACS Paragon Plus Environment

Page 2 of 31

Page 3 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Mass Spectrometry (MS)-based shotgun proteomics can generate large datasets, composed of millions of tandem mass spectra, that are usually first matched by comparison with theoretical fragmentation spectra of defined proteolytic peptide sequences1. In many workflows, protein identification based on spectral matching is followed by quantification of the identified proteins under different biological conditions2. However, this broadly used data processing pipeline, the so-called “classical workflow” in the following, can reveal rather inefficient considering the large amount of unexploited spectral information. For instance, only a fraction of all acquired spectra matched in independent experiments, is further used to provide complete quantitative information. The prevalence of the protein identification step can be predominantly attributed to historical reasons, because of the initial role of MS to identify proteins after gel electrophoresis3. Protein identification may have remained inappropriately the first processing task performed, detrimental to the quantification in many workflows, especially those employing isobaric labeling. The yield of spectral matching in MS-based proteomics is rather limited and several reports have indicated typical success rates between 20 and 50%4, 5, due, for instance, to sequence isoforms and variants not present in databases, presence of unexpected and multiple modifications, or low tandem spectral information (e.g., low spectral representation or low peak intensity depending on the nature of the peptides but also the timing of their fragmentation during the chromatographic elution). These figures also apply when isobaric tagging technologies (e.g., isobaric Tags for Relative and Absolute Quantitation (iTRAQ)6 and Tandem Mass Tag (TMT)7) are used for relative protein quantification. Importantly, many unmatched tandem mass spectra from such labeling experiments harbor information of true peptides possibly assignable to peptide 3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 31

sequences5, and contain quantitative information in form of reporter-ions. Based on these observations, the order of processing steps should be reversed, i.e., quantification may become the first task, followed by spectral matching and peptide/protein identification as the second operation. This order of proceeding is routinely followed in metabolomic studies, either nuclear magnetic resonance (NMR) or MS-based, where unidentified features (also often called “known unknowns”)8 are also preserved, such as biomarker candidates, and their identity potentially resolved afterward with complementary tools and experiments9. Data incompleteness due to missing values is a recognized limitation of current proteomic workflows based on stochastic Data-Dependent Acquisition (DDA)10,

11

. This issue generally

augments with the increasing number of samples analyzed. Additionally, performing protein identification at an early stage of data processing also contributes to the incompleteness of the data matrices. Several data processing workflows have been developed to alleviate this issue, but have been mainly applied to label-free proteomic approaches12-14. For instance, the ”Match Between Runs” method of the MaxQuant computational platform15 reduces missing values with the alignment of spectra among different runs based on retention times and precursor masses. Shotgun isobaric tagging data suffers from the same issues but requires ad-hoc treatments and solutions to maximize its outputs. Moreover, a large variety of preprocessing methods exists for denoising, normalization, peak detection and filtering16-18, and it is common practice to use a standard method with a default set of parameters. These practices usually result in suboptimal or biased solutions that might affect all subsequent analyses. Computational pipelines supporting isobaric tagging data analysis have been previously developed, such as the Trans-Proteomics Pipeline (TPP)19, MSnbase20, Iquant21 and pyQuant22. All these aforementioned pipelines are based on the classical workflow that discard unassigned 4 ACS Paragon Plus Environment

Page 5 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

spectra despite the presence of reporter-ions; additionally their analysis flow includes manually tuned parameters and steps with increased risks for over-fitting and/or underperforming. To address such drawbacks, we have developed, and present herein, the “Quantify then Identify” (QtI) pipeline that optimizes the processing of mass spectral data obtained with isobaric tags, such as TMT. A multi-objective evolutionary algorithm is used to select the optimal settings for each step of the data treatment and analysis, their optimal order and their fine-tuned parameters (see Supporting Information I), allowing for a more accurate and unbiased quantification analysis. Multi-objective optimization has been previously applied to proteomic data23 but it is used for the first time to optimize the preprocessing pipeline of every dataset with a data-driven adaptive approach, able to explore the search space and also capable to approach the global optimal solutions minimizing the risk of getting trapped to only local optimal solutions. Moreover, QtI focuses on the tandem MS (MS/MS) level (i.e., the tandem mass spectra containing both the TMT reporter-ions for quantitation and the peptide fragment-ions for identification), performs first quantification and then identification. It preserves unidentified spectra in a high quality consensus spectral list that can be later exploited with a minimum computational cost. Different modules such as the Peptide Match Rescue (PMR) and the Optimized Post-Translational Modification (OPTM) allow extracting more quantitative information from datasets than a classical workflow.

Experimental Section Experimental Design

5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The raw datasets used in this report were previously generated24,25. A commercial pool from healthy control donors of human cerebrospinal fluid (CSF) samples from Analytical Biological Services (Wilmington, DE) was used. Pooled human plasma samples were obtained from the DiOGenes project (http://www.diogenes-eu.org/)26, as previously reported24. The DiOGenes project was supported by a contract (FP6-2005-513946) from the European Commission Food Quality and Safety Priority of the Sixth Framework Program. Local sponsors (full list at www.diogenes-eu.org/sponsors/) made financial contributions to the shop centers, which also received a number of foods free of charge from food manufacturers. All samples were prepared using a highly automated proteomic workflow that includes spiking of β-lactoglobulin (LACB) protein standard, abundant protein depletion, buffer exchange, reduction, alkylation, digestion, TMT sixplex labeling, sample pooling, and purification. Reversed-Phase Liquid Chromatography (RP-LC)-MS/MS was performed with hybrid Linear ion Trap-OrbiTrap (LTQ-OT) Elite instruments coupled to Ultimate 3000 RSLC nano systems (Thermo Scientific, San Jose, CA), as previously described24,25,27. The “CSF”25 and the “Plasma”24 experiments consist both of 16 replicates TMT sixplex experiments measuring identical 96 CSF and plasma samples, analyzed in triplicates, resulting in a total of 48 raw files each. Spectral Library Generation Pools of the 96 previous CSF and plasma samples were prepared and both were fractionated using off-gel electrophoresis28. The 24 fractions from each pool sample were analyzed in duplicates with RP-LC-MS/MS using hybrid LTQ-OT Elite instruments coupled to Ultimate 3000 RSLC nano systems, as previously described24-26. The spectral library was then generated

6 ACS Paragon Plus Environment

Page 6 of 31

Page 7 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

from the raw files acquired and contained 54213 spectra, corresponding to 37393 peptide sequences. For more details about the spectral library generation, refer to the Supporting Information II. Data Availability Part of the MS proteomic data was previously deposited to the ProteomeXchange Consortium29 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository30 with the dataset identifier PXD003024. The complementary part of the data has been deposited to ProteomeXchange with identifier PXD005206 (Username: [email protected], Password: KBdzJfA8) and PXD008029 (Username: [email protected] @ebi.ac.uk, Password: qfpERiud). Classical Data Processing Proteome Discoverer (version 1.4, Thermo Scientific) was used as data processing interface. Identification was performed against the human UniProtKB/Swiss-Prot database (release 07/2016, 20206 proteins) including the LACB sequence. Mascot31 (version 2.4.0, Matrix Sciences, London, UK) was used as search engine. The variable amino acid modifications which were included in the search were oxidized methionine, deamidated asparagine/glutamine, and sixplex TMT-labeled peptide amino terminus. Sixplex TMT-labeled lysine was set as a fixed modification as well as carbamidomethylation of cysteine. Trypsin was selected as the proteolytic enzyme, with a maximum of two potential missed cleavages. Peptide and fragment ion tolerance were set to, respectively, 10 ppm and 0.02 Da. All Mascot result files were loaded into Scaffold Q+S 4.2.1 (Proteome Software, Portland, OR) to be further searched with X! Tandem. Both peptide and protein False Discovery Rates (FDRs) were fixed at 1%, with a two unique peptides 7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

criterion to report protein identification. Relative quantitative protein values were exported from Scaffold Q+S. QtI Pipeline Data Processing The overall pipeline consists of five analysis steps as depicted in Figure 1. An initial conversion of the raw files to mzML format using the ProteoWizard's MSConvert tool32 is performed with parameters set to 32-bit precision, no zlib conversion, HCD activation, MS levels 1 & 2 and zero sample filter level 1 & 2. The first step, i.e., Step 1 or Peak Filtering, refers to automatically generating and applying a preprocessing pipeline to all tandem mass spectra of a dataset. Preprocessing analysis is performed using machine learning algorithms and includes many crucial steps (e.g., normalization, smoothing/denoising, and peak finding) in order to mainly remove systematic errors and noise from the MS/MS peak lists obtained from an experiment. The second step of the QtI workflow, i.e., Step 2 or Quantification & Merging, is used to perform quantification at tandem mass spectral level, filter out spectra with too many missing quantification values, and generate a unified spectral list. To create the unified spectral list, a spectral alignment is performed for all spectra of all experiments. This alignment is based on a retention time alignment (with a user-defined tolerance) and then a calculation of a similarity score based on the peak lists of every pair of spectra. This similarity score is an extension of the metric proposed by Stein and Scott33 to improve speed calculation and performance (for the detailed equations, see Supporting Information III). A missing quantification value tolerance is set by the user (see Supporting Information III); in this study we retained only tandem mass spectra with at least 3 out of 6 TMT reporter-ions. The third step, i.e., Step 3 or Identification, is used to identify peptides and proteins, using here a combination of Mascot and Scaffold software, with the same parameters used for the classical workflow to benchmark. Step 4 or PTM 8 ACS Paragon Plus Environment

Page 8 of 31

Page 9 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Optimization, select an optimized set of PTMs to be used for searching the unidentified quantified spectra. The fifth step, i.e., Step 5 or PMR & Reporting, includes the PMR module and generation of reports. The PMR module optimizes the consensus spectral list to alleviate the problem of missing values. The method starts by processing each unmatched tandem mass spectrum included in the consensus list. The different experiments are parsed to locate a matched spectrum, which corresponds to the spectrum that has a similarity score above a pre-defined similarity value (the similarity scoring is based on a distance metric of the peaks in the tandem mass spectra; it does not consider m/z value of the precursors but takes into account retention time information). If such spectrum is found, then its peak list is selected as the representative peak list for the unmatched spectrum in the consensus list. If more than one such spectrum exists in different experiments, then the one with the highest similarity score is selected to update the consensus spectral list. Finally, a report is generated to summarize the results of the overall analysis. This report, based on the consensus spectral list, contains the quantification values, information from the Mascot searches, and protein identification and quantification information from Scaffold. A detailed description of the algorithms, the implementation and the parameters of the QtI pipeline is presented in Supporting Information III.

Results and Discussion Increasing the Number of Exploited Spectra Figure 2 shows the cumulative spectral metrics obtained with the classical workflow (combination of Mascot and Scaffold) versus the QtI pipeline for both examined datasets, i.e., CSF (Figure 2a) and Plasma (Figure 2b). The QtI pipeline allowed reducing the number of 9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

unexploited spectra (in grey) respectively by 77% and 62%, for CSF and Plasma datasets compared to the classical workflow. Using QtI, only 16.5% of the CSF spectra remained unexploited. QtI also significantly increased the number of quantified and identified spectra compared with the classical workflow. The proportion of concomitantly quantified and identified spectra (in orange) in the datasets increased by 55% and 75% for CSF and Plasma datasets, respectively. Using the QtI, the category of “Not quantified-Identified” spectra (in blue) disappeared while the “Quantified-Not identified” (in yellow) category appeared. The spectral features in this new category are also referred as the “known unknowns”, as in metabolomics, and might be exploited afterwards with various methods and tools. For example, the recent method by Griss et al. could be used to complete spectral identification34. De novo tools or searches in spectral libraries as well as faster tools for identification, such as MSFragger35, which speed up searching in several protein databases and take into account more PTMs, could be employed to match such tandem mass spectra.

Interestingly, the QtI allowed here reducing storage requirements and search times. In specific, mzML files outputted from Step 2 were approximately one fifth of the size of the initial raw dataset, and the Mascot search times were reduced by 46% compared to the time needed to search the original file (data not shown). The MS/MS/MS acquisition mode used to alleviate the co-fragmentation/interference issue for relative protein quantification with isobaric tagging is gaining popularity. While our pipeline only supports MS/MS data for now, upstream merging of the peptide sequence-ion m/z range from

10 ACS Paragon Plus Environment

Page 10 of 31

Page 11 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

tandem mass spectra and the reporter-ion m/z range from spectra of MS/MS/MS in combined spectra36 should allow using QtI with such data type. Further analysis validated the quantitative performances of the QtI pipeline when reconstructing selectively calibration curves from a two-proteome model. Proteins, peptides, and rescued peptides were accurately quantified with the QtI (Supporting Information IV). These results demonstrated the specificity of the similarity scoring (see next section). Additional results are shown in Supporting Information V. Reducing Missing Values with the PMR Module The QtI pipeline can exploit unmatched tandem mass spectra with quantitative values. Once tandem mass spectra with TMT reporter-ions are confirmed to represent true peptides rather than experimental artifacts, using alignment and similarity scores, those spectra are combined in a consensus list of quantified tandem mass spectra across all experiments (Figure 1). The problem, in the classical approach, is that if a peptide sequence is not independently and consistently identified across all TMT-based experiments, it can generate missing values in the dataset at both peptide and protein levels. The PMR module searches for similar tandem mass spectra between experiments. The unified list of quantified tandem mass spectra across all experiments is exploited to rescue spectra with poor probability score, which usually results in unassigned peptide matches. The PMR method applies a similarity score to indirectly reconstruct and assign a peptide match to an unmatched spectrum that finds an analogous spectrum in the list of identified peptides. Figure 3 displays heatmaps representing part of the proteins quantified across the 16 TMT sixplex experiments constituting the Plasma dataset for both the classical workflow and the QtI

11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

pipeline. This figure illustrates the impact of the PMR module on the data matrix completeness. With the classical workflow, 16.99% and 22.01% of protein quantification values were missing in CSF and Plasma datasets, respectively. Applying the PMR module allowed reducing the percentage of total number of missing values in these experiments to 16.95% and 17.74%, respectively. The enhanced performances of the QtI were also observable for the proteins constantly identified and quantified in the 16 TMT sixplex experiments. Indeed, the QtI quantification-driven pipeline increased the number of quantified proteins across all experiments from 225 (i.e., 71% of all identified proteins) to 273 proteins (i.e., 90% of all identified proteins) for the Plasma dataset (Figure 4). No improvement was observed for the CSF dataset on the proportion of constantly quantified proteins (Figure 4) but the absolute number raised from 538 to 565 proteins (Supporting Information V). The missing value recovery had therefore a direct impact on the total but also the intersection number of peptides/proteins identified and quantified in the samples. In Figure 5, the intersection of covered data between experiments in both CSF and Plasma datasets are given at peptide and protein levels. The number of commonly quantified peptides across more than 90% of the experiments (we arbitrarily considered here 10% of missing data per protein to be the maximum acceptable for subsequent statistical analyses) increased considerably when employing the QtI pipeline, with 37% of supplementary peptides quantified with the QtI for CSF dataset and 58% for the Plasma dataset. This time and considering the 90% criterion set above, the CSF dataset completeness was also markedly improved at protein level (i.e., 5% increase), showing the practical advantage of the QtI pipeline to exploit more data. The most abundant proteins in the samples have been preferentially alleviated from missing quantification, 12 ACS Paragon Plus Environment

Page 12 of 31

Page 13 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

allowing them reaching the 90% completeness criterion (Supporting Information VI). Additional figure-of-merits are provided in Supporting Information V.

Identifying More Peptides/Proteins from the “Quantified-Not Identified” Spectra of the Consensus Spectral List The “Quantified-Not identified” tandem mass spectra kept by the QtI pipeline correspond to real peptides even if not identified due to insufficient MS/MS information for the search engine. The QtI method, in its final step, outputs a consensus spectral list containing only the representative quantified spectra. This enables the application of complementary search techniques on the unidentified spectra of the consensus spectral list, with reduced computational complexity since they are applied only to a small subset of the spectra. To exploit the strength of this approach, we used the OPTM module to search the unidentified spectra from the consensus spectral list with additional PTMs and performed a spectral library search on the remaining unidentified spectra. With the OPTM search, unmatched quantified tandem mass spectra can be further exploited to deliver additional peptide identifications. Classically, the peptide identification rate can be enhanced by exploring PTMs during database search. However, the number of PTMs potentially investigated could be extremely high; searching for all of them simultaneously is unrealistic and detrimental to both the search time and the FDR control37. To improve the identification rate without increasing the search space, the OPTM method was embedded into Step 4 (PTM Optimization) of the QtI pipeline. Step 4 consists in selecting a subset of PTMs to be searched on the unmatched spectra dataset at the end of Step 3 (Identification) (the details are provided in Supporting Information III). Further Mascot searches with the optimized set of PTMs on the

13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

unidentified spectra of the consensus spectra list showed that a significant number of additional peptide matches can be retrieved using four well-selected PTMs (Table 1). It is noteworthy that since the OPTM module requests a search on the consensus spectral list and not on the whole dataset, the time requirements were smaller than those necessary for re-searching once the whole dataset (58 minutes on average for the each dataset on a 4 processors and 32GB memory Mascot Server). For CSF and Plasma datasets, this module allowed identifying 12506 and 3120 supplementary peptides, respectively. This module showed enhanced identification rates, from 1% to 3%, for both datasets at peptide level. The second search with optimized PTMs was here performed with Mascot but can be easily done using more PTM-oriented search engines such as Byonic38, for instance. As an alternative method to identify more peptides from the pool of remaining unidentified quantified spectra, we searched those spectra against an in-house TMT-based spectral library (1% FDR threshold) with the SpectraST and peptideProphet/iProphet tools. This supplementary search allowed to identify 1860 peptides for the CSF dataset and 1145 peptides for the Plasma dataset. It is important to mention that the PMR module can be subsequently applied to the results of both of these additional searches to further increase the comprehensiveness of the exploitable datasets (data not shown).

Conclusion The QtI pipeline was demonstrated to provide a viable and valuable solution for exploiting more spectral data from isobaric TMT-based experiments. Enhanced bioinformatic tools are of crucial importance for biomarker discovery especially in large datasets where the problem of missing 14 ACS Paragon Plus Environment

Page 14 of 31

Page 15 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

identifications and quantifications is key. With preprocessing analysis of peak lists using machine learning algorithms and different innovative modules such as the PMR and the OPTM, our adaptive QtI pipeline outperformed a classical workflow for the processing of isobaric tagging data. Our proof-of-principle study showed superior performance of the QtI in terms of numbers of exploited spectra, identified/quantified peptides and proteins, and data completeness. A new category of spectra, i.e., “Quantified-Not identified”, is kept to allow further analysis by using de novo sequencing, various search engines or proteogenomic approaches. These known unknown features can be retained as biomarker candidates. Very recently, Skillback et al.39 have reported on a quantification-driven strategy applied to peptidomics using also isobaric tagging. The authors employed a spectral clustering method and demonstrated the potential of reversing the identification and quantification steps. Our approach pipeline is however quite different and offers extended features; it is here applied to proteomics in general, includes a full processing workflow to allow generation of more complete datasets and their downstream examination with any statistical methods for biomarker discovery. Our future plans aim at simplifying the utilization of the QtI with a simple command line interface. This will enable the proteomic and bioinformatic communities to test and use it, as well as develop new methods, deploying beyond TMT labeling applications and improving its performance and usability.

Supporting Information Supporting Information I – Performance analysis of evolutionary multi-objective framework.

15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information II – Spectral library generation. Supporting Information III – Detailed description of the QtI pipeline. Supporting Information IV – Quantification analysis with E. coli mixed datasets (two-proteome model). Supporting Information V – Detailed comparative results. Supporting Information VI – Number of rescued spectra in function of protein concentration in plasma. Supporting Information VII – Tuning absolute similarity threshold and retention time threshold.

Corresponding Authors Prof. Spiridon Likothanassis Department of Computer Engineering and Informatics, University of Patras Building B, University Campus Rio Patras, 26500 Greece Email: [email protected] Fax: +30 2610338798 Dr. Loïc Dayon Nestlé Institute of Health Sciences SA EPFL Innovation Park, Bâtiment H 1015 Lausanne Switzerland Email: [email protected] Fax: +41 21 632 6499

Author Contributions 16 ACS Paragon Plus Environment

Page 16 of 31

Page 17 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

J.C. and K.T. implemented the strategy and performed the data analysis. J.C. supervised the project. K.T. wrote the QtI code and developed software. F.F. and M.R. reviewed and optimized the QtI code. A.N.G., J.C., and L.D. collected proteomic datasets. J.C., K.T., S.M., C.M., O.C., S.L., and L.D. discussed and interpreted the results. O.C. and L.D. conceived the idea. J.C., K.T., S.M., C.M., and L.D. wrote the manuscript, with contributions from O.C., F.F., M.K., and S.L.. All authors reviewed the manuscript.

Notes The authors declare no competing financial interests. J.C., C.M., O.C., F.F., M.R., A.N.G., M.K., and L.D. are employees of Nestlé Institute of Health Sciences S.A.. Nestlé Institute of Health Sciences S.A. contracted InSybio Ltd to write the QtI code and developed software.

Acknowledgements We thank Dr. Ivan Montoliu and Dr. Christine Chichester for support and fruitful discussions, and Maarten Warndorff for legal aspects. We thank the DiOGenes consortium for providing the samples. InSyBio Ltd. participates in NBG Business Seeds program by the National Bank of Greece.

Abbreviations

CSF

Cerebrospinal fluid

DDA

Data-Dependent Acquisition 17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FDR

False Discovery Rate

iTRAQ

isobaric Tags for Relative and Absolute Quantitation

MS

Mass Spectrometry

MS/MS

Tandem Mass Spectrometry

MS/MS/MS

Triple-Stage Mass Spectrometry

NMR

Nuclear Magnetic Resonance

OPTM

Optimized Post-Translational Modification

PMR

Peptide Match Rescue

PTM

Post-Translational Modification

QtI

Quantify then Identify

RP-LC

Reversed-Phase Liquid Chromatography

TMT

Tandem Mass Tag

TPP

Trans-Proteomics Pipeline

18 ACS Paragon Plus Environment

Page 18 of 31

Page 19 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References 1.

Marcotte, E. M. (2007) How do shotgun proteomics algorithms identify proteins? Nat.

Biotechnol. 25, 755-757 2.

Matzke, M. M., Brown, J. N., Gritsenko, M. A., Metz, T. O., Pounds, J. G., Rodland, K.

D., Shukla, A. K., Smith, R. D., Waters, K. M., McDermott, J. E., and Webb-Robertson, B. J. (2013) A comparative analysis of computational approaches to relative protein quantification using peptide peak intensities in label-free LC-MS proteomics experiments. Proteomics 13, 493503 3.

Wilkins, M. R., Pasquali, C., Appel, R. D., Ou, K., Golaz, O., Sanchez, J. C., Yan, J. X.,

Gooley, A. A., Hughes, G., Humphery-Smith, I., Williams, K. L., and Hochstrasser, D. F. (1996) Proteins to proteomes: Large scale protein identification by two-dimensional electrophoresis and amino acid analysis. Nat. Biotechnol. 14, 61-65 4.

Pirmoradian, M., Budamgunta, H., Chingin, K., Zhang, B., Astorga-Wells, J., and

Zubarev, R. A. (2013) Rapid and deep human proteome analysis by single-dimension shotgun proteomics. Mol. Cell. Proteomics 12, 3330-3338 5.

Chick, J. M., Kolippakkam, D., Nusinow, D. P., Zhai, B., Rad, R., Huttlin, E. L., and

Gygi, S. P. (2015) A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat. Biotechnol. 33, 743-749 6.

Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B., Parker, K., Hattan, S.,

Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., BartletJones, M., He, F., Jacobson, A., and Pappin, D. J. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3, 1154-1169 7.

Dayon, L., Hainard, A., Licker, V., Turck, N., Kuhn, K., Hochstrasser, D. F., Burkhard, P.

R., and Sanchez, J. C. (2008) Relative quantification of proteins in human cerebrospinal fluids by MS/MS using 6-plex isobaric tags. Anal. Chem. 80, 2921-2931 8.

Baker, M. (2011) Metabolomics: From small molecules to big ideas. Nat. Methods 8, 117-

121 9.

Alonso, A., Marsal, S., and Julià, A. (2015) Analytical methods in untargeted

metabolomics: state of the art in 2015. Front. Bioeng. Biotechnol. 2015;3:23.

19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10.

Yates Iii, J. R., Eng, J. K., McCormack, A. L., and Schieltz, D. (1995) Method to

Correlate Tandem Mass Spectra of Modified Peptides to Amino Acid Sequences in the Protein Database. Anal. Chem. 67, 1426-1436 11.

Stahl, D. C., Swiderek, K. M., Davis, M. T., and Lee, T. D. (1996) Data-controlled

automation of liquid chromatography/tandem mass spectrometry analysis of peptide mixtures. J. Am. Soc. Mass Spectrom. 7, 532-540 12.

America, A. H. P., and Cordewener, J. H. G. (2008) Comparative LC-MS: A landscape of

peaks and valleys. Proteomics 8, 731-749 13.

De Costa, D., Broodman, I., VanDuijn, M. M., Stingl, C., Dekker, L. J. M., Burgers, P. C.,

Hoogsteden, H. C., Smitt, P. A. E. S., Van Klaveren, R. J., and Luider, T. M. (2010) Sequencing and quantifying igg fragments and antigen-binding regions by mass spectrometry. J. Proteome Res. 9, 2937-2945 14.

Zhang, B., Käll, L., and Zubarev, R. A. (2016) DeMix-Q: Quantification-Centered Data

Processing Workflow. Mol. Cell. Proteomics 15, 1467-1478 15.

Tyanova, S., Temu, T., & Cox, J. (2016). The MaxQuant computational platform for mass

spectrometry-based shotgun proteomics. Nature Protocols 11(12), 2301-2319. 16.

Smith, R., Mathis, A. D., Ventura, D., & Prince, J. T. (2014). Proteomics, lipidomics,

metabolomics: a mass spectrometry tutorial from a computer scientist's point of view. BMC bioinformatics, 15(7), S9 17.

M.C. Codrea, C.R. Jimenez, J. Heringa, E. Marchiori, Tools for computational processing

of LC–MS datasets: a user's perspective, Comput. Methods Programs Biomed. 86 (2007) 281– 290 18.

Perez-Riverol, Y., Wang, R., Hermjakob, H., Müller, M., Vesada, V., & Vizcaíno, J. A.

(2014). Open source libraries and frameworks for mass spectrometry ased proteomics: a developer's perspective. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 1844(1), 63-76 19.

Deutsch, E. W., Mendoza, L., Shteynberg, D., Slagel, J., Sun, Z., and Moritz, R. L. (2015)

Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin. Appl. 9, 745-754

20 ACS Paragon Plus Environment

Page 20 of 31

Page 21 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

20.

Gatto, L., and Lilley, K. S. (2012) Msnbase-an R/Bioconductor package for isobaric

tagged mass spectrometry data visualization, processing and quantitation. Bioinformatics 28, 288289 21.

Wen, B., Zhou, R., Feng, Q., Wang, Q., Wang, J., and Liu, S. (2014) IQuant: An

automated pipeline for quantitative proteomics based upon isobaric tags. Proteomics 14, 22802285 22.

Mitchell, C. J., Kim, M. S., Na, C. H., and Pandey, A. (2016) PyQuant: A versatile

framework for analysis of quantitative mass spectrometry data. Mol. Cell. Proteomics 15, 28292838 23.

Bradbury, J., Genta-Jouve, G., Allwood, J. W., Dunn, W. B., Goodacre, R., Knowles, J.

D., He, S. and Viant, M. R. (2015). MUSCLE: automated multi-objective evolutionary optimization of targeted LC-MS/MS analysis. Bioinformatics, 31(6), 975-977 24.

Dayon, L., Núñez Galindo, A., Corthésy, J., Cominetti, O., and Kussmann, M. (2014)

Comprehensive and scalable highly automated MS-based proteomic workflow for clinical biomarker discovery in human plasma. J. Proteome Res. 13, 3837-3845 25.

Núñez Galindo, A., Kussmann, M., and Dayon, L. (2015) Proteomics of Cerebrospinal

Fluid: Throughput and Robustness Using a Scalable Automated Analysis Pipeline for Biomarker Discovery. Anal. Chem. 87, 10755-10761 26.

Larsen, T. M., Dalskov, S. M., Van Baak, M., Jebb, S. A., Papadaki, A., Pfeiffer, A. F. H.,

Martinez, J. A., Handjieva-Darlenska, T., Kunešová, M., Pihlsgård, M., Stender, S., Holst, C., Saris, W. H. M., and Astrup, A. (2010) Diets with high or low protein content and glycemic index for weight-loss maintenance. New Engl. J. Med. 363, 2102-2113 27.

Cominetti, O., Núñez Galindo, A., Corthésy, J., Oller Moreno, S., Irincheeva, I., Valsesia,

A., Astrup, A., Saris, W. H. M., Hager, J., Kussmann, M., and Dayon, L. (2016) Proteomic Biomarker Discovery in 1000 Human Plasma Samples with Mass Spectrometry. J. Proteome Res. 15, 389-399 28.

Dayon, D. and Sanchez, J.C (2012) Chapter 9: Relative protein quantification by MS/MS

using the tandem mass tag technology. Methods in Molecular Biology vol. 893, 115-127 29.

Vizcaíno, J. A., Deutsch, E. W., Wang, R., Csordas, A., Reisinger, F., Ríos, D., Dianes, J.

A., Sun, Z., Farrah, T., Bandeira, N., Binz, P. A., Xenarios, I., Eisenacher, M., Mayer, G., Gatto, L., Campos, A., Chalkley, R. J., Kraus, H. J., Albar, J. P., Martinez-Bartolomé, S., Apweiler, R., 21 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Omenn, G. S., Martens, L., Jones, A. R., and Hermjakob, H. (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223226 30.

Vizcaíno, J. A., Côté, R. G., Csordas, A., Dianes, J. A., Fabregat, A., Foster, J. M., Griss,

J., Alpi, E., Birim, M., Contell, J., O'Kelly, G., Schoenegger, A., Ovelleiro, D., Pérez-Riverol, Y., Reisinger, F., Ríos, D., Wang, R., and Hermjakob, H. (2013) The Proteomics Identifications (PRIDE) database and associated tools: Status in 2013. Nucleic Acids Res. 41, D1063-D1069 31.

Perkins, D. N., Pappin, D. J. C., Creasy, D. M., and Cottrell, J. S. (1999) Probability-

based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-3567 32.

Kessner, D., Chambers, M., Burke, R., Agus, D., and Mallick, P. (2008) ProteoWizard:

Open source software for rapid proteomics tools development. Bioinformatics 24, 2534-2536 33.

Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H.,

Aebersold, R., and Schwikowski, B. (2006) Signal maps for mass spectrometry-based comparative proteomics. Mol. Cell. Proteomics 5, 423-432 34.

Griss, J., Perez-Riverol, Y., Lewis, S., Tabb, D. L., Dianes, J. A., del-Toro, N., Rurik, M.,

Walzer, M., Kohlbacher, O., Hermjakob, H., Wang, R., and Vizcaíno, J. A. (2016) Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat. Methods, (8)-651-656 35.

Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI. MSFragger:

ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods. 2017;14(5):513-20 36.

Dayon L, Pasquarello C, Hoogland C, Sanchez JC, Scherl A. Combining low- and high-

energy tandem mass spectra for optimized peptide quantification with isobaric tags. J Proteomics. 2010;73(4):769-77 37.

Elias, J. E., and Gygi, S. P. (2007) Target-decoy search strategy for increased confidence

in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207-214 38.

Bern, M., Kil, Y. J. and Becker, C. (2012) Byonic: Advanced Peptide and Protein

Identification Software. Curr. Protoc. Bioinformatics, Chapter 13:Unit13.20

22 ACS Paragon Plus Environment

Page 22 of 31

Page 23 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

39.

Skillback T, Mattsson N, Hansson K, Mirgorodskaya E, Dahlen R, van der Flier W, et al.

A novel quantification-driven proteomic strategy identifies an endogenous peptide of pleiotrophin as a new biomarker of Alzheimer's disease. Sci Rep. 2017;7(1):13333

Figure captions Figure 1. Flowchart of the QtI processing pipeline applied to TMT-based proteomic datasets. The converted raw files to mzML format are parsed through five sequential steps, Step1: Peak Filtering, Step 2: Quantification & Merging, Step 3: Identification, Step 4 (optional): PTM Optimization, and Step 5: PMR & Reporting. Figure 2. Exploitation of tandem mass spectra using the classical workflow and the QtI pipeline for the CSF and plasma datasets. The results of classical data processing and QtI are based on Scaffold outputs using 1% FDR at both peptide and protein levels with a minimum of two-peptide criterion to report a protein identification. The unexploited tandem mass spectra with the QtI mainly correspond to spectra not passing the missing quantification value tolerance (i.e., in the present study, less than 3 out of 6 TMT reporter-ions detected). Both represented datasets are composed of 16 TMT sixplex experiments (i.e., 96 samples) analyzed in triplicate with RPLC-MS/MS. Figure 3. Principle of the PMR module illustrated on a subset of proteins and its effect on missing value reduction. A subset of the same identified proteins between the classical 23 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

workflow and the QtI pipeline for the Plasma dataset is shown. In particular, the hyaluronanbinding protein 2 (HABP2) was identified and quantified in 13/16 TMT experiments with the classical workflow. Using the QtI pipeline including the PMR module, the protein was quantified and identified also in the remaining three TMT sixplex experiments (i.e., all the 96 plasma samples in the dataset could be now compared with regards to this protein). Figure 4. Percentage of proteins commonly quantified in all samples (n = 96) for the classical workflow and the QtI pipeline. The proportion of proteins with complete quantitative data for the 16 TMT sixplex experiments is given for both CSF and Plasma datasets. Absolute numbers of protein coverages are provided in Supporting Information V. Figure 5. Comparative results between the classical workflow and the QtI pipeline in terms of common quantified peptides/proteins in at least 90% of the experiments. The number of commonly quantified peptides is given a well as the number of commonly quantified proteins for both CSF and Plasma datasets.

24 ACS Paragon Plus Environment

Page 24 of 31

Page 25 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1. Number of quantified peptides added by the use of the OPTM module. The OPTM module allows searching the consensus spectra list with well-selected PTMs and identifying supplementary peptides.

25 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1.

26 ACS Paragon Plus Environment

Page 26 of 31

Page 27 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2.

27 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3.

28 ACS Paragon Plus Environment

Page 28 of 31

Page 29 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4.

29 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5.

30 ACS Paragon Plus Environment

Page 30 of 31

Page 31 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

For TOC only

31 ACS Paragon Plus Environment