Accurate precursor mass assignment improves peptide identifica-tion

7 days ago - Compared to conventional spectral library searching, the accuracy and sensitivity of peptide identification were significant-ly increased...
0 downloads 0 Views 1MB Size
Subscriber access provided by Nottingham Trent University

Article

Accurate precursor mass assignment improves peptide identification in data-independent acquisition mass spectrometry (DIA-MS) Dong-Gi Mun, Dowoon Nam, Hokeun Kim, Akhilesh Pandey, and Sang-Won Lee Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.9b01474 • Publication Date (Web): 03 Jun 2019 Downloaded from http://pubs.acs.org on June 5, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Accurate precursor mass assignment improves peptide identification in data-independent acquisition mass spectrometry (DIA-MS) Dong-Gi Mun1ǂ, Dowoon Nam1ǂ, Hokeun Kim1, Akhilesh Pandey2,3, and Sang-Won Lee1* 1Department

of Chemistry, Center for Proteogenome Research, Korea University, Seoul, 136-701, Republic of Korea of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, 55902, USA 3Manipal Academy of Higher Education (MAHE), Manipal 576104, Karnataka, India 2Department

ABSTRACT: Proteomics research today no longer simply seeks exhaustive protein identification; increasingly, it is also desirable to obtain robust, large-scale quantitative information. To accomplish this, data-independent acquisition (DIA) has emerged as a promising strategy largely owing to developments in advanced mass spectrometers and sophisticated data analysis methods. Nevertheless, the highly complex multiplexed MS/MS spectra produced by DIA remain challenging to interpret. Here, we present a novel strategy to analyze DIA data, based on unambiguous precursor mass assignment through the mPE-MMR (multiplexed postexperimental monoisotopic mass refinement) procedure, and combined with complementary multi-stage database searching. Compared to conventional spectral library searching, the accuracy and sensitivity of peptide identification were significantly increased by incorporating precise precursor masses in DIA data. We demonstrate identification of additional peptides absent from spectral libraries, including sample-specific mutated peptides and post-translationally-modified peptides using MS-GF+ and MODa/MODi multi-stage database searching. This first use of unambiguously-determined precursor masses to mine DIA data demonstrates considerable potential for further exploitation of this type of experimental data.

Liquid chromatography tandem mass spectrometry (LCMS/MS) analysis with data-dependent acquisition (DDA) has been the cornerstone of proteome profiling for several years now1, 2. In DDA analysis, peptides separated by liquid chromatography are ionized and subsequently detected on an MS survey scan. Ions are consecutively isolated for fragmentation, usually based on their intensity in the survey scan and within a narrow isolation window. The resulting fragment ions are recorded as MS/MS spectra from which peptide sequence information can be extracted. The recorded MS/MS spectra are searched against theoretical MS/MS spectra generated in-silico from protein databases, applying a precursor mass tolerance. This approach is referred to as spectrumcentric3, and is used in many common tools (e.g., SEQUEST4, Mascot5, and MS-GF+6). With the advent of high-performance mass spectrometers7, 8, the information provided by the DDA approach has become increasingly comprehensive. However, stochastic sampling during the precursor ion selection step represents an inherent limitation. MS/MS analysis is biased towards detecting more abundant peptides, and the stochastic selection approach decreases the reproducibility of technical replicates as the complexity of the proteome contained in the sample increases9. Recently, data-independent acquisition (DIA) has emerged as an alternative strategy allowing both comprehensive identification and accurate quantification10. All fragment ions derived from multiple precursor ions within a pre-defined isolation window, which is typically wider than that used in DDA methods, are recorded as MS/MS spectra, regardless of precursor ion intensity. Consequently, the MS/MS spectra acquired by DIA are highly complex because they contain fragments from multiple precursor ions. This co-fragmentation

hampers accurate and sensitive peptide identification when applying conventional DDA-adapted spectrum-centric database search strategies. To overcome this problem, novel sophisticated approaches have been developed for DIA data analysis3, 11. For example, peptide-centric tools have been developed tracing the evidence of peptide existence (OpenSWATH12, Skyline13, and Spectronaut10). In these chromatogram-scoring methods, targeted extraction is performed for several fragment ions produced from a peptide in a spectral library and matched against the whole elution time or a specific window of elution time. Spectral library search tools have also been reported which directly assess similarity between DIA MS/MS spectra and library MS/MS spectra based on dot products (MSPLIT-DIA14 and FT-ARM15). A major limitation of spectral library-based approaches is that peptides absent from these libraries, such as modified or mutated peptides, cannot be identified. In addition, the number of peptide identifications is highly dependent on the size of the spectral library as well as its contents16. As a result, extra time for DDA experiment is needed to construct a sample-specific spectral library. These problems may be circumvented by using spectral-library-free targeted extraction tools such as PECAN17 and DIA-Umpire18. In addition, precursor ion mass information can be exploited when analyzing DIA data, and a few tools have been specifically developed to do so. Thus, DIA-Umpire18 offers an alternative strategy detecting precursor-fragment groups and generating pseudo-MS/MS spectra which can be used with conventional spectrum-centric tools. In addition, He et al.19 developed an algorithm to extract precursor information directly from the full MS scan, without recourse to the fragment ion information. Their algorithm, named RawConverter,

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

considers all possible theoretical isotopic envelopes for charges +1 to +6 of each peak in the isolation window and matches them with experimental isotopic distributions to determine their precursor masses. However, few studies have comprehensively identified peptides from DIA data based on precursor ion mass, while systematically assessing the accuracy and sensitivity of the peptides identified. In this paper, we present a novel DIA data analysis strategy exploiting unambiguously-determined precursor ion masses. Accurate monoisotopic masses of multiple precursor ions corresponding to multiplexed MS/MS spectra produced by DIA were determined using the mPE-MMR20 procedure which was initially developed and successfully applied to interpret multiplexed MS/MS spectra found in DDA data. MS/MS data for which precursor masses had been determined were searched against spectral libraries using an in-house spectral library search (SLS) tool, with specific precursor mass and normalized retention time tolerance (1st stage search). The impact of precursor mass assignment on peptide identification by SLS of DIA data was assessed by comparing results to those obtained by conventional SLS, only considering fragment ions. The spectrum-level false discovery rate estimation was improved when precursor masses were assigned to MS/MS data, resulting in increased peptide identifications (~30%). The remaining MS/MS data, for which no peptides were identified during the 1st stage SLS, were directly subjected to sequential spectrumcentric search engines, MS-GF+ (2nd stage search)6 followed by MODa/MODi (3rd stage search)21, 22, which was previously demonstrated to effectively improve the extent of peptide identification23. When using sample-specific protein databases constructed from the matching genomics data24, the 2nd stage search was demonstrated to be especially effective in identifying sample-specific mutated peptides, including single nucleotide variants (SNVs) and insertions and deletions (INDELs), which would not be included in the spectral library. In addition, the 3rd stage search by MODa/MODi allowed post translationally modified peptides - with modifications including phosphorylation, acetylation and methylation, among many others - to be accurately characterized from the DIA data. This level of characterization would be impossible with conventional spectral library-based analysis methods of DIA data.

EXPERIMENTAL SECTION Sample preparation. The gastric tumor tissue sample was collected from a patient with gastric cancer. Full details of the tissue collection, protein extraction, and digestion procedures along with institutional review board approval can be found elsewhere25. The gastric tumor tissue peptides were used to perform DIA analysis. The iRT calibration kit (Biognosys, ZH, Switzerland) was added to all experiments for retention time calibration during SLS. DIA data acquisition. The Peptides (5 µg) derived from tumor tissue were analyzed individually using a Q Exactive HFX mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) linked to a dual-online UPLC system consisting of two binary nanoflow pumps (Ultimate 3000 NCP-3200RS, Dionex, Germering, Germany), an autosampler (Ultimate 3000 WPS-3000TPL RS, Dionex, Germering, Germany), two analytical columns (75 m  100 cm) and two trap columns (150 m  3 cm), as previously described26. The analytical columns were maintained at 60 C. Solvent A was 0.1% formic acid (FA) in water, and solvent B was 0.1% FA in acetonitrile.

Page 2 of 12

A 180-min gradient (from 10% to 37.5% B over 161 min, from 37.5% to 80% B over 5 min, 80% B for 12 min, and 10% B for 2 min) was used for the unfractionated DIA analysis. Eluted peptides were ionized using an in-house nano-electrospray source. The electric potential of electrospray ionization was maintained at 2.4 kV and the temperature of the desolvation capillary was set to 250 C. For DIA experiments, sequential MS/MS scans were acquired at a resolution of 15,000 with an automatic gain control (AGC) value of 3.0  106 and a maximum injection time of 22 ms. A 9-Th isolation window was used over a range from 385 to 1,015 Th. Normalized collision energy (NCE) was set to 28. Full MS scans were acquired every 35 MS/MS scans with a mass range of 350 1,500 Th and a resolution of 60,000, with an AGC value of 5.0  106 and a maximum injection time of 20 ms (Figure 1A). Generating MS/MS data with assigned precursor masses. DIA data were processed by mPE-MMR following the same procedure as used for DDA data analysis20. Briefly, charge-state multiplexing and 13C mass correction were performed for peaks detected in the corresponding isolation window of the previous MS scan. Assumed monoisotopic mass candidates were matched to UMCs, which corresponded to a group of similar monoisotopic masses along the chromatographic elution time. When matches occurred within 10 ppm and 5 MS scans, MS/MS data were generated with UMC masses and recorded in the MGF file format. When no match was found, peaks were discarded (Figure 1B.). Construction of the spectral library. The spectral library was constructed using data from the fractionated LC-MS/MS analysis of unlabeled peptides obtained from gastric tissue. For this study, eight online or offline fractionated LC-MS/MS experiments were performed in DDA mode to produce the spectral library. Offline fractionation was performed as described previously25, producing 24 fractions. Online fractionation was performed using a previously reported online two-dimensional non-contiguous concatenation and fractionation reverse-phase/reverse-phase liquid chromatography system (online 2D NCFC-RP/RPLC)26 which was modified to generate 24 online NCFC fractions. The ion optics parameters used with the Q Exactive HF-X mass spectrometer were the same as those used in the DIA experiment. Full MS scans were acquired over a mass range of 400-2,000 Th. DDA was performed by the top-24 method at an NCE of 28 and applying an isolation window of 1.2 Th. Resolution was 60,000 for MS and 15,000 for MS/MS scans. The maximum injection time used was 20 ms for MS scans and 23 ms for MS/MS scans. The target AGC value was set to 5.0  106 for both MS and MS/MS scans. All LC-MS/MS data were processed with mPE-MMR20 and searched against a samplespecific protein database using the MS-GF+ search engine6, 24. A PSM-FDR of 1% was applied for peptide identification. Summary of experimental conditions and protein database search result for the eight LC-MS/MS experiments can be found in Table S1. The peptides identified in the eight DDA experiments were then merged to produce a spectral library (Figure S1A). The MS/MS spectrum with the highest search score (i.e., – log(specE-value)) was selected as the representative MS/MS spectrum for each peptide ion. Theoretical fragment ions for each peptide identified were matched to peaks on the MS/MS spectrum within 0.01 Da.

ACS Paragon Plus Environment

Page 3 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1. Overall workflow of DIA data acquisition and analysis. (A) Experimental overview of DIA data acquisition. Seventy MS/MS scans were acquired covering 385 Th -1,015 Th with an isolation window of 9 Th. (B) Illustration of how mPE-MMR determines multiple precursor masses. Five MS/MS data were generated with unambiguously-determined precursor masses (i.e., 998.09 Th (blue), 1000.48 Th (yellow), 1001.51 Th (green), 1002.48 Th (red), and 1004.52 Th (purple) after charge-state multiplexing, 13C mass correction, and matching to UMC mass. (C) Workflow for the multi-stage search, involving spectral library search (1st stage), MS-GF+ (2nd stage), and MODa/MODi (3rd stage). Two peptides SDFASNCCSINSPPLYCDSEIDAELK (blue) and SLSQPTPPPMPILSQSEAK (purple) were identified by the spectral library searching with the precursor mass and retention time tolerances applied (±10 ppm and 0.025 NET, respectively). Peptide VLYSNM*LGEENTYLWR (red), where M*denotes oxidized methionine, was identified during the second-stage MS-GF+ search. The third-stage search using MODa/MODi identified peptide P+12SSINYMVAPVTGNDVGIR (green), where P+12 denotes addition of a carbon atom to the N-terminus. (D) Annotated MS/MS spectrum showing the peptides identified during the three stages.

Annotated fragment ions (a-, b- and y-type ions) were thus obtained, and other fragment peaks without annotation were discarded. Peptides which had more than ten annotated fragment ion peaks were used to construct the spectral library. Although protein database searching of DDA data was performed considering some variable modifications (methionine oxidation and asparagine/glutamine deamidation), only unmodified peptides were used to build the spectral library for this study. The elution times for peptides was normalized according to the NET scale using the 11 reference iRT peptides27. The NETs observed for the 11 reference peptides were calculated using the predicted NETs obtained from NET prediction utility tool28. The calibration function obtained was applied to determine the normalized retention time for all peptides, expressed as an observed NET value. When the same peptide ions were identified in multiple LC-MS/MS analyses, the median NET value was used as the representative NET value after iterative Grubbs testing to eliminate outliers (Figure S1B). The resulting spectral library consisted of 438,239 peptide ions corresponding to 301,520 peptides Decoy peptide sequences were obtained by shuffling, and artificial fragment ions were generated by repositioning annotated fragment ion peaks29. These sequences were concatenated with the target spectral library (Figure S1C). Spectral library searching and target-decoy analysis. The resulting MS/MS data with accurately-determined precursor masses were subjected to SLS procedure (Figure S2). MS/MS spectra from the spectral library were listed to assess spectrum similarity within a retention time tolerance of ±0.025 NET and with a precursor mass tolerance of ±10 ppm for each DIA MS/MS run. MS/MS spectra in the spectral library were defined as A = [a1,a2, …ai], where ai is a vector of m/z and absolute

intensity for each annotated fragment ion peak. The absolute intensity of ai was rescaled as a relative intensity, with the maximum intensity equal to 100. Each ai for an MS/MS spectrum within the retention time and precursor mass tolerance was searched against the DIA MS/MS spectrum, allowing an m/z tolerance of ±0.01 Th, to generate B= [b1,b2, … bi], where bi is a vector of m/z and absolute intensity. The absolute intensity of bi was also rescaled such that its maximum intensity was equal to 100. Finally, intensities were normalized pairwise for each pair of A and B, (ai,bi). For example, if the intensity of a pair was (50, 25), it was normalized to (100, 50). A similarity score was calculated according to the dot product-based function: ∑𝑛 Simliarity score =

𝑦 × 𝑦𝐵𝑖 𝑖 = 1 𝐴𝑖

104 where y represents the intensity of vector a and b. Targetdecoy analysis was performed to estimate the FDR. SSMs with the highest similarity score for each MS/MS spectrum were considered. The FDR was calculated as the number of decoy hits over the number of target hits, and SSMs with an FDR equal to or less than 0.01 were used for subsequent analysis. MS-GF+ and MODa/MODi search and protein grouping. The MS-GF+ (v9949) search engine (available at https://omics.pnl.gov/software/ms-gf) was used with the following options: semi tryptic, precursor mass tolerance of 10 ppm, static modification as carbamidomethylation (+57.021 Da) of cysteine, variable modification as oxidation (+15.995 Da) of methionine and deamidation (+0.984 Da) of asparagine and glutamine. A PSM FDR was estimated by target-decoy analysis. MS/MS data without peptide identification after the first- and second-stage searches were submitted to MODa/MODi for

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

unrestrictive modification searching (available at https://omictools.com/moda-2-tool)21, 22. A precursor mass tolerance of 10 ppm and a fragment mass tolerance of 0.01 Da were used, and 59 variable modifications were allowed (Table S2). The FDR was estimated for each subgroup determined based on their charge state and NTT, as described above. The biologically significance of observed modifications was classified according to Unimod website (http://www.unimod.org)30. And all peptides identified through multi-stage search were clustered into protein groups as described before25. Protein groups of equal or more than two peptides were reported. Data and Software availability. All mass spectrometric data were deposited to the ProteomeXchange Consortium via the PRIDE31 partner repository with the dataset identifier PXD012744. The software mPE-MMR was written in the JAVA language and is available at https://omics.pnl.gov.

RESULTS AND DISCUSSION Overall workflow for DIA data analysis with unambiguous precursor mass assignment. DIA was performed as illustrated in Figure 1A. Seventy consecutive MS/MS scans were acquired, covering 385 Th-1,015 Th with an isolation window of 9 Th. MS scans were acquired every 35 MS/MS scans to maintain an interval between MS scans of about 1.5 s. The cycle time when acquiring MS/MS scans with the same isolation window was about 3.0 s. Precursor masses were unambiguously assigned using a slight modification of the mPE-MMR procedure previously applied to DDA data20. Figure 1B presents an overview of how multiple precursor masses were assigned to MS/MS spectra. As previously described, all monoisotopic masses measured from the entire MS data were calculated and clustered into a unique mass class (UMC) list, where a UMC is a collection of monoisotopic masses of a peptide measured at different elution times and different charge states. Subsequently, an m/z peak in the 9 Th isolation window was selected and subjected to “charge multiplexing”, “13C mass corrections” and “range scanning”. For example, as demonstrated in Figure 1B, the lowest m/z peak, 998.09 Th within the 9-Th isolation window on the previous MS spectrum was multiplexed into seven different charge states (i.e. +1 to +7). For each of the seven masses corresponding to the seven charge states, 13C mass corrections were applied by adding 1.00235 Da (i.e., +1 Da) and subtracting multiples of 1.00235 Da (i.e., -1, -2, and -3 Da). The resulting 35 assumed monoisotopic masses were matched to a unique mass class list, with 10 ppm mass tolerance within 5 scan range. The match presented in the figure corresponds to unique mass classes (UMC)#63574 with a +3 charge state. Subsequently mPE-MMR removes the isotopic peaks (marked in blue Figure 1B) corresponding to the matched ion from the isolation window (“matching isotope envelope removal”). The next lowest m/z peak after removal of matching isotopes, 1000.48 Th, was then chosen to apply the same procedures for charge multiplexing, 13C mass corrections, range scanning, mapping to UMC, and removal of isotopic peaks. The whole process was repeated until all the peaks within the isolation window had been accounted for. In this example, a total of five monoisotopic masses were unambiguously-determined (i.e., 998.09 Th, 1000.48 Th, 1001.51 Th, 1002.48 Th, and 1004.52 Th) and assigned to the MS/MS spectrum. MS/MS data produced by applying mPE-MMR process not only provided accurately-determined precursor masses and

Page 4 of 12

charge states for precursor ions which were co-isolated within the same isolation window, but could also determine the number of multiplexed precursor ions. Previously, a wide isolation window (i.e., 20 ~ 30 Th) covering a relatively small range (i.e., 500 to 900 Th) was used to acquire mass spectrometry data in DIA mode. We performed mPE-MMR analysis with publicly available DIA data10 acquired with a broad isolation window using a Q Exactive instrument, determined the number of multiplexed ions and compared these to our DIA data (Figure S3A). By this analysis, we found that a large proportion (60.9%) of MS/MS scans contained more than 30 precursor ions contributing to multiplexed spectra. This high level of multiplexing makes these data incompatible with spectrum-centric database search engines, and for this reason peptide-centric approaches with spectral libraries were introduced. With the introduction of high scan rate mass spectrometers such as the Q Exactive HF-X used in this study, however, it is becoming more feasible to use narrow isolation window (e.g. 9 Th in this study) for DIA experiments. With this narrow window, DIA MS/MS spectra are becoming less complex and could be applied with a spectrum-centric tool, MSGF+, as the first-stage search for peptide identification. Without the 1st stage SLS, we attempted to follow the typical DDA data analysis procedure using MS-GF+. This approach of bypassing SLS identified 30,647 peptides from 241,494 PSMs (Figure S3B), which is significantly less than the 51,194 identified by the mPE-MMR SLS approach. In addition, the percentage of scans producing multiple peptide identifications (35.27%) was far less than in the mPE-MMR SLS result (68.92%) (Figure S3B and S3C). This result suggested that MS/MS spectra produced by DIA with a 9-Th isolation window were still too complex for accurate interpretation, even though MS-GF+ is known to be only mildly affected by the presence of co-fragmented ions6, 32. Based on these findings, we chose to apply SLS as the firststage search method in this study. The resulting DIA MS/MS data with their assigned precursor masses were subsequently used in a multi-stage search involving SLS (1st stage), MS-GF+ (2nd stage) and MODa/MODi (3rd stage) searches (Figure 1C). Initially, MS/MS data were searched against a spectral library built from previously acquired DDA data (Figure S1, Table S1). Because the precursor masses were accurately determined with UMC masses, a stringent precursor mass tolerance (i.e., 10 ppm instead of 9 Th) could be applied in addition to a tight retention time tolerance (i.e., 0.025 NET). Based on spectrum similarity (Figure S2), two peptides (SDFASNCCSINSPPLYCDSEIDAELK and SLSQPTPPPMPILSQSEAK) were identified. The remaining MS/MS data for which peptides were not yet identified (i.e., 1001.51 Th, 1002.48 Th, and 1000.48 Th) were used as input for the MS-GF+ search, with a precursor mass ion tolerance of 10 ppm. Among the three remaining precursor masses, the precursor mass of 1002.48 Th was matched to VLYSNM*LGEENTYLWR, where M* denotes methionine oxidation (Figure 1C). Because modified peptide sequences were not included in the spectral library used in this study, peptides with oxidized methionine residues could not be identified during the 1st stage SLS, but could be identified in the 2nd stage owing to its accurately determined precursor mass. For the remaining uninterpreted MS/MS data from the multiplexed MS/MS spectrum (i.e., 1001.51 Th and 1000.48 Th), MODa/MODi search was performed with a precursor mass tolerance of 10 ppm and identified the precursor mass of

ACS Paragon Plus Environment

Page 5 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. Comparison result of conventional SLS with mPE-MMR SLS. (A) Venn diagram of peptides identified by conventional SLS and mPE-MMR SLS. (B) Rate of observation of commonly identified peptides over triplicate experiments. Among the 38,754 commonly identified peptides, 80.6% were observed in all three experiments through mPE-MMR SLS (blue bar), compared to 67.0% with conventional SLS (pink bar). (C) Distribution of mass measurement accuracies for common peptides (purple), mPE-MMR SLSonly peptides (blue) and conventional SLS-only peptides (pink). (D) Example of peptide IHFSTAPIQVFSTYSNEDYDR which was incorrectly identified by conventional SLS. The mPE-MMR SLS method correctly identified this spectrum as peptide DDKESVPISDTIIPAVPPPTDLR. Annotated MS/MS spectra and representative MS/MS spectra from the spectral library of two peptides are presented. Fragment ion peak 398.22 Th with charge state +2 was incorrectly interpreted as b3 ion through conventional SLS. The previous MS1 scan is zoomed in the isolation window on the bottom. (E) Example of dependent identification occurring in conventional SLS. Peptide PFETLLSQNQGGK was arbitrarily identified by conventional SLS, but was in fact part of the sequence of peptide ALPGQLKPFETLLSQNQGGK which was identified by both methods. MS scan for the isolation widow and annotated MS/MS spectrum of ALPGQLKPFETLLSQNQGGK are presented. 1001.51 Th as P+12SSINYMVAPVTGNDVGIR, where P+12 indicates addition of a carbon atom to the peptide’s N-terminal extremity. This modification is often observed in tandem mass spectrometry data21. Most of fragment ions in the multiplexed MS/MS spectrum were interpreted in a complementary fashion through the three stage searches (Figure 1D). The precursor of 1000.48 Th did not result in any peptide identification, probably because the correct sequence was absent from the database, or because insufficient fragment ions were available for confident identification. Advantage of accurate precursor mass assignment to DIA data: Sensitive and Accurate SLS peptide identification. The effect of the availability of unambiguously-determined precursor masses for DIA data on peptide identification by SLS was assessed by comparing the results obtained by our threestage approach to those obtained by conventional spectral library searches (conventional SLS). Only elution time tolerance was applied in conventional SLS, thus all possible MS/MS spectra with a precursor mass in the corresponding isolation window (i.e., 9 Th) were examined to determine spectrum similarity. A spectrum-to-spectrum match (SSM)-

level false discovery rate was determined and the result of conventional SLS was compared to that obtained with the spectral library search result after the mPE-MMR process (mPE-MMR SLS). As shown by the target-decoy distributions based on similarity scores, the threshold score (4.732) was slightly reduced with mPE-MMR SLS compared to the conventional SLS approach (5.752) (Figure S4 and Table S3). Thus, as summarized in Table S3, for all three replicates performed, a greater numbers of peptides were identified by the mPE-MMR SLS step (49,432; 51,194; 48,996) than by conventional SLS (36,740; 38,390; 36,852). In all, 47,013 distinct peptides were identified in the triplicate experiments by conventional SLS while mPE-MMR SLS approach identified 61,287 peptides, ca. 30% increase. When these peptides sets were compared, 38,754 peptides (55.7%) were found to be identified by both approaches, whereas 8,259 peptides (11.9%) were unique to conventional SLS and 22,533 peptides (32.4%) were only identified by mPE-MMR SLS (Figure 2A). The frequency of observation among triplicate LC-MS/MS runs was examined for peptides identified in both approaches

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(Figure 2B). Among the 38,754 commonly identified peptides, 25,983 (67.0%) were identified in all three LC-MS/MS runs using conventional SLS, whereas 31,230 (80.6%) were identified in all three experiments when using the mPE-MMR SLS strategy. In addition, just 3,526 (9%) were identified only in a single LC-MS/MS run when mPE-MMR SLS analysis was applied, compared to 6,968 (18%) when using conventional SLS. This result indicates that peptide identification was not only more sensitive but also more robust when precursor masses were assigned to DIA data. We next compared the accuracy of peptide identifications of the two methods. Experimental mass measurement accuracies (MMAs) could not be calculated for the conventional SLS approach as experimental mass were normally not determined. We therefore calculated the experimental mass post hoc by matching the theoretical mass of the peptide identified by SLS with neighboring UMCs within a mass tolerance of 10 ppm and an elution time tolerance of ±5 min. Of 8,259 peptides examined, 4,693 had matches, and a mass error was calculated between the UMC mass and the theoretical mass of the peptides identified. The MMA distributions obtained for peptides common to both approaches and for mPE-MMR SLS-only peptides displayed a normal profile, whereas conventional SLSonly peptides were randomly distributed, suggesting that many of these peptide identifications were random matches (Figure 2C). As shown in Figure 2D, conventional SLS and mPE-MMR SLS sometimes yielded different peptide sequences, IHFSTAPIQVFSTYSNEDYDR (similarity score 5.95) and DDKESVPISDTIIPAVPPPTDLR (similarity score 33.56), respectively. The failure of conventional SLS was due to the

Page 6 of 12

fact that the correct monoisotopic peak of the precursor ion was outside the isolation window (shown in the expanded MS spectrum). mPE-MMR, on the other hand, could still assign the correct precursor mass thanks to 13C mass correction and the UMC matching process. Other conventional-only peptides demonstrated the weakness of the conventional SLS. For example, as shown in Figure 2E, peptide ALPGQLKPFETLLSQNQGGK was identified by both conventional and mPE-MMR SLS, with a precursor mass of 709.39129 Th and a +3 charge state. However, conventional SLS additionally identified peptide PFETLLSQNQGGK with a charge state of +2. The m/z of this ion is 709.8682 Th which is within the isolation window. Since conventional SLS considers all possible peptide ions from the spectral library which are contained within the isolation window, PFETLLSQNQGGK also became a candidate and its MS/MS spectrum was similar to that of ALPGQLKPFETLLSQNQGGK as major fragments were produced in the sequence of PFETLLSQNQGGK. This issue has been previously reported when applying fragment-based similarity methods with multiplexed MS/MS spectra, and led to the development of a filtering-out algorithm included in MSPLIT-DIA14. The mPE-MMR process also solved this issue as no peak at 709.8682 Th was observed in the corresponding MS spectrum and no matching UMC for 709.8682 Th was available (Figure 2E). The improved sensitivity and accuracy in peptide identification by mPE-MMR SLS over the conventional SLS stems from a reduction in search space allowed by accurate precursor mass assignments, as was described previously33.

Figure 3. Identification of mutated and modified peptides not present in spectral library. (A) Summary of the numbers of peptides and protein groups identified in triplicate experiments. On average, 56,549 peptides corresponding to 7,426 protein groups were identified. (B) Percentage of peptides identified in each MS/MS scan for replicate II. Among 199,842 MS/MS scans, about 70% were found to contain spectra for multiple peptides. (C) Mass measurement accuracy distribution for these 498,475 SSMs/PSMS (corresponding to 57,834 peptides) shows a normal mass error with an offset of 0.29 ppm and a standard deviation of 1.28 ppm. (D) Annotated MS/MS spectrum of a mutated peptide identified during the second-stage search. Germline-mutated peptide YTAIPIVGQVFQSQCK was identified with a single amino acid variant (HQ). Elution time traces are shown for three abundant ions (y12, y9 and y10). (E) Bar chart of top five frequently observed biologically significant PTMs in triplicate experiments through the third stage search. The number of identified peptides of each PTM is described along with their modified residues. (F) Annotated MS/MS spectrum of a N-terminal acetylated peptide with a germline mutation that was identified through the third-stage search. N-terminal acetylated peptide MDILVSECSAR was identified with a single amino acid variant (VI). Elution time traces are shown for three abundant ions (b3, y6, and y7).

ACS Paragon Plus Environment

Page 7 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Advantage of accurate precursor mass assignment to DIA data: Identification of mutated and modified peptides not present in spectrum library. Assignment of precursor mass data to DIA MS/MS data makes it possible to analyze the DIA by conventional DDA-adapted spectrum-centric database search strategies. Thus, this approach allows identification of biologically relevant peptides that may not otherwise be identified including less abundant and post-translationally modified peptides. MS/MS data for which no peptide was identified by the 1st stage SLS were subjected to a multi-stage database search involving MS-GF+ and MODa/MODi. These two search engines were previously shown to be complementary and their combined use was demonstrated to provide more comprehensive proteome analysis in the context of a proteogenomics study23. A sample-specific protein database24 constructed using matching exome-sequencing and mRNA-sequencing data can be used to identify sample-specific mutated peptides which could not be considered during the initial SLS. On average for the triplicate experimental data, 3,778 additional peptides were identified by the second-stage MS-GF+ search (Figure 3A). Remaining MS/MS data for which no peptide identification could be assigned were further searched by the MODa/MODi approach, aiming to identify post-translationally-modified peptides, which were not considered in the two previous search stages. On average, an additional 2,897 peptides were identified by this approach (Figure 3A). By applying the three-stage search strategy, an average of 56,549 peptides were identified, corresponding to 7,426 protein groups with more than two sibling peptides (Figure 3A). In total, 72,282 distinct peptides were identified in the triplicate DIA experiments, which corresponds to a 54% increase in the number of peptides identified compared to the conventional approach (Table S4, S5, and S6). For replicate II, 57,834 peptides were identified, corresponding to 498,475 SSMs/PSMs, and most (ca. 70.7%) DIA MS/MS spectra resulted in more than two peptide identifications (Figure 3B). In addition, mass measurement accuracy showed a normal mass error distribution (an offset of 0.29 ppm and standard deviation of 1.28 ppm, Figure 3C). On average, 45 peptides per experiment were shown to contain mutation information. A representative example of a germlinemutated peptide (HQ), YTAIPIVGQVFQSQCK, identified through the second-stage search is shown in Figure 3D. The fragment ion corresponding to the mutated amino acid, glutamine, is clearly annotated in the MS/MS spectrum. In support of this identification, traces for three abundant fragment ions (y12, y9 and y10) were detectable, and are shown along with their retention times. Other peptides with various posttranslational modifications were also identified from the thirdstage MODa/MODi search. As shown in Figure 3E, the top five most frequently identified modifications over the triplicate experiments were acetylation, dehydration, methylation, formylation and phosphorylation. In addition, Figure 3F presents a representative annotated MS/MS spectrum for an Nterminal acetylated peptide with a single amino acid variant (VI). This identification was supported by the traces detected for three abundant fragment ions (b3, y6 and y7).

CONCLUSIONS With this study, we demonstrate how accurate precursor mass assignment to DIA data can improve peptide identification. Our

results show that SLS performed with accurately-determined precursor masses resulted in more sensitive, accurate and robust peptide identifications than conventional SLS approaches where precursor mass information is not exploited (Figure 2). In addition, direct application of conventional spectrum-centric database search engines such as MS-GF+ and MODa/MODi was facilitated when precursor masses were assigned (Figure 1). With the use of a sample-specific protein database, DIA data could identify mutated peptides that were absent from the initial spectral library (Figure 3D). Furthermore, significant numbers of post-translationally-modified peptides were successfully identified during the MODa/MODi search stage (Figure 3E and Figure 3F). Further studies will be needed to interpret precursor mass-assigned DIA data for which a specific spectral library is not available. Feasible approaches for acquiring lowercomplexity MS/MS spectra are to introduce additional separation space through on/offline fractionation or to use an even narrower isolation window. Also, as our method solely extract precursor mass information from MS data of the instrument’s MS dynamic range, adopting methods to obtain high-dynamic range MS scans, such as the BoxCar acquisition method34, should improve this method in identifying extremely low abundant peptide.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI : Figures for scheme of spectral library generation; procedure assessing similarity between two MS/MS spectra; complexity evaluation of multiplexed MS/MS spectra of DIA data; targetdecoy similarity score distribution of spectral library searching. Tables of DDA experiment used for spectral library generation; list of 59 PTMs considered during MODa/MODi search; summary of analysis of DIA data through spectral library search. (PDF) Lists of identified peptides though the three stage search. (xlsx)

AUTHOR INFORMATION Corresponding Author * E-mail: [email protected]; Fax: 82-3290-3121

Present Addresses †D-G.M.: Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, 55902, USA.

Author Contributions ǂD-G.M. and D.N. contributed equally.

Notes The authors declare no competing financial interest.

ACKNOWLEDGMENT This work was supported in part by a grant from the Genome Program for Fostering New Post-Genome Industry (NRF2017M3C9A5031597) and a grant (NRF-2018R1A4A1025985) awarded through the National Research Foundation, which is funded by the Korean Ministry of Science and ICT (MSIT). This work was also supported by a research grant (1711260) from National Cancer Center, Korea.

REFERENCES 1.

Zhang, Y.; Fonslow, B. R.; Shan, B.; Baek, M. C.; Yates, J.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R., 3rd, Protein analysis by shotgun/bottom-up proteomics. Chem Rev 2013, 113 (4), 2343-94. 2. Aebersold, R.; Mann, M., Mass-spectrometric exploration of proteome structure and function. Nature 2016, 537 (7620), 347-55. 3. Ting, Y. S.; Egertson, J. D.; Payne, S. H.; Kim, S.; MacLean, B.; Kall, L.; Aebersold, R.; Smith, R. D.; Noble, W. S.; MacCoss, M. J., Peptide-Centric Proteome Analysis: An Alternative Strategy for the Analysis of Tandem Mass Spectrometry Data. Mol Cell Proteomics 2015, 14 (9), 2301-7. 4. Eng, J. K.; Mccormack, A. L.; Yates, J. R., An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-AcidSequences in a Protein Database. J Am Soc Mass Spectr 1994, 5 (11), 976989. 5. Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S., Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551-3567. 6. Kim, S.; Pevzner, P. A., MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 2014, 5, 5277. 7. Michalski, A.; Damoc, E.; Hauschild, J. P.; Lange, O.; Wieghaus, A.; Makarov, A.; Nagaraj, N.; Cox, J.; Mann, M.; Horning, S., Mass Spectrometry-based Proteomics Using Q Exactive, a Highperformance Benchtop Quadrupole Orbitrap Mass Spectrometer. Mol Cell Proteomics 2011, 10 (9). 8. Kelstrup, C. D.; Bekker-Jensen, D. B.; Arrey, T. N.; Hogrebe, A.; Harder, A.; Olsen, J. V., Performance Evaluation of the Q Exactive HFX for Shotgun Proteomics. Journal of Proteome Research 2018, 17 (1), 727-738. 9. Tabb, D. L.; Vega-Montoto, L.; Rudnick, P. A.; Variyath, A. M.; Ham, A. J.; Bunk, D. M.; Kilpatrick, L. E.; Billheimer, D. D.; Blackman, R. K.; Cardasis, H. L.; Carr, S. A.; Clauser, K. R.; Jaffe, J. D.; Kowalski, K. A.; Neubert, T. A.; Regnier, F. E.; Schilling, B.; Tegeler, T. J.; Wang, M.; Wang, P.; Whiteaker, J. R.; Zimmerman, L. J.; Fisher, S. J.; Gibson, B. W.; Kinsinger, C. R.; Mesri, M.; Rodriguez, H.; Stein, S. E.; Tempst, P.; Paulovich, A. G.; Liebler, D. C.; Spiegelman, C., Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res 2010, 9 (2), 761-76. 10. Bruderer, R.; Bernhardt, O. M.; Gandhi, T.; Miladinovic, S. M.; Cheng, L. Y.; Messner, S.; Ehrenberger, T.; Zanotelli, V.; Butscheid, Y.; Escher, C.; Vitek, O.; Rinner, O.; Reiter, L., Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol Cell Proteomics 2015, 14 (5), 1400-10. 11. Hu, A.; Noble, W. S.; Wolf-Yadlin, A., Technical advances in proteomics: new developments in data-independent acquisition. F1000Res 2016, 5. 12. Gillet, L. C.; Navarro, P.; Tate, S.; Rost, H.; Selevsek, N.; Reiter, L.; Bonner, R.; Aebersold, R., Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 2012, 11 (6), O111 016717. 13. MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J., Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26 (7), 966-968. 14. Wang, J.; Tucholska, M.; Knight, J. D.; Lambert, J. P.; Tate, S.; Larsen, B.; Gingras, A. C.; Bandeira, N., MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat Methods 2015, 12 (12), 1106-8. 15. Weisbrod, C. R.; Eng, J. K.; Hoopmann, M. R.; Baker, T.; Bruce, J. E., Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J Proteome Res 2012, 11 (3), 1621-32. 16. Barkovits, K.; Linden, A.; Galozzi, S.; Schilde, L.; Pacharra, S.; Mollenhauer, B.; Stoepel, N.; Steinbach, S.; May, C.; Uszkoreit, J.; Eisenacher, M.; Marcus, K., Characterization of Cerebrospinal Fluid via Data-Independent Acquisition Mass Spectrometry. J Proteome Res 2018, 17 (10), 3418-3430. 17. Ting, Y. S.; Egertson, J. D.; Bollinger, J. G.; Searle, B. C.; Payne, S. H.; Noble, W. S.; MacCoss, M. J., PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat Methods 2017, 14 (9), 903-908. 18. Tsou, C. C.; Avtonomov, D.; Larsen, B.; Tucholska, M.; Choi, H.; Gingras, A. C.; Nesvizhskii, A. I., DIA-Umpire: comprehensive

Page 8 of 12

computational framework for data-independent acquisition proteomics. Nat Methods 2015, 12 (3), 258-64, 7 p following 264. 19. He, L.; Diedrich, J.; Chu, Y. Y.; Yates, J. R., 3rd, Extracting Accurate Precursor Information for Tandem Mass Spectra by RawConverter. Anal Chem 2015, 87 (22), 11361-7. 20. Madar, I. H.; Ko, S. I.; Kim, H.; Mun, D. G.; Kim, S.; Smith, R. D.; Lee, S. W., Multiplexed Post-Experimental Monoisotopic Mass Refinement (mPE-MMR) to Increase Sensitivity and Accuracy in Peptide Identifications from Tandem Mass Spectra of Cofragmentation. Anal Chem 2017, 89 (2), 1244-1253. 21. Na, S.; Jeong, J.; Park, H.; Lee, K. J.; Paek, E., Unrestrictive identification of multiple post-translational modifications from tandem mass spectrometry using an error-tolerant algorithm based on an extended sequence tag approach. Mol Cell Proteomics 2008, 7 (12), 2452-63. 22. Na, S.; Bandeira, N.; Paek, E., Fast Multi-blind Modification Search through Tandem Mass Spectrometry. Mol Cell Proteomics 2012, 11 (4). 23. Madar, I. H.; Lee, W.; Wang, X. J.; Ko, S. I.; Kim, H.; Mun, D. G.; Zhang, B.; Paek, E.; Lee, S. W., Comprehensive and sensitive proteogenomics data analysis strategy based on complementary multi-stage database search. Int J Mass Spectrom 2018, 427, 11-19. 24. Park, H.; Bae, J.; Kim, H.; Kim, S.; Kim, H.; Mun, D. G.; Joh, Y.; Lee, W.; Chae, S.; Lee, S.; Kim, H. K.; Hwang, D.; Lee, S. W.; Paek, E., Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses. Proteomics 2014, 14 (23-24), 2742-9. 25. Park, J. M.; Park, J. H.; Mun, D. G.; Bae, J.; Jung, J. H.; Back, S.; Lee, H.; Kim, H.; Jung, H. J.; Kim, H. K.; Lee, H.; Kim, K. P.; Hwang, D.; Lee, S. W., Integrated analysis of global proteome, phosphoproteome, and glycoproteome enables complementary interpretation of disease-related protein networks. Sci Rep 2015, 5, 18189. 26. Lee, H.; Mun, D. G.; So, J. E.; Bae, J.; Kim, H.; Masselon, C.; Lee, S. W., Efficient Exploitation of Separation Space in TwoDimensional Liquid Chromatography System for Comprehensive and Efficient Proteomic Analyses. Anal Chem 2016, 88 (23), 11734-11741. 27. Escher, C.; Reiter, L.; MacLean, B.; Ossola, R.; Herzog, F.; Chilton, J.; MacCoss, M. J.; Rinner, O., Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 2012, 12 (8), 1111-21. 28. Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; Pasa-Tolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y. F.; Zhao, R.; Smith, R. D., Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. Anal Chem 2003, 75 (5), 1039-1048. 29. Lam, H.; Deutsch, E. W.; Aebersold, R., Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. J Proteome Res 2010, 9 (1), 605-10. 30. Creasy, D. M.; Cottrell, J. S., Unimod: Protein modifications for mass spectrometry. Proteomics 2004, 4 (6), 1534-6. 31. Perez-Riverol, Y.; Csordas, A.; Bai, J.; Bernal-Llinares, M.; Hewapathirana, S.; Kundu, D. J.; Inuganti, A.; Griss, J.; Mayer, G.; Eisenacher, M.; Perez, E.; Uszkoreit, J.; Pfeuffer, J.; Sachsenberg, T.; Yilmaz, S.; Tiwary, S.; Cox, J.; Audain, E.; Walzer, M.; Jarnuczak, A. F.; Ternent, T.; Brazma, A.; Vizcaino, J. A., The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 2019, 47 (D1), D442-D450. 32. Li, H.; Hwang, K. B.; Mun, D. G.; Kim, H.; Lee, H.; Lee, S. W.; Paek, E., Estimating influence of cofragmentation on peptide quantification and identification in iTRAQ experiments by simulating multiplexed spectra. J Proteome Res 2014, 13 (7), 3488-97. 33. Masselon, C.; Pasa-Tolic, L.; Lee, S. W.; Li, L.; Anderson, G. A.; Harkewicz, R.; Smith, R. D., Identification of tryptic peptides from large databases using multiplexed tandem mass spectrometry: simulations and experimental results. Proteomics 2003, 3 (7), 1279-86. 34. Meier, F.; Geyer, P. E.; Virreira Winter, S.; Cox, J.; Mann, M., BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes. Nat Methods 2018, 15 (6), 440-448.

ACS Paragon Plus Environment

Page 9 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

ACS Paragon Plus Environment

9

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Overall workflow of DIA data acquisition and analysis. 176x70mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 10 of 12

Page 11 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Comparison result of conventional SLS with mPE-MMR SLS. 175x102mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Identification of mutated and modified peptides not present in spectral library. 166x76mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 12 of 12