microDIA (µDIA): data-independent acquisition for ... - ACS Publications

spectral deconvolution, queries every peptide with a library-free search algorithm against a .... tered to a 0 % or 1 % protein FDR based on applying ...
1 downloads 0 Views 982KB Size
Subscriber access provided by University of South Dakota

Article

microDIA (µDIA): data-independent acquisition for high-throughput proteomics and sensitive peptide mass spectrum identification Michael R Heaven, Archie L Cobbs, Yuan-Wei Nei, Danielle B. Gutierrez, Anthony W Herren, Harsha P Gunawardena, Richard M. Caprioli, and Jeremy L. Norris Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b01026 • Publication Date (Web): 09 Jul 2018 Downloaded from http://pubs.acs.org on July 11, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

microDIA (µDIA): data-independent acquisition for high-throughput proteomics and sensitive peptide mass spectrum identification Michael R. Heaven,1 Archie L. Cobbs,1 Yuan-Wei Nei,†2 Danielle B. Gutierrez,2 Anthony W. Herren,3 Harsha P. Gunawardena,4 Richard M. Caprioli,2 and Jeremy L. Norris2* 1 2

Vulcan Analytical, Birmingham, AL 35203 Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN 37240

3

University of California at Davis Proteomics Core, Davis, CA 95616

4

Janssen Research and Development, The Janssen Pharmaceutical Companies of Johnson & Johnson, Spring House, PA 19002

ABSTRACT: State-of-the-art strategies for proteomics are not able to rapidly interrogate complex peptide mixtures in an untargeted manner with sensitive peptide and protein identification rates. We describe a data-independent acquisition (DIA) approach, microDIA (µDIA), that applies a novel MS/MS mass spectral deconvolution method to increase the specificity of tandem mass spectra acquired during proteomics experiments. Using the µDIA approach with a 10-min liquid chromatography gradient allowed detection of 3.1-fold more HeLa proteins than the results obtained from data-dependent acquisition (DDA) of the same samples. Additionally, we found the µDIA MS/MS deconvolution procedure is critical for resolving modified peptides with relatively small precursor mass shifts that cause the same peptide sequence in modified and unmodified form to theoretically co-fragment in the same raw MS/MS spectra. The µDIA workflow is implemented in the PROTALIZER software tool which fully-automates tandem mass spectral deconvolution, queries every peptide with a library-free search algorithm against a user-defined protein database, and confidently identifies multiple peptides in a single tandem mass spectrum. We also benchmarked µDIA against DDA using a 90-min gradient analysis of HeLa and E. coli peptides that were mixed in predefined quantitative ratios, and our results showed µDIA provided 24% more true positives at the same false positive rate.

In recent years, quantitative proteomics with data-independent acquisition (DIA) has gained momentum due to advances in scan speed, mass measurement accuracy, and software application tools.1 In DIA experiments, a mass filter transmits a preselected isolation window range for untargeted fragmentation, typically between 10-800 mass-to-charge (m/z) wide, followed by a fullscan analysis of the resulting fragment ions (see S-1 for an overview of common DIA methods). To enable detection of a broad range of peptide masses, larger isolation windows are applied during DIA sample analysis than traditional data-dependent acquisition (DDA) methods.2 As a result, DIA generates highly complex mass spectra, documented to contain as many as ten peptides per tandem mass spectrum, and with lower signal-tonoise than DIA analyses with smaller isolation window lengths.3-5 Therefore, investigators are forced to employ lengthy liquid chromatography (LC) separations to interrogate whole cell lysate protein digestions by DIA, reducing throughput and the number of samples that can be analyzed with a finite number of instruments.1,3-5 Another important consequence of the complex characteristics of DIA tandem spectra are that specialized data processing is required to accurately extract peptide identifications. To date, most DIA analyses use spectral libraries for peptide identification. These libraries can be produced from DDA analysis of the same samples (i.e., local libraries),6 or derived from generic libraries such as the Human Peptide Atlas.7 The shortcomings with library-based approaches are that data interpretation is restricted to peptides in the library searched, and local libraries incur additional experimental costs during the setup of the database. Alternatively, several library-free software tools have been developed,

each with various deficiencies. Of these, DIA-Umpire cannot detect peptides with unobservable precursor signals, which account for up to 31% of the total detectable peptides in a sample.8,9 Nor can PECAN, a tool for peptide-centric library-free searches, confidently identify modified peptides with small precursor mass shifts from the same corresponding peptide in unmodified form.10,11 We report microDIA (µDIA), a platform for DIA-based proteomics, which contains a deconvolution strategy to resolve multiple peptide candidates (including modifications) that may theoretically match to a tandem MS scan. Unlike multiplexed scanning DIA,12 µDIA can be used to analyzed data from any highresolution tandem mass spectrometer and applies fragment ion intensities to correctly assign peptides to tandem mass spectra with fewer peaks and narrower isolation windows. To query peptides in the resulting µDIA tandem mass spectra, we created a library-free search tool, Protein Farmer, which has peptide-centric searching capabilities to allow detecting multiple peptides per tandem MS scan and peptides without observable precursor signals. Notably, Protein Farmer also has quality-control identification features similar to the spectral-library identification tool MSPLIT-DIA,4 which reassigns fragment ions erroneously matched to multiple peptides in the same MS/MS scans solely to the best match based on a peptide-spectrum-match (PSM) expectation scoring algorithm. In this study, we describe the experimental setup for µDIA and evaluate this novel DIA approach for the rapid acquisition of proteomics data as well as the determination of differentially abundant proteins in label-free samples.

EXPERIMENTAL SECTION

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Cell protein extracts and digestion. HeLa S3 cell lysate tryptic digests from Pierce (Rockford, IL) were reconstituted in 0.1 % formic acid. E. coli strain K12 ER2420/pACYC184 from New England Biolabs (Ipswich, MA) was grown to mid log phase, pelleted, resuspended in 20 mM tris/100 mM NaCl/DNAse, and centrifuged at 20,000 x g for 15-min. Soluble proteins were BCA assayed (Pierce) and digested using a trypsin/Lys-C mixture from Promega (Madison, WI) according to the manufacturer’s protocol. The resulting E. coli peptides were desalted using an Agilent AssayMAP BRAVO instrument with C18 cartridges (Santa Clara, CA) and reconstituted in 0.1 % formic acid. LC-MS/MS. Data were acquired in positive mode using a Bruker Impact II UHR Q-TOF mass spectrometer equipped with a CaptiveSpray source (Billerica, MA), a pulled tip emitter column, (360 µm O.D. x 100 µm I.D. x 35 cm L, packed with BEH C18, 1.7 µm, 130 Å resin), and a Waters nanoAcquity UPLC (Milford, MA). A flow rate of 400 nL/min was used with 10-min loading and washing in buffer A (0.1 % formic acid in water) and 10, 30, 60, or 90-min linear gradients from 5-30% solvent B (99.9 % acetonitrile in 0.1 % formic acid) followed by 30-95 % solvent B for 10-min. Both MS and MS/MS scans in µDIA mode were acquired with a 37 Hz scan rate from 140-1200 m/z and centroided. The MS/MS inclusion list stepped + 6 m/z per scan and had a 9 m/z wide isolation window. A single MS scan and 120 MS/MS scans were acquired per 3.4-sec duty cycle across 400-1115 precursor m/z for all samples, except the 10-min gradient analyses which used 66 MS/MS scans per 1.8-sec duty cycle spanning 440830 precursor m/z. Collision induced dissociation (CID) was used with fragmentation energies corresponding to a + 2 precursor charge state. DDA analysis. DDA files were collected as described previously,13 with a top 17 precursor method and dynamic exclusion for 24 seconds. DDA data were searched with MaxQuant14 (v1.5.7.4) and processed using the label-free quantification (LFQ) algorithm and Perseus (v1.6.0.2) (S-2). µDIA analysis. Samples acquired in µDIA mode were analyzed by PROTALIZER (v1.1.3.2) from Vulcan Analytical (Birmingham, AL). Raw µDIA files were processed with a peak picking algorithm after conversion to mzML format by dividing peaks in each MS or MS/MS scan into continuous 50 m/z ranges (200-250, 250-300, etc.). Next, the peaks in each 50 m/z span were sorted by intensity, and the most intense peaks in each scan were kept until ten or five consecutive peaks for MS and MS/MS scans, respectively, had a sum intensity difference < 5% (see S-3 for details). Peak picked files were subjected to µDIA deconvolution as described below (Figure 1), unless noted as being a nondeconvoluted analysis. µDIA spectra were queried by the Protein Farmer search tool against all peptides with + 2 and + 3 precursor charge states within each isolation window. All peptides between 6-40 amino acids with up to one missed-cleavage were searched in the human Swiss-Prot reference database containing 20,231 sequences (2018-01 release). For the spike-in analyses, a concatenated database including both human and E. coli strain K12 Swiss-Prot reference databases were searched with 24,516 sequences (2018-01 releases). All theoretical b/y fragment ions, except for b1 and y1 (due to having low specificity), were searched in the + 1 charge state, as well as the + 2 charge state for fragment ions containing basic Lys, His, or Arg residues. Potential modifications queried were oxidation of Met, N-terminal protein acetylation, N-terminal protein Met cleavage and acetylation, and pyroglu of N-terminal Gln and Glu. Carbamidomethylation of Cys residues were searched as a fixed modification. Matches with > 6 fragment ions, within ± 50 ppm, and in the top 200 peaks per MS/MS spectrum were retained as candidate PSMs. To ensure co-elution shape agreement,15 initial PSMs were checked by iden-

tifying fragment ions observed in consecutive MS/MS scans of the same isolation window within a ± 0.015 m/z tolerance and eliminating those lacking at least 6 fragment ions with a relative intensity of > 70 % from the apex intensity of each fragment ion. To adjust the match probability of PSMs that have detectable precursors, the algorithm determined if the theoretical precursor monoisotopic, as well as first isotopic peak, were observed in the top 500 peaks in the MS scan with the most similar retention time to the MS/MS scan the peptide was found in. Uncalibrated PSMs were then assigned expectation scores (E), with smaller scores representing more significant matches: 1 ‫=ܧ‬ ݂ ‫ܯ‬ ∑൬ ൰+ቀ ቁ ܿ+݁+݅ ݁+݅ Let f = the number of amino acids in each fragment ion; c = the number of consecutive MS/MS spectra a fragment ion was detected with a ± 0.015 m/z tolerance in the same isolation window, where c = 1.3 for a fragment ion detected in only 1 spectrum, c = 1.2 for 2 spectra, c = 1 for > 3 consecutive spectra; e = m/z error, whereby the total m/z tolerance was divided into five smaller equally sized sections, and if the peak was in the 1st section with the least amount of error e = 1, 2nd section e = 1.5, 3rd section e = 2, 4th section e = 2.5, 5th section e = 3; i = intensity rank of each ion in a spectrum where the total number of peaks considered for matching was divided into five equally sized sections with the most intense section i = 1, 2nd most intense section i = 1.3, 3rd most intense section i = 1.6, 4th most intense section i = 2, and 5th most intense section i = 3; M = 15 if a precursor monoisotopic and 1st isotopic peak were detected in an MS scan with the most similar retention time to the MS/MS spectrum being scored, if this condition is not met M = 0. To remove low confidence identifications, different peptides (including modified and associated non-modified forms with the same peptide sequence), that share fragment ions in the same MS/MS spectrum, and had similar E scores < 1.5-fold, were not reported. For PSMs in the same MS/MS spectrum matched to at least one of the same fragment ions that had E score differences > 1.5-fold, only the peptide with the most significant score was reported. Redundant PSMs in multiple MS/MS spectra were condensed to the most significant PSM for each distinct peptide ion, and modifications lacking 100 % site localization confidence were discarded. During m/z error calibration all the peaks in each sample were adjusted by the median m/z error for all the fragment ions and precursors in each 25 m/z range (e.g., 200-225). Peptides detected by the uncalibrated search were re-queried to eliminate precursor and fragment ions not within a maximum ± 13 ppm m/z tolerance and calibrated E scores were computed. To assign peptides to surrogate proteins in the Swiss-Prot database, the peptides unique to a single entry were assigned to that protein. Razor peptides were assigned to the protein with the greatest percent sequence coverage. If a razor peptide was assigned to multiple proteins having the most and equal sequence coverage, the peptide was assigned to only the alpha or isoform 1 protein entry. Final E scores were determined from the quotient of dividing the calibrated E score by the corresponding protein percent sequence coverage. The target-decoy strategy was used to calculate false discovery rates (FDRs).16 All searches were filtered to a 0 % or 1 % protein FDR based on applying an E score cutoff that maximized the total proteins detected and resulted in that fraction of decoy protein identifications out of the total proteins detected. To test the target-decoy strategy implemented, we queried HeLa cell human peptides against an E. coli strain K12 Swiss-Prot reference database, with the anticipation that none or very few true positives should be detected, and indeed, this was

ACS Paragon Plus Environment

Page 2 of 9

Page 3 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry the case (S-4). The peptides identified were quantified across samples using normalized retention time and intensity values in MS/MS chromatograms with a minimum of four fragment ions (S-5).

RESULTS AND DISCUSSION µDIA acquisition and deconvolution Tandem mass spectra were collected with partially overlapping 9 m/z isolation windows which increase by 6 m/z in every tandem MS scan per duty cycle. This acquisition method generates three distinct precursor isolation window sections for each raw MS/MS scan: two overlapping sections at the flanking ends of each isolation window and a center region with no precursor overlap to any other scan. For µDIA deconvolution, the algorithm first identifies peptide fragments within ± 0.015 m/z with similar intensities that are detected in two sequential MS/MS scans and are derived from precursor ions in the overlapping region of the MS/MS scans. These fragment ions are assigned to a reconstructed MS/MS spectrum with a 3-fold narrower precursor window range than the acquisition window width (see Figure 1 examples LACDVDQVTR and STLVCPECAK, as well as S-6 for intensity-based optimization).

Figure 1. µDIA acquisition and parsing of fragment ions into narrower isolation windows. (A) In each duty cycle, 120 MS/MS scans with partially overlapping isolation windows were collected across the 400-1115 precursor m/z range. (B) Raw MS/MS example scan N with a 9 m/z isolation window containing multiple peptides. (C) Raw MS/MS scan N deconvoluted into three synthetic scans with 3 m/z isolation windows. The intensities of fragment ions in two overlapping scans are summed together in the resulting deconvoluted scans. Data shown are PSMs from a 30-min LC-µDIA analysis of HeLa tryptic peptides. All Cys residues were carbamidomethylated.

Peptide fragment ions are then assigned to center section deconvoluted MS/MS spectra that exhibit heightened intensity or were only present in the raw scan undergoing deconvolution relative to cross-referencing the former and next overlapping sequential MS/MS scans with the most similar precursor window ranges. The precursor isolation windows in center section MS/MS spectra were also subjected to a 3-fold reduction to only span across the non-overlapping precursor window range (see Figure 1 example VVGNPCPICR).

Evaluation of identification rates with varying LC gradients We propose µDIA as a practical strategy for the analysis of large numbers of samples via rapid reversed-phase HPLC. To systematically demonstrate this, we separated HeLa cell tryptic digestions with 10, 30, 60, as well as 90-min LC gradients and performed µDIA analysis along with benchmarking to a widelyadopted DDA-MaxQuant13,14 method.

Figure 2. µDIA increases peptide and protein identification rates. (A) Average number of non-redundant peptide ions and proteins identified from 1 µg of HeLa tryptic peptides at 1% protein FDR by µDIA-PROTALIZER benchmarked to DDAMaxQuant. Error bars show stdev. of the mean. (B) Overlap of proteins identified with 90-min gradients by each method in any of four technical replicates. (C) Average number of peptides matched to MS/MS spectra in the 10-min time-point with and without deconvolution. Data shown are the percentages of MS/MS scans where the most significant Protein Farmer E score for each peptide was detected. The number of MS/MS spectra with 5 and 6 peptide matches were rare and are shown in S-8. (AC) All time-points were analyzed in quadruplicate.

Database searches with the human Swiss-Prot reference proteome resulted in µDIA-PROTALIZER detecting more peptides and proteins than DDA-MaxQuant at a 1% protein FDR (Figure 2A). The most pronounced increase in detections for µDIA was in the rapid 10-min gradient where an average of 12,574 (4.4fold) more peptides and 2,062 (3.1-fold) additional proteins were found. Notably, even with a typical 90-min separation, µDIA was able to detect an average of 17,667 (1.8-fold) more peptides and 950 (1.3-fold) more proteins than DDA (Figure 2A). As noted

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

previously with other DIA search tools,8,10 we also found high overlap between the proteins identified by µDIA-PROTALIZER and DDA-MaxQuant in the 90-min gradient results (Figure 2B). In this comparison, µDIA identified 3,674 of the 4,226 (87%) proteins detected by DDA. Additionally, we observed the greatest sensitivity increase comparing identification rates with and without µDIA deconvolution at the 10-min time-point, where µDIA deconvolution provided an additional 3,403 (1.3-fold) peptides and 634 (1.3-fold) proteins than the results without deconvolution (S-7). In addition to the inherent ability of DIA to be less restricted by MS/MS sampling rates than DDA approaches, both the µDIA MS/MS deconvolution procedure and several other capabilities in the Protein Farmer search engine enable it to robustly detect peptides in rapid gradients including: (i) the ability to identify multiple peptides in a single MS/MS spectrum, and (ii) peptides below the limit of detection (LOD) in MS survey scans. For example, as many as 6 peptides were matched to a single MS/MS spectrum in 10-min separations (Figure 2C, see S-8 for other gradients). We also found that a majority, 60%, of the total peptides detected in the 10-min gradient µDIA analysis lacked an observable precursor signal, which is noteworthy since this entire population of peptides would be missed by precursor-dependent approaches such as MSE17 or DIA-Umpire.8 Overall, our 90-min gradient data closely agreed with a prior gas-phase fractionation (GPF) study,9 with 31% of the peptides found lacking detectable precursor signals, and as a general trend we found that more rapid gradients increased the number of peptide matches per MS/MS scan as well as the percentage of peptides without observable precursor signals (see S-9 for the percentage of peptides without observable precursors across all gradients).

Querying modified peptides in µDIA MS/MS spectra Legitimate concerns have been raised whether modified peptides can be reliably detected with peptide-centric searching tools that do not require precursor signal detection.10 Therefore, in lieu of using peptide standards to validate a relatively small number of modified peptides, we verified the ability of µDIA-PROTALIZER to detect modifications by determining if a substantial proportion of the modifications found putatively represent false identifications based on the premise that modifications introduce retention time shifts from their unmodified peptide counterparts.18 The common peptide modifications searched for in these analyses included oxidation of Met, N-terminal protein acetylation, Nterminal protein Met cleavage and acetylation, as well as pyro-glu of N-terminal Glu and Gln residues. We found that only 2 (0.4%) of 482 modified peptides in the 10-min gradient results also had detectable amounts of the same unmodified peptide sequence with elution shifts less than the estimated peptide elution width of 10sec (Figure 3). Overall, this suggests these peptides were indeed modified, and that µDIA-PROTALIZER has the ability to identify modifications without substantial artifacts from unmodified peptides. Additionally, µDIA-PROTALIZER MS/MS searches were able to detect oxidized peptides with multiple Met residue acceptor sites in the same peptide sequence with distinct elution times from each other and the corresponding unmodified peptide (S-10). Last, we found the µDIA deconvolution is critical for resolving three of the modifications analyzed with relatively small precursor mass shifts that cause the same peptide sequence in modified and unmodified form to theoretically co-fragment in the same raw MS/MS spectra (Met oxidation, N-terminal pyro-glu of Gln and Glu). Specifically, of the modifications observed in the search results without using isolation window deconvolution that were not found in the deconvoluted MS/MS spectra, 6 of 52 (11.5%) were low confidence based on having elution shifts less than 10-

sec between the unmodified and modified version of the same peptide sequence (S-11). Having established PROTALIZER searches of deconvoluted µDIA MS/MS spectra provide confident detection of modified peptides, we compared the number of each modification detected to DDA-MaxQuant analyses (see S-12 for the number of modifications with each gradient). Even in the 90-min gradient results from DDA-MaxQuant, µDIA-PROTALIZER MS/MS searches found 419 (3.1-fold) more oxidized Met peptides, 347 (1.8-fold) more N-terminal protein acetylated peptides, 235 (3.3-fold) more pyro-glu of N-terminal Glu peptides, and 832 (2.7-fold) additional pyro-glu of N-terminal Gln modified peptides.

Figure 3. Retention time shift analysis of common modifications. Histograms of the elution shift introduced by the addition of each modification in a 10-min µDIA-PROTALIZER analysis. (∆) differences in elution were calculated by subtracting the unmodified peptide retention time from the corresponding modified peptide with the same amino acid sequence. Overall, the elution shifts observed indicated Met oxidized peptides eluted earlier, while both N-terminal acetylated peptides as well as pyro-glu of N-terminal Glu and Gln residues were observed later in the gradient than the same peptides without modifications. Data for Nterminal protein acetylation at the 1st protein residue had 3 peptides where the unmodified form was also detected with elution shifts ranging from 1.2-2.1 (min) and are not shown. 7 outliers with elution differences exceeding three minutes are shown as either -3 or +3 (min).

Validation of µDIA quantification

ACS Paragon Plus Environment

Page 4 of 9

Page 5 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry To the best of our knowledge, no reports have shown a largescale quantitative comparison between a DDA versus DIA workflow across thousands of proteins in known ratios, despite a substantial body of DIA literature.1,3-6,8-12,15,17 Notably, there are also limitations in prior DIA and DDA quantification assessment studies: due to using insufficient numbers of differentially abundant peptides,6,10,12 or applying differences in spiked-in peptides that exceed the distribution of fold-changes typically observed across different phenotypes compared in many experiments.13,19 To address these deficiencies, we created two samples: 1) one µg of HeLa tryptic peptides spiked with 500 ng of E. coli tryptic peptides, and 2) one µg of HeLa tryptic peptides spiked with 250 ng of E. coli tryptic peptides. Both samples were analyzed with a typical 90-min LC gradient in technical quadruplicate by µDIAPROTALIZER with and without isolation window deconvolution versus the most widely-adopted untargeted proteomic strategy, DDA-MaxQuant.13,14,19 Initially, to validate if the spike-in datasets represented highquality assays, we determined the coefficient-of-variation (CV) separately for replicates with differing E. coli amounts. In total, the combined average CV was less than 15% for all the methods (µDIA with deconvolution: 14% CV across 5,073 proteins quantified; without µDIA deconvolution: 12% CV for 4,934 proteins quantified; and DDA: 11% CV for 4,458 proteins quantified). The lower CV values in the DDA results are possibly due to using a slower 5 Hz MS scan for quantification instead of a 37 Hz MS/MS scan rate for µDIA quantification.

Figure 4. µDIA provides more accurate quantification. Sample A (1 µg of HeLa tryptic peptides and 500 ng of E. coli tryptic peptides) was compared to sample B (1 µg of HeLa tryptic peptides and 250 ng of E. coli tryptic peptides). Both samples were analyzed in technical quadruplicate with a 90-min gradient. ROC curves were generated using P-values as the classifier. (TP) true positives are E. coli proteins with average fold-change increases in sample A:B and P-values that correspond to each false positive rate; (TN) true negatives are HeLa-derived human proteins with P-values greater than the cutoff used to meet each false positive rate shown; (FP) false positives are human proteins with P-values less than the cutoff value for each false positive rate shown, or E.

coli proteins with an erroneous average increase in sample B:A with P-values less than the cutoff value for a particular false positive rate; (FN) false negatives are E. coli proteins with P-values greater than the cutoff for each false positive rate. A total of 3,708, 3,726, and 3,256 proteins were quantified by > 2 peptides for µDIA with deconvolution, without deconvolution, and DDA, respectively. Whereas a total of 1,365, 1,208, and 1,202 proteins were quantified by a single peptide for µDIA with deconvolution, without deconvolution, and DDA, respectively. Due to spiking a two-fold E. coli difference into a constant amount of HeLa tryptic peptides, we were able to determine the exact rates of true and false assay results with each approach. To optimize the performance of the Max Quant DDA results, we analyzed the dataset using quantitative values derived from raw intensity, LFQ, LFQ normalized to total peptide mass per sample, and with as well as without Perseus statistical processing. We found that LFQ values normalized to the total peptide mass per sample provided the best quantitative performance, and Perseus processing compared to using the same t-test statistical analysis that was applied on the µDIA dataset provided equivocal results (S-13). Therefore, in all the DDA analyses below we used LFQ values normalized to the total peptide mass per sample and the same t-test analysis that was used on the µDIA results. Statistical analysis of the resulting fold-change values across sample replicates indicated µDIA with isolation window deconvolution provided superior true positive rates in receiver-operator curves (ROC) (Figure 4, S-14). With a t-test statistical analysis at a 5% false positive rate, µDIA with isolation window deconvolution provided a true positive rate of 94% for proteins quantified by > 2 peptides and 76% for proteins quantified by a single peptide. Without µDIA deconvolution resulted in a very similar 93% true positive rate for proteins quantified by > 2 peptides, but a drastically smaller 56% true positive rate for proteins assayed by only a single peptide. The corresponding results from DDA yielded the worst quantification at a 5% false positive rate with a 60% and 39% true positive rate for proteins assayed by > 2 peptides and a single peptide, respectively. The improved quantification performance of µDIA is perhaps a product of employing more specific MS/MS spectra for quantification coupled with an independent fragment ion measurement procedure that quantifies each fragment ion assigned to a peptide independently across all the samples compared and uses the median fold-change obtained from the entire set of fragment ions as the final peptide fold-change value per sample (see S-5 for a description). Next, we assessed if foldchange was a better discriminator of truly differential versus unchanged proteins, and indeed every approach exhibited improvement (S-15). However, using fold-change alone is generally not as meaningful as statistical evaluation, since protein differences occur across a diverse fold-change range in biologically-relevant samples that are not known a priori. Because most of the false positives from each method were associated with protein changes < 1.5-fold, we reanalyzed the datasets using a positive result criterion of both > 1.5 average foldchange and P-value < 0.05 as described previously .20 Each of the methods had a 1-2% false positive rate with these cutoffs, but µDIA with isolation window deconvolution provided 915 true positive proteins compared to 863 without µDIA deconvolution, and 740 by DDA. The corresponding true positive rates were 92%, 88%, and 85% for µDIA with deconvolution, µDIA without deconvolution, and DDA, respectively. Upon expressing these proteome-wide results as fold-change versus P-value volcano plots, µDIA with and without isolation window deconvolution, as well as the DDA workflow, had very accurate quantification based on the finding of a protein distribution centered at zero fold-

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

change for HeLa true negative proteins and a two-fold difference for E. coli true positives (see S-16 for volcano plots with each method).

analyzed the DDA data and M.R.H. analyzed the µDIA results. The manuscript was written by M.R.H. with input from A.L.C., D.B.G., A.W.H., H.P.G., R.M.C., and J.L.N.

Notes

CONCLUSION We have presented a complete technology, µDIA, to enable rapid quantitative proteomic analyses suitable for interrogating large numbers of samples by mass spectrometry. Through reducing precursor mass uncertainty and the number of fragment ions in MS/MS spectra, the µDIA deconvolution strategy creates synthetic MS/MS scans with specificities that resemble GPF assays,9 but can be applied in a single LC-µDIA analysis per sample. We also demonstrated that µDIA isolation window deconvolution is important for correctly detecting modified peptides and minimizing false quantification results for proteins measured by a single peptide. Additionally, we showed µDIA provides favorable performance compared to a leading DDA workflow, especially when acquiring data under compressed chromatographic timescales (< 60-min).

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website. S-1 Review of various DIA techniques (SI.pdf) S-2 MaxQuant parameters (SI.pdf) S-3 Effects of PROTALIZER peak picking (SI.pdf) S-4 Target-decoy evaluation (SI.pdf) S-5 PROTALIZER MS/MS chromatogram description (SI.pdf) S-6 Intensity parameters for µDIA deconvolution (SI.pdf) S-7 Number of peptide and protein identifications with and without µDIA isolation window deconvolution (SI.pdf) S-8 Number of peptides detected per MS/MS µDIA spectrum (SI.pdf) S-9 Percentage of peptides lacking detectable precursor signals (SI.pdf) S-10 Met oxidized peptides with multiple acceptor sites (SI.pdf) S-11 Isolation window deconvolution results in fewer low confidence modifications (SI.pdf) S-12 Number of modified peptides detected by µDIAPROTALIZER versus DDA-MaxQuant (SI.pdf) S-13 DDA-Max Quant quantification results from raw intensity, LFQ, LFQ normalized to total peptide mass per sample, and with as well as without Perseus statistical analysis (SI.pdf) S-14 Per protein quantification results using only statistical analysis (S-14.xls) S-15 ROC plots generated using fold-change as the only discriminating factor (SI.pdf) S-16 Volcano plots from the spike-in experiment (SI.pdf)

AUTHOR INFORMATION Corresponding Author *Email [email protected].

Present Addresses †Quest Diagnostics Nichols Institute, 14225 Newbrook Drive, Chantilly, VA 20151.

Author Contributions M.R.H., Y.W.N., D.B.G., and J.L.N. designed the gradient timecourse and spike-in experiments. M.R.H. and A.L.C. developed the algorithms with input from Y.W.N., D.B.G., A.W.H., H.P.G., and J.L.N. A.L.C. wrote all of the software described in the manuscript. Y.W.N. performed the experimental analyses. A.W.H.

M.R.H, A.L.C., and J.L.N own shares in Vulcan Analytical that sells the commercial µDIA software platform PROTALIZER.

ACKNOWLEDGMENT This work was supported in part by the U.S. Army Research Office and the Defense Advanced Research Projects Agency grant W911 NF-14-2-0022. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office, DARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

REFERENCES (1) Venable, J.D.; Dong, M.Q.; Wohlschlegel, J.; Dillin, A.; Yates, J.R. Nat Methods 2004, 1, 39-45. (2) Stahl, D.C.; Swiderek, K.M.; Davis, M.T.; Lee, T.D. J. Am. Soc. Mass Spectrom. 1996, 7, 532-540. (3) Navarro, P.; Kuharev, J.; Gillet, L.C.; Bernhardt, O.M.; MacLean, B.; Röst, H.L.; Tate, S.A.; Tsou, C.C.; Reiter, L.; Distler, U.; Rosenberger, G.; Perez-Riverol, Y.; Nesvizhskii, A.I.; Aebersold, R.; Tenzer, S. Nat Biotechnol. 2016, 34, 1130-1136. (4) Wang, J.; Tucholska, M.; Knight, J.D.; Lambert, J.P.; Tate, S.; Larsen, B.; Gingras, A.C.; Bandeira, N. Nat Methods 2015, 12, 11061108. (5) Heaven, M.R.; Funk, A.J.; Cobbs, A.L.; Haffey, W.D.; Norris, J.L.; McCullumsmith, R.E.; Greis, K.D. J Mass Spectrom. 2015, 51, 1-11. (6) Röst, H.L.; Rosenberger, G.; Navarro, P.; Gillet, L.; Miladinović, S.M.; Schubert, O.T.; Wolski, W.; Collins, B.C.; Malmström, J.; Malmström, L.; Aebersold, R. Nat Biotechnol. 2014, 32, 219-223. (7) Desiere, F.; Deutsch, E.W.; King, N.L.; Nesvizhskii, A.I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S.N.; Aebersold, R. Nucleic Acids Res. 2006, 34, 655-658. (8) Tsou, C.C.; Avtonomov, D.; Larsen, B.; Tucholska, M.; Choi, H.; Gingras, A.C.; Nesvizhskii, A.I.; Nat Methods 2015, 12, 258-264. (9) Panchaud, A.; Scherl, A.; Shaffer, S.; von Haller, D.; Kulasekara, D.; Miller, S.; Goodlett, D. Anal. Chem. 2009, 81, 6481-6488. (10) Ting, Y.S.; Egertson, J.D.; Bollinger, J.G.; Searle, B.C.; Payne, S.H.; Noble, W.S.; MacCoss, M.J. Nat Methods 2017, 14, 903-908. (11) Ting, Y.S.; Egertson, J.D.; Payne, S.H.; Kim, S.; MacLean, B.; Käll, L.; Aebersold, R.; Smith, R.D.; Noble, W.S.; MacCoss, M.J. Mol Cell Proteomics 2015, 14, 2301-2307. (12) Egertson, J.D.; Kuehn, A.; Merrihew, G.E.; Bateman, N.W.; MacLean, B.X.; Ting, Y.S.; Canterbury, J.D.; Marsh, D.M.; Kellmann, M.; Zabrouskov, V.; Wu, C.C.; MacCoss, M.J. Nat Methods 2013, 10, 744-746. (13) Beck, S.; Michalski, A.; Raether, O.; Lubeck, M.; Kaspar, S.; Goedecke, N.; Baessmann, C.; Hornburg, D.; Meier, F.; Paron, I.; Kulak, N.A.; Cox, J.; Mann, M. Mol Cell Proteomics 2015, 14, 20142029. (14) Cox, J.; Mann, M. Nat Biotechnol. 2008, 26, 1477-1485. (15) Bern, M.; Finney, G.; Hoopmann, M.; Merrihew, G.; Toth, M.; MacCoss, M.J. Anal. Chem. 2010, 82, 833-841. (16) Elias, J.E.; Gygi, S.P. Nat Methods 2007, 4, 207-214. (17) Plumb, R.; Johnson, K.; Rainville, P.; Smith, B.; Wilson, I.; Castro-Perez, J.; Nicholson, J. Rapid Commun Mass Spectrom. 2006, 20, 1989-1994. (18) Marx, H.; Lemeer, S.; Schliep, J.E.; Matheron, L.; Mohammed, S.; Cox, J.; Mann, M.; Heck, A.J.R.; Kuster, B. Nat Biotechnol. 2013, 31, 557-564. (19) Cox, J.; Hein, M.Y.; Luber, C.A.; Paron, I.; Nagaraj, N.; Mann, M. Mol Cell Proteomics 2014, 13, 2513-2526.

ACS Paragon Plus Environment

Page 6 of 9

Page 7 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry (20) Wu, J.X.; Song, X.; Pascovici, D.; Zaw, T.; Care, N.; Krisp, C.; Molloy, M.P. Mol Cell Proteomics 2016, 15, 2501-2514.

For TOC Only

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. µDIA provides more accurate quantification. Sample A (1 µg of HeLa tryptic peptides and 500 ng of E. coli tryptic peptides) was compared to sample B (1 µg of HeLa tryptic peptides and 250 ng of E. coli tryptic peptides). Both samples were ana-lyzed in technical quadruplicate with a 90-min gradient. ROC curves were generated using P-values as the classifier. (TP) true positives are E. coli proteins with average fold-change increases in sample A:B and P-values that correspond to each false positive rate; (TN) true negatives are HeLa-derived human proteins with P-values greater than the cutoff used to meet each false positive rate shown; (FP) false positives are human proteins with P-values less than the cutoff value for each false positive rate shown, or E. coli proteins with an erroneous average increase in sample B:A with P-values less than the cutoff value for a particular false positive rate; (FN) false negatives are E. coli proteins with Pvalues greater than the cutoff for each false positive rate. A total of 3,708, 3,726, and 3,256 proteins were quantified by > 2 peptides for µDIA with de-convolution, without deconvolution, and DDA, respectively. Whereas a total of 1,365, 1,208, and 1,202 proteins were quantified by a single peptide for µDIA with deconvolution, without deconvo-lution, and DDA, respectively. 168x170mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 8 of 9

Page 9 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

291x135mm (150 x 150 DPI)

ACS Paragon Plus Environment