Hybrid Data Acquisition and Processing Strategies with Increased

Sep 22, 2014 - Journal of Proteome Research 2018 17 (1), 63-75. Abstract | Full Text ... Arita , Oliver Fiehn. Mass Spectrometry Reviews 2018 37 (4), ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

Hybrid Data Acquisition and Processing Strategies with Increased Throughput and Selectivity: pSMART Analysis for Global Qualitative and Quantitative Analysis Amol Prakash, Scott Peterman,* Shadab Ahmad, David Sarracino, Barbara Frewen, Maryann Vogelsang, Gregory Byram, Bryan Krastins, Gouri Vadali, and Mary Lopez Thermo Fisher Scientific, 790 Memorial Drive, Suite 202, Cambridge, Massachusetts 02139, United States S Supporting Information *

ABSTRACT: Data-dependent acquisition (DDA) and data-independent acquisition strategies (DIA) have both resulted in improved understanding of proteomics samples. Both strategies have advantages and disadvantages that are well-published, where DDA is typically applied for deep discovery and DIA may be used to create sample records. In this paper, we present a hybrid data acquisition and processing strategy (pSMART) that combines the strengths of both techniques and provides significant benefits for qualitative and quantitative peptide analysis. The performance of pSMART is compared to published DIA strategies in an experiment that allows the objective assessment of DIA performance with respect to interrogation of previously acquired MS data. The results of this experiment demonstrate that pSMART creates fewer decoy hits than a standard DIA strategy. Moreover, we show that pSMART is more selective, sensitive, and reproducible than either standard DIA or DDA strategies alone. KEYWORDS: peptide, mass spectrometry, Fourier transform, data-independent analysis, decoy searching, qualitative, quantitative



INTRODUCTION Recently, due to improvements in instrumentation and the increased interest in protein differential expression as it relates to biology, proteomic discovery experiments have evolved to provide quantitative analysis of all observed peptides. Successful quantitation requires that qualitative attributes must first be determined to associate qualitative confidence of measured peptide expression levels for all technical and biological replicates.1,2 The stochastic and irreproducible precursor selection associated with DDA has reduced its effectiveness for robust qualitative and quantitative determination across all sample analyses.3 In addition, DDA has been shown to undersample low-level peptides, thereby increasing the chance of missing putative biomarkers.4 The creation of inclusion lists based on previous discovery experiments can increase the number of peptides routinely measured, improve reproducibility, and extend the dynamic range.5−7 However, even with very large target lists, DDA cannot sample all possible precursors in complex biological samples. Attempts to reproducibly sample all peptides across the retention time window while facilitating highly multiplexed peptide quantification have prompted an alternative approach. Data-independent analysis (DIA) is based on decoupling product ion data acquisition from MS1 analyses and instead systematically fragmenting all precursors across the LC gradient.8 In order to sample all precursors throughout the © 2014 American Chemical Society

entire gradient, wider precursor isolation windows than those for DDA analysis (1−3 Da) are typically used. Previously published DIA windows range from 2.5 to 1600 Da,9−14 with the most commonly reported precursor windows of 20−25 Da.12 DIA focuses almost exclusively on product ion data acquisition by stepping a quadrupole mass filter at a specified precursor mass window (e.g., 10 or 25 Da windows) across the mass range of interest. The filtered precursor ions are simultaneously dissociated and analyzed, resulting in a highly complex full-scan product ion spectrum potentially containing fragment ions from all filtered precursors. The cycle is repeated continuously across the entire LC gradient, collecting product ion density maps per precursor mass window. This method is best utilized with high-resolution/accurate mass tandem mass spectra to increase the robustness and confidence in postacquisition data analysis. DIA is a straightforward experimental method that can be applied to most Q-TOF, linear ion traps, and Orbitrap mass spectrometers with little optimization. The collection of product ion maps resulting from DIA acquisition across the gradient requires an alternate (to DDA) data processing strategy. In this approach, detailed precursor ion information is unavailable for deconvoluting very complex Received: March 31, 2014 Published: September 22, 2014 5415

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

more comprehensive determination of abundant peptides per targeted proteins/peptides and corresponding product ion rank order, which can be further used as a qualitative tool to score product ion distribution overlap for DIA data. Also, retention times from DDA experiments can be correlated to the current experiment to narrow data extraction windows and further reducing false positive hits. The approach we have taken is to collect HR/AM MS data for quantitation and use narrow, asymmetric window DIA acquisition for sequence confirmation. HR/AM MS data acquisition in the Orbitrap mass spectrometer facilitates high charge density, resolution, and mass accuracy to increase intraand interscan dynamic range and selectivity.19 The instrument method we employed set the HR/AM MS data acquisition rate proportional to the average peak width to ensure at least seven data points across the elution profile. Looped, narrow asymmetric window DIA acquisition was acquired throughout the gradient, facilitating at least one product ion spectral data point per precursor peak width that was acquired. The precursor mass range filtered per DIA event was set on the basis of precursor density, with the smallest ranges for the precursor range of m/z 400−800 and wider isolation windows for higher mass ranges. By decoupling the data used for quantitation and sequencing, the pSMART strategy enables appropriate quantitative and qualitative cycle times to be established across the average chromatographic peak profile with fewer compromises. In addition, the full scan HR/AM MS provides global quantitation of all precursors associated with a particular peptide per unit time.20−25 In common with other DIA strategies, peptide sequence confirmation is achieved by postacquisition processing. The pSMART scheme provides significant advantages compared to previous DIA strategies that use wider precursor isolation windows. The narrow precursor isolation window in pSMART results in DIA spectra containing fewer product ions, enabling different searching strategies to be employed. An unbiased database search strategy similar to that used for DDA processing could be performed,11 or as we demonstrate here, real-time spectral matching based on spectral libraries could be implemented.6 As in DDA analysis and database searching, only one matched product ion spectrum is needed to confidently sequence the targeted peptide, reducing the constraints of acquiring multiple product ion spectral data points across the peak. This strategy provides a 2-fold benefit. First, each narrow window precursor isolation/activation window can be acquired with longer product ion fill times and measured using higher resolution, which increases selectivity.26 Second, the decoupling of qualitative and quantitative data acquisition dramatically increases the number of precursors sampled across the gradient in a reproducible manner. DIA strategies are promoted for building unbiased sample records that may be repeatedly analyzed over time to uncover additional peptides of interest.12 We wanted to explore the possibility of revisiting “old” DIA data and analyze it based on “new” peptide information. However, for our experiment it was logistically more feasible to analyze recently acquired DIA data using an older library generated from several years’ worth of previously acquired data. In this way, we simulated differences in retention time and fragmentation patterns that may arise over time due to changes in sample preparation, instrument calibration, etc. To evaluate the performance of pSMART, we performed comparative qualitative and quantitative analyses of human

product ion spectra resulting from the wide isolation windows, making typical database searching strategies untenable. The complexity of each product ion spectrum is overcome by incorporating information from several spectra collected over a short period of time. A user-defined list of peptides based on a biological hypothesis is created to determine the desired precursor/product ion groups per peptide.12,14 Once defined, the theoretical product ion m/z values and user-defined mass tolerance values are used to produce individual extracted ion chromatograms (XICs) that are then grouped and overlaid for covariance determination and additional verification scoring based on chromatographic and MS attributes.15 Since one can select any peptide and corresponding m/z values to search the data, using spectral libraries to determine targeted peptide selection, and subsequent data processing can increase robustness and confidence.16−18 Spectral libraries created from DDA experiments are typically queried to determine optimal target m/z lists. While DDA experiments are not reliable for routine quantitation, the unbiased (or nontargeted) data acquisition, small-window isolation/fragmentation, and subsequent database searching provide highconfidence sequence identification within specific biological samples. In addition, the relationship of the corresponding precursor and product ions generates peptide-specific content for LC and MS attributes. Thus, standard protocols for DIA data processing require spectral libraries for the determination of previously observed peptides per targeted protein, identification of the best corresponding precursor and product ions, and knowledge of the expected relative intensities of those product ions.18 An additional benefit is that some spectral libraries may provide expected chromatographic retention times. The primary limitation associated with DIA methods is the reliance on chromatographic peak shape for both qualitative and quantitative analyses.13 Due to the lack of discriminating precursor ion information, SRM-like verification schemes have been incorporated into supporting processing software. Automated postacquisition targeted peptide detection, verification, and quantification are achieved by grouping each peptidespecific product ion XICs to determine retention time covariance and summed area-under-curve (AUC) values. To increase the robustness in qualitative and quantitative data processing, a sufficient number of data points (7−10) per precursor isolation window must be acquired across the elution profile. Therefore, a balance has to be established between sensitivity and selectivity of data acquisition, governed by the complexity of each DIA product ion spectrum. A wider precursor isolation window reduces the selectivity and intrascan dynamic range, since more precursors are passed through the filtering quadrupole and subsequently dissociated. The product ion distribution is directly proportional to relative precursor abundance. Conversely, using narrow windows does increase the sensitivity but requires many more DIA scan events to cover the same precursor mass range. The latter approach can decrease the ion collection/fill times and resolution settings when using an Orbitrap mass spectrometer to decrease the overall cycle time. Most commonly, the precursor mass range is reduced to cover the greatest density of precursors,11 and additional sample injections may be required to satisfy all experimental requirements.9 In addition to peak shape attributes, spectral library attributes are incorporated in peptide list creation and data processing.7,12,16 The consolidation of spectral libraries provides a 5416

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Figure 1. PSMART data acquisition approach with HR/AM MS and 5 Da DIA acquisitions acquired using independent cycles. The narrow DIA scan event cycle is interrupted by the HR/AM MS scan event at a fixed user-defined time.

The samples were eluted from the column with a linear gradient from 5 to 45% B in 180 min prior to ramping to 90% B for column regeneration. The column was re-equilibrated to 5% B for 15 min prior to the next injection. Three technical replicates were acquired for each method and used to evaluate reproducibility, efficiency, and quantitation. For the standard DDA analysis experiments, the Q Exactive was operated with one full-scan MS acquired at 70 000 resolution (at m/z 200) with an automatic gain control (AGC) setting of 3 × 106 ions and a corresponding maximum fill time of 150 ms. MS/MS spectra were acquired with HCD fragmentation, a normalized collision energy of 28, a precursor isolation width of 2.5 Da, resolving power of 17 500 (at m/z 200), an AGC target value of 1 × 105 ions, and a maximum injection time of 100 ms. A top-10 method was used for datadependent acquisition with a 45 s dynamic exclusion list setting and a precursor intensity threshold of 5 × 104 counts. The DIA acquisition-based experiment was operated in looped product ion mode. The quadrupole mass filter stepped across a precursor mass range of 400−1200 at 26 Da/window with a 1 Da overlap, for a total of 32 total windows. The DIA spectrum for each window was acquired with a resolving power of 35 000 (at m/z 200), an AGC target value of 3 × 106, and a maximum fill time of 150 ms. Each DIA window was acquired with normalized collision energy of 28 with the default charge state of +2. There was no additional tuning or optimization needed for maximum transmission between DDA and DIA experiments. The total cycle time for the DIA experiments was set to 6 s, enabling collection of 6−10 data points per precursor peak. The pSMART data acquisition strategy acquired both MS and DIA data. A prototype instrument control script was used to manage this combination of independent data acquisition strategies. An HR/AM MS scan was acquired at a resolving power of 140 000 (at m/z 200), an AGC target value of 5 × 106 ions, and a maximum fill time of 150 ms. Each narrow-window DIA spectrum was acquired using a normalized collision energy

plasma digests using three approaches. The same sample was analyzed in triplicate using DDA, standard DIA (25 Da windows), and pSMART. Real-time data analysis based on spectral library matching for the standard DIA and pSMART was performed. The spectral library was compiled from DDA experiments acquired on the LTQ Orbitrap Velos and Q Exactive mass spectrometers in our laboratory over the course of 2 years. The benefits of creating spectral libraries from many different DDA files include identification of routinely sampled peptides per protein, consolidated product ion spectra, and relative retention time determination. The consolidated product ion spectra provide not only the most abundant fragment ions (b-, y-, and/or neutral loss) but also the predicted relative abundance that can be used to qualitatively determine targeted peptide hits. We show that pSMART coupled with such spectral libraries facilitates real-time data acquisition and increased peptide identification without sacrificing selectivity. Additionally, the decoupling of qualitative and quantitative data acquisition enables optimal data acquisition for quantitation. In addition, pSMART has the prospects of being a generic acquisition method transferrable across different Q Exactive mass spectrometers for proteomics or any “omics” studies and other HR/AM mass spectrometers.



MATERIAL AND METHODS

LC−MS Data Acquisition

A commercial Q Exactive mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) was used for all experiments. The instrument was coupled to an EASY-nLCII LC system (Thermo Fisher Scientific) for chromatographic separations using a binary solvent system with solvent A composed of 0.2% formic acid in water and solvent B being 0.2% formic acid in acetonitrile. Samples were loaded onto a 120 × 0.15 mm trapping column packed with 5 μm PS-dvb particles (Polymer Laboratories), and the analytical separation was performed using a 500 × 0.1 mm column packed with C18 Aq (Bischoff). 5417

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Fisher Scientific) to help assess data quality and enable normalization of retention times in varying gradients.30−32 The product ion spectra for the PRTC peptides were first compared to consolidated product ion spectrum to initially determine if the instrument calibration routing is sufficient. DDA experimental data were collected from fractionations, with and without depletion, antibody extractions, and different LC gradients, as well as other sample preparation conditions, and all resulting DDA files were searched with Proteome Discoverer 1.4 (Thermo Fisher Scientific) using SEQUEST.13 All searches were performed in Proteome Discoverer 1.4 using the Uniprot complete human proteome database (release 2013_08) and searched with full tryptic cleavage assumption allowing for two possible missed cleavage sites, minimum and maximum peptide lengths of 6 and 144, respectively, and a search tolerance of ±10 ppm for the precursor and 0.02 Da for product ions. The list of differential modifications included methylation, oxidation, phosphorylation, acetylation, and deamidation and one static modification for Cys alkylation using carboxymethylation. The resulting Proteome Discoverer output files were entered into the Crystal library for human plasma. Each peptide had a spectral count of matched experimental spectra per peptide from each individual RAW file as well as a relative retention time based on the spiked internal standards. Crystal then created a consolidated record for each peptide consisting of the relative retention times and variability, as well as all mass spectrometry specific information (e.g., precursor/product ion charge state, m/z values, and product ion distribution). Similar to previously published methods,29 algorithms were used to consolidate all the data to normalize retention time and various fragmentation observations for identified peptides. The human plasma spectral library contained over 16 000 unique peptides with and without modifications and/or missed cleavage sites, of which 10 228 peptide entries were used to search the data. The relative retention time correlation is presented in Figure S1 in the Supporting Information. The final list of peptides was selected from those with a consolidated product ion spectrum created from at least six independent DDA spectra and containing at least four product ions. These requirements allow for more robust spectral matching and decoy hit assessment.

of 28. The DIA strategy utilized asymmetric precursor isolation windows with 6 Da (with 1 Da overlap) covering m/z 400−800 and then increased the precursor isolation window to 11 Da, covering m/z 800−1000, and 21 Da, covering m/z 1000−1200. All further references to the effective precursor mass range will be “narrow” DIA. All DIA spectra were acquired at a resolving power of 35 000 (at m/z 200), an AGC target value of 3 × 106 ions, and a maximum fill time of 150 ms. Figure 1 shows the acquisition scheme used for pSMART data analysis. The HR/ AM MS spectrum was acquired every 5 s and interspersed with 20 DIA events. The narrow mass range DIA event cycle was completed prior to looping back to the m/z 400−406 window. The entire data acquisition cycle (MS and narrow DIA acquisition windows) took 26 s, enabling at least one DIA event per chromatographic peak. For further peptide verification, a last data acquisition experiment was created using parallel reaction monitoring (PRM).27 Confident peptide identifications from each initial experiment (standard DIA, pSMART, and DDA) were used to create an inclusion list containing peptide sequence, precursor m/z values, and retention times. The inclusion list of precursor m/z values as a function of retention time were used to direct repetitive DDA acquisition using the same gradient as all other experiments. The precursor and product ion data were acquired using the same settings as those for DDA. The only difference was that the dynamic exclusion was reduced to 1 s to ensure that multiple product ion spectra could be acquired per precursor m/z value. Sample Preparation

All experiments were performed using a donor sample of human plasma collected under IRB approved protocols and stored in an EDTA-stabilized tube (Becton Dickinson, Franklin Lakes, NJ). A stock solution of human plasma was prepared without depletion using standard trypsin digestion protocols following reduction and alkylation. The concentration of the final stock solution was estimated to be 4 mg/μL, divided into aliquots of 100 μL of 100 ng/μL, and frozen until used. Before MS analysis, the sample was spiked with peptide retention time calibration (PRTC) peptides (Thermo Fisher Scientific, Rockford, IL) to a final concentration of 20 fmol on the column. A total of 1 μL was injected on the column per experiment.

Data Analysis

The initial DDA results were searched against a comprehensive human protein database.33 Each of the three DDA technical replicates was searched with similar parameters to those used to create the original libraries. The retention times measured for the DDA experiments were used to further confirm the retention time correlations between the relative retention times in the spectral library and the measured retention times identified in the two different DIA experiments (SI2, Supporting Information). In addition to the unbiased database search, the DDA product ion spectra were searched using spectral matching in Proteome Discoverer 2.0. The same plasma-serum library that was used to process the DIA data was also used for the spectral matching search. A CS score criteria of 0.7 or more was used to match product ions extracted using 0.02 Da. The 10 288 peptides selected from the spectral library were evaluated using DDA and both DIA methods. The peptidespecific information extracted from the library was used to create the look-up table for real-time analysis and contained relative retention time windows (±15 min), precursor, and product ion m/z values as well as product ion distributions. The

Spectral Library

A common spectral library was created in Crystal28 and used to enable spectral library searching. Briefly, the human plasma spectral library was compiled from DDA experiments acquired on the LTQ Orbitrap Velos and Q Exactive mass spectrometers (Thermo Fisher Scientific) over the course of 2 years. The code was predicated on previously published work by Lam et al.;29 specifically, individually matched product ion spectra were combined to form consolidated spectra per peptide following product ion distribution overlap scoring. For each individual spectrum, the top 20 matched fragment ions and measured intensities were extracted. A minimum of five individual spectra present with a minimum CS score of 0.7 was required to build the consolidated product ion spectrum (per peptide) which contained those fragment ions routinely observed in each spectrum as well as the averaged relative intensity. Additional quality control routines were incorporated for acceptance/ rejection of each new DDA spectrum in a similar manner to that previously described.29 Each of these samples was spiked with a standard peptide mixture (PRTC peptides from Thermo 5418

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Figure 2. Data acquisition and processing approach for the hybrid data acquisition strategy based on the spectral library information. (A) The HR/ AM MS data is separated and used determine the retention time based (B) precursor isotopic covariance. The AUC values from the precursor isotopic XICs are used for qualitative and quantitative peptide analysis and (C) a cosine similarity is calculated on the basis of theoretical isotopic distribution. (D) DIA data is extracted from corresponding product ion scan headers, and the four product ion intensity values are (E) extracted and (F) normalized for sequence confirmation.

mass tolerance to ±20 ppm was collected to better simulate the data extraction process as defined by previous publications.12 Integration of the product ion XICs was performed in Pinpoint, and coefficient of variance values were determined on the basis of summed AUC values for ±10 ppm product ion XIC generation as well as for ±20 ppm mass tolerance (Figure S3, Supporting Information). In pSMART experiments, identification and confirmation were based on both HR/AM MS and DIA data. The same spectral library list of peptides and corresponding look-up tables used for automated standard DIA processing were used for real-time pSMART processing. In addition, the pSMART look-up tables also contained the theoretical precursor m/z values and isotopic distribution for the four most abundant isotopes. The m/z values and relative isotopic distribution were calculated from the peptide sequence and resulting chemical formula. The method for data processing is shown in Figure 2; the HR/AM MS data were separately analyzed from the narrow DIA data. The specified retention time window triggered the real-time data processing for both MS and DIA data as the file was written to disc. Figure 2B demonstrates the extraction of the specified precursor isotopic XICs from the HR/AM MS data, and each data point was scored on the basis of isotopic distribution overlap with the theoretical values with a mass tolerance of ±10 ppm. Using this information, the isotopic area under the curve (AUC) values were calculated and normalized, a final CS score was determined along with the summed AUC value, and an acceptance criteria of 0.95 was used for high level

four most abundant product ions were selected and used for targeted peptide sequence verification. For the standard DIA method, peptide identification was based on continual spectral matching across the scheduled retention time window. A wide retention time window was used to ensure adequate correlation between the experimental and spectral library times, since the relative retention time values were determined from discovery data collected over 2 years. However, the wide windows do not impede highly multiplexed data processing. Automated data processing first assesses the presence/absence of each diagnostic fragment ion and measured intensity with a ±10 ppm mass tolerance based on the theoretical product ion m/z value. The resulting product ion intensities were then used to determine relative abundance distribution profiles, and the cosine similarity (CS) score34 was determined on the basis of overlap with the corresponding spectral library overlap. A minimum CS score of 0.7 was used as the threshold for a positive match, since only four product ions were used for the evaluation (SI3, Supporting Information). All of the data were compiled following completion of the experiment and analysis using predefined scripts. A targeted peptide was considered “identified” on the basis of consecutive extracted MS/MS spectral matches with CS scores at or above the threshold defined above. Two different settings were used in the final analysis: medium confidence for all peptides that showed at least four consecutive spectra with CS of 0.7 or higher and high confidence for those peptides with more than six consecutive spectra. A second round of data that altered the product ion 5419

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Figure 3. Comparative targeted peptide analysis using standard DIA and pSMART methods for the plasma peptide ISASAEELR. The inset table lists the fragment ions and spectral library intensity values used for the data extraction and scoring. Panel A shows the product ion XIC results from the standard DIA data set compared to panel B, which shows the precursor ion XIC results, and panel C, which shows the resulting normalized precursor isotopic overlap to the theoretical distribution. The dashed line indicates the retention time for the matched narrow DIA spectrum, and panel D shows the comparative product ion distribution between the spectral library entry, narrow DIA spectrum, and the normalized product ion AUC values from standard DIA.

was generated by taking the first decoy library and shuffling the intensities of the fragment peaks. Both methods generated libraries with the same number of entries as the target library and the same distribution of precursor and product ion m/z values and retention times. In the first method, the decoy spectra were as similar to each other as were the spectra in the target library, in order to model homology among library entries. The second method modified the similarity slightly by keeping the same m/z values of the product ion peaks but perturbing the intensities assigned to them. Data processing with the two spectral libraries was performed using identical methods as described above. The acceptance criteria were also held constant.

acceptance and 0.9 for medium level acceptance (Figure 2C). If the precursor isotopic distribution overlap between the experimental and theoretical values surpassed a CS score of 0.9, the narrow DIA data were processed by matching the specific precursor m/z value with the appropriate DIA scan header as they were acquired in the specific retention time window where the resulting narrow DIA window contains a chimeric spectrum. As described above for standard DIA, the target m/z values were used to perform real-time analysis on the basis of a product ion presence/absence determination using a 10 ppm product ion mass tolerance (Figure 2E), normalizing the measured product ion intensities and calculating the CS score with the spectral library entry as well as at least one product ion spectral match with CS scores in excess of 0.7 (Figure 2F) (see SI4 of the Supporting Information for an example output). The CS score distribution as a function of precursor m/z value is presented in Figure S4 of the Supporting Information.



RESULTS We describe a new approach to maximize comprehensive qualitative and quantitative data acquisition with processing amenable to a greater range of biological samples. The hybrid data acquisition scheme leverages the strengths of the Orbitrap mass spectrometer for global detection and quantitation without sacrificing qualitative target assessment. The full scan MS acquisition settings used for global data acquisition were set for optimal performance on an LC time scale by increasing the resolution, trapped ion charge density, and mass accuracy compared to previously published methods.12,35 Since the HR/ AM MS data were used to define chromatographic features, the product ion data collection was used for sequence confirmation and additional verification. As such, the need to acquire at least

Decoy Library

We generated a library of decoy spectra to be searched in parallel with the target library in order to estimate false positive identifications. Decoys were generated by assigning a new precursor m/z to an existing target spectrum in the library. The precursor m/z value was that of an existing peptide in the library and the delta precursor m/z difference was greater than 50 Da. In addition, the second peptide from which the precursor m/z value was derived could not have a relative retention time value in excess of 10 min. A second decoy library 5420

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Table 1. List of Processed Results for Peptide Identification Using Standard DIA and pSMART Strategies on the Q Exactive Mass Spectrometera data acquisition strategy data processing strategy library hits decoy hits 1 decoy hits 2 decoy % hit rate 1 decoy % hit rate 2

20 ppm and six data points 2159 983 515 46 24

standard DIA 20 ppm and four data 10 ppm and six data points points 3034 1645 1790 503 996 182 59 31 33 11

pSMART data acquisition 10 ppm and four data points 2244 897 363 40 16

10 ppm 2525 287 149 11 6

a

Two different data sets were acquired and processed based on the same spectral library with data from the standard DIA experiment processed in two approaches.

one matched product ion under the MS elution profile facilitated increased sensitivity and selectivity for each narrow DIA spectrum compared to DIA events using wider precursor acquisition schemes.

approach as standard DIA. Figure 3D shows the comparative product ion distribution plot compared to the spectral library entry. Note that the relative abundance of the diagnostic product ions using standard and narrow DIA processing showed similar distribution overlap with the spectral library (refer to SI4, Supporting Information). The data acquisition strategy for all methods incorporated high AGC target values to take greater advantage of ion accumulation in the Orbitrap mass spectrometer. One potential effect of high charge density in an Orbitrap mass spectrometer is space charging. Mass measurement errors were evaluated at each level (MS, MS/MS from DDA, standard DIA, and narrow DIA) on the basis of the theoretical m/z values for precursors and product ions, and there was no discernible difference between methods. Measured mass errors were predominantly less than 5 ppm. In addition, mass tolerance values used for data extraction were consistent, resulting in the overlaid XIC plots used to determine presence/absence as well as qualitative scoring.

Comparative Data Processing Strategies

To evaluate the effectiveness of pSMART, comparative data acquisition performance was evaluated against standard DIA using a common plasma digest sample. Figure 3 shows an example of the comparative data processing strategies to determine qualitative and quantitative success for each targeted peptide. The postacquisition strategies for standard DIA have been published and are based on targeted product ion XIC analysis from the aligned precursor DIA window. Figure 3A shows that the overlaid XIC plot facilitates targeted peptide detection. The qualitative analysis is linked to the mass accuracy used for data extraction, the relative abundance for the group of product ions as a function of relative AUC values, and XIC covariance to determine the presence of background interference in the form of a nonuniform peak profile. The custom acquisition script used for this study performed real-time data processing for standard DIA acquisition. The script utilizes the spectral library look-up table for the 10 288 peptides described above to perform spectral matching following completion of each specific mass spectrum. The presence/absence for the four diagnostic product ions is determined by mass error analysis, extracted and normalized product ion intensity, CS score calculated on the basis of spectral library distribution, and retention time correlation with the predicted retention time window, which are all used for determining a positive hit. The average CS score for the 18 data points across the elution profile was 0.987, indicating little background interference. Note that seven consecutive data points were used in place of defining the actual peak shape given that the CS scores were satisfactory. The pSMART data analysis performed qualitative and quantitative analyses using both sets of data. HR/AM MS data were acquired using higher resolution and low mass tolerance for isotopic XIC extraction to determine peak shape (Figure 3B), potential background interference, and retention time overlap with predicted values. The relative AUC values were used to calculate the CS score as compared with the theoretical isotopic distribution (Figure 3C). For precursors, a much more stringent threshold was set for acceptable CS scores (in comparison to DIA data analysis) due to the well-defined isotopic distribution profile compared to product ion distribution, which is dependent on a number of instrumental factors. The narrow DIA spectrum overlapping precursor m/z values were directly processed and scored using the same

Specificity Analysis

One of the stated requirements for quantitation is to successfully detect and sequence prior to peak integration. Repetitive detection for a wide range of targeted peptides was evaluated across three technical replicates per DIA method. In addition, the DIA data were processed against the two different decoy libraries. Table 1 lists the average values for each method for the original library and decoy library search matches. The same standard DIA data were searched using four different sets of processing and acceptance criteria based on previously published methods and optimal data extraction for an Orbitrap mass spectrometer.12 For each acquisition/processing method, we also recorded the number of hits for each of the two decoy databases as an estimate of how many false positive matches we might be incurred. There appeared to be a significant trade-off between data extraction parameters and confidence. The medium-confidence scoring for standard DIA (±20 ppm and four data points) resulted in the greatest number of library matches but also showed the highest number of decoy hits using the two decoy search strategies. Conversely, automated pSMART data processing resulted in the greatest number of peptides confidently identified with the lowest number of decoy hits. Decreasing the peak tolerance to ±10 ppm to leverage the mass accuracy of the Orbitrap and increasing the consecutive number of scans from four to six provided greater confidence in the matched results as well as decoy hit rate. pSMART data processing decreased the number of positive hits as well as the decoy hits. Additional selectivity data processing parameters were changed by increasing the consecutive number of scans 5421

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

from four to six and this provided higher confidence for fewer peptides. This approach and the requirement of at least six consecutive spectra produced the highest confidence results for standard DIA. The significantly increased number of decoy hits at 20 ppm suggested that automated chromatographic peak shape verification was not sufficiently rigorous to eliminate false positive matches, since the CS score was not 0.7 or greater. This result might be overcome by significant manual inspection of each peptide at the substantial cost of increasing the data processing time and decreasing objectivity. Further examples of specific peptides that may require manual inspection and determination of whether to keep or omit from the final list of identified targets or which specific product ion XIC is omitted to satisfy the researchers acceptance criteria are presented in Figures S1 and S2 (Supporting Information). For comparison, DDA data were processed using spectral matching. The three technical replicates were processed using the same forward and scrambled spectral libraries as those used to evaluate the DIA data. A total of only 1124 peptides were matched across all three replicates, less than those identified using the database search strategy. Of the 1124, only 742 peptides showed spectral matching across all three technical replicates. The results from the shuffled library searching showed only 27 hits across all three data files and 12 were matched across all three files, both decoy hit rates equating to ca. 2%. The goal of DDA experiments is to acquire as many product ion spectra as possible per cycle. To achieve this, the maximum ion fill times and product ion resolution are reduced to 100 ms and a resolving power of 17 500 (at m/z 200). Also, the acceptance criteria are much higher, incorporating CS scores rather than simply the presence/absence of specific m/z values. Comparative CS score analysis was performed on the 2500 peptides identified by both standard DIA (high confidence) and pSMART. Figure 4 shows the CS score distribution for each data acquisition strategy. The CS score ranges were selected to demonstrate the distribution profiles fitting each method, DIA and HR/AM MS. The results were compiled from the three technical replicates per method. Figure 4A shows the CS score distribution from the standard DIA experiments. All 2500 peptides were evaluated on the basis of previously published methods that first integrated the AUC values per product ion and normalized and performed the CS score calculation with the spectral library product ion distribution. The CS score tolerance was 0.7, and ca. 80% of the peptides had acceptable CS scores, but a majority of these hits were in the 0.7−0.9 CS score range, which, while acceptable, indicates marginal agreement with the spectral library. Around 20% of the targeted peptides did not have measured signal in enough product ion XICs to be recorded and scored for consideration, and this is in agreement with the decoy hit rates for different postacquisition data parameters. The pSMART data analysis facilitated two different methods of verification based on precursors and product ion distribution overlap. The precursor CS score evaluation was determined using normalized AUC values and compared against the theoretical isotopic distribution. Figure 4B shows the precursor CS score distribution based on the pSMART data. Almost 63% of the peptides showed CS scores of 0.98−1.0, indicating an almost perfect match. The acceptance threshold for precursor isotopic matching was set to 0.9, which resulted in confident identification of almost 80% of all peptides. Less than 10% of the targeted peptides showed no measurable signal in the

Figure 4. Comparative histograms of cosine similarity score distributions for the large set of targeted plasma peptides across three technical replicates for (A) product ion standard DIA data, (B) precursor isotopic distribution overlap, and (C) product ion intensity values from the 5 Da DIA spectra.

precursor ion XICs. Evaluation of the precursor isotopic distribution overlap provided an estimate of putative background interference in a similar approach as that typically used for SRM verification. The AUC values and CS scores for each peptide per technical replicate are listed in the Supporting Information (SI5). 5422

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Figure 5. Comparative Venn diagrams demonstrating experimental reproducibility for each data acquisition methods used in the study. The overlap analysis was based on specific peptide identification in each technical replicate and scored for (A) DDA using Proteome Discoverer 1.4 search results. The DIA data was scored on the basis of spectral matching against a common library, and only one data set was acquired for the standard DIA experiment but processed using two different acceptance criteria, (B) high and (C) medium, as defined by the mass tolerance used for data extraction and number of consecutive spectral matches, while (D) the pSMART data was processed using similar mass tolerance values to part C.

Figure 6. (A) Venn diagram showing overlapping targeted peptide identification and verification using different data acquisition strategies relative to the spectral library and (B) the overlap of identified peptides among the three methods.

Reproducibility Analysis

matches, the overlap in specific peptide identification across the three technical replicates was analyzed for each data acquisition method. The Venn diagrams presented in Figure 5 show the degree of reproducibility for each data acquisition method. It is important to keep in mind that the resulting DDA data were searched using SEQUEST and not against the spectral library (which was done for the DIA and pSMART data). The pSMART analysis generated not only the greatest forward hits but also demonstrated the most consistent detection of targeted peptides. Approximately 94% of the identified peptides were

Reproducible target peptide detection and verification are the primary requirements for robust quantitation of multiplexed targeted peptide assays. The total number of peptides identified across the three technical replicates per DIA acquisition method were similar to medium confidence DIA, with a total of 2779 peptides identified (summed across all three technical replicates), compared to 2444 and 2001 peptides for pSMART and high confidence DIA, respectively. In addition to determining the average forward and shuffled spectral library 5423

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Figure 7. (A) Results of targeted data analysis for a unique peptide as identified using pSMART showing the original pSMART overlaid precursor XIC and product ion distribution overlap (shown as insets). (B) The overlaid XIC analysis from the follow up PRM experiment using the same product ions as that used in the initial set of experiments and (C) the product ion XIC analysis from the standard DIA data.

identified in all three replicates and 97% identified in two of the three replicates. Regardless of the acceptance criteria, standard DIA identified ca. 80% of the peptides in all three technical replicates, and 90% were identified in two of three replicates. It should be noted that while the total number of peptides identified using DDA was less than standard DIA by 20%, the reproducibility of successfully identifying the peptides was greater. This played a far greater role in postacquisition data analysis and how much manual interpretation was needed to fully process the data.

performed with the human plasma spectral library. Figure 6 shows the overlap for standard DIA (processed with medium and high confidence criteria) and the pSMART method with the spectral library. A total of 3597 different peptides were identified from the technical replicate injections using the two methods totaling approximately 33.6% of the peptides in the spectral library. A total of 1613 peptides were identified. The results from the targeted MS/MS analysis for each peptide list showed confirmation of 73% uniquely identified from the pSMART list, a 15% success rate for the standard DIA list, and a 97% success rate for the commonly identified 1613 peptides. This empirical estimation of false positives suggests that both decoy libraries underestimate the number of false positive hits but more so for DIA than pSMART. Scheduling the targeted data acquisition resulted in the highest confidence levels in determining true positive hits based on our LC−MS protocols and leveraging the most sensitive means of detection.27 Figure 6B shows the overlap of reproducibly measured targeted peptides among the three acquisition methods. The identified peptide list from standard DIA experiments was determined on the basis of the highest confidence scoring criteria. Interestingly, almost 15% of peptides were uniquely identified by each method. If we were to relax our peptide selection criteria and also include peptides with fewer observations in the spectral library, it is quite likely that we would have seen a bigger overlap with DDA identifications. A second round of data acquisition was performed for verification of targeted peptides. The same human plasma digest was re-evaluated using PRM analysis as described above (see the experimental section). Three sets of target lists were

Sensitivity Analysis

It is of primary interest to determine the overlap of identified peptides listed in Table 1 based on the standard DIA and pSMART acquisition and processing schemes. Figure 6 shows the Venn diagram for the identified peptides from all technical replicates per method. The acceptance criteria used for each standard DIA method was a product ion XIC tolerance of ±20 ppm and four consecutive acceptable CS scores for the medium confidence and six consecutive CS scores for high confidence. The peptide acceptance criteria for pSMART is the same as listed in Table 1. A total of 526 peptides were uniquely identified from standard DIA methods using medium confidence (ca. 17% of the total peptides using this processing strategy) and another 546 peptides that were uniquely identified in both standard DIA methods. A total of 546 peptides were uniquely identified from the pSMART method. Further data analysis was performed to evaluate the overlap in identified peptides across the three data acquisition methods to determine the peptides that were uniquely identified with each method. The same sample was injected for each technical replicate per acquisition method, and data processing was 5424

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Figure 8. (A) Results of targeted data analysis for a unique peptide as identified using standard DIA showing the original overlaid product ion XIC and distribution overlap (shown as insets). (B) The overlaid XIC analysis from the follow up PRM experiment using the same product ions as that used in the initial set of experiments and (C) the product ion XIC analysis from the pSMART data.

a similar m/z (within 20 ppm). The second most abundant fragment ion intensity was not measured and therefore not counted as a positive hit. Figure 8 shows the comparative results for the peptide ELEDLIIEAVYTDIIQGK, uniquely identified in DIA. Figure 8A shows the overlapping coeluting product ions with the right rank order (shown in the inset with a cosine similarity of 0.8). When the same peptide was put on a target list for the PRM experiment, no signal was observed for the product ions (Figure 8B), thus suggesting that this was a false positive identification by DIA, and the measured product ion signal was attributed to background matrix ions coisolated in the 25 Da precursor window. Figure 8C also plots the extracted chromatograms for the precursor isotopes for this peptide from the pSMART run, which also showed no evidence for this peptide. The total results from the PRM experiments were used to determine further evidence of the presence/absence for each peptide on the targeted list. The incorporation of the PRM experiment has been demonstrated to provide the greatest degree of sensitivity and selectivity, since data acquisition is driven by the targeted precursor m/z list. Over 95% of the 724 targeted peptides found in all three methods were further confirmed by the PRM experiments. Similarly, the targeted inclusion list of unique peptides identified from pSMART was targeted using a second PRM inclusion list. Data analysis showed that ca. 86% of the unique peptides met the same acceptance criteria based on retention time, precursor isotopic identification, and product ion distribution as the spectral library. In contrast, only 15% of the unique peptides identified from the standard DIA experiment were confirmed using PRM.

created, one covering the 536 unique peptide sequences found by all three replicates of standard DIA (high confidence) but not identified by any other method, 656 peptides uniquely identified by pSMART, and the final experiment used a list of 724 peptides identified by all three methods. The targeted peptide lists were used to direct the Q Exactive mass spectrometer to acquire multiple true tandem MS data for the specific precursors across the specified retention time windows. The resulting data were processed with the same method used for the DIA data analysis against the spectral library.36 Figure 7 shows the comparative results for the peptide SLAELGGHLDQQVEEFR uniquely and reproducibly identified using pSMART and verified by PRM. The overlaid isotopic XIC graph is presented showing excellent S/N for all isotopes (Figure 7A). The relative isotopic AUC values were compared to the theoretical isotopic distribution and showed a cosine similarity score of 0.99. The extracted product ion distribution is compared against the spectral library distribution and has a cosine similarity score of 0.83, which is sufficient for sequence confirmation. Figure 7B shows the PRM plot based on the same four product ions across the expected retention time window. The product ion distribution shows a cosine similarity score of 0.95, and the retention time lines up (within experimental error) with the pSMART response. Figure 7C shows the overlaid product ion XIC graph for the same peptide and product ions using the standard DIA acquisition scheme. The method does show detection of the most abundant product ions for three data points as well as the third and fourth ranked product ions, but the rank order was shuffled, possibly because of coeluting peptides that produced product ions with 5425

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Quantitative Analysis

LC gradient in order to facilitate postacquisition data reprocessing. As with most qualitative and quantitative LC−MS-based experiments, the primary limiting factor in multiplexing is the chromatographic peak width. Data acquisition strategies must be adjusted to facilitate appropriate data points for elution peak definition to allow robust quantitation. As such, sacrifices are factored into the data acquisition schemes to balance sensitivity and selectivity with broad-band precursor analysis. The difference between standard DIA and pSMART strategies is how peptides are detected and quantified in the resulting wideband spectrum. The standard DIA strategy isolates narrow precursor mass windows and collects the swaths of wide-band product ion spectra.12 This approach is repeated across the total precursor mass range through the chromatographic gradient. The resulting product ion XICs are used to define the targeted peptide elution profile, from which qualitative aspects can be extracted, governing quantitation. The effects of the isolation window on sensitivity and selectivity have been well-published,5,13 where the precursor isolation window is inversely proportional to qualitative and quantitative performance. The most commonly reported settings utilize a 25 Da precursor isolation window to facilitate ca. eight data points for all precursor mass windows covering the range. The eight data points enable postacquisition data processing to define elution profiles for qualitative and quantitative assessment. The limitation is that the greater the density in the eluting precursor isolation window, the greater the reduction in measurement for low-level peptides. Three options are available to help alleviate this limitation: (1) reduce the specific time spent per window via resolution settings or product ion fill times,11 (2) incorporate asymmetrical precursor windows to match the precursor elution density profile, and/or (3) reduce the precursor mass range continually sampled across the LC gradient. All of these would enable more scan events to be acquired in the same unit time but also introduce compromises. The most plausible option with the lowest degree of compromise would be to incorporate asymmetrical precursor isolation windows, but this would require substantial knowledge of the particular sample and would have to be retrained for each new sample. The approach we have taken is to leverage the HR/AM MS data for wide-band peptide quantitation. The Orbitrap mass spectrometer has unique advantages for simultaneously trapping and detecting extremely high charge densities without compromising resolution or mass accuracy.19 The intrascan dynamic range of 4.5 orders of magnitude, coupled with the high-resolution (>60 000 for most precursors) and low mass errors, provides comprehensive precursor selectivity and sensitivity on an acceptable LC time scale. Initial peptide identification strategies organize peptides into retention time bins (±15 min) as well as the precursor information, including charge state, top four isotopic m/z values, and theoretical isotopic distribution from which to compare experimental data. Real-time MS analysis identifies when peptides are eluting within the predicted window. In addition, the qualitative target peptide MS/MS data are independently acquired without being triggered from the real-time MS data interrogation. Since the qualitative and quantitative aspects are decoupled, the acquisition cycles for each data acquisition method can be independently defined by the user on the basis of the average chromatographic peak widths, meeting both sets of criteria. The user-defined DIA cycle time is set to acquire at least one data

An important requirement for accurate quantitation is having a sufficient number of points along the elution profile of the ion(s) of interest. Our selected cycle time of 4 s on a gradient with average peak widths of 35−60 s should acquire around eight points per peptide. The same 1613 peptides evaluated for CS scores were also integrated for coefficient of variance analysis between DIA and pSMART methods. Variance analysis for the pSMART data was based on summed AUC values from the four isotopes compared to the summed AUC values for the four product ions using DIA. Figure 9 shows the distribution of

Figure 9. Comparative distribution of measured coefficient of variance between the precursor isotopic analysis for pSMART and summed AUC values for standard DIA.

measured peptide variance. It demonstrates that the number of peptides having CVs below 20% was higher when measured by HR/AM MS (80%) compared to measurement by product ion XICs generated from standard DIA experiments (66%). The measured peptide response in both MS and DIA data was successfully quantified over 4 orders of magnitude, and the distribution of AUC values was similar.



DISCUSSION Data independent acquisition strategies have demonstrated unique capabilities for performing simultaneous qualitative and quantitative analysis on complex biological samples. By cycling through the precursor mass range using narrow isolation windows, complex but manageable chimeric spectra may be acquired across the LC gradient to generate product ion maps that can be used to reprocess the data using any combination of user-defined target peptide lists, without having to reacquire the data. In addition, software has been created to handle the complex product ion spectra on the basis of experimental overlap with spectral libraries. The method we have developed extends the DIA concept of decoupling tandem mass spectral data acquisition from HR/AM MS detection but differentiates from previously published DIA methods in what data set is used for quantitation. The pSMART acquisition scheme uses the DIA information as additional qualitative information for targeted peptide sequencing, which augments the quantitation performed with HR/AM MS data. This combination affords the same experimental advantages as standard DIA, such as acquiring precursor and product ion maps across the entire 5426

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

Data-independent analysis searching strategies generally rely on spectral library matching, which is almost exclusively created on the basis of DDA experiments. Confidence in DIA search results are only as effective as the library used for evaluation. The library implemented here for DIA verification was comprised of many different DDA experiments, and each file was searched independently. The resulting library contains all peptides sequenced against a protein database in an unbiased manner. In addition, Crystal maintains a spectral count for each peptide to increase attributes for likelihood of observing a particular peptide in the plasma digest as well as increasing the confidence in product ion distribution compared to using only one or two DDA results to build the spectral library. Each peptide entry contains up to the top 20 repeatedly identified fragment ions and corresponding relative abundance values. We chose the top four fragment ions, since the fifth and lower fragment ions have relative abundance values that are less than 10%, which for low-level peptides would most likely not measure ion intensity values. Also, this is a similar strategy to SRM analysis performed on triple quadrupole mass spectrometers.15 Previous studies by Prakash et al.15 evaluated the CS scores as a function of total SRM transitions monitored per peptide. In this study, it was determined that the number of SRM transitions established the acceptance criteria, and the fewer the number of product ions used for scoring, the higher the CS score required for confirmation. For low-resolution mass spectrometers, six SRM transitions required a minimum CS score of 0.6 compared to our studies using an Orbitrap mass spectrometer with high resolution/accurate mass, where four product ions require a CS score of 0.7 or higher. Spectral matching of DIA product ion distribution against spectral library entries comprised of DDA experiments makes certain assumptions. Although both methods utilized HCD activation on an Orbitrap-based instrument, the specifics of the normalized collision energy are slightly different between the two experiments. For example, DDA utilizes a normalized collision energy (NCE) setting (e.g., 27) and reads in the precursor m/z value and charge state to adjust the final voltage offset used for collisional activation. Previous experiments performed to test spectral library building concepts have shown that sufficiently different NCE settings can alter the resulting product ion distribution profiles (20−30 V), but from 25 to 30 V, changes do not significantly alter the product ion distribution (data not presented). In DIA, there is only a centered m/z value and default charge state (+2 for our situation) used with a NCE setting of 27 to match the spectral library information. A majority of the targeted peptides are ionized in the +2 precursor charge state, matching the default DIA settings. Comparison of CS score distribution as a function of precursor charge state was performed for DDA experiments as well as standard DIA and pSMART (see Figures S2−S4 and experimental output listed in SI2 and SI3 in the Supporting Information). In the analysis, the product ion distribution overlap between experimental and library values was consistent across all precursor m/z values per experimental. Additional analyses of CS score distribution as a function of proximal precursor m/z values to the DIA isolation boundary were performed (data not presented). The distribution profiles for “edge” precursors vs “centered” precursor m/z values were again similar. For the scope of this work, we believe that the isolation windows and NCE values affect the resulting DIA spectra similarly between pSMART and standard DIA schemes.

point per average peak width and maximized for optimal sensitivity and selectivity by increasing the ion fill time and resolution, as compared to previously published methods, severely limiting the product ion fill times. Peptide sequence determination based on one matched product ion spectrum has been well-established in DDA experiments and evaluated here.4 The precursor isolation windows used for our narrow DIA method are similar to those of DDA and reduce complexity in the resulting full scan product ion spectrum. In addition, the peptide sequence determination is based on spectral library matching, which not only evaluates the presence/absence of specific fragment ion m/z values based on low mass errors but also the relative abundance. Last, data extraction is further confirmed on the basis of measured retention time correlation to precursor isotopic elution profiles and the relative retention times extracted from the Crystal plasma library. Of primary interest is the quantitative capability of HR/AM MS data to reliably discriminate true signal from background components. All quantitation experiments must combine some means of qualitatively evaluating the measured ion signal used for quantitation. In our method, we incorporate HR/AM MS signal for global quantitation, since it provides the greatest depth of detection per unit time. To increase the discriminating capabilities of data extraction, we employ several features of AMT concepts as previously published, such as the accurate mass values and library retention times, to reduce false positives.20,26 That is, each peptide in the spectral library has a relative retention time that can be correlated to the current LC parameters, thus limiting which specific peptides can elute in any window.26,37 Addition of known retention time information as a further fingerprint for targeted peptide identification significantly limits the choice of peptides for identification. In our data acquisition strategy, we utilize a conservative retention time window (±15 min) because we are not using precursor intensities to trigger subsequent MS/MS events and incorporate narrow DIA data for further sequencing/confirmation. Figure S1 (Supporting Information) shows the retention time correlation between all DDA experiments used to build the plasma spectral library and relative retention time values contained in the library. In addition, the strategy utilizes at least three isotopes per precursor to perform XIC analysis during the expected elution window. Each precursor isotopic XIC is extracted using a 10 ppm mass tolerance from MS data acquired using a resolution setting of 140 000 (at m/z 200), compared to a resolution setting of only 35 000 (at m/z 200),35 to significantly reduce background interference, as illustrated in Figure S9 (Supporting Information). Incorporation of much higher resolution facilitated base resolution between the native and oxidized forms of the target peptide against background ions separated by only 0.02 Da. The resulting isotopic XICs are overlaid to determine covariance and establish the retention time from which the specific DIA spectrum containing the correct precursor window is processed for sequence confirmation. The combination of retention time windows and high resolution/mass accuracy resulted in ca. 80% of all targeted peptides having acceptable isotopic distribution profiles covering a dynamic range of ca. 4 orders of magnitude (see Supporting Information, SI5) and a low probability of background interference contributing to the final AUC values. A better test of discriminating power was also performed by spiking known proteins into complex mixtures at different levels (data not presented). 5427

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

with confidence which variant has been observed. In some cases, different retention times can be used to distinguish, but if the differences are slight, there still may not be sufficient confidence. While a majority of the unmodified/modified peptide forms are separated using long gradients, the chromatographic separation is reduced/eliminated with faster gradients used for higher throughput. Figure S8 (Supporting Information) shows overlaid precursor XICs for coeluting oxidized and nonoxidized peptides, which are easily distinguished using the precursor information. In addition, high resolution further enables selective precursor data extraction. Incorporation of the high resolution (140 000 at m/z 200) significantly increases the selectivity of MS data analysis. Targeted precursor intensity can be separated from background interference (Figure S9, Supporting Information), resulting in accurate isotopic distribution profiles, which we use to also evaluate putative background interference prior to quantification. It is possible that some false positives can be removed by manual interrogation of data and a better definition of what a peak is, but that requires significant time when large targeted peptide lists and/or many different biological and technical replicates are processed. The sort of ambiguity described above is likely the cause of the standard DIA false positive illustrated in Figure 8. Measured product ion intensity can be extracted within the ±10 ppm mass tolerance for three of the four product ions as well as showing covariance, and the CS score is greater than 0.7. But analysis of the pSMART data showed no response when extracting the same product ion XICs for the associated precursor isolation window as well as in HR/AM MS data. In addition, the targeted MS/MS data acquisition and PRM data analysis using narrow precursor isolation supported the assumption that the peptide could not be detected within the sample. The decoy search results matched with the verification data acquired using targeted MS/MS strategies. The decoy hit rate was substantial for standard DIA data. Targeting those peptides uniquely identified using standard DIA also demonstrated a low verification rate when narrow precursor isolation was performed prior to dissociation. The results for targeted MS/ MS analysis of pSMART data showed similar product ion detection, retention time correlation, and CS scoring for 73% of the unique peptides. As a proof of principle, almost all of the 1261 peptides confidently identified using both pSMART and narrow DIA were also confirmed using targeted MS/MS acquisition. This secondary set of experiments also illustrated the effective code used to link the discovery quantitation experiments with automated method refinement for more robust quantitation. Smaller precursor isolation windows provide significant advantages for qualitative and quantitative assessment. In our experiments, the average peak widths were 30−50 s, providing us with extended cycle times, which we leveraged with higher AGC values and ion fill times for both MS1 and DIA events. In addition, acquisition parameters for each DIA event were set to maximize sensitivity and selectivity using 150 ms product ion fill times and 35 000 (at m/z 200) resolution settings. To an extent, we have introduced asymmetric precursor isolation windows for the resulting DIA method, using 5 Da windows from 400 to 800 Da and then increasing to 10 Da from 800 to 1000 Da and finally increasing to 20 Da windows from 1000 to 1200 Da. Hybrid (pSMART) data acquisition cycles could be adjusted to enable faster DIA cycle times, as previously

Verification strategies based on spectral library matching have not been extensively tested for false hits. Our results suggest a large number of false positive identifications with the standard DIA method. In all of our false positive estimates (decoy library or verification by PRM) we see substantially more evidence of errors in standard DIA experiments. While our model did not result in the generally accepted level of 5% decoy hit rate, we feel our method presents a good first approximation to the selectivity per DIA method. In creating the shuffled library, the precursor m/z values used to swap product ions were greater than 25 Da windows used for DIA acquisition. The retention time values for shuffled partners were also closely matched to provide realistic decoy targets. Additionally, shuffling the relative abundance values prior to researching the DIA data reduced the decoy hit rates by ca. 50%. Currently, we are evaluating additional aspects that may increase the selectivity for determining true positives from decoy hits for DIA data relative to the DDA analysis. In manually reviewing the data, other advantages were observed for the pSMART method compared to the standard DIA strategy. The primary difference observed between the standard and narrow DIA data was the complexity of the resulting product ion spectra. Isolating 25 Da precursor swaths generally resulted in a much more complex product ion spectrum compared to the narrow DIA windows used for pSMART, owing to more precursors being dissociated prior to product ion detection. The probability of larger dynamic ranges in precursor intensities is directly proportional to the relative intensities of product ions effecting detection. The greater complexity of the product ion spectra for standard DIA relative to the narrow DIA also increases the probability of an overlap in fragment ions from a nontargeted peptide, further skewing the predicted relative intensities needed for target peptide verification. An example is presented in Figures S5 (Supporting Information). The detection method used for standard DIA data required the specific product ions to be detected within the specified mass tolerance (10 and 20 ppm) as well as have predictable ion distribution profiles. The additional product ions at greater intensity reduce detection capabilities for the specific fragment ions needed to verify and ultimately quantify the targeted peptide. Those peptides that have product ion distribution profiles where the second, third, and/or fourth most intense product ions are 20% relative abundance of the most intense product ion are more difficult to detect, and the resulting distribution profiles may be skewed, requiring manual inspection to confirm and integrate (Figure S6, Supporting Information). Furthermore, if the fourth most abundant fragment is low intensity and shows interference, then a lower probability exists for detection of other fragments that may not have interference. Incorporation of precursor data provided additional levels of characterization, even when the precursor isotopic intensity was insufficient for reproducible peak shape determination. At least three isotope intensities were measured within the mass tolerance values used. Moreover, because MS1 spectra are not utilized, standard DIA is unlikely to accurately discriminate between multiple peptides that have similar product ions and distribution profiles.35 Our Crystal plasma library may contain many variants of a partial peptide sequence (e.g., by missed cleavages or PTMs) (Figure S7, Supporting Information). More than one variant may easily fall into a 25 Da window of isolation. If there is not sufficient evidence resulting from differentiating product ions, it may not be possible to know 5428

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

suggested. The first change would be to reduce the resolution setting per DIA spectrum from 35 000 to 17 500 but maintain the HR/AM MS settings, since this is the source of quantitation. Also, the narrow precursor isolation windows would be maintained for the m/a 400−800 Da region due to precursor ion density across the gradient. Wider precursor isolation windows have been shown to increase decoy hit rates (data not shown), which could be attributed to small mass shift modifications that do not alter the diagnostic product ion signatures. Previously published approaches to increase the selectivity and improve characterization of related peptides involve incorporation of unique fragment ions that can differentiate one form from the others. The determining factor for success is attributed to the relative abundance of the unique fragment ion to all others. If the targeted peptide abundance is low, selecting poorly responding fragment ions that are diagnostic reduces the probability of successful quantitation.12 Although the absolute numbers of the three methods disagree, they all assign the relative ranking of specificity to each of the methods and suggest that pSMART is more sensitive and selective than standard DIA, even with strict selection criteria. It is our belief that the standard DIA results could be improved with significant manual interrogation, changing of product ions from those showing background interference/omission to alternative product ions that may be more selective. The purpose of the study, however, was to identify possible global profiling schemes that reduced/ eliminated the need for manual interrogation to expedite data analysis, biological interpretations, and transition into more focused quantitation for greater sensitivity or higher throughout. The results of this initial study can be used automatically to create such methods on the Q Exactive mass spectrometer or other instruments.

real-time spectral processing results for standard DIA (SI3) and pSMART methods (SI4). Finally, an Excel file listing the AUC values determined for the selected 2500 peptides for standard DIA and pSMART per technical replicate (SI5). The AUC values were processed in the Pinpoint software. This material is available free of charge via the Internet at http://pubs.acs.org.



*Tel: 515-802-7596. Fax: 617-225-0935. E-mail: scott. peterman@thermofisher.com. Author Contributions

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. A.P., S.P., S.A., B.F., and M.L. contributed equally. Notes

The authors declare no competing financial interest.



ABBREVIATIONS USED DDA, data dependent acquisition; DIA, data independent acquisition; pSMART, peptide-based staggered mass spectral acquisition across retention time; XIC, extracted ion chromatogram; SRM, selective reaction monitoring; AUC, area under the curve; AGC, automatic gain control; HCD, high energy collision dissociation; ppm, part per million; CS, cosine similarity; HR/AM, high resolution/accurate mass; PRM, parallel reaction monitoring.





REFERENCES

(1) Picotti, P.; Bodenmiller, B.; Mueller, L. N.; Domon, B.; Aebersold, R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 2009, 138 (4), 795−806. (2) Gallien, S.; Duriez, E.; Demeure, K.; Domon, B. Selectivity of LC−MS/MS analysis: Implication for protoemics experiments. J. Proteomics 2013, 81 (9), 148−158. (3) Lui, H.; Sadygov, R. G.; Yates, J. R., III A model for random sampling and esitmation of relative protein abundance in shotgun proteomics. Anal. Chem. 2004, 76 (14), 4193−4201. (4) Michalski, A.; Cox, J.; Mann, M. More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC−MS/MS. J. Proteome Res. 2011, 10 (4), 1785−1793. (5) Hoopmann, M. R.; Merrihew, G. E.; von Haller, P. D.; MacCoss, M. J. Post analysis data acquisition for the iterative MS/MS sampling of proteomics mixtures. J. Proteome Res. 2009, 8 (4), 1870−1875. (6) Bailey, D. J.; Rose, C. M.; McAlister, G. C.; Brumbaugh, J.; Yu, P.; Wenger, C. D.; Westphall, M. S.; Thomson, J. A.; Coon, J. J. Instant spectral assignment for advanced decision tree-driven mass spectrometry. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (22), 8411−8416. (7) Schilling, B.; Rardin, M. J.; MacLean, B. X.; Zawadzka, A. M.; Frewen, B. E.; Cusack, M. P.; Sorensen, D. J.; Bereman, M. S.; Jing, E.; Wu, C. C.; Verdin, E.; Kahn, C. R.; MacCoss, M. J.; Gibson, B. W. Platform-independent and label-free quantitation of proteomic data using MS1 extracted ion chromatograms in Skyline. Mol. Cell. Proteomics 2012, 11, 202−214. (8) Venable, J. D.; Dong, M. Q.; Wohlschlegel, J.; Dillin, A.; Yates, J. R., III Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 2004, 1, 39−45. (9) Panchaud, A.; Scherl, A.; Shaffer, S. A.; von Haller, P. D.; Kulasekara, H. D.; Miller, S. I.; Goodlett, D. R. PAcIFIC: How to diver deeper into the proteomics ocean. Anal. Chem. 2009, 81 (15), 6481− 6488.

CONCLUSION We have presented a hybrid data acquisition method, pSMART, that merges the benefits of DDA and DIA acquisition for global analyte detection, identification, and quantification. Comparative analysis on a tryptic digestion of human plasma shows pSMART to identify confidently more peptides than either standard DIA or DDA and to maintain the same reproducibility in measurement. In addition, the results using pSMART showed significantly lower decoy hit rates and higher reproducibility when compared to other common DIA strategies. Finally, pSMART has been designed with a mindset to make data processing easier, and in fact, a simple real-time data processing scheme is sufficient for pSMART to filter over 10 000 peptides during the course of a real-time experiment that can be directly imported into the Pinpoint software for quantitative analysis.



AUTHOR INFORMATION

Corresponding Author

ASSOCIATED CONTENT

S Supporting Information *

Descriptive figures (Figures S1−S9) for spectral library retention time correlation to experimental data, breakdown of CS scores as a function of precursor m/z values, and comparative data acquisition and processing for targeted peptides. The comparative analysis illustrates the advantages of the pSMART method. In addition, an Excel file containing output CSV files for the selected 2500 peptides processed in Pinpoint listing the AUC values and coefficient of variance across the three technical replicates (SI2). Excel files listing the 5429

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430

Journal of Proteome Research

Article

(10) Bern, M.; Finney, G.; Hoopmann, M. R.; Merrihew, G.; Toth, M. J.; MacCoss, M. J. Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry. Anal. Chem. 2010, 82, 833−841. (11) Egertson, J. D.; Kuehn, A.; Merrihew, G. E.; Bateman, N. W.; MacLean, B. X.; Ting, Y. S.; Canterbury, J. D.; Marsh, D. M.; Kellmann, M.; Zabrouskov, V.; Wu, C. C.; MacCoss, M. J. Multiplexed MS/MS for improved data-independent acquisition. Nat. Methods 2013, 10 (8), 744−748. (12) Gillet, L. C.; Navarro, P.; Tate, S.; Rost, H.; Selevesk, N.; Reiter, L.; Bonner, R.; Aebersold, R. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: A new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 2012, 11, 1−17. (13) Weisbrod, C. R.; Eng, J. K.; Hoopmann, M. R.; Baker, T.; Bruce, J. E. Accurate peptide fragment mass analysis: Multiplexed peptide identification and quantification. J. Proteome Res. 2012, 11, 1621− 1632. (14) Plumb, R. S.; Johnson, K. A.; Rainville, P.; Smith, B. W.; Wilson, I. D.; Castro-Perez, J. M.; Nicholson, J. K. UPLC/MS(E): A new approach for generating molecular fragment information for biomarker structure elucidation. Rapid Commun. Mass Spectrom. 2006, 1, 1989− 1994. (15) Prakash, A.; Tomazela, D. M.; Frewen, B.; MacLean, B.; Merrihew, G.; Peterman, S. M.; MacCoss, M. J. Expediting the development of targeted SRM assays: Using data from shotgun proteomics to automate method development. J. Proteome Res. 2008, 8 (6), 2733−2739. (16) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 2007, 7, 655−667. (17) Deutsch, E.; Lam, H.; Aebersold, R. PeptideAtlas: A resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 2008, 9 (5), 429−434. (18) Bailey, D. J.; McDevitt, M. T.; Westphall, M. S.; Pagliarini, D. J.; Coon, J. J. Intelligent data acquisition blends targeted and discovery methods. J. Proteome Res. 2014, 13 (4), 2152−2161. (19) Michalski, A.; D, E.; Haushild, J.-P.; Lange, O.; Wieghaus, A.; Makarov, A.; Nagaraj, N.; Cox, J.; Mann, M.; Horning, S. Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop, quandrupole orbitrap, mass spectrometer. Mol. Cell. Proteomics 2011, 10 (9), M111.011015. (20) Smith, R. D.; Anderson, G. A.; Lipton, M. S.; Pasa-Tolic, L.; Sheng, Y.; Conrads, T. P.; Veenstra, T. D.; Udseth, H. R. An accurate mass tag strategy for quantitative and high-throughput proteome measurements. Proteomics 2002, 2, 513−523. (21) Strittmatter, E. F.; Ferguson, P. L.; Tang, K.; Smith, R. D. Proteome analyses using accurate mass and elution time peptide tags with capillary LC time-of-flight mass spectrometry. J. Am. Soc. Mass Spectrom. 2003, 14 (9), 980−991. (22) Cox, J.; M, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008, 26 (12), 393−403. (23) Higgs, R. E.; Knierman, M. D.; Gelfanova, V.; Butler, J. P.; Hale, J. E. Comprehensive label-free method for the relative quantification of proteins from biological samples. J. Proteome Res. 2005, 4 (4), 1442− 1450. (24) Higgs, R. E.; Butler, J. P.; Han, B.; Knierman, M. D. Quantitative proteomics via high resolution MS quantification: Capabilities and limitations. Int. J. Proteomics 2013, 2013, 674282. (25) Monroe, M. E.; Shaw, J. L.; Daly, D. S.; Adkins, J. N.; Smith, R. D. MASIC: A software program for fast quantitation and flexible visualization of chromatographic profiles from detected LC−MS(/ MS) features. Comput. Biol. Chem. 2008, 32 (3), 215−217. (26) Conrads, T. P.; Anderson, G. A.; Veenstra, T. D.; Pasa-Tolic, L.; Smith, R. D. Utility of accurate mass tage for proeome-wide protein identification. Anal. Chem. 2000, 72 (14), 3349−3354.

(27) Gallien, S.; Duriez, E.; Crone, C.; Kellmann, M.; Moehring, T.; Domon, B. Targeted proteomic quantification on a quadrupoleOrbitrap mass spectrometer. Mol. Cell. Proteomics 2012, 11, 1709− 1723. (28) Frewen, B.; Peterman, S.; Ciccimaro, E.; Mallick, P.; Gallien, S.; Domon, B.; Jain, M.; Lin, T.; Hood, B.; Conrads, T.; Smith, C.; Batruch, I.; Drabovich, A.; Kulasingam, V.; Diamondis, E. P.; Gunawardena, H.; Chen, X.; Lorang, C.; Foster, L.; Chen, V.; Vogelsang, M.; Garces, A.; Athanas, M.; Lorang, C.; Krastins, B.; Sarracino, D.; Lopez, M.; Prakash, A. Improved spectral libraries with data quality requirements, internal standards and normalized results. https://portal.thermo-brims.com/facebook/documents/ASMS2012/ ASMS2012_Improved_Spectral_BFrewen.pdf. (29) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; Stein, S. E.; Aebersold, R. Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 2008, 5 (10), 873−875. (30) Gallien, S.; Peterman, S.; Kiyonami, R.; Souady, J.; Duriez, E.; Schoen, A.; Domon, B. Highly multiplexed targeted proteomics using precise control of peptide retention time. Proteomics 2012, 12 (8), 1122−1133. (31) Kiyonami, R.; Schoen, A.; Prakash, A.; Peterman, S.; Zabrouskov, V.; Picotti, P.; Aebersold, R.; Huhmer, A.; Domon, B. Increased selectivity, analytical precision, and throughput in targeted ̀ M110.002931. proteomics. Mol. Cell. Proteomics 2011, 10 (), (32) Krokhin, O. V.; Spicer, V. Peptide retention standards and hydrophobicity indexes in reversed-phase high-performance liquid chromatography of peptides. Anal. Chem. 2009, 81 (22), 9522−9530. (33) Pruitt, K. D.; Tatusova, T.; Brown, G. R.; Maglott, D. R. NCBI reference squences (RefSeq): Current status, new features, and genome annotation policy. Nucleic Acids Res. 2012, 40, 130−135. (34) Huson, L. W. Performance of some corrleation coefficients when applied to zero-clustered data. J. Appl. Statistical Methods 2007, 6 (2), 530−536. (35) Egertson, J. D.; Kuehn, A.; Merrihew, G. E.; Bateman, N. W.; MacLean, B. X.; Ting, Y. S.; Canterbury, J. D.; Marsh, D. M.; Kellmann, M.; Zabrouskov, V.; Wu, C. C.; MacCoss, M. J. Multiplexed MS/MS for improved data-independent acquisition. Nat. Methods 2013, 10 (8), 744−748. (36) Peterson, A. C.; Russell, J. D.; Bailey, D. J.; Westphall, M. S.; Coon, J. J. Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics. Mol. Cell. Proteomics 2012, 11, 1475−1488. (37) Vu, H.; Spicer, V.; Gotfrid, A.; Krokhin, O. V. A model for predicting slopes S in the basic equation for the linear-solvent-strength theory of peptide separation by reversed-phase high-performance liquid chromatography. J. Chromatogr., A 2010, 1217 (4), 489−497.

5430

dx.doi.org/10.1021/pr5003017 | J. Proteome Res. 2014, 13, 5415−5430