Expansion of the Ion Library for Mining SWATH-MS Data through

Jun 26, 2014 - Thierry Schmidlin , Luc Garrigues , Catherine S. Lane , T. Celine Mulder ... Olga T Schubert , Ludovic C Gillet , Ben C Collins , Pedro...
0 downloads 0 Views 217KB Size
Technical Note pubs.acs.org/ac

Expansion of the Ion Library for Mining SWATH-MS Data through Fractionation Proteomics Jin Zi,† Shenyan Zhang,† Ruo Zhou,† Baojin Zhou,† Shaohang Xu,† Guixue Hou,† Fengji Tan,† Bo Wen,† Quanhui Wang,†,‡ Liang Lin,*,† and Siqi Liu*,†,‡ †

Proteomics Division, BGI-Shenzhen, Beishan Industrial Zone, Yantian, Shenzhen, Guangdong 518083, China CAS Key Laboratory of Genome Sciences and Information, Beijing Institutes of Genomics, Chinese Academy of Sciences, No. 1, Beichen West Rd., Chaoyang District, Beijing,100101, China



S Supporting Information *

ABSTRACT: The strategy of sequential window acquisition of all theoretical fragment ion spectra (SWATH) is emerging in the field of label-free proteomics. A critical consideration for the processing of SWATH data is the quality of the ion library (or mass spectrometric reference map). As the availability of open spectral libraries that can be used to process SWATH data is limited, most users currently create their libraries in-house. Herein, we propose an approach to construct an expanded ion library using the data-dependent acquisition (DDA) data generated by fractionation proteomics. We identified three critical elements for achieving a satisfactory ion library during the iterative process of our ion library expansion, including a correction of the retention times (RTs) gained from fractionation proteomics, appropriate integrations of the fractionated proteomics into an ion library, and assessments of the impact of the expanded ion libraries to data mining in SWATH. Using a bacterial lysate as an evaluation material, we employed sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) to fractionate the lysate proteins and constructed the expanded ion library using the fractionation proteomics data. Compared with the ion library built from the unfractionated proteomics, approximately 20% more peptides were extracted from the expanded ion library. The extracted peptides, moreover, were acceptable for further quantitative analysis.

D

synthesized peptides.2 In most laboratories, the construction of an ion library by coalescing replicate spectra of the same peptide into a consensus spectrum is a feasible and economic approach.4,5 An ideal ion library contains the maximal coverage of the peptides in a sample; therefore, the manner in which the peptide-maximized ion library is generated thus becomes a primary task for SWATH users. It is well accepted that fractionation approaches coupled with LC-MS are effective techniques for improving peptide identification.7 In SWATH, however, the shotgun proteomics data-dependent acquisition (DDA) data obtained from fractionated proteins or peptides have generally not been adopted into the construction of ion libraries, possibly due to technique barriers in the integration of the DDA data across the different fractions. Herein, we propose a method for the global correction of peptide retention times (RTs) using SWATH-MS data to improve the quality of ion libraries. We used a bacterium, T. tengcongensis (TTE), as an evaluation sample and fractionated the lysate proteins by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) followed by DDA analysis on each fraction. Using the data set, we corrected the RTs based on a single LC-

ata-independent acquisition (DIA) has emerged as an important technique for proteomic analysis, as it does not require the selection of precursor ions from MS spectra for peptide fragmentation but instead enables all ionized peptides within a sample to be fragmented.1 This approach apparently benefits from significantly increased MS/MS signals that are advantageous for both qualification and quantification for protein analysis. The strategy of sequential window acquisition of all theoretical fragment ion spectra (SWATH) is a novel DIA method that aims to complement traditional mass spectrometry-based proteomics techniques such as shotgun proteomics and multiple-reaction monitoring (MRM).2 Fundamentally, SWATH acquisition mode (SWATH-MS) records MS/MS ions that are derived from the fragmentation of all peptide precursor ions within a predetermined mass-tocharge window at a particular time.3 The composite MS/MS spectra generated by SWATH-MS are then annotated to the corresponding peptides or proteins through an extraction analysis against an ion library by peak picking using mass-tocharge ratios of the precursor and fragment ions, the relative intensities of the MS signals, chromatographic concurrence, and other information.2 Clearly, the construction of an ion library is a critical component for SWATH. Two predominant approaches are used to construct an ion library: through the shotgun sequencing of either a suitable sample4 or chemically © XXXX American Chemical Society

Received: January 10, 2014 Accepted: June 26, 2014

A

dx.doi.org/10.1021/ac501828a | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Technical Note

library. Details can be retrieved from the Supporting Methods and Figure S1, Supporting Information. Quantitative Analysis of Different Samples via SWATH. The SWATH-MS data of TTE55 and TTE75 were acquired using the TripleTOF 5600 with three technical replicates. The six SWATH-MS data files (wiff) were then annotated using the expanded ion library mentioned above. The peak areas for each protein/peptide were extracted and quantified using QPROT at http://www.nesvilab.org/software. html. Proteins possessing greater than 1.5-fold changes in the chromatographic area between the two samples were defined as differential ones. Statistical Analysis. To evaluate the improvements in the ion library, either with/without RT correction or ion library expansion, a one-way t test was used on all the data sets (Tables S1, S2, and S3, Supporting Information). For the SWATH FDR evaluation toward the extracted peptides from SWATH-MS, a decoy ion library was constructed by using the similar principle of ref 13, and then followed by SWATH FDR evaluation. To evaluate the variances of peptides and proteins, CVs were calculated using the equation CV = SD/MEAN × 100%.

MS run by SWATH-MS and optimized the data process for RT correction. Using this method for the ion library construction, we demonstrated an extended coverage of peptide identifications directly from a single SWATH-MS analysis. Using the expanded ion library, we performed relative quantification of two bacterial states using SWATH-MS and obtained comparable results to data generated from an iTRAQ analysis of the same two states.



MATERIALS AND METHODS Sample Preparation. TTE strain MB4T was cultured in the complex MB media under two temperatures, 55 or 75 °C, as described by Wang et al.8 The bacterial proteins were extracted from the TTE pellets using Chen’s method,9 which were termed TTE55 or TTE75, and the extracted proteins were either directly digested using trypsin after reduction and alkylation10 or separated by SDS-PAGE. The SDS-PAGE was stained with Coomassie blue, and each lane was sliced into 15 bands.11 Each slice was subjected to tryptic digestion as described in our previous report.12 LC-MS Analysis to Peptides. An Ekspert nanoLC 415 single gradient system (Eksigent, Redwood City, CA, USA) coupled with a C18 column (50 cm × 75 μm, Dionex, San Francisco, USA) was used to chromatographically separate the peptides. The two mobile phases consisted of solvent A, 0.1% formic acid in 5% acetonitrile water, and solvent B, 0.1% formic acid in 80% acetonitrile. A linear gradient of solvent B from 5% to 30% over 360 min was adopted for all the chromatographic separations. A TripleTOF 5600 System (AB SCIEX, Concord, ON, Canada) mounted with a Nanospray III source (AB SCIEX, Concord, ON, Canada) and a pulled quartz emitter (New Objectives, Woburn, MA) was used in informationdependent acquisition (IDA) or SWATH-MS mode. In IDA mode, 30 dependent scans whose precursors were chosen from a 250 ms survey scan (TOF-MS) were performed at accumulation times of 50 ms. Ions in the MS scan with intensities >120 counts/s and charge states >1+ were chosen for MSMS. In SWATH-MS, a 100 ms survey scan (TOF-MS) was conducted first, and 32 SWATH scans with a quadrupole resolution of 25 Da were operated as described by Gillet et al.2 Database Search and Peak Extraction from SWATHMS Data. ProteinPilot (V4.5.1) was used to determine the peptide and protein identification in the DDA samples. It searched against the NCBI database (NCBI reference sequence: NC_003869.1) with Rapid ID search effort and Cys alkylation with iodoacetamide. PeakView (V 1.2.0.4) was used to integrate the group files from ProteinPilot and the wiff files from SWATH-MS with the following parameter settings: (1) only unique peptides from ProteinPilot identification were considered for quantification, (2) a 75 ppm m/z tolerance was set for the targeted transitions, (3) the transitions were picked within a 20 min RT window around the RT of a peptide in the ion library, (4) the confidence for peptide identification was set to 99%, and (5) peak group detection for export was set to a SWATH FDR of 1%.6 Expansion of Ion Library Based on RT Alignment of Different DDA Data. The method to build an expanded ion library based on fractionated proteomics was divided into four steps: (1) obtaining peptide RTs from SWATH-MS data (SWATH RTs), (2) obtaining peptides RTs from fractionated DDA data (DDA RTs), (3) correcting RTs between SWATH RTs and DDA RTs, and (4) generating an expanded ion



RESULTS AND DISCUSSION Construction of Ion Library Based on DDA Analysis of Fractionation Proteomics. RT correction is generally considered an important step for peptide extraction in SWATH, but it is not often performed during the construction of ion libraries. To construct an ion library using different sources of DDA data, however, the RT is assumed to be a reasonable and feasible link between the different DDA sources. In this study, we set up a pipeline to construct an expanded ion library based on fractionated proteomics data. We identified 3 principal factors that were important for RT correction and usage in libraries for SWATH analysis: first, how to combine the data from multiple DDA analyses; second, the optimal number of peptides to be used in the SWATH data to align the DDA data, and finally, the optimal size of the ion library for extraction of peak groups from SWATH-MS data. The integration of the RT information was investigated using two strategies. In strategy 1, as presented in Figure S2, Supporting Information, we combined the DDA results from 15 raw search results as a single library followed by RT correction against the SWATH-MS data. After RT correction, a new ion library was constructed and was termed as “all_corrected_lib”. In strategy 2, each individual fraction DDA result was aligned against the SWATH-MS data to generate 15 corrected libraries. The 15 corrected libraries were merged to produce a combined ion library that was termed as “individual_corrected_lib”. Both libraries were used to extract data from SWATH experiments. We were able to extract 20% more peptides using “individual_corrected_lib” instead of the “all_corrected_lib”, specifically, 11 988 peptides from “individual_corrected_lib” compared to 9740 peptides from “all_corrected_lib”. This increase in the number of identified peptides resulted in a 5% increase in the number of identified proteins, specifically 1547 proteins from “individual_corrected_lib” compared to 1474 from “all_corrected_lib”. An example is shown in Figure S3, Supporting Information, to demonstrate how RT correction can increase the number of detected peptides. In Figure S3, Supporting Information, the DDA RT of the peptide in the raw ion library is 217.14 min, whereas its SWATH RT is 207.46 min. Therefore, the SWATH FDR for the peak extraction is 0.013 without RT correction. After its B

dx.doi.org/10.1021/ac501828a | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Technical Note

results demonstrated that the ion library built on an RT correction with better regression benefited peptide extraction from SWATH-MS data. It is well-known that the use of different-size FASTA files can affect FDR results during database searching; however, it is currently unknown whether the size of an ion library could result in a variance of the true positive rate in SWATH extraction. To assess this effect, we generated different ion libraries based on the confidences of peptide identifications. In total, we produced 4 different libraries, “conf99_lib”, “conf80_lib”, “conf30_lib”, and “noconf_lib”, which corresponded to peptide identification confidences of 99%, 80%, 30%, and all peptide matches, respectively. After extracting the different libraries, the numbers of peptides/proteins extracted at 1% SWATH FDR were determined and are plotted in Figure 1C,D. The dotted lines indicate that the extraction efficiencies (extracted peptides/total peptides in the ion library) decreased as the size of the ion library was increased. The bars in Figure 1C,D illustrate that the ion libraries with identification confidences >99% can extract the greatest number of peptides and proteins. Altogether, the ion library with a confidence setting of 99% not only extracted more proteins and peptides but also resulted in better extraction efficiencies from an ion library; i.e., 25% of the peptides were utilized at a confidence of 99%, whereas only 5% of peptides without a confidence cutoff value were utilized. The extraction of peptides from SWATH-MS data that contains MS and chromatographic information depends upon a satisfactory ion library. The RT correction that links SWATH RTs and DDA RTs is therefore a key step for the generation of a high-quality ion library. Samples are often spiked with exogenous peptides to perform RT correction. For instance, a kit developed by Escher14 called iRT has previously been applied in a SWATH study.3 RT correction with exogenous peptides, however, is not always practical because the MS signals of these peptides can either impact or be affected by endogenous peptides. RT correction with endogenous peptides is able to avoid such data bias, and the PeakView software (AB SCIEX) is commonly accepted for endogenous RT correction. In practice, especially during the construction of ion libraries for fractionation proteomics, PeakView contains two technique bottlenecks, manually picking the peptides that are not sufficiently efficient for large-scale date treatment and the difficulties experienced when handling the large size of the expanded ion library. Our approach herein provides a feasible solution to these technique obstacles because RT regression with global peptides filtered by the SWATH FDR could be implemented in an automatic program, while the optimized size of the ion library filtered by a confident identification could increase the running efficiency of PeakView. Evaluation of the Expanded Ion Library Achieved by Fractionation Proteomics. The TTE55 lysate proteins were divided into two parts, one that was directly tryptic digested and whose corresponding peptides were identified by DDA or SWATH-MS, and the other that was separated by SDS-PAGE, in which a total of 15 gel bands were sliced, followed by in-gel tryptic digestion and peptide identification by the DDA mode. The two ion libraries were constructed following the established approach above, one as the unexpanded ion library from unfractionated TTE lysate, “unfrac_lib”, and the other as the expanded ion library from the SDS-PAGE fractionated proteomics, “frac_lib”. A total of 11 988 peptides and 1389 proteins were extracted using “frac_lib”, while 10 122 peptides

DDA RT is corrected by the SWATH RT, the RT correction follows the linear regression of y = 0.89x + 12.27, and the RT value becomes 205.52 min. Thus, the SWATH FDR for the peak extraction is improved to 0.010, which is an accepted FDR value for identification of a peptide. We used a recommended 1% SWATH FDR cutoff value in the SWATH peak group detection when aligning the individual libraries above. However, the more data we could use to align the RTs, the higher the confidence in the regression analysis. As such, we investigated the use of different SWATH FDR cutoff values for the correction of the different RTs. Two sets of extracted peptides from SWATH-MS were exported in the case of fraction 2, Set 1 using a SWATH FDR threshold of 100% and Set 2 using a SWATH FDR value of 1%. As shown in Figure 1A, the RT correlation coefficient between the DDA

Figure 1. (A) Scatter plot of ion library RTs and SWATH RTs. (B) Scatter plot of ion DDA RTs and SWATH RTs with SWATH FDR