Data Processing Has Major Impact on the Outcome ... - ACS Publications

Nov 19, 2014 - However, both techniques involve extensive data processing, which can be performed by a multitude of different software solutions. Anal...
1 downloads 9 Views 2MB Size
Article pubs.acs.org/jpr

Data Processing Has Major Impact on the Outcome of Quantitative Label-Free LC-MS Analysis Aakash Chawade,∥,† Marianne Sandin,∥,† Johan Teleman,†,‡ Johan Malmström,‡ and Fredrik Levander*,†,§ †

Department of Immunotechnology, Medicon Village, Lund University, Scheelevägen 2, S-223 63 Lund, Sweden Department of Clinical Sciences, Faculty of Medicine, Lund University, SE-221 84 Lund, Sweden § Bioinformatics Infrastructure for Life Sciences (BILS), Lund University, P.O. Box 117, 221 00 Lund, Sweden ‡

S Supporting Information *

ABSTRACT: High-throughput multiplexed protein quantification using mass spectrometry is steadily increasing in popularity, with the two major techniques being datadependent acquisition (DDA) and targeted acquisition using selected reaction monitoring (SRM). However, both techniques involve extensive data processing, which can be performed by a multitude of different software solutions. Analysis of quantitative LC-MS/MS data is mainly performed in three major steps: processing of raw data, normalization, and statistical analysis. To evaluate the impact of data processing steps, we developed two new benchmark data sets, one each for DDA and SRM, with samples consisting of a long-range dilution series of synthetic peptides spiked in a total cell protein digest. The generated data were processed by eight different software workflows and three postprocessing steps. The results show that the choice of the raw data processing software and the postprocessing steps play an important role in the final outcome. Also, the linear dynamic range of the DDA data could be extended by an order of magnitude through feature alignment and a charge state merging algorithm proposed here. Furthermore, the benchmark data sets are made publicly available for further benchmarking and software developments. KEYWORDS: label-free, quantification, proteomics, SRM, shotgun, targeted



INTRODUCTION The usage of quantitative label-free proteomics workflows has accelerated in recent years partly due to the improvements in mass spectrometry techniques. The major platforms can be considered to be LC-MS/MS shotgun proteomics and targeted LC-SRM, with upcoming variants using components from both, e.g. directed proteomics and data-independent acquisition.1 From a user perspective, shotgun proteomics usually requires less upfront method development, while targeted proteomics requires the design of SRM assays for the peptides to be measured.2 On the other hand, data processing is challenging for label-free shotgun proteomics and involves several processing steps for precursor-based quantification, including feature detection, feature identification, and retention time alignment.3,4 Processing of targeted proteomics data, on the other hand, is limited to detecting the correct peaks, and possibly removal of interference. Over the past decade, several software workflows aimed toward label-free analysis have been developed.3−5 The list of software includes commercial, freely available, modular, and nonmodular variants. The increasing number of software has necessitated the development of tools aimed at comparative evaluation of their output. Hoekman et al.6 developed msCompare to assess workflows based on freely available © 2014 American Chemical Society

modular software and showed that modular workflows based on different software outperform a consensus pipeline. Zhang et al.7 performed quantitative comparison of the commercial software Elucidator and Progenesis and proposed a set of measurement metrics useful for such comparison. Comparative evaluations have also been done for several peak picking algorithms,8,9 retention-time alignment algorithms,9−11 normalization methods,12−15 and statistical methods used for identification of biomarkers.16 It is thus evident that the selection of an optimal data processing workflow is important for retrieving relevant information from the data. From an experimentalist perspective, the relative severity of false negatives versus false positives is dependent on the nature of the investigation, and should guide the selection of a suitable workflow for the current question. While proteome coverage may be an important factor, it may be equally important to select a workflow that has a good chance of detecting a 2-fold quantitative difference in a low-abundant protein as statistically significant. But to allow for such a workflow selection, data processing procedures need to be characterized at the level of Received: June 29, 2014 Published: November 19, 2014 676

dx.doi.org/10.1021/pr500665j | J. Proteome Res. 2015, 14, 676−687

Journal of Proteome Research

Article

Table 1. Spike-in Peptides in the Dilution Seriesa Sample

Replicates

Potato Relative conc

Human Relative conc

Comparison

Potato FC

Human FC

1 2 3 4 5 6 7 8 9 10 11 12

4 7 7 7 4 4 4 4 4 4 4 4

1 0.33 0.12 0.1 0.033 0.01 0.0033 0.001 0.00033 0.0001 0.000033 0.00001

0.00001 0.000033 0.000067 0.0001 0.00033 0.001 0.0033 0.01 0.033 0.1 0.33 1

2 vs 1 3 vs 2 4 vs 3 5 vs 4 6 vs 5 7 vs 6 8 vs 7 9 vs 8 10 vs 9 11 vs 10 12 vs 11

3.33 2.75 1.20 3.03 3.30 3.03 3.30 3.03 3.30 3.03 3.30

3.30 2.03 1.49 3.30 3.03 3.30 3.03 3.30 3.03 3.30 3.03

a

Sample: Sample number in the dilution series; replicates, total number of technical injections; relative concentration, the concentration relative to that in sample 1 (Potato) or sample 12 (Human); FC, relative expected fold change in the two consecutive samples. The highest concentration in sample 1 or 12 corresponds to approx. 20 pmol/μL.

differential expression data evaluation, and to our knowledge, such characterization is missing. In the present work, two new benchmark data sets were generated from samples containing a stable background with 273 spiked-in synthetic peptides. The two data sets were acquired by analyzing the samples using both shotgun and targeted proteomics. The shotgun and the SRM data were then analyzed quantitatively by six and two different software workflows, respectively. The quantitative evaluation was performed on the output from each software solution but was also done after each of the three postprocessing steps, including normalization, charge state merging, and curation. The importance of the retention time alignment step in the shotgun data processing was also investigated. Finally, the two new benchmark data sets were made publicly available for further evaluation and comparisons.



MATERIALS AND METHODS

Preparation of the Benchmark Data

Synthetic crude tryptic peptides (SpikeTides proteotypic peptides) from potato (115 peptides) and human (158 peptides) protein sequences were obtained from JPT Peptide Technologies GmbH (Supporting Information File 1). The lyophilized peptides (∼50 nmol/well) were dissolved by adding 5 μL concentrated formic acid (JT Baker) to the wells. Thereafter, 5 μL LiChrosolv water (Merc Millipore) was added, followed by addition of 300 μL 30% acetonitrile and incubation for 5 min in a sonication bath. Peptides were pooled into one human and one potato pool by transferring 5 μL of 160 pmol/ μL of each peptide to an Eppendorf tube, lyophilized, and redissolved to approximately 20 pmol/μL in 0.1% formic acid. A background of stable and nonvariable peptides was obtained from Streptococcus pyogenes strain SF370.17 Synthetic peptides were spiked into the background at 12 different concentration points, ranging from approximately 20pmol/μL to 200 attomol/μL, as summarized in Table 1 and Figure 1. To reduce the pipetting related errors, the first four samples with the highest concentrations were spiked in individually, and the rest of the samples were spiked in three different subsets wherein each subset was prepared from the dilution of the earlier subset.

Figure 1. Overview of the acquired data sets and the different data processing platforms employed in the study. (A) Samples and replicate analyses; (b) Data processing workflows; (C) Processing steps.

LC-MS Data Acquisition

For both shotgun and targeted proteomics, LC separation was conducted on an Eksigent nanoLC system (Eksigent Technologies, Dublin CA). The following mobile phases were used: Solvent A 0.1% formic acid and Solvent B acetonitrile with 0.1% formic acid. For each run, 2 μL of the sample was injected at a constant flow of 300 nL/min with a gradient of 97% of solvent A from 0 to 5 min, 65% A at 95 min, and 10% A from 98 to 108 min. Blank samples with water injections were run in between each sample. Technical replicates were obtained by multiple sample injections in randomized order. The replicate injections are denominated 677

dx.doi.org/10.1021/pr500665j | J. Proteome Res. 2015, 14, 676−687

Journal of Proteome Research

Article

for the database searches in Andromeda,22 with default Orbitrap (ITMS) settings (first search precursor tolerance 20 ppm and main search tolerance 7 ppm). Other parameters were maxCharge”:7″, protease “Trypsin/P”, variableModifications: “Oxidation” and “Acetyl (Protein N-term)”, fixedModifications: “Carbamidomethyl”, Fragment ion tolerance: “0.5 Da” and deisotoping: “false”. Thus, the data output from MaxQuant was not normalized but was charge state deconvoluted in MaxQuant. During writing of this manuscript, newer versions of MaxQuant have become available. We reprocessed the raw data using MaxQuant 1.5, but since this produced very similar results, it was deemed unnecessary to redo the entire analysis. Proteios. For Proteios23 data processing, the raw data files were converted to mzML and MGF using ProteoWizard,24 following a previously described workflow.25 MS/MS identification was performed by combining the peptide-spectrum level search results from X!Tandem and Mascot. Search settings were 7 ppm precursor tolerance and 0.5 Da fragment tolerance, with fixed carbamidomethylation of cysteins and variable oxidation of methionines. Feature detection was performed using msInspect26 without any feature filtering and the features were matched to identifications from the MS/MS searches, that passed a false discovery rate of 0.01 at the peptide-spectrummatch level, with a tolerance of 0.005 Da and 0.2 min. A feature report was exported before further processing (MSIP_BA), and after alignment and propagation of identifications between runs (MSIP). Additionally, features were imported from the MaxQuant runs, and subjected to the same workflow. These were denoted MQP_BA and MQP, when exported before and after alignment and propagation, respectively. Thus, the data from MQP, MSIP, MSIP_BA and MQP_BA was not normalized and was not charge state deconvoluted in Proteios. Skyline. Skyline version 2.527 was used. For SRM method setup, a spectral library was created in Proteios and imported into Skyline, where the Pick Peptide Matching parameter was set to “Library and Filter”. In peptide settings, filter parameters were min length “8” and max length “25”. In transition settings, the parameters were: precursor charges: “2″, ion charges: “1″, ion types: “y,b” and “always add N-terminal to Proline”. For quantitation, the raw data files were imported into Skyline, and quantitation was performed automatically. Processed data from Skyline was exported without any manual curation. TotalArea calculated by Skyline was used for quantitative evaluation. The data from Skyline was not normalized in Skyline. Anubis. For Anubis,17 the raw data files were converted to mzML using ProteoWizard.24 The mzML files were processed with Anubis version 1.2. The default settings were q-value cutoff: “0.01”, null distribution size: “1000” and transition limit: “6”. The reference file for Anubis was created from unscheduled SRM runs of a sample containing all synthetic spiked-in peptides in the S. pyogenes background. The data from Anubis was not normalized in Anubis.

X_Y in the remainder of the manuscript, where X is the sample number and Y is the injection number. Shotgun LC-MS/MS analysis was performed online with an electrospray voltage of 1.9 kV using an LTQ Orbitrap XL ETD mass spectrometer (Thermo) with collision-induced fragmentation in the linear ion trap using top four data-dependent acquisition (DDA). MS1 spectra were acquired in the mass range 400−2000 at a resolution of 60,000 at m/z 400 in the orbitrap with a target value of 5.0 × 105 ions and maximum fill time of 500 ms. MS2 spectra were collected in the linear ion trap with a target ion value of 104 and maximum 100 ms fill time. Dynamic exclusion settings were: repeat count 2, repeat duration 20 s, exclusion list size 500, exclusion duration 120 s. Targeted acquisition by SRM was performed on a TSQ Quantum Vantage (Thermo-Fisher Scientific, Waltham MA) triple quadrupole instrument. TSQ cycle time was 2 s, Q1 and Q2 isolation window width was 0.7 Da, and the spray voltage was 1.75 kV. The gradient settings were as mentioned for the Orbitrap above. Up to four transitions were measured for each peptide, and the transition list targeted 78 human, 47 potato, and 29 S. pyogenes peptides in scheduled mode (Supporting Information File 1). The raw data files from the shotgun experiments and a fasta file with the spike-in peptides have been deposited to ProteomeXchange Consortium,18 via the PRIDE partner repository with the data set identifier PXD001091 and DOI 10.6019/PXD001091. The SRM data set and transition list have been deposited in PASSEL19 at http://www. peptideatlas.org/PASS/PASS00522. Software Workflows

The shotgun LC-MS/MS data was processed using three different software packages. For MS/MS identification a database consisting of the Streptococcus pyogenes MGAS1 proteome downloaded from UniProt and the spike-in synthetic peptides was used, totaling 2584 proteins. It was extended with an equal size decoy part (reversed proteins) for Mascot (http://www.matrixscience.com, version 2.3.01) and X!Tandem20 (version TORNADO 2008.12.01.1) searches. Progenesis. Progenesis version 4.1.4804.41555 (NonLinear Dynamics, UK) was used. Removal of areas was not done before alignment, and an intermediate dilution level was used as a reference. After manual inspection, sample 7_01 was selected as reference. Automatic alignment performed well for most files; however, four files, 2_04, 4_03, 10_03, and 9_04, were not automatically aligned. Manual landmarks were subsequently added (up to 20 per file) until alignment of the bulk of the peptides was considered satisfactory and followed by automatic alignment. Another four files, 4_06, 8_04, 12_03, and 12_04, showed automatic alignment scores below 50%. These alignments were deleted and the files processed as above, thereby increasing the alignment scores. However, the scores were still low for the files 12_03 (38.6%) and 12_04 (42.0%) despite manual vectors. Scores for the rest of the files ranged between 51.2% and 96.3%. Peak picking was performed with default values, and neither features nor peptides were subjected to any filtering. Data normalization was done within Progenesis. Mascot was used for peptide identification with the same settings as described for Proteios below. Thus, the data output from Progenesis was normalized in Progenesis but was not charge state deconvoluted. MaxQuant (MQ). MaxQuant21 version 1.3.0.5 was run with label-free quantification settings and the match-between-runs setting enabled. The custom database described above was used

Data Processing

Data processing prior to statistical analysis was performed in MATLAB version 8.2.0.701 (R2013b) (www.mathworks.org). Normalization was performed in Normalyzer.14 All subsequent analyses were performed on log-transformed data (natural logarithm). Charge State Merging. To facilitate comparison and evaluation of the processed shotgun data, an intensity-ratiobased charge state merging was implemented, as some software treat peptides at different charge states separately, and others 678

dx.doi.org/10.1021/pr500665j | J. Proteome Res. 2015, 14, 676−687

Journal of Proteome Research

Article

Figure 2. Three charge states of one peptide. For deconvolution purposes, the charge state with least missing values is selected as the state to which intensities are mapped and is referred to as the base peptide. It should be noted that the base peptide could be represented by different charge states for different peptides. For the first dilution level in the figure, there are two possibilities: the missing intensity could be estimated either from charge state 1 or charge state 2. The method is described in detail in the Supporting Information.

LOESS28 normalization for undirected data and local LOESS normalization for SRM data, where non-normalized data had the worst score for undirected data but average for the SRM data. In the Progenesis workflow, built-in normalization from Progenesis was used. Limit of Linearity Estimation. To determine the potential linear dynamic range of the LC-MS platforms, a data curation strategy was applied based on outlier removal and R 2 (coefficient of determination) curation as follows: a. Outlier Removal. As seen in Supporting Information Figure S2, outliers can be detrimental to the linearity estimation. To prevent such effects, an iterative outlier removal procedure similar to the one utilized for the CVs in the charge state merging step was implemented. It is well-known that one of the inherent properties of the log-transformation is to stabilize the variance across the intensity range, this however leads to inflated CVs for the low intensity peptides; therefore, an empirically derived standard deviation cutoff for each peptide dilution level was selected. By manually studying the standard deviation distribution for all levels for each software workflow, the limit was set to 1.5. Any dilution level with a standard deviation above this was subjected to a leave-one-out filtering process, where the removed data point which gave the largest improvement in R2 was labeled an outlier. This was repeated until the standard deviation fell within limits. b. R2 Curation and Limit of Linearity. To determine the linear dynamic range of the respective platforms, low intensity dilution levels were iteratively removed until the R2 value reached a certain threshold. If the selected threshold was not reached with 20% of the data remaining for a peptide, the best of the iteratively computed R2 values was selected. The coefficient of determination was not freely optimized, as this often led to peptides with only the topmost dilution levels remaining. The threshold was selected based on manual inspection of which R2 value would give the best overall linear fit to the data from a software workflow. It should be noted that the R2 threshold is a means to assess the most accurate linear range, and so its absolute value is less vital than how well the resulting linear fit represents the true dynamic range of intensities. We refer to the lowest dilution level in this linear range as the limit of linearity and data which has been processed in this manner as curated data.

deconvolute to one charge state. If a peptide was found in several charge states, the state with the least missing values was used as the base to which intensities from other charge states were mapped (see Figure 2 and Supporting Information for a detailed description of the procedure). First, a base charge state was established as described above and then the ratios between other charge states and the base charge state were computed individually for each peptide and replicate (Figure 2). The mean ratio between replicate samples and its CV (coefficient of variation) was then computed for each charge state combination. The mean ratio was subsequently used to impute missing values for the base charge state. To avoid erroneous imputations, a CV cutoff of 10% was used, so that ratios with a CV higher than the cutoff were considered outliers, and ratios that deviated most from the mean were removed until the CV fell within the limit or a single ratio value remained. If several ratios were available to impute a single intensity value, then the one with the lowest CV was used. Singleton ratios, i.e. without a CV, were not considered in this case, as they could be outliers. Finally, the missing base charge state intensity was computed by dividing the corresponding feature intensity of the selected charge state by the selected mean ratio. This procedure was performed on all Proteios and Progenesis data. Examples of merging can be seen in Supporting Information Figure S1. Normalization Evaluation. An optimal normalization method was chosen for each data set based on the report generated by Normalyzer. Further verification of the suitability of the normalization method was performed by utilizing the background peptides and evaluated after charge state merging. For each such peptide, the standard deviations between replicate injections for all associated dilution samples were calculated. The deviations were summed up, rather than using means or medians, as the sum allowed for better differentiation between the peptides, as well as taking the normalization results for all levels into account. The sum can however not be used for comparison between peptides, as they may be present in varying numbers of samples. Linear regression was subsequently performed for every peptide and the slope of the line determined, resulting in a representation of each peptide based on two values: the sum of the level-wise standard deviations and the absolute value of the slope. The product of the means for the distributions of these two values over the peptide sets was used as a normalization selection metric, where the normalization strategy with the minimum value was selected, resulting in surprisingly uniform results: global

Statistical Analysis

The typical quantitative proteomics investigation aims at finding molecular differences, and this is performed by applying statistical tests between samples from different biological 679

dx.doi.org/10.1021/pr500665j | J. Proteome Res. 2015, 14, 676−687

Journal of Proteome Research

Article

fold-change (FC) greater than 1.5-fold were considered to be statistically significant in pairwise comparisons. For comparison of samples 4 versus 3, statistical significance was based on the FDR threshold of