Technical Note pubs.acs.org/jpr
Integral Quantification Accuracy Estimation for Reporter Ion-based Quantitative Proteomics (iQuARI) Marc Vaudel,†,‡ Julia M. Burkhart,†,‡ Sonja Radau,† René P. Zahedi,† Lennart Martens,*,§,∥ and Albert Sickmann†,⊥ †
Leibniz-Institut für Analytische Wissenschaften − ISAS − e.V., Dortmund, Germany Department of Medical Protein Research, VIB, Ghent, Belgium ∥ Department of Biochemistry, Ghent University, Ghent, Belgium ⊥ Medizinisches Proteom-Center (MPC), Ruhr-Universität, Bochum, Germany §
S Supporting Information *
ABSTRACT: With the increasing popularity of comparative studies of complex proteomes, reporter ion-based quantification methods such as iTRAQ and TMT have become commonplace in biological studies. Their appeal derives from simple multiplexing and quantification of several samples at reasonable cost. This advantage yet comes with a known shortcoming: precursors of different species can interfere, thus reducing the quantification accuracy. Recently, two methods were brought to the community alleviating the amount of interference via novel experimental design. Before considering setting up a new workflow, tuning the system, optimizing identification and quantification rates, etc. one legitimately asks: is it really worth the effort, time and money? The question is actually not easy to answer since the interference is heavily sample and system dependent. Moreover, there was to date no method allowing the inline estimation of error rates for reporter quantification. We therefore introduce a method called iQuARI to compute false discovery rates for reporter ion based quantification experiments as easily as Target/Decoy FDR for identification. With it, the scientist can accurately estimate the amount of interference in his sample on his system and eventually consider removing shadows subsequently, a task for which reporter ion quantification might not be the solution of choice. KEYWORDS: false discovery rate, quantification, iTRAQ, TMT
■
This issue has already been discussed in the literature4−12 and recently two methods6,10 were proposed alleviating the coisolation of fragment ions: (A) perform an additional round of fragmentation yielding MS3 reporter ions or (B) perform a gas phase fractionation by charge reduction. It is important to note that (A) is not compatible with every protease and especially not with trypsin which is the most used protease in proteomics. Moreover, both methods require advanced experimental design and data processing which is in strong contrast with the simplicity of reporter ion methods. It is thus legitimate for the scientist to verify that the gain on his sample on his system is worth the effort, time and money. In the present work, we therefore describe iQuARI, the first method that allows integral quantification accuracy estimation within each experiment. It is based on the use of a suitable decoy sample in one of the multiplexed channels of the experiment in analogy with target/decoy database search strategies for peptide and protein identification error estimation.13 Briefly, one of the isobaric tags is dedicated to the labeling of a decoy sample as illustrated in Figure 1. Similar to a decoy database for
INTRODUCTION Reporter ion-based quantification methods gained popularity in the past decade because of their simple setup and reasonable cost. The two most commonly encountered examples are iTRAQ,1 allowing for multiplexing of up to eight samples, and TMT,2 allowing for multiplexing of up to six samples. In these quantification methods, peptides from different samples are chemically labeled with different isobaric tags. Each flavor of the attached labels presents similar physicochemical properties and is isobaric with the others by balancing light and heavy isotopes, making differently labeled peptides indistinguishable along the workflow as long as they stay intact. Differentiation of the labeled peptide forms occurs upon fragmentation, when the labels fragment and release so-called reporter ions at defined and flavor-specific mass-overcharge ratios (m/z). The multiplexed samples thus follow the same experimental workflow from labeling to final data processing, where the relative amounts of each sample are deduced from the reporter ion peaks. However, this quantification method suffers from ion interference, occurring when the precursor isolation width of the mass spectrometer allows coisolation of unrelated ions within the respective m/z-range coeluting with the peptide of interest.3 The obtained reporter ion signals are then derived from both the peptide of interest as well as coselected contaminants. © 2012 American Chemical Society
Received: March 13, 2012 Published: August 8, 2012 5072
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
Journal of Proteome Research
Technical Note
In the present work, we demonstrate the suitability of the Pyrococcus f uriosus proteome for the error rate control of human samples and by extension Eukaryota samples.14 With a proof of principle experiment, we demonstrate the efficiency of the method, revealing substantial interference between the target and decoy samples. However, we also demonstrate that the error becomes marginal after a simple fractionation of the samples, thus questioning whether the interference reduction methods actually reach the threshold of profitability.
■
MATERIALS AND METHODS
Material
Ammonium bicarbonate (NH4HCO3) iodoacetamide (IAA), urea, trypsin, guanidinium hydrochloride (Gu-HCl), triflouroethanol (TFE) and the ampholyte (pH 3−10) were purchased from Sigma-Aldrich, Steinheim, Germany. Trichloroacetic acid (TCA) was obtained from Roth, Karlsruhe, Germany. Sodium dihydrogen phosphate (NaH2PO4) calcium chloride (CaCl2) and magnesium chloride (MgCl2) was purchased from Merck KGaA, Darmstadt, Germany and Benzonase from Merckmillipore, Germany. Tris base was aquired from AppliChem, Darmstadt, Germany. DTT was bought from Roche Diagnostics, Mannheim, Germany. Bicichinon assay (BCA) was acquired from Pierce Thermo Fisher Scientific, Schwerte, Germany and Spec C18AR tips as well as Complex Proteomics Standard (representing the proteome of Pyrococcus f uriosus, pfu) and the DryStrip Cover Fluid for the IEF from Agilent Technologies, Darmstadt, Germany. The Immobiline DryStrips (pH 3−10) were bought from GE Healthcare, Munich, Germany. Nanosep centrifugal devices were acquired from PALL, Washington. The 4-plex iTRAQ reagent kit was obtained form Applied Biosystems, Forster City, California.All chemicals for ultrapure HPLC solvents such as formic acid (FA), trifluoro acetic acid (TFA) and acetonitrile (ACN) were obtained from Biosolve, Valkenswaard, The Netherlands.
Figure 1. Quantification False Discovery Rate estimation using iQuARI. One isobaric tag is used to label a decoy sample (here m/z 117) which is multiplexed with equal amounts of target samples (here m/z 114, 115 and 116). Since the target and decoy samples do not share any peptide, it is possible to assign identified MS/MS spectra either to the target or the decoy samples. Whereas target spectra contain the quantitative information, decoy spectra contain a measure of the interference which is used to model the prevalence of contaminated ratios in target spectra. The interference of target reporter ions in decoy spectra (derived from normalized intensities of reporter ions m/z 114, 115 and 116) thus provides a straightforward estimation of the quantification False Discovery Rate.
identification, this decoy sample has to fulfill two necessary and sufficient conditions:13 1 Orthogonality: the decoy sample must not contain any shared peptides with the target samples. 2 Similarity: the decoy sample must present the same quantification properties (intensity range, complexity, LC−MS behavior, etc.) as the target samples. Condition 1 allows distinguishing MS/MS spectra obtained from decoy sample peptides (decoy spectra) during the identification process. It is vital here to exclude all shared peptides. For example, approximately 4% of the yeast observable tryptic peptides are shared with human: yielding a setup in which every 25th peptide would be unsuited for error rate estimation. Condition 2 then ensures that the target reporter ions observed in decoy spectra will not be biased toward species: the interference found in the spectra will be the same for both target and decoy samples. Since the decoy spectra should only contain the decoy sample specific reporter ions, the simultaneous occurrence of target sample reporter ions is thus a straightforward measure of ion interference from target peptides. Indeed, the target channels will contain signal plus noise in target spectra but should in principle contain only noise in decoy spectra. Here again, similar LC−MS behavior and symmetrical experimental design between target and decoy samples was not achieved by any of the above-mentioned studies.
Experimental Procedure
Three experiments were conducted as listed in Table 1. (1) Symmetric proof of principle experiment where two iTRAQ channels (114 and 115) are used to label the pfu standard and the two others (116 and 117) to label HeLa cells; (2) a symmetric proof of principle experiment where two iTRAQ channels (116 and 117) are used to label the pfu standard and the two others (114 and 115) to label human platelets; (3) an illustrative realistic experiment where three IEF fractions of human platelets are multiplexed with the pfu standard and analyzed in duplicates under different experimental conditions: gradient length of 95 or 185 min and precursor isolation width of 1 or 2 m/z. Pyrococcus furiosus Standard
Pyrococcus f uriosus (Pfu) was treated according to the manufacturer’s instructions. Briefly, an aliquot of 100 μg dissolved in 2 M GuHCl 50 mM Na2HPO4 was precipitated with TCA and subsequently dissolved in 50 mM NH4HCO3, 4 mM DTT and 50% TFE. Disulfide bonds were reduced for 60 min at 56 °C and afterward free sulfhydryl groups were carbamidomethylated using 15 mM IAA for 60 min at room temperature in the dark. For digestion TFE was reduced to final concentration of 5% with 50 mM NH4HCO3 and trypsin was added in a protease to protein ratio of 1:30 and incubated at 37 °C overnight. 5073
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
(1) A proof of principle experiment was conducted using two iTRAQ channels for the Pyrococcus furiosus (pfu) standard and the two others for a HeLa cell lysate. The symmetry of this experiment allows us to demonstrate the resemblance of quantitative results obtained on both proteomes. (2) A similar experiment was conducted, this time with a human platelet sample. (3) Multiplexing three IEF fractions together with the pfu standard, we could show how to estimate quantitative error on any given experiment. Four experimental settings were benchmarked here: (a) gradient length of 95 min and precursor isolation width of 2 m/z; (b) gradient length of 185 min and precursor isolation width of 2 m/z; (c) gradient length of 185 min and precursor isolation width of 1 m/z.
HeLa
Hela cells were lyzed in 100 μL lysis buffer consisting of 150 mM NaCl, 50 mM Tris, 1% SDS, pH 8.5 and protein concentration was determined via BCA assay. MgCl2 was added to a final concentration of 2 mM and the solution was further incubated with 50 units Benzonase at 37 °C for 25 min. Disulfide bonds were reduced using 10 mM DTT for 30 min at 56 °C and afterward free sulfhydryl groups were carbamidomethylated using 30 mM IAA for 30 min at room temperature in the dark. Digestion was performed using a modified FASP protocol:15 samples were transferred to ultrafiltration units of nominal molecular weight cutoff of 30000 (PALL) and rinsed trice via centrifugation using 100 μL of 50 mM NH4HCO3. For trypsin digestion, 100 μL of 50 mM NH4HCO3 and further ACN and CaCl2 were added to final concentrations of 5% and 1 mM, respectively. Trypsin was added in a protease to protein ratio of 1:30 and incubated at 37 °C for 12 h. Platelets
Human platelets from apheresis concentrates were isolated and purified according to Moebius et al.16 Platelets were resuspended and lyzed in 400 μL 8 M Urea, 50 mM Tris, pH 8.5. Protein concentration was determined using the BCA assay. Disulfide bonds were reduced with 10 mM DTT for 30 min at 56 °C and afterward free sulfhydryl groups were carbamidomethylated using 30 mM IAA for 30 min at room temperature in the dark. For digestion, Urea was diluted to a concentration of 1 M using 50 mM NH4HCO3. ACN and CaCl2 were added to final concentrations of 5% and 1 mM, respectively. Trypsin was added in a protease to protein ratio of 1:30 and incubated at 37 °C for 12 h. Digest Performance
Digests were controlled using monolithic column separation (PepSwift monolithic PS-DVB PL-CAP200 p.m., Dionex) on an inert Ultimate 3000 HPLC (Dionex, Germering, Germany) as described previously.17 Although the efficiency of tryptic digest of Pyrococcus f uriosus is controversial,18 previous work14 on the same standard demonstrated a large number of identified peptides and proteins, a finding further supported by the results presented here (see Figure 2A and online data sets). Desalting
Samples were desalted by solid phase extraction using Spec C18AR tips (Agilent, Germany) according to the manufacturer’s instructions. IEF
Isoelectric Focusing (IEF) was accomplished using an Agilent 3100 OFFGEL Fractionator (Agilent, Germany). For separation Immobiline DryStrips (GE Healthcare), pH 3−10 of 24 cm length as described previously.19 Briefly, IPG strips were treated with rehydration solution containing 12% glycerol and ampholyte, pH 3−10 for 15 min. Peptides were diluted in rehydration solution. Each well was loaded with 150 μL of the peptide solution. Subsequently, the Agilent OFFGEL Fractionator was prepared with Immobiline DryStrip Cover Fluid (GE Healthcare) and operated to focus 24 fractions with a maximum current of 50 μA and power of 200 mW. The potential was rising during 12 h starting at 300 V up to 8000 V and subsequently set to 8000 V until 64 kVh was reached. Finally, fractions were desalted evaporated and prepared for LC−MS/ MS analysis. As necessary for the design of experiment 3, all IEF fractions were measured in individual runs to test the
a
(117) Pfu
(a) 95 min, 2 m/z (b) 185 min, 2 m/z (c) 185 min, 1 m/z
(115) human platelet IEF fraction 9
(116) human platelet IEF fraction 20
(117) human platelet
(114) human platelet IEF fraction 3 Platelet multiplexed fractions
(116) human platelet
(115) pfu
(117) HeLa
(114) pfu
(116) HeLa
(115) pfu
Technical Note
3
description
Proof of principle platelets 2
1
Table 1. Summary of the Experiments Conducteda
Proof of principle HeLa
(114) pfu
labeling strategy
Journal of Proteome Research
5074
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
Journal of Proteome Research
Technical Note
Figure 2. (A) When using the Pyrococcus furiosus (pfu) proteome as decoy sample for Homo sapiens target samples, decoy spectra are measured at a high sampling rate on the whole m/z range over the entire LC gradient as shown here by a mapping of the precursor of MS/MS spectra identified at 1% FDR for both species. (B) We labeled the pfu standard with labels 114 and 115 and a total HeLa cell lysate with labels 116 and 117. The expected ideal ratios are thus 1:1:0:0 in pfu spectra and 0:0:1:1 in HeLa spectra. (C) Due to interference during precursor ion isolation, in reality intensities from pfu labels will be measured in human (target) spectra and vice versa. The human (target) ratios measured in a pfu (decoy) spectrum can then be used for estimating the quantification error if the interference is symmetrical between target and decoy species. (D) Distribution of interferences of pfu intensities in human spectra and human intensities in pfu spectra are hardly distinguishable. Species thus present similar quantification properties, and interference in target spectra can be estimated from interference in decoy spectra. (E) Interference level was monitored for HeLa cells (experiment 1), entire human platelet lysate (experiment 2), and three different IEF fractions of a human platelet sample (experiment 3b). As can be intuitively expected, the interference level decreased with sample complexity. While interference posed a serious threat for accurate quantification in very complex samples, it became negligible after a single additional fractionation step. (F) In the HeLa spectra obtained in experiment 1, the interference level increases as the human deisotoped reporter intensity level increases, as illustrated by the share of spectra with less than 20% interference (green), 20% to 80% interference (orange) and >80% interference (red). It is thus possible to filter the result at a desired level of quality by intensity level(see text). (G) Interference levels were monitored for three different IEF fractions of a human platelet sample under various experimental conditions. The effect of the experimental conditions is thus quantitatively benchmarked in line with the quantified sample. As expected from the literature, increasing the LC−MS gradient length from 95 min (experiment 3a, dark blue) to 185 min (experiment 3b, blue) reduced the interference level by reduction of coelution. The same effect was observed for the reduction of the precursor isolation width from 2 m/z to 1 m/z (experiment 3c, light blue), yet at a high loss in identification rate (see text). Using the iQuARI method, it is thus possible to estimate unbiased error rates for every experiment and to optimize the experimental workflow for optimal quantification results. 5075
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
Journal of Proteome Research
Technical Note
fragment ion mass tolerance of 0.02 Da. Carbamidomethylation of Cys, itraq114 on N-terminus and Lys as fixed and itraq114 on Tyr as well as oxidation of Met as variable modifications. All other settings were kept at the default values of SearchGUI. All spectra were searched against concatenated Forward/ Reverse Target/Decoy databases13 generated using SearchGUI based on the following target databases: (1) the human complement of the UniProtKB/Swiss-Prot database26 (downloaded on fourth of November 2010, containing 20260 target sequences), (2) a Pyrococcus f uriosus database, obtained from UniProt using the corresponding taxonomy (downloaded on the 11th of February 2011, containing 2087 target sequences). Generally, spectra matching peptide sequences that were shared between target and decoy databases were omitted, as well as peptides with less than 6 or more than 30 amino acids.
identification rate of every fraction (see Supplementary Figure 1, Supporting Information). iTRAQ Labeling
Three experiments required iTRAQ labeling, two proof of principle experiments and one illustrative realistic experiment. In experiment 1, two iTRAQ channels (114 and 115) are used to label replicates of the pfu standard and the two others (116 and 117) are used to label duplicates of a HeLa sample. Upon desalting, twice 20 μg of Hela peptides and twice 20 μg of pfu digest were lyophilized and resuspended in 0.5 M TEAB before treatment with the iTRAQ label (Applied Biosystems, Germany). In experiment 2, two iTRAQ channels (114 and 115) are used to label a human platelet sample (5 μg each) and the two others (116 and 117) are used to label replicates of the pfu standard (5 μg each) using the same procedure. For the realization of an everyday life experiment (experiment 3), we needed three different proteomes presenting qualitative and quantitative differences. We thus performed an IEF fractionation of human platelets used for the proof of principle experiment (experiment 2). As illustrated in Supplementary Figure 1, fractions 2−22 presented more than 1000 peptide to spectrum matches at 1% False Discovery Rate (FDR) when analyzed separately on an LTQ-Orbitrap XL using a 50 min gradient. Thus, we retained IEF fractions 3, 9, and 20 as they all provide a high amount of identifications while presenting differences in peptide composition. Approximately 5 μg of pfu as well as 5 μg of the platelet IEF fractions 3, 9, and 20 were iTRAQ labeled. iTRAQ labeling was here again performed according to the manufacturer’s instructions. As detailed in Table 1, labels 114, 115, and 116 were used for the platelet fractions whereas 117 was dedicated to the Pfu sample.
First Ranking Peptide
OMSSA often provides two possible best ranking hits. Usually the difference in sequence is a switch between Leucine and Isoleucine. In order not to bias our results and to ensure reproducibility, we selected the peptide belonging to the protein presenting the highest amount of identified spectra without thresholding. When considering Peptide to Spectrum Matches (PSMs) we will thus only refer to this best ranking hit. FDR estimation
OMSSA PSMs were filtered at 1% False Discovery Rate (FDR), meaning that an estimated amount of 1% random hits are included in the presented results and estimated via the use of concatenated Target/Decoy databases.13 Post-Translational Modification (PTM) location errors as well as close but wrong hits are here not considered as false positive since this cannot be assessed by target/decoy searches.27,28 The term false positive thus strictly refers to random hits. PSMs were sorted using OMSSA e-value. The number of target false positives with an e-value smaller than α, NFP(α), is estimated by counting the number of decoy hits with an e-value smaller than α, ND(α). With NT(α) the amount of target hits with an e-value smaller than α, the FDR at e-value α can thus be estimated as follows:29
MS Analysis
Measurements were performed on an LTQ-Orbitrap Velos mass spectrometer (both Thermo Fisher Scientific, Bremen, Germany coupled to an Ultimate 3000 Rapid Separation Liquid Chromatography (RSLC) system (Dionex, Germering, Germany). Briefly, peptides were preconcentrated on a 100 μm ID reversed-phase (RP) trapping column (Acclaim PepMap RSLC 100 μm × 2 cm, 3 μm particle size, 100 Å pore size, Dionex) in 0.1% TFA followed by separation on a 75 μm ID RP column (Acclaim PepMap RSLC 75 μm × 25 cm, 2 μm particle size, 100 Å pore size, Dionex) using a binary gradient (solvent A, 0.1% FA and solvent B, 0.1% FA 84% ACN) ranging from 5 to 50% of solvent B at a flow rate of 300 nL/min in 90 min (experiment 3a) or 180 min (experiment 1, 2, and 3b and c). MS survey scans were acquired in the range of 300 to 2000 m/z at a resolution of 30,000 using the polysiloxane m/z 371.101236 as lock mass.20 The five most intensive signals were subjected to HCD-MS/MS taking into account a dynamic exclusion of 10 s. Orbitrap AGC target values were set to 106 for MS and 2 × 105 for MSn. HCD spectra were acquired with a normalized CE of 45%, a default charge state of 7 and an activation time of 0.1 ms with a resolution of 7,500.
FDR(α) =
NFP(α) NT(α)
(1)
FDR(α̂ ) =
N (α ) NFP̂ (α) = D NT(α) NT(α)
(2)
Data Availability
Data sets of experiments 1, 2, and 3 were uploaded in Proteome XChange (http://www.proteomexchange.org, accession PXD000013) and can be inspected using PRIDE Inspector.30 PRIDE31 accession numbers of the various data sets are listed in Supplementary Table 1, Supporting Information. iTRAQ Quantification
Reporter ion peaks were searched in spectra with a mass tolerance of 0.01 Da. When two or more candidate peaks were found, the closest to the theoretic mass was chosen. Once extracted, the intensities were deisotoped by applying the invert of the isotope matrix.32
Spectrum Identification
Raw data were converted into mzML21 files using msconvert as part of the Proteowizard 2.2 package.22 They were further conzverted into mgf files using OpenMS 1.9.23 Database searches with OMSSA24 (version 2.1.9) were conducted with the help of SearchGUI25 (version 1.8.9). Search settings were: a maximum of two missed cleavages, peptide charges 2−4+, peptide mass tolerance of 10 ppm,
Ratio Calculation
In experiment 1 and 2, intensities from channels 114 and 115 (respectively named i114 and i115 here) present two replicates of the pfu standard for experiment 1 and of human platelets for 5076
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
Journal of Proteome Research
Technical Note
experiment 2. Intensities from channels 116 and 117 (i116 and i117, respectively), on the other hand, present two replicates of the HeLa quantification for experiment 1 and of the pfu standard quantification for experiment 2. It is possible to differentiate human and pfu spectra after identification in the respective database. In experiment 1 pfu spectra, two ratios of the pfu duplicates can thus be estimated: pfu min =
min(i114, i115) (i114 + i115)/2
(1)
pfu max =
max(i114, i115) (i114 + i115)/2
(2)
human in pfu117 =
min(i116, i117) (i116 + i117)/2
(3)
human max =
max(i116, i117) (i116 + i117)/2
(4)
i117′ = i117
(5)
human pfu
(6)
i115′ = i115
pfu human
(7)
i117′ = i117
pfu human
(8)
i114′ (i116 + i117)/2
(9)
pfu in human115 =
i115′ (i116 + i117)/2
(10)
human in pfu116
i114 i117′
(14)
human in pfu 2 =
i115 i117′
(15)
human in pfu3 =
i116 i117′
(16)
RESULTS Similar to the well-known target/decoy database approach for peptide and protein identification,13 the iQuARI method allows the estimation of quantitative error rates for reporter ion based quantification using target and decoy samples. Here, we present results from two proof of principle experiments (experiments 1 and 2) and from an illustrative experiment (experiment 3), demonstrating how iQuARI enables the inline control of quantitative error rates for virtually any reporter ion based quantitative experiment. Choice of the Decoy Sample
The choice of the decoy sample is crucial for the success of the iQuARI approach. In order to satisfy condition 2, the decoy sample has to resemble the target sample in terms of complexity, dynamic range and LC−MS response. Consequently, the use of a simple spiked-in mixture4,5,7−9 is insufficient. However, by using a species with sufficient evolutionary distance to the target sample, it is possible to use an entire proteome as the decoy sample while avoiding shared peptides, thus satisfying condition 1 of the Introduction. Pyrococcus f uriosus (pfu) is such a species with a complex proteome (2087 target protein sequences) that is evolutionary very distant from all Eukaryota and is readily commercially available (Agilent). As demonstrated in a previous study,14 its proteome shares only a minimal number of observable fully tryptic peptides with UniProt sequences from Eukaryota, leading to fully orthogonal identification results.
Symmetrically, human interference in pfu spectra are given by: i116′ = (i114 + i115)/2
human in pfu1 =
■
Moreover, pfu interference ratios in human spectra can thus be straightforwardly estimated: pfu in human114 =
(13)
Since all three human samples are interchangeable, they are considered together for the experimental error investigation.
Where human and pfu represent human and pfu deisotoped reporter intensity means, respectively. As displayed in Supplementary Figure 2A (Supporting Information), deisotoped reporter human intensities in human spectra present the same distribution as the pfu intensities in pfu spectra, thus allowing their comparison. A counter example, likely due to a pipetting issue, is given in Supplementary Figure 2B; the estimation of interference rate will suffer from high errors in such a scenario, indicating that such simple yet efficient quality controls are useful and necessary parts of an iQuARI workflow. Similarly, human intensities in pfu spectra are normalized: i116′ = i116
human pfu
It is thus possible to estimate three human interference ratios in pfu spectra, image of the “false positive” ratios due to interference in human spectra:
In human spectra, pfu intensities (channels 114 and 115) can be detected due to ion interferences at the MS1 level. In order to fairly compare pfu reporter intensities with human reporter intensities, it is mandatory to normalize pfu intensities in human spectra: human i114′ = i114 pfu
(12)
Identical processing was applied for the multiplexing of platelets and pfu (experiment 2). In experiment 3, channels 114, 115, and 116 are used to label IEF fractions of human platelets, whereas channel 117 is dedicated to the pfu standard. Unlike many experiments where regulation is not expected in all samples and where intensity median can be used for normalization, peptide relative concentrations are here highly variable due to the IEF separation. The maximal intensity of the three human channels was thus used to represent the human reference intensity. Pfu intensities were normalized as previously:
Similarly, human ratios can be estimated in human spectra: human min =
i117′ (i114 + i115)/2
Proof of Principle
Although LC−MS systems are tuned toward the detection of certain peptides regarding mass, charge and chemical properties
(11) 5077
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
Journal of Proteome Research
Technical Note
quantification experiment. In order to demonstrate the use of iQuARI on a realistic experiment, we applied the strategy to a complex biological analysis with three iTRAQ labels (114, 115 and 116) dedicated to human platelet, and with the last label (117) dedicated to the pfu standard. This time however, the human platelet sample was fractionated by isoelectric focusing (IEF), a standard procedure in proteomics reducing sample complexity. As can be expected, the less complex IEF fractions presented lower interference when compared to the full proteomes used in the above-mentioned proof of principle experiment: with a 95 min gradient 92 ± 2% of all confidently identified MS/MS spectra presented a quantification error below 20%, 7 ± 2% between 20 and 80% and 2 ± 1% higher than 80%, see Figure 2G. As also illustrated Figure 2, increasing LC gradient length from 95 to 185 min (identical to the gradient used in experiment 1 and 2) reduced the level of interference errors only marginally: 93 ± 1% of all confidently identified MS/MS spectra presented a quantification error below 20%, 6 ± 1% between 20 and 80% and 1.3 ± 0.4% higher than 80%. Similar to, and as expected from the literature,7,12 reducing the precursor isolation width from 2 m/z to 1 m/z slightly reduced the error rate: 93 ± 1% of all confidently identified MS/MS spectra presented a quantification error below 20%, 5 ± 0.8% between 20 and 80% and 1.6 ± 0.4% higher than 80%, but at a substantial cost in identification rate, amounting to 18 ± 2% loss in the amount of identified human spectra at 1% FDR.
and thus implicitly favor a particular subset of peptides in any sample, this does not ensure that the analysis of a pfu standard will present identification and quantification results that are comparable with a human sample (condition 2 of the Introduction). We therefore designed a perfectly symmetrical iTRAQ 4-plex experiment (experiment 1) multiplexing the HeLa (target) and pfu (decoy) proteomes. Labels 114 and 115 were used for the pfu standard while labels 116 and 117 where used for HeLa. After measurement on an LTQ-Orbitrap Velos LC−MS system, peptide identification was conducted at 1% FDR against the respective databases allowing the separation of (target) human spectra and (decoy) pfu spectra. As depicted in Figure 2A, Hela and pfu are barely distinguishable at the MS1 level. This ensures that the decoy spectra will provide a sufficient sampling rate of interfering intensities from human peptides. Moreover, decoy spectra are recorded over the whole gradient, m/z range and intensity range, ensuring that error rates will be accurately estimated for low abundant species as well. Theoretically, multiplexing equal amounts should generate normalized 114:115:116:117 ratios of 1:1:0:0 for pfu spectra, and 0:0:1:1 for human spectra (Figure 2B). However, due to coisolation, pfu interference will appear in human spectra at m/z 114 and 115, distorting these ratios. Symmetrically, human interference will appear in pfu spectra at channels 116 and 117. In addition, besides this interference-derived deviation, human ratios in human spectra (and pfu ratios in pfu spectra) bulge away from the theoretical value of one with a standard deviation of 11% (10% for pfu), in close agreement with the literature5 (Figure 2C). The success of the iQuARI method is demonstrated by the correspondence between human interference in pfu spectra, and pfu interference in human spectra as presented in Figure 2D. Altogether, these four interference measures show that only 69 ± 7% of all confidently identified MS/MS spectra had a quantification error below 20%. Importantly, 25 ± 5% of all MS/MS spectra had an error ranging between 20 and 80%. Worse yet, 6 ± 2% of all MS/MS spectra presented an error higher than 80%. In the latter case, quantification is essentially impossible as the quantitative information is effectively masked by interference. We conducted the same study with human platelets (experiment 2), a less complex sample with approximately 3000 proteins per platelet33 against approximately 10000 per HeLa cell.34 Similar results were obtained with a decreased level of interference: 75 ± 7% of all confidently identified MS/MS spectra presented a quantification error below 20%, 22 ± 6% between 20% and 80% and 3 ± 1% higher than 80%. Accordingly to the literature,11 further reducing the complexity using IEF fractionation (experiment 3) dramatically reduced the interference level as displayed in Figure 2E. Moreover, as displayed Figure 2F, the error is considerably biased toward low reporter ion intensities, indicating that low abundant or poorly fragmented peptides are more prone to erroneous quantification, as would be expected. Our iQuARI method thus enables the straightforward filtering of reporter ion based quantification results at a desired False Discovery Rate (FDR): in experiment 1, it is possible to retain only 5% (respectively 1%) of target spectra presenting an interference over 80% by considering only the spectra with the 79% (respectively 14%) most intense target reporter ions.
■
DISCUSSION In summary, it is clear from the literature that quantification errors in reporter ion based methods are heavily sample and system dependent. Using iQuARI, we demonstrated how to estimate the interference level for any given sample on any given system at very low cost. iQuARI thus met unprecedented need for a quantitative metric of the interference level in quantitative experiments and allows the scientist to monitor it inline. The cost of the method is an increase in MS1 complexity and the recording of decoy spectra leaving less measurement time for the target spectra. This slight drawback is however easily compensated for by extending the gradient time: going from a 95 min to a 185 min gradient increased the identification rate of experiment 3 by 83 ± 10%. The number of decoy peptides might also be reduced by fractionation of the pfu proteome, but this may lead to a lower accuracy of the method. Applying this method on two human samples of differing complexity (see Figure 2E), HeLa cells and platelets, we established that 25 ± 5% and 22 ± 6% respectively of the human MS/MS spectra in these samples presented interference ranging from 20 to 80%. Furthermore, 6 ± 2% and 3 ± 1% of them presented interference levels that were more intense than the quantitative signal itself. This level of interference is clearly not suitable for accurate quantitative studies. Our iQuARI method therefore allows the filtering of the results to any desired quality level based on intensity. The price for this increased specificity, however, is the exclusion of a share of quantified spectra. Importantly, we also demonstrated that the actual benefit derived from LC−MS workflow optimization (as discussed in the literature11,12) becomes directly quantifiable using iQuARI, and this for every sample and system. We thus found that the optimal interference reduction was observed after a simple fractionation of the sample that reduced the amount of inaccurately quantified spectra to 6 ± 1% and the amount of nonsense
Illustrative Example
The previous results clearly demonstrate that inline control of quantification accuracy is necessary for every multiplexed 5078
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
Journal of Proteome Research
Technical Note
quantifications down to 1.1 ± 0.4%; close to the 1% FDR, we allowed for the identification. In the latter case, 93 ± 1% of spectra will present less than 20% interference. In light of the excellent results obtained by such simple methods, the usefulness of the proposed systematic application of highly complex experimental procedures aimed at similarly alleviating peptide interference can be legitimately questioned. Indeed, reporter ion methods are chosen for their simplicity rather than for their accuracy or dynamic range: typically for discovery studies. For in depth and accurate relative quantification, label free as well as Multiple Reaction Monitoring approaches are more suited. By design, these methods do not present any interference issue; they require however a strong reproducibility of the LC−MS system. Choosing the quantification method thus poses a cost/benefit challenge to the scientist when setting-up the strategy upfront the experiment. iQuARI, with its simple design and affordable cost, allows balancing between complexity of the workflow and quality of the results.
■
(4) Ow, S. Y.; Salim, M.; Noirel, J.; Evans, C.; Rehman, I.; Wright, P. C. iTRAQ underestimation in simple and complex mixtures: “the good, the bad and the ugly”. J. Proteome Res. 2009, 8, 5347. (5) Burkhart, J. M.; Vaudel, M.; Zahedi, R. P.; Martens, L.; Sickmann, A. iTRAQ protein quantification: a quality-controlled workflow. Proteomics 2011, 11, 1125. (6) Ting, L.; Rad, R.; Gygi, S. P.; Haas, W. MS3 eliminates ratio distortion in isobaric multiplexed quantitative proteomics. Nat. Methods 2011, 8, 937. (7) Bantscheff, M.; Boesche, M.; Eberhard, D.; Matthieson, T.; Sweetman, G.; Kuster, B. Robust and sensitive iTRAQ quantification on an LTQ Orbitrap mass spectrometer. Mol. Cell. Proteomics 2008, 7, 1702. (8) Karp, N. A.; Huber, W.; Sadowski, P. G.; Charles, P. D.; Hester, S. V.; Lilley, K. S. Addressing accuracy and precision issues in iTRAQ quantitation. Mol. Cell. Proteomics 2010, 9, 1885. (9) Shirran, S. L.; Botting, C. H. A comparison of the accuracy of iTRAQ quantification by nLC-ESI MSMS and nLC-MALDI MSMS methods. J. Proteomics 2010, 73, 1391. (10) Wenger, C. D.; Lee, M. V.; Hebert, A. S.; McAlister, G. C.; Phanstiel, D. H.; Westphall, M. S.; Coon, J. J. Gas-phase purification enables accurate, multiplexed proteome quantification with isobaric tagging. Nat. Methods 2011, 8, 933. (11) Ow, S. Y.; Salim, M.; Noirel, J.; Evans, C.; Wright, P. C. Minimising iTRAQ ratio compression through understanding LC-MS elution dependence and high-resolution HILIC fractionation. Proteomics 2011, 11, 2341. (12) Savitski, M. M.; Sweetman, G.; Askenazi, M.; Marto, J. A.; Lang, M.; Zinn, N.; Bantscheff, M. Delayed fragmentation and optimized isolation width settings for improvement of protein identification and accuracy of isobaric mass tag quantification on Orbitrap-type mass spectrometers. Anal. Chem. 2011, 83, 8959. (13) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4, 207. (14) Vaudel, M.; Burkhart, J. M.; Breiter, D.; Zahedi, R. P.; Sickmann, A.; Martens, L. A complex standard for protein identification, designed by evolution. J. Proteome Res. 2012, DOI: 10.1021/pr300055q. (15) Wisniewski, J. R.; Zougman, A.; Mann, M. Combination of FASP and StageTip-based fractionation allows in-depth analysis of the hippocampal membrane proteome. J. Proteome Res. 2009, 8, 5674. (16) Moebius, J.; Zahedi, R. P.; Lewandrowski, U.; Berger, C.; Walter, U.; Sickmann, A. The human platelet membrane proteome reveals several new potential membrane proteins. Mol. Cell. Proteomics 2005, 4, 1754. (17) Burkhart, J. M.; Schumbrutzki, C.; Wortelkamp, S.; Sickmann, A.; Zahedi, R. P. Systematic and quantitative comparison of digest efficiency and specificity reveals the impact of trypsin quality on MSbased proteomics. J. Proteomics 2012, 75, 1454. (18) Lee, A. M.; Sevinsky, J. R.; Bundy, J. L.; Grunden, A. M.; Stephenson, J. L., Jr. Proteomics of Pyrococcus furiosus, a hyperthermophilic archaeon refractory to traditional methods. J. Proteome Res. 2009, 8, 3844. (19) Keidel, E. M.; Dosch, D.; Brunner, A.; Kellermann, J.; Lottspeich, F. Evaluation of protein loading techniques and improved separation in OFFGEL isoelectric focusing. Electrophoresis 2011, 32, 1659. (20) Olsen, J. V.; de Godoy, L. M.; Li, G.; Macek, B.; Mortensen, P.; Pesch, R.; Makarov, A.; Lange, O.; Horning, S.; Mann, M. Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap. Mol. Cell. Proteomics 2005, 4, 2010. (21) Martens, L.; Chambers, M.; Sturm, M.; Kessner, D.; Levander, F.; Shofstahl, J.; Tang, W. H.; Rompp, A.; Neumann, S.; Pizarro, A. D.; Montecchi-Palazzi, L.; Tasman, N.; Coleman, M.; Reisinger, F.; Souda, P.; Hermjakob, H.; Binz, P. A.; Deutsch, E. W. mzML–a community standard for mass spectrometry data. Mol. Cell. Proteomics 2011, 10, R110 000133.
ASSOCIATED CONTENT
S Supporting Information *
Supporting Information as mentioned in text. This material is available free of charge via the Internet at http://pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
*Tel: +32 9 264 93 58. Fax: +32 9 264 94 84. E-mail: lennart.
[email protected]. Author Contributions ‡
These authors contributed equally to this work.
Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS The financial support by the Ministerium für Innovation, Wissenschaft und Forschung des Landes Nordrhein-Westfalen and by the Bundesministerium für Bildung und Forschung (SARA, DYNAMO) is gratefully acknowledged. L.M. acknowledges the financial support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”) and the PRIME-XS project funded by the European Union 7th Framework Program under grant agreement number 262067.
■
REFERENCES
(1) Ross, P. L.; Huang, Y. N.; Marchese, J. N.; Williamson, B.; Parker, K.; Hattan, S.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Purkayastha, S.; Juhasz, P.; Martin, S.; Bartlet-Jones, M.; He, F.; Jacobson, A.; Pappin, D. J. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 2004, 3, 1154. (2) Thompson, A.; Schafer, J.; Kuhn, K.; Kienle, S.; Schwarz, J.; Schmidt, G.; Neumann, T.; Johnstone, R.; Mohammed, A. K.; Hamon, C. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 2003, 75, 1895. (3) Michalski, A.; Cox, J.; Mann, M. More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC−MS/MS. J. Proteome Res. 2011, 10, 1785. 5079
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080
Journal of Proteome Research
Technical Note
(22) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 2008, 24, 2534. (23) Bertsch, A.; Gropl, C.; Reinert, K.; Kohlbacher, O. OpenMS and TOPP: open source software for LC-MS data analysis. Methods Mol. Biol. 2011, 696, 353. (24) Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open mass spectrometry search algorithm. J. Proteome Res. 2004, 3, 958. (25) Vaudel, M.; Barsnes, H.; Berven, F. S.; Sickmann, A.; Martens, L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 2011, 11, 996. (26) Apweiler, R.; Bairoch, A.; Wu, C. H.; Barker, W. C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M. J.; Natale, D. A.; O’Donovan, C.; Redaschi, N.; Yeh, L. S. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004, 32, D115. (27) Creasy, D. M.; Cottrell, J. S. Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2002, 2, 1426. (28) Colaert, N.; Degroeve, S.; Helsens, K.; Martens, L. Analysis of the resolution limitations of peptide identification algorithms. J. Proteome Res. 2011, 10 (12), 5555. (29) Storey, J. D.; Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 9440. (30) Wang, R.; Fabregat, A.; Rios, D.; Ovelleiro, D.; Foster, J. M.; Cote, R. G.; Griss, J.; Csordas, A.; Perez-Riverol, Y.; Reisinger, F.; Hermjakob, H.; Martens, L.; Vizcaino, J. A. PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nat. Biotechnol. 2012, 30, 135. (31) Martens, L.; Hermjakob, H.; Jones, P.; Adamski, M.; Taylor, C.; States, D.; Gevaert, K.; Vandekerckhove, J.; Apweiler, R. PRIDE: the proteomics identifications database. Proteomics 2005, 5, 3537. (32) Vaudel, M.; Sickmann, A.; Martens, L. Peptide and protein quantification: a map of the minefield. Proteomics 2010, 10, 650. (33) Dittrich, M.; Birschmann, I.; Mietner, S.; Sickmann, A.; Walter, U.; Dandekar, T. Platelet protein interactions: map, signaling components, and phosphorylation groundstate. Arterioscler. Thromb. Vasc. Biol. 2008, 28, 1326. (34) Nagaraj, N.; Wisniewski, J. R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Paabo, S.; Mann, M. Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 2011, 7, 548.
5080
dx.doi.org/10.1021/pr300247u | J. Proteome Res. 2012, 11, 5072−5080