iTRAQ Underestimation in Simple and Complex Mixtures: âThe Good

iTRAQ Underestimation in Simple and Complex Mixtures: “The Good, the Bad and the Ugly” Saw Yen Ow,† Malinda Salim,† Josselin Noirel,† Caroline Evans,†,‡ Ishtiaq Rehman,‡ and Phillip C. Wright*,† ChELSI Institute, Chemical and Process Engineering, University of Sheffield, Mappin Street, S1 3JD Sheffield, United Kingdom, and Mellanby Centre for Bone Research, University of Sheffield, Medical School, Sheffield, S10 2RX, United Kingdom Received July 18, 2009

The increasing popularity of iTRAQ for quantitative proteomics applications makes it necessary to evaluate its relevance, accuracy, and precision for biological interpretation. Here, we have assessed (a) the accuracy and precision of iTRAQ quantification in a controlled experimental setup, using lowand high-complexity protein mixtures; and (b) the potential pitfalls that hamper the applicability and attainable dynamic range of iTRAQ: isotopic contamination, background interference, and signal-tonoise ratio. Our data suggest greater dynamic crosstalk between interfering factors affecting underestimations, and that these interferences were largely scenario-specific, dependent on sample complexity. The good is the potential for iTRAQ to provide accurate quantification spanning 2 orders of magnitude. This potential is however limited by two factors. (1) The bad: the existence of isotopic impurities that can be corrected for; provided accurate isotopic factors are at one’s disposal. (2) The ugly: we demonstrate here the interference of mixed MS/MS contribution occurring during precursor selection, an issue that is currently very difficult to minimize. In light of our results, we propose a list of advice for iTRAQ data analysis that could routinely ameliorate quantitative interpretation of proteomic data sets. Keywords: iTRAQ underestimations • iTRAQ • quantitative proteomics • high-throughput proteomics

Introduction Ever since the quantitative potential of proteomics was demonstrated, efforts have been made to develop and to improve quantitative methods (e.g., ICAT,1 iTRAQ,2 SILAC,3 and label-free methods4). These endeavors have focused primarily on providing high-throughput quantitative information about the abundance of proteins at the biological system’s scale (proteome). A number of key methodologies based on silent isotope (isobaric mass) incorporation for multiple differential mass measurements, such as iTRAQ (isobaric tags for relative and absolute quantification),2 have risen in popularity. Isobaric mass techniques fundamentally provide potentially increased precision in the abundance measurements given that quantifications are performed at the level of the fragment mass spectra (MS/MS scans). Such a strategy provides reduced interference from backgrounds and coeluting masses in isotope pairs in intact peptide scans (MS scans). There have been discussions previously on the relative high precision of iTRAQ in certain situations.5,6 In this paper, ‘precision’ will refer to * To whom correspondence should be addressed: Prof. Phillip C. Wright, Chemical Engineering at the Life Science Interface (ChELSI), Department of Chemical and Process Engineering, University of Sheffield, Mappin St., Sheffield S1 3JD U.K. Tel: +44(0)114 2227577. Fax: +44(0)114 2227501. E-mail: [email protected]. † ChELSI Institute, Chemical and Process Engineering, University of Sheffield. ‡ Mellanby Centre for Bone Research, University of Sheffield. 10.1021/pr900634c CCC: $40.75

 2009 American Chemical Society

the reproducibility of the measurements, whereas ‘accuracy’ will refer to the closeness to the true value. From a statistical perspective, the former is concerned with standard deviation, and the latter with mean. Despite some good results, the current widespread adoption of the iTRAQ technique has raised concerns as to the accuracy of the estimations of differential expression, particularly in highly complex mixtures.7-11 Strikingly, microarrays can measure differences in expression spanning over 3 orders of magnitude, up to over 4.7,12 while iTRAQ experiments typically report fold changes of less than 2 orders of magnitude. From a purely technical point of view, this may be perceived as a limitation of iTRAQ for quantitative proteomics.9,13 With respect to iTRAQ for high-throughput quantitative applications, there are a number of key features which must be critically assessed. In this report, we have systematically evaluated the quantitative dynamic range achievable using iTRAQ in both low- and high-complexity systems. Estimations were achieved by analyzing a defined, low-complexity sample composed of a master mix of four protein standards, treated either individually or spiked into a high-complexity experimental background (bacterial cell lysate). Quantitative range permutations of ratios 1:1, 1:5, 1:10, and 1:100 were provided to generate differential levels of reporter ion intensities. We took advantage of the differences in distinctive masses of iTRAQ 4-plex reporters and 8-plex ion reporters to use a 4-plex rich Journal of Proteome Research 2009, 8, 5347–5355 5347 Published on Web 09/16/2009

research articles

Ow et al.

Figure 1. iTRAQ master mix analysis and spiking experimental design. The analysis can be structured into 6 separate stages. (A) 8-plex four protein master-mix protein preparation and digestion; (B) sample aliquot and labeling; (C) label mixing, dilution design and technical replicates; (D) LC-MS/MS analysis using QSTAR; (E) complex 4-plex sample preparation; (F) 8-plex/4-plex mixing (spiking) dilution design and analysis. Remarks on 4-plex design; NP I, ammonium condition control; NP II, ammonium condition replicate; NP III, N2 fixing stress condition; NP IV, N2 fixing stress replicate.

complex background for the evaluation of the interferences between 4-plex affected regions (114-117 m/z) and those that are minimally affected (113, 118, 119, 121 m/z). This study and its findings provide both the analysis of the initial data and the underestimation effects; data that should benefit future improvement of iTRAQ-based quantification.

Materials and Methods Protein Standards Preparations. Four protein standards: cytochrome C, bovine serum albumin, myoglobin and hen egg lysozyme, obtained commercially (Sigma-Aldrich), were prepared in bulk using 500 mM, pH 8.0, triethylammonium bicarbonate buffer in 3 mg/mL concentration. Each standard protein stock (160 µg) was aliquoted and mixed producing a master mix containing 640 µg of total proteins in 1:1:1:1 mass ratio. Protein Standards Digestion and 8-plex iTRAQ Labeling. The protein standard master mix was digested in a single procedure to allow equally distributed contributions across all labels. Digestion was carried out using 1:50 ratio of sequencing grade trypsin (Promega) incubated at 37 °C for 16 h. Digested samples were then aliquoted into eight identical fractions for 8-plex labeling. Labeling was performed according to the manufacturer’s protocol (Applied Biosystems). Individually labeled master mix samples were subsequently combined into 3 mixing replicates with ratio of 100:100:20:20:10:10:1:1 across the labels, which corresponded to the reporter region of 113-119 and 121 m/z (Figure 1). Samples were then vacuum 5348

Journal of Proteome Research • Vol. 8, No. 11, 2009

concentrated and prepared for SCX-HPLC cleanup to remove excess iTRAQ reagents. Complex Background Generation Using 4-plex Proteome Mixture. Non-mammalian samples from the cyanobacterium Nostoc punctiforme ATCC 29133 (8.2 Mnt) were used to generate the complex sample background. Replicate proteome samples extracted from whole filaments cultivated in N2-fixing and ammonia-supplemented conditions as previously described by Ow et al. were used to generate four phenotypes of 100 µg each. Samples were digested, then labeled as follows: ammonium I (114), ammonium II (115), N2 I (116), and N2 II (117), as detailed previously.14 Simple Protein Mix and Complex Sample Strong Cation Exchange (SCX) Chromatography. Replicate sample containing the master mixture standards was SCX cleaned using a 200 mm PolySULFOETHYL-A (5 µm, 2.1 mm i.d., PolyLC) stainless steel column. Equilibration buffers using 10 mM KH2PO4, pH 3.0, provided a 25 min wash-out of nonbinding contaminants, followed by a 20 min isocratic elution of bound peptides using 10 mM KH2PO4 and 250 mM KCl, pH 3.0, operated on an Dionex BioLC HPLC (Dionex, Surrey, U.K.) quaternary pump control. Throughout the elution, samples were collected and vacuum concentrated. SCX-based chromatography was performed at a constant flow rate of 200 µL/min and measurements of elution were monitored in situ via ultraviolet 214 nm using a UVD170/U unit (Dionex, Surrey, U.K.). An identical chromatography setup and isocratic elution program was used for the complex mixtures comprising the bacterial 4-plex

research articles

iTRAQ Underestimation in Simple and Complex Mixtures background. However, in that case, separations were eluted with a higher salt concentration (up to 350 mM KCl). Sample Complexity and Spiking Strategy. Ion-exchange purified 4-plex fractions were resolubilized in 0.1% triflouroacetic acid (TFA) and 3% acetonitrile (ACN) and combined to give a complex sample stock. A total of 300 ng of 4-plex peptides was injected in a 400 fmol aliquot of 8-plex controlled standards. Two further dilutions of spikes were conducted: 80 fmol (5-fold dilution) and 40 fmol (10-fold dilution) of 8-plex standards (Figure 1). The 8-plex spiked 4-plex samples were then submitted for LC-MS/MS analysis.

and their distributive estimations were all performed using Mathematica v7.0 (Wolfram Research, Oxfordshire UK). Routine corrections of impurities in iTRAQ labeling isotopes were also assessed and applied; in this case, correction values were adapted from the manufacturer’s recommendations (Supporting Information).15 Corrections at M - 1, M - 2, M + 1 and M + 2 m/z shifts in label impurities were made to all measurements. No other means of bias corrections were applied to the data.

Nanospray Quadrupole Time-of-Flight Tandem Mass Spectrometry. Tandem mass spectrometry of iTRAQ labeled samples was carried out on a QSTAR XL quadrupole time-offlight system (Applied Biosystems, MDS-Sciex) coupled with an Ultimate 3000 nanoflow HPLC (Dionex, Surrey, U.K.). All samples were first desalted online using a 5 cm, 300 µm i.d. LC-Packings C18 PepMap trap cartridge under 0.1% TFA and 3% ACN for 15 min, and eluted to a 15 cm, 75 µm i.d. LCPackings C18 PepMap analytical column in 0.1% formic acid (FA) with ACN gradient extending from 3% to 95% ACN. The elution profile was extended in 2 different gradient programs, a 30-min fast ramp (3%-35% ACN) and a 90-min extended ramp (3-35% ACN) to give a measure between short and long gradients. Replicate samples (triplicates) of master mix standards were analyzed on separate LC-MS/MS measurements. MS acquisition parameters for the analysis of mixed iTRAQ samples were deliberately left as designed for large-scale nontargeted analyses to reproduce laboratory routine. Two precursors of charge +2 and +3 (intensity binning) from each TOFMS scan (350-1200 m/z) were dynamically selected and isolated for MS/MS fragment ion scans (65-1600 m/z). An isolation window width of 4 m/z was set together with an equal Q2 transmission split across each entire scan. Selected precursors on a 1 amu window were then actively excluded for 60 s from further analysis. An optimized iTRAQ collision energy parameter was set with a +2 and +3 ion energy ramp of 0.625 eV × m/z and an additional 7 eV of excitation energy during collision rolling up to a cutoff energy of 80 eV. MS and MS/MS accumulation were set at 1 and 0.33 Hz. Data and Quantitative Analysis. Tandem MS data generated from the QSTAR XL were first converted to generic MGF peaklists via the mascot.dll embedded script (version 1.6 release no. 25) in Analyst QS v. 1.1 (Applied Biosystems, Sciex; Matrix Science). Conversion parameters removed the option of averaging the MS/MS spectra. The spectra charge deconvolution was disabled around the iTRAQ reporter region (113-119 and 121 m/z). Centroided data were interrogated for identifications (not quantifications) using an in-house Phenyx algorithm cluster (binary version 2.6; Genebio Geneva) at the ChELSI Institute, University of Sheffield. The 8-plex data were interrogated using the Swiss-Prot vertebrate database (accessed May 2009, 116 653 entries). Modifications were set to include 8-plex iTRAQ mass shifts (+304 Da, K and N-term), methylthiol (+46 Da, C) and oxidation of methionine (+16 Da, M). Mass tolerances for identification were set to 0.4 Da MS and 0.4 Da MS/MS. Peptide level filters were set to a z-score of 5.0 and a p-value significance of 0.0001. Phenyx protein scores were set using a total z-score of 20. The identified peptides (MS/MS) were then used as the basis for calculations. iTRAQ reporter intensities for these peptides were referenced directly to the centroided data provided in the MGF peaklists to obtain relative quantifications. The analyses of iTRAQ ratio, statistical tests,

Results and Discussion Consistency of Estimations via Technical Mixing Replicates. To validate the consistency of these measurements, separate replicates of the four protein master mix were analyzed. Student’s t-tests were run to assess differences in iTRAQ quantifications between the three replicates. For each label 114-119, 121, we computed the p-value of the hypothesis that there was no difference between any pair of replicates across all proteins (replicate 1 against replicate 2, replicate 1 against replicate 3, and replicate 2 against replicate 3). We obtained 3 × 7 ) 21 p-values, the smallest of which was 0.02, which is greater than 0.0024 (5% adjusted using the Bonferroni correction to account for the multiple-test procedure). No significant difference between the replicates could therefore be detected. Importance of Purity Corrections and Nonrandom Neighboring Cross-Contributions. Cross-label isotopic impurity is one of the factors liable to affect the dynamic range achievable by iTRAQ. Label cross-contribution may arise from two sources: manufacture level impurities and experimental error.15,16 With care, effects arising from the latter shall be minimal. Hence, for most practitioners, impurity of the labels shall be the major factor. Our design of the experimental setup provides clear evidence of such co-contributing effects. In which case, we expect to see a bias in certain reporter ions’ intensities within replicates. (a) The reporter ion 114’s intensity is expected to be equal to that of reporter ion 113; first, because they have the greatest absolute values, second, because one can assume that contamination of one reporter ion by the other is roughly counterbalanced by an equal reverse contribution. (b) The reporter ion 115 intensity is expected to be greater than the 116 intensity, by the contamination from the high-abundance reporter ion 114. (c) For similar reasons, the 117 intensity is expected to be greater than the 118 intensity and (d) the 119 intensity is expected to be greater than the 121 intensity (but this case deserves special consideration because of the occurrence of the 120 m/z phenylalanine immonium ion between the pair of labels, see later sections). Estimations were made to assess the equality of the means within each pair of replicates using a t-test combined with Welch’s approximation. For case (a), t-tests revealed no significant difference between adjacent labels 114 and 113; (p-value 0.23). Significant differences were however observed between the intensities of (b) reporter ions 115 and 116 (by 13%; p-value 2.3 × 10-15), and (c) reporter ion 117 and 118 (by 6.3%; p-value 1.0 × 10-3). The contributions are significant to adjacent labels’ intensities and not for the other reporter ions; this is in agreement with data provided by the manufacturer, which indicates that contributions from M + 1 and M - 1 are predominant.15,16 Because of such confounding effects, uncorrected data will appear ‘suppressed’ Journal of Proteome Research • Vol. 8, No. 11, 2009 5349

research articles

Figure 2. Box plot representing the logarithmic departure from the expected value of the abundance of reporters 114, 115, 116, 117, 118, 119, and 121 relative to that of reporter 113 considering all proteins (top, uncorrected; bottom, corrected for isotopic overlap). The boxes extend from the 1/4 to the 3/4 quartile, while the whiskers represent two standard deviations below and above the mean. Means and standard deviations are given in Supporting Information. The scale is set differently for reporter 114-118 from that of reporter 119 and 121, as the noise becomes significantly higher for the low-abundance reporters 119 and 121. For the reporters 114-118, the solid bar represents the expected values and dotted lines represent 1.1-, 1.2-, and 1.3-fold deviations from it. For the reporters 119 and 121, the solid bar represents the expected values and dotted lines represent 2- and 3-fold deviations from it.

and changes in expression will be underestimated; carrying out isotopic correction helps one achieve accurate quantitation (Figure 2). iTRAQ Quantitative Stability at Low Sample Complexity. The analysis of the 8-plex labeled protein master mix at various relative concentrations addressed the following three points: (1) precision of measurements at varying reporter ion signalto-noise ratios (S/N), (2) accuracy of low complexity iTRAQ quantifications at 1-fold, 5-fold, 10-fold, and 100-fold change, and (3) consistency of quantifications across the four standard proteins. The stability of measurements at varying S/N was assessed by determining the deviation of 1:1 quantifications evaluated between labels 113:114 (high S/N), 115:116 (medium S/N), 117: 118 (low S/N), and 119:121 (very low S/N). The quantification of both 113:114 and 115:116 ratios remained very stable (log2 standard deviations 0.13 and 0.11, respectively). Increasing noise was found within the 117:118 pair of replicates (log2 standard deviation 0.21). The pair of replicates 119:121 has the lowest S/N, while the precision is accordingly lower (log2 standard deviation 0.74). Nevertheless, despite the low precision (within 60%-170% interval around the expected value), both the direction and the order of magnitude (up to 2 orders of magnitude) of the changes can be correctly evaluated. This analysis relies on the many peptides available to accurately perform quantification. For the 119:121 pair of replicates, hundreds of peptides are necessary to ensure a (5% precision interval. One may assume that the noise is additive; for instance, the deviation observed of the 115:113 ratios is the 5350

Journal of Proteome Research • Vol. 8, No. 11, 2009

Ow et al.

Figure 3. Corrected log10-ratios relative to the abundance of reporter 113 for all proteins: BSA (b), CYT (O), LYZ (9), MYO ()). Expected values are represented by horizontal lines (y ) 0 [reporters 113 and 114], -0.7 [reporters 115 and 116], -1 [reporters 117 and 118], and -2 [reporters 119 and 121]).

sum of two contributions: ε113 + ε115. A least-squares estimation of the noise contributions was performed: ε113 ) 0.04, ε114 ) 0.05, ε115 ) 0.07, ε116 ) 0.04, ε117 ) 0.08, ε118 ) 0.04, ε119 ) 0.24, and ε121 ) 0.34. This shows that precision is maintained over an order of magnitude but drops dramatically when changes span 2 orders of magnitude. The different relative abundance of master mix proteins also enabled us to infer the accuracy of low complexity iTRAQ quantifications: 1-fold, 2-fold, 5-fold, 10-fold, and 100-fold. By considering the same pairs of replicates as above, the accuracy of 1:1 ratios remained high (less than 1.6% deviation from the true value), save for the 119:121 pair, for which the deviation attains 40% around the expected value. Estimations of 2:1 ratio (labels 115:116 against 117:118) were also accurate (2.7% deviation from the true value). For 5:1 and 10:1 ratios, estimations were accurate within 3.4% and 4.8%. Quantification of the 100:5 and 100:1 ratios required the use of low intensity reporter ions (labels 119 and 121). This led to consistently lower accuracy as a result of a detrimental loss of precision (Figure 3). Consistent results were obtained across the four standard proteins in the simple protein master mix up to 10-fold ratios. Nevertheless, the consistency of the relative quantifications for 100:5 and 100:1 ratios was compromised. On one hand, accuracy was undermined by the use of low-abundance reporter ions; on the other hand, only a fraction of the peptides were used to quantify a given protein (20 peptides for myoglobin, for instance). Assessment of Cross-Contamination of 8-plex Standards in 4-plex Complex Background. The mixture of 4-plex and 8-plex reporter ions offers the possibility to estimate the effects of contributing backgrounds on only a selected mass range. Of the eight reporter ions from the 8-plex mixture, four were unique to the 8-plex standard (113, 118, 119, and 121 m/z), whereas 4 are shared with the 4-plex reagents (114-117 m/z). As a result, it could be assumed that contributions of a 4-plex background affect only the 114-117 m/z region, if one ignores isotopic impurity overlaps.

iTRAQ Underestimation in Simple and Complex Mixtures

Figure 4. Suppression arising as a result of the introduction of a 4-plex background alongside the 8-plex protein mixture for the 4-plex reporters 114-117; the mean departure from the expected value, relative to the reporter ion 113’s intensity, and the corresponding standard deviation are represented without 4-plex background (left, b) and with 4-plex background (right, increasing background 1/1 [9], 1/5 [(], 1/10 [2]) on a logarithmic scale of basis 10.

In the 114-117 m/z region, the 117 reporter ion from the 8-plex standards is most likely to be affected by the background, as its intensity is lower. (In this paragraph and the following, inverse fold changes are shown for clarity.) As expected, reporter ion 117’s intensities for the individual peptides, relative to the (8-plex-specific) reporter ion 113, decrease to 8.65, 8.60, and 8.17 (for increasing background contributions) instead of 10, thus, establishing toward unity as the 4-plex background augments. We carried out peptide-level t-tests to show that these results are significantly different from those obtained without 4-plex background (p-values: 0.45. However, a significant difference can be observed for reporter ion 118: even though the suppression is less pronounced (9.72, 9.72, and 8.52 instead of 10 for increasing background contributions), the p-values computed were significant:

iTRAQ Underestimation in Simple and Complex Mixtures: âThe Good

Recommend Documents

iTRAQ Underestimation in Simple and Complex Mixtures: âThe Good