Quantitative Analysis of Proteome Coverage and Recovery Rates for

Jan 16, 2010 - Upstream Fractionation Methods in Proteomics. Yuan Fang, Dale P. Robinson, and Leonard J. Foster*. Centre for High-Throughput Biology, ...
2 downloads 0 Views 3MB Size
Quantitative Analysis of Proteome Coverage and Recovery Rates for Upstream Fractionation Methods in Proteomics Yuan Fang, Dale P. Robinson, and Leonard J. Foster* Centre for High-Throughput Biology, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada Received November 19, 2009

The proteome of any cell or even any subcellular fraction remains too complex for complete analysis by one dimension of liquid chromatography-tandem mass spectrometry (LC-MS/MS). Hence, to achieve greater depth of coverage for a proteome of interest, most groups routinely subfractionate the sample prior to LC-MS/MS so that the material entering LC-MS/MS is less complex than the original sample. Protein and/or peptide fractionation methods that biochemists have used for decades, such as strong cation exchange chromatography (SCX), isoelectric focusing (IEF) and SDS-PAGE, are the most common prefractionation methods used currently. There has, as yet, been no comprehensive, controlled evaluation of the relative merits of the various methods, although some binary comparisons have been made. Here, we compare the most popular methods for fractionating samples at both the protein and peptide level, replicating all analyses to provide estimates of the variability in the analyses and controlling precisely for instrument time dedicated to each analysis, as well as directly measuring the recovery of protein or peptide from each fractionation procedure. For maximal proteome coverage, SDS-PAGE is very clearly the most effective method tested, with more than 90% of the entire data set found. When considering the amount of material recovered after each fractionation procedure, solutionbased IEF and SCX performed similarly, with approximately 80% of the input being recovered. Keywords: SDS-PAGE • efficiency • separation • honey bee

Introduction The addition of the ‘omics’ suffix to genes or proteins, that is, genomics or proteomics, refers to the study of all genes or proteins in an organism. In the case of genomics, there is a known amount of DNA in any one cell and the technical hurdles toward fully decoding this information have effectively been surmounted; even genomes of metazoans can now be sequenced to completion.1 The remaining distance to the finish line for proteomics is much less clear, although it is certain that we are not yet close to completely identifying all the gene products in any system, let alone all the post-translational modifications that might be occurring.2 Two fundamental problems currently limit our ability to fully characterize the proteome of any organism using mass spectrometry: sample complexity and the wide range of concentrations of analytes, that is, dynamic range.3 Both issues can be addressed, at least in part, by fractionating the proteome of interest prior to introducing it into a mass spectrometer. Virtually all electrospray ionization-based proteomics studies use reversed phase C18 as the primary fractionation procedure,4 electrospraying the eluant directly into the mass spectrometer, but a single dimension of fractionation is not sufficient for complete proteome coverage with current mass spectrometer technology. A wide variety of second and third dimensions of fractionation * To whom correspondence should be addressed. E-mail: foster@ chibi.ubc.ca.

1902 Journal of Proteome Research 2010, 9, 1902–1912 Published on Web 01/16/2010

are employed upstream of reversed phase chromatography.5-9 Each of these methods has its advocates, with several groups reporting some performance characteristics when one method is compared with another.10-15 Typically, only the number of peptides and/or proteins identified is used as the comparison metric, often with little or no replicability reported. Furthermore, what is not reported at all in these studies is the recovery each fractionation method allows; that is, how much of the starting material loaded onto a phase is extracted afterward for LC-MS/MS analysis? Such information is at least as important a consideration as the number of proteins/peptides identified in those analyses where the starting material is limited. By our estimation, three upstream fractionation methods predominate in the most current literature, employed in up to 95% of LC-MS/MS-based proteomics studies from the past 2 years where fractionation is used: (1) ‘GeLC-MS’, fractionation based on molecular weight through the use of SDS-PAGE; (2) SCX, strong cation exchange chromatography, either in-line as with the classic ‘MudPIT’6 approach or off-line, in various guises;7,8 (3) IEF, isoelectric focusing with or without an immobilized pH gradient.9 Here, we report a comprehensive, quantitative, replicated evaluation of the proteome coverage and recovery rates from the most popular fractionation procedures in proteomics. 10.1021/pr901063t

 2010 American Chemical Society

Quantitative Analysis of Proteome Coverage and Recovery Rates

Materials and Methods Protein Sample Preparation and Tryptic Digestion. Unless otherwise stated, all standard chemicals and solvents were of ACS grade (chemicals) or HPLC grade (solvents) and purchased from ThermoFisher Scientific (Nepean, ON, Canada) or SigmaAldrich (Oakville, ON, Canada). Adult honey bee head tissue lysates were used as a representative complexity-limited protein sample for all experiments. The tissues were homogenized in a bead mill using a tungsten bead in each 2 mL tube at 30 Hz for 5 min in 50 µL of phosphate buffered saline with protease inhibitor cocktail (Roche, Mississauga, ON, Canada) at the recommended concentration. Lysis buffer (recipe) was added to 1% NP-40 and the lysate was mechanically disrupted further by passage through a 25 G needle 10 times. The lysate was then centrifuged for 10 min at 16 100 relative centrifugal force (rcf) at 4 °C and the pelleted debris was discarded. The Coomassie Plus Protein Assay (Pierce, Rockford, IL) was used to determine the protein concentration before 50 µg aliquots of the lysate were stored at -20 °C until use. When a sample was needed for digestion, the protein in an aliquot was first precipitated by 4 vol of 100% ethanol with 50 mM sodium acetate (pH 5) for 90 min.16 The protein pellets were resuspended in 50 mM ammonium bicarbonate/1% sodium deoxycholate and immediately heated to 99 °C for 10 min. After cooling to room temperate, 50 µg of total protein was aliquoted and subjected to reduction (1 µg of dithiothreitol (DTT) for 30 min at 37 °C), alkylation (5 µg of iodoacetamide for 30 min at 37 °C) and trypsinization (1 µg of trypsin overnight at 37 °C) as described.17 After digestion, the sample was diluted 3-fold with a solution of 1% trifluoroacetic acid, 3% (v/v) acetonitrile, 0.5% (v/v) acetic acid and the resulting deoxycholic acid precipitate was pelleted by centrifugation at 16 100 rcf for 10 min. The supernatant containing the peptides was then desalted, concentrated and filtered on C18 STop And Go Extraction tips.8 Peptide Chemical Labeling and Purification. Reductive dimethylation using formaldehyde isotopologues was performed to differentially label peptides from different experimental conditions. Light formaldehyde (CH2O), medium formaldehyde (CD2O) and heavy formaldehyde (13CD2O) (Cambridge Isotope Laboratories, Andover, MA) were combined with light cyanoborohydride (NaBH3CN) or heavy cyanoborohydride (NaBD3CN) (Sigma-Aldrich) to give at least 4 Da difference for labeled peptides.18 Peptides were incubated in 100 mM formaldehyde and 47 mM sodium cyanoborohydride for at least 60 min in the dark and, after adjusting the pH to ∼7.5, for another 60 min. After labeling, NH4Cl was added to a final concentration of 125 mM and incubated for 10 min to react with excess formaldehyde, following which 3 vol of sample buffer (3% acetonitrile, 1% trifluoroacetic acid, 0.5% acetic acid) was added to degrade sodium cyanoborohydride. For each comparison, equal amounts of peptides were mixed, and desalted by C18 STop And Go Extraction tips.8 Protein Fractionation. Protein fractionation by IEF was performed in two commercially available systems: the 3100 OFFGEL Fractionator (Agilent Technologies, Santa Clara, CA) and the MicroRotofor Liquid-Phase IEF Cell (Bio-Rad Laboratories, Hercules, CA). Hereafter, the OFFGEL is referred to as the immobilized pH gradient (IPG)-based IEF method and the MicroRotofor is referred to as the solution-based IEF method. In the IPG-based separation, pH 3-10 immobilized pH gradient strips (GE Healthcare) and 12-well frames were used to separate

research articles

50 µg of proteins in 2.4 mL of 5% glycerol buffer with 0.5% pH 3-10 ampholytes, thiourea and DTT (concentrations as suggested by manufacturer) for 16 h. In the solution-based IEF separation, the same mass of protein was separated with the same reagents for 4 h with a constant power of 1 W. Each fraction was harvested, precipitated with 100% ethanol and tryptically digested. For in-gel separations, 50 µg of protein was resolved by SDS-PAGE in a 10% gel followed by in-gel reduction (10 mM DTT for 45 min at 56 °C), alkylation (55 mM iodoacetamide for 30 min at 37 °C) and trypsinization (12.5 ng/µL trypsin overnight at 37 °C) exactly as we have described previously.17 Peptide Fractionation. Peptide IEF fractionation was also performed in the Agilent 3100 OFFGEL Fractionator Kit and the Bio-Rad MicroRotofor Liquid-Phase IEF Cell. In each case, 50 µg of peptides in 2.4 mL of 5% glycerol buffer with 0.5% pH-3-10 ampholytes was separated by the OFFGEL system into 12 fractions in 16 h or by the MicroRotofor system into 10 fractions in 4 h at 1 W. For SCX fractionation of peptides, 50 µg of sample was fractionated on C18-SCX-C18 STAGE tips using a 10-step ammonium acetate elution gradient from 0 to 500 mM.8 Mass Spectrometry Analysis. For all samples, each fraction of desalted peptides was analyzed on a linear-trapping quadrupole-Fourier transform-ion cyclotron resonance mass spectrometer (LTQ-FT; ThermoFisher Scientific, Bremen, Germany) online coupled to an 1100 Series nanoflow high performance liquid chromatography system (HPLC; Agilent Technologies) using a nanospray ionization source (Proxeon, Denmark) holding columns packed in house into 15-cm-long, 75- µm-inner diameter fused silica emitters (8- µm-diameter opening, pulled on a P-2000 laser puller from Sutter Instruments) using 3 µm-diameter Reprosil-Pur C-18-AQ beads (Dr. Maisch, www.Dr-Maisch.com, Germany). Gradients were run from 4.8% to 64% acetonitrile over 60 and 105 min gradient using mobile phase supplemented with 0.5% acetic acid. The HPLC system included Agilent 1100 series degaser, nanoflow pump, autosampler and thermostat. The thermostat temperature was set at 6 °C. The LTQ-FT was set to acquire a fullrange scan at 25 000 resolutions from 350 to 1500 Th in the FT-ICR and to simultaneously fragment the top three peptide ions in each cycle in the LTQ (minimum intensity 200 counts). Selected ion monitoring (SIM) scans ((12 Th at resolution 50 000) were utilized in accurate parent mass determination. Parent ions were then excluded from MS/MS for the next 180 s. Data Processing. Peak lists of fragment ions were generated by Extract_MSN (v3.2, ThermoFisher) using the default parameters. Monoisotopic peak and charge state assignments were corrected with DTA Supercharge.19 Fragment spectra were searched against the Honeybee Official Gene Set 1 (18 542 sequences; 9 474 300 residues)20 protein database using Mascot (v2.2, Matrix Science) with the following parameters; trypsin specificity allowing up to one missed cleavage, cysteine carbamidomethylation as a fixed modification, dimethyl (K), dimethyl (N-term), dimethyl: 2H(4) (K), dimethyl: 2H(4) (Nterm), dimethyl: 2H(6)13C(2) (K) and dimethyl: 2H(6)13C(2) (Nterm) as variable modifications in the standard UniMod nomenclature, ESI-trap fragmentation characteristics, 10-ppm mass tolerance for precursor ion masses, 0.8 Da tolerance for fragment ion masses. After iterative mass recalibration with MSQuant (v1.4.3),19 acceptance criteria for protein identifications were as following: proteins were considered identified when at least two unique peptides of seven or more amino Journal of Proteome Research • Vol. 9, No. 4, 2010 1903

research articles

Fang et al.

Figure 1. Scheme used to compare the proteome coverage of six fractionation methods. Fifty micrograms of protein was used for each method and resolved into 10 fractions. The samples were fractionated either at the protein level prior to tryptic digestion or at the peptide level after proteolysis. The peptides from each fraction were purified by a C18 STAGE-TIP and analyzed by a LTQ-FT. (A) Protein fractionation; (B) peptide fractionation. (I) 50 µg protein; (II) tryptic digestion (37 °C, 16 h); (III) peptide purification by C18 STAGE-TIP; (IV) protein fractionation via SDS-PAGE, IPG-based Isoelectric Focusing (IEF), solution-based IEF, or aliquoting; (V) peptide fractionation via Strong Cation Exchange (SCX), IPG-based IEF, solution-based IEF or aliquoting; (VI) nLC-MS/MS analysis.

acids with Mascot IonsScores >25 and measured with mass accuracies better than 3 ppm were observed. On the basis of reversed database searching, these criteria result in an estimated false discovery rate of less than 0.5% on the protein level. Quantitative ratios were extracted from the raw data using MSQuant, which calculates an intensity-weighted average of within-spectra ratios from all spectra across the chromatographic peak of each peptide ion. Protein and peptide isoelectric points were calculated by Sequence Manipulation Suite (http://www.bioinformatics.org). Numbers of peptides identified in the experiments described here are redundant or sequence-level nonredundant counts of peptides meeting the above criteria for identification. Student’s t test and leastsquares linear regression were used to evaluate statistical significance of data where indicated.

Results Effective Resolution of Fractionation Metohds. Several approaches to sample fractionation upstream of reversed phase LC-MS/MS have been described and each group uses those methods that they feel are best, although this feeling may be based on anecdotal evidence or poorly controlled/biased comparisons. Thus, in order to make as fair a comparison between fractionation methods as possible, we tried to control all variables except the fractionation itself, including instrument time and sample cleanup. We started with 50 µg aliquots of proteins extracted from pulverized honey bee heads and either fractionated these directly at the protein or digested the proteins with trypsin first and then fractionated at the peptide level. For protein-level fractionation, we used standard 10% 1904

Journal of Proteome Research • Vol. 9, No. 4, 2010

SDS-PAGE and sliced the whole lane into 10 pieces or one of two commercially available IEF apparatus to also generate 10 fractions each (Figure 1A, see Materials and Methods); each fraction from these separations was then analyzed by LC-MS/ MS, with everything, including controls, performed in triplicate. The reversed phase gradient used for the online LC was kept consistent across all analyses. The control for this experiment was a 50 µg aliquot of protein subdivided into 10 equal 5 µg fractions, with each fraction digested individually followed by separate injection into the LC-MS/MS system. For peptide-level fractionation, we used SCX and the same two IEF apparatus as above to generate 10 fractions from each method (Figure 1B), again with everything done in triplicate; the control for the peptide fractionation experiment was to digest a 50 µg aliquot in a single tube, which was then divided into 10 equal peptide fractions prior to LC-MS/MS analysis. SDS-PAGE is not suitable for resolving peptides and so was not used for that here; also, while full-length proteins can be resolved by SCX, this practice is very rare in proteomics and so we did not include it here. As expected, SDS-PAGE was very effective at resolving proteins by molecular weight (Figure 2A), although there was considerable overlap in the range of molecular weights of identified proteins between slices. Not surprisingly, isoelectric focusing was very effective at resolving proteins and peptides by pI. Interestingly, while the curves for average pI of all proteins identified in each fraction (Figure 2B) generally increase quite smoothly, the curves for average pI of peptides in each fraction (Figure 2C) plateau multiple times, perhaps reflecting the uneven distribution of naturally occurring,

Quantitative Analysis of Proteome Coverage and Recovery Rates

research articles

Figure 2. Effective resolution of fractionation methods. Fifty micrograms of protein, separated into 10 fractions by each method, was analyzed by nLC-MS/MS and the average predicted molecular weights or isoelectric points of identified proteins and peptides were calculated. (A) The average ((SD) molecular weights of proteins in SDS-PAGE fractions; (B) the average ((SD) isoelectric points of proteins in IPG-based and solution-based protein fractionation; (C) the average ((SD) isoelectric points of peptides in IPG-based IEF and solution-based IEF peptide fractionation.

detected peptides across the pI range seen previously.11 The IPG-based IEF instrument was more effective at fully utilizing the pH range, showing a steady increase in average pI across all fractions, than the solution-based IEF instrument, which did not effectively separate proteins by pI after fraction 7 (Figure 2B). Quantitative Differences in the Proteome Coverage of Fractionation Methods. The ultimate goal of most proteomics experiments is to probe the proteome of interest as deeply as possible and so, in this regard, improved resolving power of a fractionation procedure is expected to result in more protein/ peptide IDs, but ultimately, it is coverage that is most important, regardless of resolution. To this end, protein and peptide coverages for all the control and fractionation procedures described above were quantified in three ways: nonredundant protein identifications, unique peptide sequence identifications and the total peptide sequencing events (spectral counts). As seen in Figure 3A,B, all the fractionation methods (at both protein and peptide levels) improve the identification of protein and peptide versus an unfractionated but equal LC-MS/MS time control and most of the differences were statistically

significant at the p < 0.05 level. Conversely, NOT fractionating yielded the most successful MS/MS, that is, those good enough to identify a peptide, indicating a very high level of redundancy when not fractionating. Among the fractionation procedures, standard 1D SDS-PAGE gave the highest proteome coverage among all the fractionation methods at both the protein level (842 IDs on average, 1046 IDs in total) and the peptide level (p < 0.05, 7166 unique peptides on average, 10261 in total). However, sample extraction from SDS-PAGE is time-consuming, so many groups have started to use gel-free methods such as IEF or SCX. Between the two IEF apparatus tested here, the IPG-based method outperformed the solution-based method for both protein-level and peptide-level fractionation: for protein-level fractionation, the IPG-based method yielded 161 more proteins and 1188 more unique peptides (p < 0.01) (Figure 3A,B); for peptide-level fractionation, the IPG-based method yielded 219 more proteins and 920 more unique peptides (p < 0.05) (Figure 3A,B). The SCX performance was intermediary between the two IEF methods in both protein and unique peptide yields (Figure 3A,B) despite the fact that it had higher spectral counts than Journal of Proteome Research • Vol. 9, No. 4, 2010 1905

research articles

Fang et al.

Figure 3. Quantitative comparison of proteome coverage for four fractionation methods. Total numbers of proteins and peptides identified with each method starting with 50 µg of protein. (A) Total nonredundant proteins; (B) unique peptide sequences; (C) total peptide sequencing events (i.e., spectral counts). In the tables on the right are displayed the p-values from Student’s t tests (n ) 3) comparing the relevant values.

the IEF samples (Figure 3C). This is likely because stepwise SCX fractionation has a much lower resolution than IPG-based IEF: only 40% of peptides were identified in one SCX fraction, in contrast to 86% of peptides being identified in a single fraction from IPG-based IEF. Is protein or peptide-level fractionation better? Between the two IEF methods where both levels of fractionation were tested there were more nonredundant proteins identified after peptide fractionation (Figure 3A) but more unique peptide sequences identified after protein fractionation (Figure 3B). In the case of the IPG-based IEF apparatus, the difference in the unique peptide sequences identified was statistically significant (p < 0.05). Do the different methods yield complementary results? Vendors hoping to sell equipment to groups who are already using a competing apparatus often invoke complementarity as a rationale for why the laboratory should invest in more infrastructure. By examining Venn diagrams of the overlap between the four methods used for each of peptide and protein level fractionation, it is apparent that for peptide-level fractionation using two or more approaches results in a significant 1906

Journal of Proteome Research • Vol. 9, No. 4, 2010

improvement in the yield of peptides (Figure 4B). Strikingly, however, for protein fractionation, no other method is remotely close to SDS-PAGE; in fact, all other methods combined add only 10% to the overall tally of protein identifications (Figure 4A). Within a particular method, the overlap among all three replicates ranged from 58% to 73% for protein fractionations, and 41% to 64% in peptide fractionations (Figure 4C-J). Quantitative Comparison of Fractionation Recovery Rates. The yield of protein and/or peptide IDs obtained from a given fractionation procedure is normally the most important consideration when sample availability is not limited. However, when sample is limited due to availability or cost, the choice of fractionation procedure may be dictated by how much material can be recovered from the separation phase. To address this issue, the recovery rates of all the fractionation methods were quantified (Figure 5) using dimethylation of primary amines with formaldehyde isotopologues18 (see Materials and Methods). To control for the additional sample handling involved in fractionation procedures, for example, cleanup/desalting steps, control experiments were done with the same additional handling processes associated with protein

Quantitative Analysis of Proteome Coverage and Recovery Rates

research articles

Figure 4. Overlap of unique proteins and peptides and proteins identified in the four fractionation methods (A and B), and between the three replicates of each method (C-J). Each oval (A and B) represents one fractionation method. The percentages in parentheses are the proportions of proteins or peptides identified by each procedure out of the total nonredundant protein or peptide IDs in this study. Each circle (C-J) represents one replicate and the area of overlap is proportional to the real overlap. (A) Overall protein fractionation; (B) overall peptide fractionation. For protein level fractionation (C-F): (C) control; (D) SDS-PAGE; (E) IPG-based IEF; (F) solution-based IEF. For peptide level fractionation (G-J): (G) control; (H) SCX; (I) IPG-based IEF; (J) solution-based IEF.

or peptide fractionation and all the recovery rates were normalized to the control values. For protein fractionation, identical protein samples were separated into 10 fractions by SDS-PAGE, IPG-based IEF, solution-based IEF, or by manual aliquoting. After fractionation, proteins retained in the SDS-PAGE matrix were digested ingel while fractions from other separation methods were digested in solution. In each case, the quantitative comparisons

were made relative to manually aliquoted, in solution digestions, that is, no fractionation, as we assumed that the minimal sample handling in this procedure should result in the highest recovery. As is clear from Figure 6, the IPG-based IEF had the best recovery rate (defined as the proportion of the starting material that is collected after the fractionation procedure) in protein level fractionation (92%), followed by SDS-PAGE (81%). Protein recovery from solution-based IEF (56%) was signifiJournal of Proteome Research • Vol. 9, No. 4, 2010 1907

research articles

Fang et al.

Figure 5. Scheme for quantifying the recovery rates for six fractionation methods. As in Figure 1, 50 µg of protein was used for each replicate of each fractionation method. After separation, all fractions were recombined into a single pool and differentially labeled, and labeled peptides from different methods were mixed prior to LC-MS/MS. A manually aliquoted and then immediately recombined sample served as the control in all cases. (A) Protein fractionation; (B) peptide fractionation. (I) 50 µg of protein; (II) aliquoting into three samples; (III) protein fractionation via SDS-PAGE, IPG-based IEF, solution-based IEF or aliquoting; (IV) peptide fractionation via SCX, IPG-based IEF, solution-based IEF or aliquoting; (V) tryptic digestion (37 °C, 16 h); (VI) triplexed peptide chemical labeling; (VII) mixing differentially labeled peptides; (VIII) peptide purification by C18 STAGE-TIP; IX: nLC-LTQFT analysis.

cantly lower (p < 0.05). For peptide-level fractionation, protein samples were digested in solution and then aliquoted into three peptide samples that were then subjected to chemical labeling and fractionation by SCX, IPG-based IEF, solution-based IEF, or manually aliquoted into 10 fractions. SCX and solution-based IEF recoveries were essentially identical (81 vs 82%), whereas IPG-based IEF was significantly lower (62%, p < 0.05 vs SCX). Within a given IEF method, the recoveries for protein-level versus peptide-level fractionation showed opposing trends, with both methods showing statistically significant differences (Figure 6C,D). Having measured the recoveries for specific peptides and proteins, we then used these data to look for biases in recoveries of certain classes of polypeptides. As expected, the recoveries between the two IEF methods correlated well with one another (not shown). Intriguingly, however, we observed apparent biases in two fractionation procedures. Recovery efficiency from SDS-PAGE displayed a marked dependence on molecular weight of the protein (Figure 7A): higher molecular weight proteins were slightly more efficiently recovered than lower molecular weight proteins (p < 0.05, R2 ) 0.02 linear regression trend analysis). This trend was not observed in IPG-based IEF or solution-based IEF methods. Likewise, IPG-based IEF fractionation was strongly biased against proteins and peptides with a pI greater than 10 (Figure 7B), while no correlation was observed between pI distribution and recovery in the other methods.

Discussion It is abundantly clear that for proteomic samples where complexity is the limiting factor, the more MS/MS that are acquired the deeper one can probe the proteome of interest.21 1908

Journal of Proteome Research • Vol. 9, No. 4, 2010

Current mass spectrometers used for discovery experiments can only fragment ions at the rate of 3-10 Hz so, in order to probe more deeply, a researcher must dedicate more instrument time to the analysis. Conventional data-dependent MS/MS experiments start off by sequencing the ions with the most intense signals, so simply reanalyzing the same sample repeatedly should severely limit the depth of analysis.22 One way to bypass this is through the use of exclusion lists,23 and while all vendors that we are aware of provide this function in their mass spectrometer control software, our experience is that the algorithms are not able to handle complex data sets effectively: upper limits on the number of ions that can be excluded are too low and ions that should be excluded are still selected for fragmentation with disappointing frequency. The solution, as many groups have described, is to fractionate the sample prior to LC-MS/MS, thereby reducing the complexity and dynamic range of analyte introduced into the mass spectrometer. As different technologies have become available, several groups have reported binary comparisons between various separation methods. Yates et al. were the first to champion an automated multidimensional LC approach using SCX as the first dimension6 and others have followed with evaluations of various methods.11-15 Gilar et al. compared peptide separations with four 2D LC systems and identified the highest peak capacity in a high pH RP-low pH RP system, although there was no direct comparison of proteome coverage.24 Li et al. have also reported that three dimensions of separation are better than two10 to no surprise, although in their two-dimensional analysis they used a slower, less sensitive, older mass spectrometer than they used for the three-dimensional analysis. Barnea et al. reported that GeLC-MS/MS yielded more peptide and protein

Quantitative Analysis of Proteome Coverage and Recovery Rates

research articles

Figure 6. Quantitative comparison of fractionation recovery rates. Labeled peptides from different fractionation methods were mixed and analyzed by nLC-MS/MS to quantitatively compare the recovery rates. (A) Recovery rates of three protein fractionation methods (SDS-PAGE, IPG- and solution-based IEF); (B) recovery rates of three peptide fractionation methods (SCX, IPG- and solution-based IEF); (C and D) the comparison of protein and peptide fractionation with the same IEF methods (IPG- or solution-based IEF).Student’s t test (n ) 3): *p < 0.05; **p < 0.01.

identifications than SCX or solution-based IEF, and found that SCX fractionation on protein level in combination with MudPIT outperformed the three prefractionations coupled with normal LC-MS/MS analysis, although the amount of instrument time used to evaluate the different fractionation methods varied considerably.25 Very recently, Elschenbroich et al. reported similar proteome coverage between two peptide fractionation systems, IPG-IEF and MudPIT.26 When analyzing the yeast proteome, de Godoy et al. found more protein IDs from IPGbased IEF than with GeLC-MS/MS, although the instrument time dedicated to the GeLC-MS/MS fractions was only 63% of what was dedicated to IPG-based IEF fractions.27 Through rigorous control of instrument time and replication of all analyses, we have demonstrated here that GeLC-MS/MS very clearly yields the most protein identifications versus all other fractionation procedures, whether they be at the protein or peptide level (p < 0.05, t test). It is usually desirable to maximize the number of protein identifications in a proteomics experiment so that more potential components of the system can be monitored. There are also occasions when maximizing

peptide identifications is preferred, for example, quantitative stable isotope dilution experiments or phosphoproteomics, and in this case, GeLC-MS/MS was even more dominant, at least when considering unique peptides (p < 0.05, t test). Even when sample is limiting and recovery from the fractionation procedure becomes an issue, GeLC-MS/MS performed very well, with 81% of the material loaded onto the gel being recovered. This was not quite as high as protein-level fractionation with IPGbased IEF but nonetheless higher than we expected. When considering redundant peptides, such as one might want to do with the semiquantitative spectral counting approach,28 it is actually not worth doing any fractionation at all since repeated injections of the same sample provided slightly more ‘spectral counts’ than any other method and at the same time require less work to prepare, albeit at the expense of reduced proteome coverage. The added advantage of using such an approach for spectral counting experiments would actually be that fewer proteins would be identified overall, yielding far more spectra per protein and thereby mitigating the temptation Journal of Proteome Research • Vol. 9, No. 4, 2010 1909

research articles

Fang et al. Also, GeLC-MS/MS is incompatible with stable isotope labeling methods that rely on peptide-level derivatization since it is currently impossible to slice gels such that the precise molecular weight range of a slice is consistent from lane to lane. For metabolic labeling approaches such as SILAC,32 this limitation is avoided by mixing the samples prior to SDS-PAGE. IPG-based IEF separations also take a significant amount of time, typically 18 h in our hands, but this requires no intervention on the part of the user. The costs for IPG-based and solution-based IEF apparatus, their accessories (e.g., IPG strips, rotofor focusing chambers and membranes) and reagents are higher than the GeLC-MS/MS and SCX columns. We have taken pains to replicate all our analyses here, something that is often overlooked in large-scale proteomics projects. While the standard deviations reported here are quite tight, averaging less than 5% in Figure 3, we found that we had to do all the LC-MS/MS within a short time window as opposed to spreading the analyses out over time. This required large uninterrupted blocks of instrument time that certainly inconvenienced other users, but we found that the week-to-week and month-to-month variability in the performance of the mass spectrometer was almost equal to the variability between fractionation methods. This seemed to be largely due to deteriorating instrument performance between preventative maintenance actions such as orifice or ion optics cleaning, tuning, and so forth, but there was also substantial nonsystematic drift in the performance. Regardless, this highlights the need for rigor in ensuring the replicability of comparisons when evaluating different fractionation procedures; if the period of instrument time used to evaluate one procedure is separated temporally from that of another evaluation, the results may be meaningless.

Figure 7. Biases in SDS-PAGE and IPG-based IEF fractionation. (A) The correlation between protein molecular weights and protein recovery rates from SDS-PAGE fractionation, linear regression trend analysis: F value ) 2.5%, R2 ) 0.02; (B) proteins and peptides with pI > 10 had low recoveries from IPG-base IEF fractionation. Student’s t test (n ) 3): *p < 0.01.

to draw meaningless conclusions about differential expression of proteins based on very small numbers of spectra.29 GeLC-MS/MS is not the panacea for sample complexity, however. The method shows a very slight bias against smaller proteins; the trend line was significant at the p < 0.05 level but the goodness of fit (r2) was quite low. If the bias is indeed real, we suspect it is because the smaller proteins can be partially extracted from the gel during the washing, reduction and alkylation steps30 so, for example, it might be inappropriate for urine proteomics where most proteins are quite small.31 This bias against small proteins can be overcome by using gradient gels where such proteins are retained in a higherdensity matrix, thus, being less likely to diffuse out. Beyond potential biases, from an overall cost/benefit point of view it is not clear that GeLC-MS/MS is the best approach. Gels must be cast, run, stained and sliced up, which not only takes additional time, but also provides more opportunity for keratin contamination. The digestion and extraction steps after slicing can be automated but not without some difficulty, so overall there is generally much more effort involved in GeLC-MS/MS. 1910

Journal of Proteome Research • Vol. 9, No. 4, 2010

The analysis described here has been carried out on a specific mass spectrometry system (the LTQ-FT), using a specific protein extract (honey bee heads) and specific permutations of various fractionation techniques (e.g., 10% SDSPAGE, modular STAGE tips for SCX, etc.), yet we feel strongly that the findings are generally applicable to most situations. Certainly the relative numbers of proteins and peptides identified from each method are independent of the mass spectrometer; a mass spectrometer capable of sequencing at a higher rate would identify more peptides and proteins in the absolute sense, but it is really the resolving capacities of the fractionation methods that determines whether one is relatively better than another. Likewise, while we used a protein extract from honey bee heads as we had easy access to large amounts of material, the distribution of protein types and classes in the honey bee is not significantly different from other organisms33 so the proteome coverage and recovery rates reported here should be extendable to other organisms. It is possible too that the conclusions drawn here would be different if we had used human plasma as a protein source, or another fluid where one or a very small number of proteins constitute the bulk of the protein mass; in such a case, we would predict that proteinlevel fractionation procedures would perform even better than peptide-level separation simply because the highly abundant proteins would be more effectively isolated from lowerabundance proteins. Finally, while we had to select specific permutations of the various fractionation procedures out of practical necessity; the principles of separation, for example, on a C18-SCX-C18 modular STAGE tip,8 are the same as in conventional MudPIT,6 to the point that they are both stepwise, discontinuous elutions. Off-line SCX fractionation with an

Quantitative Analysis of Proteome Coverage and Recovery Rates HPLC delivering a continuous linear gradient would no doubt have higher resolution than either STAGE tips or MudPIT and so might be expected to perform better. For the IPG or solutionbased IEF procedures, we followed the manufacturers’ instructions, as most others have reported, and with the exception of varying the ampholytes mixture and/or the immobilized pH gradient strip, there is little else that can be altered with these methods. The measurements of recovery from each phase made here are likewise independent of the mass spectrometer and protein source. Because there are simply too many permutations to test in a reasonable time, we have measured the recovery from each phase at a single loading level. Thus, it is possible that the relative recovery levels are not constant with varying loads. For example, some phases may adsorb an absolute amount of protein and nothing else, for example, 3 µg, so if one loaded 4 µg, the recovery would only be 25% (1 µg out of 4 after 3 were adsorbed), but if one loaded 50 µg, the recovery would be 94% (47 out of 50 µg).

Conclusions This study represent a significant advance over all previous studies of a similar nature in several aspects: (1) we have replicated all analyses and provided estimates of the variability in the analyses to confirm which differences are significant, (2) we have controlled precisely for instrument time dedicated to each analysis, and (3) we have measured the recovery of protein or peptide from each fractionation procedure. On the basis of the data presented, GeLC-MS/MS is by far the best fractionation procedure overall.

Acknowledgment. The authors thank the other members of our group for fruitful discussions and advice. In particular, we thank Nikolay Stoynov for technical assistance and Queenie Chan for supplying the honey bee tissues. Operating funds for this work came from a Canadian Institutes of Health Research Operating Grant (MOP-77688) to L.J.F. The apiary at UBC is supported in part by the Boone-Hodgson-Wilkinson Fund. Infrastructure used in this project was supported by the Canada Foundation for Innovation, the British Columbia (BC) Knowledge Development Fund and the Michael Smith Foundation through the BC Proteomics Network (BCPN). L.J.F. is the Canada Research Chair in Quantitative Proteomics and a Michael Smith Foundation Scholar. References (1) C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 1998, 282 (5396), 2012–8. (2) Anderson, N. L.; Anderson, N. G.; Pearson, T. W.; Borchers, C. H.; Paulovich, A. G.; Patterson, S. D.; Gillette, M.; Aebersold, R.; Carr, S. A. A human proteome detection and quantitation project. Mol. Cell. Proteomics 2009, 8 (5), 883–6. (3) Cox, J.; Mann, M. Is proteomics the new genomics. Cell 2007, 130 (3), 395–8. (4) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422 (6928), 198–207. (5) Lasonder, E.; Ishihama, Y.; Andersen, J. S.; Vermunt, A. M.; Pain, A.; Sauerwein, R. W.; Eling, W. M.; Hall, N.; Waters, A. P.; Stunnenberg, H. G.; Mann, M. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 2002, 419 (6906), 537–42. (6) Washburn, M. P.; Wolters, D.; Yates, J. R., III. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 2001, 19 (3), 242–7.

research articles

(7) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2003, 2 (1), 43–50. (8) Ishihama, Y.; Rappsilber, J.; Mann, M. Modular stop and go extraction tips with stacked disks for parallel and multidimensional Peptide fractionation in proteomics. J. Proteome Res. 2006, 5 (4), 988–94. (9) Thorsell, A.; Portelius, E.; Blennow, K.; Westman-Brinkmalm, A. Evaluation of sample fractionation using micro-scale liquid-phase isoelectric focusing on mass spectrometric identification and quantitation of proteins in a SILAC experiment. Rapid Commun. Mass Spectrom. 2007, 21 (5), 771–8. (10) Li, Y.; Yu, J.; Wang, Y.; Griffin, N. M.; Long, F.; Shore, S.; Oh, P.; Schnitzer, J. E. Enhancing identifications of lipid-embedded proteins by mass spectrometry for improved mapping of endothelial plasma membranes in vivo. Mol. Cell. Proteomics 2009. (11) Hubner, N. C.; Ren, S.; Mann, M. Peptide separation with immobilized pI strips is an attractive alternative to in-gel protein digestion for proteome analysis. Proteomics 2008, 8 (23-24), 4862– 72. (12) Fraterman, S.; Zeiger, U.; Khurana, T. S.; Rubinstein, N. A.; Wilm, M. Combination of peptide OFFGEL fractionation and label-free quantitation facilitated proteomics profiling of extraocular muscle. Proteomics 2007, 7 (18), 3404–16. (13) Waller, L. N.; Shores, K.; Knapp, D. R. Shotgun proteomic analysis of cerebrospinal fluid using off-gel electrophoresis as the firstdimension separation. J. Proteome Res. 2008, 7 (10), 4577–84. (14) Slebos, R. J.; Brock, J. W.; Winters, N. F.; Stuart, S. R.; Martinez, M. A.; Li, M.; Chambers, M. C.; Zimmerman, L. J.; Ham, A. J.; Tabb, D. L.; Liebler, D. C. Evaluation of strong cation exchange versus isoelectric focusing of peptides for multidimensional liquid chromatography-tandem mass spectrometry. J. Proteome Res. 2008, 7 (12), 5286–94. (15) Heller, M.; Michel, P. E.; Morier, P.; Crettaz, D.; Wenz, C.; Tissot, J. D.; Reymond, F.; Rossier, J. S. Two-stage Off-Gel isoelectric focusing: protein followed by peptide fractionation and application to proteome analysis of human plasma. Electrophoresis 2005, 26 (6), 1174–88. (16) Foster, L. J.; de Hoog, C. L.; Mann, M. Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors. Proc. Natl. Acad. Sci., U.S.A. 2003, 100 (10), 5813–8. (17) Chan, Q. W.; Howes, C. G.; Foster, L. J. Quantitative comparison of caste differences in honeybee hemolymph. Mol. Cell. Proteomics 2006, 5 (12), 2252–2262. (18) Boersema, P. J.; Aye, T. T.; van Veen, T. A.; Heck, A. J.; Mohammed, S. Triplex protein quantification based on stable isotope labeling by peptide dimethylation applied to cell and tissue lysates. Proteomics 2008, 8 (22), 4624–32. (19) Mortensen, P.; Gouw, J. W.; Olsen, J. V.; Ong, S. E.; Rigbolt, K. T.; Bunkenborg, J.; Cox, J.; Foster, L.; Heck, A. J.; Blagoev, B.; Andersen, J. S.; Mann, M. MSQuant, an open source platform for mass spectrometry-based quantitative proteomics. J. Proteome Res. 2010, 9 (1), 393–403. (20) Elsik, C. G.; Mackey, A. J.; Reese, J. T.; Milshina, N. V.; Roos, D. S.; Weinstock, G. M. Creating a honey bee consensus gene set. Genome Biol. 2007, 8 (1), R13. (21) Yates, J.; Ruse, C. I.; Nakorchevsky, A. Proteomics by mass spectrometry: approaches, advances, and applications. Annu. Rev. Biomed. Eng. 2009, 11, 49–79. (22) Schirle, M.; Heurtier, M. A.; Kuster, B. Profiling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry. Mol. Cell. Proteomics 2003, 2 (12), 1297–305. (23) Bendall, S. C.; Hughes, C.; Campbell, J. L.; Stewart, M. H.; Pittock, P.; Liu, S.; Bonneil, E.; Thibault, P.; Bhatia, M.; Lajoie, G. A. An enhanced mass spectrometry approach reveals human embryonic stem cell growth factors in culture. Mol. Cell. Proteomics 2009, 8 (3), 421–32. (24) Gilar, M.; Olivova, P.; Daly, A. E.; Gebler, J. C. Orthogonality of separation in two-dimensional liquid chromatography. Anal. Chem. 2005, 77 (19), 6426–34. (25) Barnea, E.; Sorkin, R.; Ziv, T.; Beer, I.; Admon, A. Evaluation of prefractionation methods as a preparatory step for multidimensional based chromatography of serum proteins. Proteomics 2005, 5 (13), 3367–75. (26) Elschenbroich, S.; Ignatchenko, V.; Sharma, P.; Schmitt-Ulms, G.; Gramolini, A. O.; Kislinger, T. Peptide separations by on-line MudPIT compared to isoelectric focusing in an off-gel format: application to a membrane-enriched fraction from C2C12 mouse skeletal muscle cells. J. Proteome Res. 2009, 8 (10), 4860–9.

Journal of Proteome Research • Vol. 9, No. 4, 2010 1911

research articles (27) de Godoy, L. M.; Olsen, J. V.; Cox, J.; Nielsen, M. L.; Hubner, N. C.; Frohlich, F.; Walther, T. C.; Mann, M. Comprehensive massspectrometry-based proteome quantification of haploid versus diploid yeast. Nature 2008, 455 (7217), 1251–4. (28) Lu, P.; Vogel, C.; Wang, R.; Yao, X.; Marcotte, E. M. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat. Biotechnol. 2007, 25 (1), 117–24. (29) Foster, L. J. Large-scale subcellular localization of proteins by protein correlation profiling. In Comprehensive Analytical Chemistry; Barcelo´, D., Ed.; Elsevier: Amsterdam, 2009; pp 468-478. (30) Cabecinha, A.; Petrotchenko, E.; Borchers, C. H. Out-gel digest procedure for protein crosslinking applications. J. Am. Soc. Mass Spectrom. 2009, 20 (5S1), S105.

1912

Journal of Proteome Research • Vol. 9, No. 4, 2010

Fang et al. (31) Adachi, J.; Kumar, C.; Zhang, Y.; Olsen, J. V.; Mann, M. The human urinary proteome contains more than 1500 proteins, including a large proportion of membrane proteins. Genome Biol. 2006, 7 (9), R80. (32) Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M. Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 2002, 1 (5), 376–86. (33) Honeybee Genome Sequencing Consortium. nsights into social insects from the genome of the honeybee Apis mellifera. Nature 2006, 443 (7114), 931–49.

PR901063T