Article pubs.acs.org/jpr
Systematic Comparison of Fractionation Methods for In-depth Analysis of Plasma Proteomes Zhijun Cao, Hsin-Yao Tang, Huan Wang, Qin Liu, and David W. Speicher* Center for Systems and Computational Biology and Molecular and Cellular Oncogenesis Program, The Wistar Institute, Philadelphia, Pennsylvania, United States S Supporting Information *
ABSTRACT: Discovery and validation of plasma biomarkers are quite challenging because of the high complexity and wide dynamic range of the plasma proteome. Current plasma protein profiling strategies usually use major protein immunodepletion and nanoLC−MS/MS as the first and final analytical steps, respectively, but additional fractionation is needed to detect and quantify low-abundance disease biomarkers. In this study, the performances of 1-D SDS-PAGE, peptide isoelectrofocusing, and peptide high pH reverse-phase chromatography for fractionation of immunodepleted human plasma were systematically compared by evaluating protein coverage, peptide resolution, and capacity to detect known low-abundance proteins. Trade-offs between increasing the number of fractions to improve proteome coverage and resulting decreases in throughput also were assessed. High pH reverse-phase HPLC exhibited the highest peptide resolution and yielded the best depth of analysis with detection of the largest number of known lowabundance proteins for a given level of fractionation. Another advantage of using high pH reverse-phase fractionation rather than 1-D SDS gels is that all fractionation steps except for abundant protein depletion occur at the peptide level, making this strategy more compatible with quantitative biomarker validation methods such as stable isotope dilution multiple reaction monitoring. KEYWORDS: plasma proteome, proteome fractionation, protein profiling, biomarkers, human plasma, biomarker discovery, biomarker validation
■
INTRODUCTION Protein biomarkers are highly desired for early detection, accurate diagnosis, and prognosis of human diseases such as cancer, as well as for monitoring clinical interventions.1 Human plasma (or serum) is a particularly desirable biological fluid for disease biomarker discovery because blood is routinely collected in the clinic, collection is minimally invasive, and established clinical assays are relatively inexpensive. Proteins and metabolites in the blood are thought to be shed by most cells in the body, and changes in the levels of these proteins and metabolites have been hypothesized to potentially reflect most physical conditions.2,3 Thus, human plasma is a potential treasure-trove of candidate biomarkers that might indicate the onset and progression of most disease states. However, mass spectrometry (MS)-based proteomics analyses of plasma for disease-associated biomarker discovery and validation is extremely challenging because of plasma’s great complexity and wide dynamic range of plasma protein concentrations that span more than 10 orders of magnitude.2 Specifically, the plasma proteome is dominated by a handful of proteins in the mg/mL range, and the 20 most-abundant plasma proteins constitute 99% of the total protein mass in plasma.2 But, most disease biomarkers are predicted to be present at lowabundance levels, particularly proteins such as cancer biomarkers that are relatively specifically associated with the tumor. For example, prostate-specific antigen (PSA), carci© 2012 American Chemical Society
noembryonic antigen (CEA), CA125, and other relatively specific known cancer biomarkers are typically present in serum and plasma in the low ng/mL to pg/mL range.2−4 Hence, lowabundance disease biomarkers are often either masked by the abundant proteins or are below detection limits of MS instruments because the abundant proteins limit the volume of plasma that can be injected and analyzed. Therefore, detection of low-abundance proteins requires fractionation strategies that reduce sample complexity and increase the volume of original plasma that analyzed fractions represent. Of course, protein recoveries must remain high and relatively reproducible. The strategies most commonly employed are to immunodeplete the major plasma proteins and subject the remaining proteome to additional protein- and/or peptide-level fractionation steps prior to nanoLC−MS/MS.5−8 Sequential separation steps should exploit orthogonal physicochemical properties of proteins or peptides. SDS-PAGE and strong cation exchange (SCX) are highly orthogonal to immunodepletion and reverse-phase nanoLC−MS/MS and have been widely used for intermediate protein and peptide fractionation, respectively. Recently, peptide OFFGEL electrophoresis and high pH RP-HPLC (hpRP-HPLC) have gained attention and Received: October 27, 2011 Published: April 26, 2012 3090
dx.doi.org/10.1021/pr201068b | J. Proteome Res. 2012, 11, 3090−3100
Journal of Proteome Research
Article
fractionation methods for comparison to 1-D SDS gels on the basis of their high performance on less complex samples, as summarized above. Surprisingly, the results show that hpRPHPLC of depleted plasma tryptic peptides is more efficient at in-depth analysis than either 1-D SDS gels or peptide OFFGEL electrophoresis.
showed good performance in terms of separation efficiency and identifications of protein and peptide.9−17 Binary or higher dimensional comparisons of different fractionation approaches prior to LC−MS/MS have been studied by several research groups using samples with different complexities. Peptide OFFGEL electrophoresis has found to be comparable to online SCX separations using low- or medium-complexity samples18,19 and appears to outperform offline SCX20 and SDS-PAGE21 fractionation methods for complex samples. Recently, two independent systematic fractionation comparison studies showed SDS-PAGE was superior to OFFGEL electrophoresis or offline SCX in terms of protein and peptide identifications using honey bee lysates or lung cancer secretomes.22,23 hpRPHPLC exploits the same peptide properties (hydrophobicity) as low pH RP-HPLC; thus, it seems less orthogonal to low pH RP-HPLC compared with SDS-PAGE, OFFGEL electrophoresis, and SCX. However, it is worth noting that hpRPHPLC outperformed OFFGEL electrophoresis,19 SCX,19,24−26 and SDS-PAGE25 on the basis of the total number of proteins identified using low- or medium-complexity samples. Taken together, the above studies indicated that 1-D SDS-PAGE, OFFGEL electrophoresis, and hpRP-HPLC are among the highest performance proteome fractionation methods, as at least several studies showed each of these methods yielded the best depth of analysis in specific studies using low- or mediumcomplexity samples. However, to our knowledge, a side-by-side comparison of these three fractionation methods using a highly complex sample such as human plasma has not been reported. We previously used a 3-D plasma/serum fractionation strategy for ectopic pregnancy biomarker discovery and verification that combined immunodepletion of 20 abundant proteins, SDS-PAGE, and LC−MS/MS with label-free peptide quantitation.27−29 SDS-PAGE as the second plasma fractionation step provides reasonably reproducible separations and, importantly, can distinguish molecular weight changes in a given protein that may be clinically important for some biomarkers.28 This same method was used for initial small-scale validation of ectopic pregnancy and ovarian cancer biomarkers using multiple reaction monitoring (MRM) with label-free quantitation.27−29 However, this 3-D strategy with SDS-PAGE as the middle step is not very compatible with stable isotope dilution MRM quantitation or with other peptide-level, stableisotope-label-based quantitative strategies. Another limitation of the 3-D MRM analysis using 1-D SDS gels as the middle step is that proteins to be quantitated are usually spread over at least three to four fractions, and slight gel-to-gel variations in protein migration further increase the number of gel slices that must be analyzed in order to ensure that the proteins of interest are fully quantitated. This spread of peptides to be quantitated among multiple fractions reduces peptide signal intensity, making the peptide harder to detect and quantify, and reduces sample throughput. In this study, we systematically compared 1-D SDS-PAGE, OFFGEL electrophoresis, and hpRP-HPLC as the middle step in 3-D plasma proteome profiling. One goal was to identify a peptide-based method that could be better integrated with stable isotope dilution MRM assays and would have at least a similar depth of analysis to 1-D SDS PAGE. In addition, a peptide-based fractionation method that might prove to be superior to 1-D SDS PAGE for plasma proteome profiling would provide an alternative 3-D strategy for initial plasma biomarker discovery. We selected peptide OFFGEL electrophoresis and peptide hpRP-HPLC as the best peptide
■
MATERIALS AND METHODS
Reagents
LC−MS grade formic acid, 200 proof molecular biology grade ethanol, ammonium bicarbonate (ABC), and N,N-dimethylacrylamide (DMA) were purchased from Sigma-Aldrich (St. Louis, MO). Sodium dodecyl sulfate (SDS) and Tris were purchased from Bio-Rad (Hercules, CA). Dithiothreitol (DTT) was purchased from GE Healthcare (Piscataway, NJ). HPLC grade acetonitrile was obtained from Thomas Scientific (Swedesboro, NJ). Sequencing grade modified trypsin was purchased from Promega (Madison, WI). Proteo Prep20 Depletion
The most abundant 20 proteins were depleted from human plasma using a ProteoPrep20 Immunodepletion Column (Sigma-Aldrich) on an AKTA fast performance liquid chromatography system (FPLC; GE Healthcare). Briefly, 100 μL of plasma was diluted to 500 μL with PBS and filtered through a 0.22 μm microcentrifuge filter, injected onto the column, and depleted using the manufacturer’s recommended protocols and buffers. For systematic comparisons of different conditions with an identical sample, the flow-through fractions containing unbound proteins from 800 μL of plasma were pooled and divided into eight aliquots, and then each aliquot was precipitated with nine volumes of prechilled 200 proof ethanol (−20 °C). Ethanol supernatants were carefully removed, protein pellets were dried to remove residual solvent, and pellets were frozen and stored at −20 °C until further use. A representative gel of the “Top 20” protein depletion is shown in Supporting Information Figure S1. SDS-PAGE/In-Gel Trypsin Digestion
SDS-PAGE and in-gel trypsin digestion were carried out as described previously with minor modifications.8 Briefly, frozen protein pellets from ethanol precipitation of depleted plasma were thawed and resuspended in 50 mM Tris-HCl, 1% SDS, pH 8.5. Samples were reduced with 20 mM DTT for 1 h at 37 °C and alkylated with 60 mM DMA in 50 mM Tris-HCl, pH 8.5 for 1 h at 37 °C. Alkylation was quenched with 50 mM DTT for 15 min at 37 °C. Following in-solution reduction and alkylation, samples were prepared for PAGE by the addition of SDS sample buffer. Sample equivalent to 10 μL of original plasma was loaded per lane using 10-well, 12% NuPAGE minigels (Invitrogen, Carlsbad, CA) and MES running buffer. Gels were electrophoresed until the tracking dye had migrated 1.0, 2.0, or 4.0 cm and stained with Colloidal Blue (Invitrogen), and a 4-mm-wide strip from the center of each lane was subsequently sliced into 10, 20, or 40 uniform 1-mm slices using a custom razor-blade array. Corresponding slices from three replicate lanes were combined in single wells of a 96-well pierced plate (Biomachines, Inc., Carrboro, NC). Gel slices were digested overnight using 0.02 μg/μL of modified trypsin. Following digestion, samples were frozen and stored at −20 °C. In-Solution Trypsin Digestion
Frozen protein pellets from ethanol precipitation of depleted plasma were thawed briefly and resuspended in 100 mM 3091
dx.doi.org/10.1021/pr201068b | J. Proteome Res. 2012, 11, 3090−3100
Journal of Proteome Research
Article
resolution in the Orbitrap in profile mode followed by datadependent MS/MS scans in the linear trap on the six most abundant ions exceeding a minimum threshold of 1000. Monoisotopic precursor selection was enabled, and chargestate screening was enabled to reject z = 1 ions. Ions subjected to MS/MS were excluded from repeated analysis for 45 s. The volumes of fractions injected were adjusted to consistently inject approximately 0.5 μg or less of tryptic peptides, where peptide concentrations were estimated by assuming quantitative recovery and equal distribution of protein or peptides among all fractions.
ammonium biocarbonate, 8 M urea buffer, pH 8.5, reduced with 5.7 mM TCEP for 1 h at 37 °C, and alkylated with 25 mM DMA for 1 h at 37 °C. Alkylation was quenched with 30 mM cysteine for 15 min at 37 °C. A two-step proteolytic digestion was performed. First, sample was diluted with 25 mM ammonium bicarbonate to 4 M urea, digested with trypsin (enzyme/protein: 1/100) for 4 h at 37 °C, and then diluted with 25 mM ammonium bicarbonate to 2 M urea and digested with trypsin (enzyme/protein: 1/50) overnight at 37 °C. Proteolysis was stopped by adding 10% formic acid to a final pH ∼3, and the sample was desalted using a Sep-Pak C18 cartridge (Waters Inc., Milford, MA).
Data Processing
Peptide Fractionation Methods
MS/MS spectra were extracted and searched using the SEQUEST algorithm in BioWorks (version 3.3, Thermo Fisher Scientific) against the human UniRef 100 protein database (November 2007, the Protein Information Resource at Georgetown University, Washington, DC) combined with a reverse database and a list of common contaminants (trypsin, keratins, etc.). The reverse database was generated by reversing the protein amino acid sequence for each database entry, and the entire reversed database was appended in front of the original forward sequences. Database search and results filtering strategies that we previously optimized for complex proteomes such as human tumor secretomes and human serum were used in this study.30 Specifically, MS/MS spectra were searched using partial trypsin specificity with up to two missed cleavages, a 100 ppm precursor mass tolerance, 1 amu fragment ion mass tolerance, static modification of cys (DMA derivative, +99.06840), and variable modifications for methionine oxidation (+15.9949) and asparagine deamidation (+0.9840). Consensus protein lists were generated by DTASelect (version 2.0, licensed from Scripps Research Institute, La Jolla, CA) using the following data filter: full tryptic boundaries, 10 ppm, ΔCn ≥ 0.05. For each proteome, the FDR was estimated from the ratio of the decoy database peptide or protein counts to forward database peptide or protein counts, expressed as a percentage. Peptide counts for FDR calculation were taken directly from the DTASelect results, which counted different charge states and variable modifications as separate peptides. Unique peptide or protein counts were obtained using custom software, which collapsed different charge states and variable modifications of methionine oxidation and asparagine deamidation of a unique sequence into a single peptide count. The software also limited assignment of each unique peptide sequence to a single protein in the final assembled protein list as previously described.30 As previously shown, this data analysis strategy was superior to older data filtering methods that typically utilized Xcorr values because higher numbers of unique peptides and proteins could be identified while maintaining low FDR.30 As shown in Supporting Information Table S2, FDR for all peptides was less than 3% for all data sets using nonredundant peptide counts and, FDR for proteins identified by two or more peptides was less than 1%. Because FDR for proteins identified by one or more peptides was much higher (Supporting Information Table S2), proteins identified by a single peptide were separately represented in data summaries and were not emphasized when comparing methods. To identify common and unique proteins found by different fractionation methods, protein and peptide data were placed in a relational database (MySQL) and matched using custom software.
OFFGEL Separation. Thirty microliters of depleted plasma tryptic digests were separated using an Agilent 3100 OFFGEL Fractionator (Agilent, Santa Clara, CA). Both the OFFGEL low and high res kit, pH 3−10 (Agilent) were used. The lowresolution 12-well separations were focused for 20 kV h, and high-resolution 24-well separations were focused for 50 kV h, with a maximum current of 50 μA and power of 200 mW. Fractions were acidified by adding 10% formic acid to a final pH ∼3 and purified by solid phase exchange with UltraMicroSpin columns (The Nest Group, Inc., Southborough, MA). Fractions were dried in a SpeedVac, followed by resuspension of each fraction in 160 μL (for 12-well separations) or 80 μL (for 24-well separations) of 0.1% formic acid. hpRP-HPLC. Tryptic digests of depleted plasma were injected into a 2.1 × 250 mm XBridge BEH300 C18 column (Waters) with a 2.1 × 10 mm XBridge C18 guard column (Waters) connected to an Agilent 1100 HPLC system. Solvent A was 20 mM ammonia, pH 10.7, and solvent B was 20 mM ammonia in 80% acetonitrile; a flow rate of 200 μL/min was used throughout the separation. Sample loading was performed using 3% B for 8 min followed by a linear gradient from 3 to 88% B over 63 min and a 15 min hold at 88% B prior to reequilibration at 3% B. The peptide elution profile was monitored using UV absorbance at 215 nm, and fractions were collected every minute, resulting in 83 initial fractions. These 83 fractions were then pooled into 12, 20, and 40 fractions, as described in Supporting Information Table S1, such that most fractions had similar UV absorbance. Pooled fractions were dried in a SpeedVac, followed by resuspension of each fraction in 80 μL (for 15 μL of tryptic digests separated into 12 fractions) or 40 μL (for 30 μL of tryptic digests separated into 40 fractions) of 0.1% formic acid. LC−MS/MS
Trypsin digestions were loaded into a UPLC Symmetry trap column (180 μm i.d. × 2 cm packed with 5 μm C18 resin; Waters) with solvent A, which was Milli-Q (Millipore, Billerica, MA) water containing 0.1% formic acid, and separated by nanoRP-HPLC on a BEH C18 nanocapillary analytical column (75 μm i.d. × 25 cm, 3 μm particle size; Waters) interfaced with a LTQ-Orbitrap XL mass spectrometer (Thermo Fisher Scientific, Waltham, MA). Solvent B was 0.1% formic acid in ACN. Peptides were eluted at 200 nL/min using the following gradient conditions: 5 to 28% B over 42 min, 28 to 50% B over 25.5 min, 50 to 80% B over 5 min, and hold at 80% B for 5 min prior to re-equilibration at 5% B. To minimize carryover, a 30min blank with a 2 μL of buffer A injection was run between each sample. The mass spectrometer was set to scan m/z from 400 to 2000. The full MS scan was collected at 60 000 3092
dx.doi.org/10.1021/pr201068b | J. Proteome Res. 2012, 11, 3090−3100
Journal of Proteome Research
■
Article
RESULTS AND DISCUSSION Prior to directly comparing the different fractionation methods, a series of experiments were performed to optimize separation parameters for the peptide OFFGEL electrophoresis and hpRPHPLC separations using immunodepleted human plasma (data not shown). The optimized separation strategies developed from these pilot experiments are described in the Materials and Methods and were used for all subsequent experiments, as described below. The scheme used to systematically compare 1D SDS PAGE, peptide hpRP-HPLC, and peptide OFFGEL electrophoresis is shown in Figure 1. Replicate aliquots of a
tryptic peptides were injected for each fraction or pool of fractions. The numbers of unique peptides and proteins from each data set are summarized in Figure 2. As expected, both unique
Figure 2. Comparisons of SDS gel length and number of fractions on depth of plasma proteome analysis. (A) Number of unique peptides identified for depleted plasma separated for differing distances (1, 2, or 4 cm) on 1-D mini-gels. In each case, the entire gel lane to the dye front was cut into 1-mm slices, and each slice was separately digested with trypsin. For the 1-cm gel, two adjacent digests were pooled resulting in a total of five fractions. For the 2-cm gel, five fractions were created by pooling four adjacent digests, or, 20 individual digests were analyzed. For the 4-cm gel, 20 fractions were created by pooling two adjacent fractions, or 40 individual digests were analyzed. (B) The number of unique proteins identified in each proteome. (C) Overlap of proteins identified by at least two peptides in the two, five-fraction (left panel) and two, 20-fraction (right panel) proteomes. Overlap was assessed on the basis of exact matches of database entries and was, therefore, an underestimate of actual overlap of identified proteins.
Figure 1. Strategy used to systematically compare SDS-PAGE, OFFGEL electrophoresis, and hpRP-HPLC as alternatives for the middle step in a 3-D plasma proteome workflow.
single pool of depleted plasma were used for these experiments. In parallel with comparing the methods with each other, the effects of using different numbers of fractions for each method also were assessed, as summarized below. Effects of Gel Separation Distance and Number of Fractions on Depth of Analysis
Our prior studies showed that the optimal loading amount in 1 mm thick mini-gels was the depleted fraction from approximately 10 μL of plasma, because heavier protein loads sometimes caused visible band distortion with resulting decreased resolution. When depleted plasma from normal donors was reconstituted to the original plasma volume, the protein concentration was typically approximately 3 μg/μL, which corresponded to a total protein load per gel lane of approximately 30 μg. In order to independently evaluate the effects of both separation length and number of fractions, replicate depleted plasma samples were electrophoresed for 1.0, 2.0, and 4.0 cm resulting in 10, 20, and 40 1-mm gel slices, respectively. These gel slices were individually digested, and then fractions were analyzed by LC−MS/MS, either individually or after pooling adjacent fractions as follows: from the 1cm gel, five fractions were prepared by pooling digests 1−2, 3− 4, etc.; from the 2-cm gel, the 20 fractions were analyzed individually, and five fractions were prepared using one-third of each digest with pooling of digests 1−4, 5−8, etc.; and from the 4-cm gel, the 40 fractions were analyzed individually, and 20 fractions were prepared using one-third of each digest with pooling of digests 1−2, 3−4, etc. Loading of tryptic digests onto the LC−MS/MS system was standardized by assuming the 30 μg of total protein per gel lane was evenly divided into the total number of fractions in each experiment, and 0.5 μg of
peptide and unique protein numbers increased as the number of fractions analyzed per sample increased. Nearly twice as many proteins were identified in both 20-fraction proteomes compared with the five-fraction proteomes. The further increase in depth of analysis by increasing to 40 fractions resulted in identification of 24% more unique peptides and 35% more unique proteins identified with at least one unique peptide, or 28% more unique proteins identified with at least two unique peptides compared with the average from the 20fraction proteomes. While this increase is substantial, it doubles the required mass spectrometer time, and this moderate increase in depth of analysis may not justify the decreased throughput in some workflows, particularly those where analysis of large numbers of samples is desired. Interestingly, similar numbers of peptides and proteins were identified for the two five-fraction proteomes and the two 20-fraction proteomes. Furthermore, proteomes with identical number of fractions showed a high degree of overlap in proteins identified by two or more peptides (Figure 2C). This indicates that the gel separation distance did not affect the depth of analysis. However, peptide resolution, which was defined as the number of fractions where a peptide was identified, was affected by gel separation distance. As shown in Supporting Information 3093
dx.doi.org/10.1021/pr201068b | J. Proteome Res. 2012, 11, 3090−3100
Journal of Proteome Research
Article
Figure 3. Comparisons of OFFGEL electrophoresis runs using low- and high-resolution separation kits. (A) The number of unique peptides identified in duplicate 12-fraction, low-resolution separation and a 24-fraction, high-resolution run using pH 3−10 IPG strips. (B) The number of unique proteins identified in each proteome. (C) Overlap of proteins identified by at least two peptides in the three separations. Overlap was assessed on the basis of exact matches of database entries.
Figure S2, for the five-fraction proteomes, the percentage of peptides identified in a single fraction was only 46% for the 1cm separation but increased to 64% for the 2-cm separation that was analyzed as five fractions. Similarly, for the 20-fraction data sets, the percentage of peptides identified in a single fraction was only 37% for the 2-cm separation and was 45% for the 4-cm separation. Taken together, these results suggest that for discovery studies, the gel separation distance is relatively unimportant, whereas the more important factor is the total number of fractions analyzed by LC−MS/MS. In contrast, for MRM assays the gel separation distance should be based upon the desired number of fractions to be analyzed such that each digest represents a single 1-mm-high gel slice. This will maximize the number of peptides detectable in a single fraction, which is more important for MRM assays because having peptides targeted for quantitation in single fractions will maximize peptide signal intensities and may improve throughput.
system.21 This difference of 19% for yeast lysates compared with our 43% for depleted plasma is probably due to the far wider dynamic range of protein concentrations in plasma compared with yeast lysates. Combining the duplicate 12fraction proteomes into a single data set resulted in 479 unique proteins identified by at least two peptides, with 74% of these proteins common to both replicates. This combined data set was still 20% smaller than the corresponding data from the single 24-fraction analysis, despite representing the same number of total LC−MS/MS runs. Interestingly, 89% of the proteins from the combined 12-fraction proteomes were also identified in the 24-fraction proteome, illustrating good reproducibility across experiments with similar depth of analysis (Figure 3C). These results show that the 24-well fractionation is clearly advantageous for analysis of depleted plasma proteomes. Unfortunately, one of the limitations of OFFGEL electrophoresis is the lack of flexibility in adjusting the maximum number of fractions that can be achieved. The options are to analyze 12 or 24 fractions, or to reduce the number of fractions by pooling selected fractions after separation. Indeed, strategic pooling of selected fractions probably could moderately increase throughput without reducing depth of coverage, because the complexity of different fractions varies greatly as indicated by the distribution of unique peptides among fractions. The 12- and 24-fraction separations show similar trimodal distributions of peptide complexity (Supporting Information Figure S3A,B) with two regions of low-complexity fractions. For example, with the more extensive 24-fraction separation, the simplest fractions contain only about 120 identifiable peptides, while other fractions have about six times as many identifiable peptides. Despite the limitations described above, an advantage of this method is that peptide resolution is very high, with about 75 and 62% of all peptides identified in a single fraction for the 12- and 24-well separations, respectively, as shown in Supporting Information Figure S3C.
Peptide OFFGEL Fractionation Using Low- and High-Resolution Kits
OFFGEL fractionations of replicate aliquots of the large depleted plasma pool tryptic digests were performed using the manufacturer’s instructions, except without using glycerol in the rehydration buffer or sample buffer. Specifically, 30 μL of depleted plasma digest was separated using either a lowresolution kit (12 wells, 12-cm IPG strip) or a high-resolution kit (24 wells, 24-cm IPG strip) in duplicate. In order to reduce mass spectrometer analysis time, duplicate fractionations using the 12-well format and one set of 24-well fractions were analyzed by LC−MS/MS. As shown in Figure 3, the 24-fraction experiment identified 5410 unique peptides, compared to an average of 3552 unique peptides from the duplicate 12-fraction experiments for an increase of 52% more peptides identified for twice the MS analysis time. At the protein level, the 24-fraction experiment identified 47% more total proteins, i.e., 1080 proteins compared to an average of 737 proteins from the duplicate 12-fraction experiments. A similar trend was observed when only proteins identified by two or more unique peptides were considered, with 43% more proteins identified in the 24fraction experiment (596 proteins compared with an average of 417 proteins). Interestingly, Hubner, et al. reported that only about 19% more proteins were identified in yeast lysates using the 24-well format compared with the 12-well format OFFGEL
Effects of Number of hpRP-HPLC Fractions on Depth of Plasma Proteome Analysis
For the systematic comparison experiment, tryptic digests of depleted plasma were separated on a narrowbore XBridge BEM column using a gradient that was optimized in pilot experiments (Supporting Information Figure S4). Neighboring fractions were pooled on the basis of the absorbance elution 3094
dx.doi.org/10.1021/pr201068b | J. Proteome Res. 2012, 11, 3090−3100
Journal of Proteome Research
Article
Figure 4. Comparisons of hpRP-HPLC separations after pooling peptides into differing numbers of fractions. (A) The number of unique peptides identified after pooling separated peptides into 12, 20, or 40 fractions. (B) The number of unique proteins identified in each proteome. (C) Overlap of proteins identified by at least two peptides in the duplicate 12-fraction analyses. Overlap was assessed on the basis of exact matches of database entries.
yielding the highest numbers of unique peptides and unique proteins, particularly as the total number of fractions per experiment increased. That is, there were relatively small differences between methods when 12 fractions were compared, whereas the curves diverged with increasing degree of fractionation (Figure 5). Furthermore, general linear regression models (GLM) were used to test if the different fractionation methods and the number of fractions would affect the number of identified unique peptides and unique proteins. The potential interaction effects between fractionation methods and the number of fractions were examined using a likelihoodratio test. On the basis of a regression model with interaction terms, we estimated the difference in the average number of identified unique peptides or unique proteins between any two of the three fractionation methods at several given numbers of fractions (Table 1). The results show that hpRP-HPLC yielded a significantly higher number of unique peptides and unique proteins than SDS at any given number of fractions. Compared with OFFGEL, hpRP-HPLC identified significantly more unique peptides but did not identify significantly more unique proteins. However, one limitation of OFFGEL electrophoresis is that it is currently limited to a maximum of 24 fractions, while larger numbers of fractions are readily feasible with hpRPHPLC. OFFGEL, compared with SDS, identified significantly more unique proteins at any tested number of fractions but did not significantly identify more unique peptides if fraction number was less than 24. Interestingly, 1-D SDS-PAGE, which was determined to be the most effective proteome fractionation method in two prior studies using cell or tissue lysates,22,23 yielded the lowest depth of analysis for fractionation of plasma in this study. For plasma proteome analysis, hpRP-HPLC yielded the greatest depth of analysis, and the advantage of hpRP-HPLC relative to 1-D SDS PAGE was the greatest when larger numbers of fractions were used (Figure 5). A more detailed analysis of the three methods at similar throughput levels can be achieved by comparing the protein data sets identified by two or more peptides for the 20−24 fraction experiments. When 20 or 24 fractions (depending upon method) per proteome were used, OFFGEL and hpRPHPLC yielded 28 and 27% more protein identifications than 1D SDS-PAGE, respectively. The total number of unique proteins identified by at least one of the three methods at the 20−24 fraction level is 802, and the number of proteins
profiles to yield 12, 20, or 40 fractions, where total absorbance per fraction within each sample set were as similar as possible (Supporting Information Table S1A,B). Duplicates of the 12fraction set and single sets of the 20- and 40-fraction experiments were analyzed by LC−MS/MS. The numbers of unique peptides and proteins identified are shown in Figure 4A and B, respectively. There are 30 and 38% more peptides and 40 and 45% more proteins identified by two or more peptides as the fraction number is increased from 12 to 20 and from 20 to 40 fractions, indicating that increasing the fraction number to at least 40 has a substantial impact on the depth of plasma proteome analysis. The proteins identified by at least two unique peptides in the 40 fraction hpHPLC experiment, which is the largest data set obtained in these studies, are shown in Supporting Information Table S3. Similar to the other fractionation methods, separations and the resulting identified proteomes were reproducible, with about 89% of proteins identified by two or more peptides common to both replicate 12-fraction data sets (Figure 4C). Using UV absorbance at 215 nm to guide fraction pooling worked effectively to achieve similar complexity, as most fractions within an experiment had similar numbers of unique identifiable peptides (Supporting Information Figures S5A−C). Interestingly, similar peptide resolution was achieved at all fractionation levels, as 73, 75, and 70% of the peptides were identified in a single fraction for the 12-, 20-, and 40-fraction data sets, respectively (Supporting Information Figure S5D). Increasing the number of fractions had little effect on the number of fractions containing a single peptide, thereby demonstrating the high resolution of hpRPHPLC to separate the peptides in our current experimental conditions. Comparison of 1-D SDS-PAGE, Peptide OFFGEL, and Peptide hpRP-HPLC Fractionation Methods at Different Levels of Fractionation
As discussed above, increasing the number of fractions increased the depth of analysis for each fractionation method with the trade-off of decreased throughput. The critical factor is to find the optimal trade-off between throughput and depth of analysis. Hence, we compared the depth of analysis for the three methods over overlapping ranges of fraction numbers (Figure 5). Regardless of the criteria used to measure depth of analysis, the trends were similar, with hpRP-HPLC consistently 3095
dx.doi.org/10.1021/pr201068b | J. Proteome Res. 2012, 11, 3090−3100
Journal of Proteome Research
Article
multiple fractionation methods either in tandem or sequentially. Identification of Known Low-Abundance Plasma Proteins. Another method of assessing depth of plasma proteome analysis is to determine the number of known lowabundance plasma proteins identified in the different data sets. Hence, these data sets were compared to a list of lowabundance plasma proteins that included 154 proteins with reported concentrations of 100 ng/mL or less.33,34 Consistent with the overall protein and unique peptide counts, hpRPHPLC identified more low-abundance proteins than SDSPAGE and the OFFGEL method when 20−24 fractions were used (Figure 7), and the largest numbers of low-abundance proteins among all data sets were identified in the 40-fraction hpRP-HPLC data set. When only highly confident assignments based on identification by two or more peptides were considered, the 40-fraction hpRP-HPLC data set identified nearly twice as many low-abundance proteins compared with all other data sets. To further compare identification and peptide coverage of low-abundance proteins across methods, all proteins with reported abundances of 50 ng/mL or less33,34 that were identified by two or more peptides in at least one of the experiments involving at least 20 fractions were listed in Supporting Information Table S4. The 40 fraction hpRP-HPLC data set identified 18 of these 21 low-abundance proteins, while the other methods identified 12 or less. In addition, most proteins were identified by the largest number of peptides in the 40 fraction hpRP-HPLC data set. For example, the lowestabundance protein detected, interleukin 18 (59 pg/mL), was detected by two peptides in the 40 fraction hpRP-HPLC data set, by a single peptide in the OFFGEL_24F data set, and was not detected in any other data set. The second lowest detected protein, gamma enolase (80 ng/mL) was detected by six and four peptides in the 40 and 20 fraction hpRP-HPLC data sets, respectively, compared with three or fewer peptides in all other data sets. Not surprisingly, specific proteins were preferentially detected by either the RP-HPLC method or 1-D SDS PAGE method, but it is interesting that OFFGEL never detected a low-abundance protein that was not detected by one of the alternate methods when similar numbers of fractions were compared. Overall, the hpRP-HPLC 40 fraction method provided the highest sensitivity for identification of low abundant plasma proteins with the greatest sequence coverage. Comparison of Separation Efficiency Indicated by Peptide Resolution. For discovery experiments, the most important factors are the numbers of proteins identified, the sequence coverage (1-peptide-hit proteins are more tentative even at stringent overall false discovery rates), and capacity to identify low-abundance proteins, as summarized above. However, for MRM assays, an additional important parameter is the total number of fractions among which a peptide is distributed. When a peptide is spread among multiple fractions, the peptide signal strength will decrease, and the total number of fractions that need to be analyzed will probably increase, thereby reducing assay throughput. At the 20- or 24-fraction level, the portion of peptides identified in a single fraction were 75% for hpRP-HPLC, 62% for OFFGEL, and only 45% for 1-D SDS-PAGE (Figure 8). At the 40-fraction level, the proportion of peptides in a single fraction was 70% for hpRP-HPLC and only 34% for 1-D SDS PAGE. Therefore, hpRP-HPLC is clearly the method of choice for MRM assays. Sample Preparation Time and Costs. Additional factors to consider when considering alternative fractionation methods
Figure 5. Analysis depth of alternative middle step fractionation methods. (A) Relationship between number of unique peptides identified and number of fractions per proteome for the three methods. (B) Number of unique proteins identified by a single peptide. (C) Number of unique proteins identified by at least two peptides. Data points for each fractionation method were connected using the smooth curve option in Excel.
identified by all three methods is 335. Only 8.5% (68) of the proteins were unique to SDS-PAGE, whereas 13.3% (107) and 13.7% (110) were unique to OFFGEL and hpRP-HPLC, respectively, as shown in Figure 6A. Interestingly, when the sample resolution was increased to 40 fractions per proteome, hpRP-HPLC identified 46% more proteins than the 1-D SDSPAGE method (858 vs 587), as shown in Figure 6B. The total number of proteins identified by these two methods is 974, with 471 common to both data sets, 12% (116) unique to SDSPAGE, and 40% (387) unique to hpRP-HPLC. It is not surprising that these diverse fractionation methods would identify somewhat complementary data sets. However, the degree of complementarity is too low to consider using 3096
dx.doi.org/10.1021/pr201068b | J. Proteome Res. 2012, 11, 3090−3100
Journal of Proteome Research
Article
Table 1. General Linear Regression Model Indicating Estimated Differences, 95% Confidence Intervals and P-Values estimated difference (95% confidence interval) in the number of identified unique peptides number of fractions 12 20 24 40 number of fractions 12 20 24 40
OFFGEL vs SDS-PAGE
p-valuea
hpRP-HPLC vs SDS-PAGE
p-value
hpRP-HPLC vs OFFGEL
−114.6 (−561.8, 332.7) 0.616 967.5 (550.0, 1385.1)