Analysis of RP-HPLC Loading Conditions for Maximizing Peptide Identifications in Shotgun Proteomics Amelia Peterson,† Laura Hohmann,‡ Li Huang,‡ Bong Kim,‡ Jimmy K. Eng,‡,§ and Daniel B. Martin*,‡ Department of Chemistry, University of Wisconsin, Madison, Wisconsin, Institute for Systems Biology, Seattle, Washington, and UW Medicine at South Lake Union, 815 Mercer Street, Seattle, Washington 98109 Received February 14, 2009
Abstract: Substantial energy and resources have been invested in improving mass spectrometry (MS) instrumentation, upstream sample preparation protocols, and database search strategies to maximize peptide and protein identifications. The role of HPLC sample loading methods in maximizing MS identifications has been largely overlooked, and there exists an immense heterogeneity in the methods employed in the proteomics literature. We sought to optimize loading methods by testing multiple loading conditions (buffer composition, resin, initial gradient) using tryptic digests of an 18 protein mixture and whole yeast lysate. The loading buffer acetonitrile (ACN) concentration greatly affected peptide identifications: up to a 26% increase in peptide identifications was observed by decreasing the ACN concentration from 5 to 2% during sample loading. Hydrophilic peptides were the main contributors to the increase in peptide identifications and, at higher ACN concentrations, were washed from the precolumn during desalting. Sampling of the hydrophilic peptides was enhanced by using a shallow initial ACN gradient. The results were found to be resin-specific and not generalizable. Our investigation demonstrates the often unappreciated importance of optimizing sample loading conditions to reflect the aims of the research and the characteristics of the LC configurations employed. Keywords: HPLC • sample loading • proteomics • mass spectrometry • hydrophobicity
Introduction In recent years, the field of proteomics has devoted substantial energy and resources to improving sample preparation, MS instrumentation and quantification methods,1 and informatics approaches2 to facilitate the high-throughput study of whole proteomes. For example, new and expensive instrument platforms such as the Thermo Scientific LTQ Orbitrap-ETD and the Waters SYNAPT HDMS system both boast high mass * To whom correspondence should be addressed. Email: dmartin@ systemsbiology.org. Phone: 206-732-1365. Fax: 206-732-1299. † University of Wisconsin. ‡ Institute for Systems Biology. § UW Medicine at South Lake Union. 10.1021/pr9001417 CCC: $40.75
2009 American Chemical Society
accuracy and additional features, such as electron transfer dissociation in the former and ion mobility spectrometry in the latter, that seek to increase peptide identifications. While new instrumentation has enticed proteomics researchers with promises of increased proteome coverage, and research on improving sample preparation and database search strategies to extract peptide identifications has likewise seen much advancement, less attention has been given to optimizing the reversed phase (RP)-HPLC parameters used during sample loading, a crucial step in the “typical” LC-MS experiment practiced in most proteomics laboratories. The “typical” LC-MS proteomics workflow entails the digestion of a biological sample with a protease, usually trypsin, an off-line first dimension of separation (e.g., SCX,3 IEF4) or subproteome enrichment (e.g., IMAC, immuno-depletion and/ or -enrichment5), and then an online RP-HPLC separation coupled via an electrospray ionization source to a mass spectrometer.6 While most of the components of this workflow have been the subject of technological advancement and extensive optimization, especially LC system and column technologies,7 few systematic investigations of the optimum conditions for sample loading during the second dimension RP-HPLC are reported in the literature. Issaq et al.’s8 study on the effect of various experimental parameters on the separation of peptides by HPLC, for example, determined optimum gradient conditions and ion pairing reagents for the separation of single protein digest. Isaaq and colleagues did not investigate loading buffer compositions other than 5% ACN in acidified water despite noticing decreased peak resolution at the start of the elution profile. More recently, Lee et al.9 described the optimization of RP-HPLC conditions to achieve high resolution separations for peptide quantification by ICAT (isotope-coded affinity tags). An optimum ACN gradient range and slope to maximize peptide identifications and minimize peak broadening was determined for a complex biological digest. Again, however, only loading buffers containing 5% ACN were employed. One study outside of the proteomics field, using HPLCsolid phase extraction (SPE)-NMR did address the influence of buffer composition during sample loading on the retention and downstream NMR detection of natural products compounds. Clarkson and colleagues10 investigated the retention of 25 model compounds on various column resins, including RP-C18 resin, after sample loading in 0, 3, 6, 9, 12, and 15% ACN-containing buffers. Their study stressed that successful Journal of Proteome Research 2009, 8, 4161–4168 4161 Published on Web 06/26/2009
technical notes experimental outcomes depended strongly on appropriate loading buffer choice. A look at research articles published over the last two years in major proteomics journals demonstrates an immense heterogeneity in sample loading methods. For example, out of nearly 200 papers employing RP-HPLC-MS published in the Journal of Proteome Research (August, October, November 2007, and ASAP articles available online on January 9, 2009), Molecular and Cellular Proteomics (August-November 2007, AprilMay 2008, and In Press articles available online on January 9, 2009) and Proteomics (August 2008), approximately 40% use buffers containing greater than or equal to 5% ACN (n ) 79) during sample loading, 38% use buffers containing less than 5% ACN (n ) 75), and nearly 21% (n ) 41) of the articles do not report the parameters used for their HPLC separations. Loading ACN concentrations were found to range from completely aqueous (0% ACN) to 15% ACN, with 5% ACN being the most common choice. No relationship was discernible between the loading conditions chosen and column resins employed. Only one paper mentioned optimizing their chromatographic conditions to maximize their peptide identifications.11 On the basis of our perusal of the literature, we could identify no definitive standard method for sample loading. Our proteomics facility has, until recently, routinely loaded samples in buffers containing 5% ACN as part of our standard operating protocol. This standard protocol was borne in our lab, as in other laboratories, out of a desire to maximize peptide identifications and throughput while maintaining column integrity. Limited investigation went into its establishment. However, after noticing a dramatic increase in peptide identifications after a one-time deviation from our standard operating protocol for analysis of our facility’s standard protein digest, we were spurred to investigate our RP-HPLC loading conditions more closely. Here, we detail those investigations and provide guidelines for optimization of loading conditions in proteomics studies.
Experimental Section Materials. Unless otherwise noted, all reagents and ISB standard 18 protein mixture components were purchased from Sigma Aldrich (St. Louis, MO) and used without modification. HPLC grade ACN was purchased from EMD (San Diego, CA), HPLC grade water-0.1% formic acid (FA) from J. T. Baker (Phillipsburg, NJ), tris(2-carboxyethyl)phosphine (TCEP) from Pierce (Rockford, IL), and sequencing grade trypsin was from Promega (Madison, WI). Standard 18 Protein Mixture (18 mix) Preparation. The ISB standard 18 protein mixture was prepared as described previously.12 Briefly, each protein component was dissolved at a final concentration of 1 µM in ammonium bicarbonate containing 0.05% SDS for a total protein concentration of 970 µg/mL. The sample was reduced with TCEP, alkylated with iodoacetamide, and digested overnight with sequencing grade trypsin. Samples were dried and desalted on an Oasis MCX column (Waters, Milford, MA). The dried eluate was resuspended in 1% ACN in acidified water for mass spectrometry analysis. Four microliters, ∼250 ng total protein, was used for each analysis. Yeast Digest Preparation. A lysate of strain JRY193, grown to an OD600 of 1.0 in YPD media, was prepared by snap freezing in liquid nitrogen followed by mechanical disruption using a Retsch (Newtown, PA) PM 100 planetary ball mill. The yeast lysate was denatured with EDTA-containing urea in Tris buffer. Cysteines were reduced with dithiothreitol (DTT), 4162
Journal of Proteome Research • Vol. 8, No. 8, 2009
Peterson et al. alkylated with iodoacetamide, and alkylation quenched with additional DTT. After dilution with Tris buffer, the sample was digested for 3 h with LysC before an overnight incubation with sequencing grade trypsin. The digested sample was desalted on a Grace (Colombia, MD) Vydac Bioselect SPE C18 cartridge. The dried eluate was reconstituted at 1 mg/mL in 1% ACN in acidified water for preliminary yeast analyses using the Magic C18Aq RP resin. The digest was diluted to 250 µg/mL for subsequent yeast analyses. Liquid Chromatography-MS/MS Analyses. LC-MS/MS was performed on a Thermo Scientific (Waltham, MA) LTQ linear ion trap coupled to an Agilent (Santa Clara, CA) HP 1100 series LC system through an electrospray source. Confirmatory analyses utilized the Thermo Scientific LTQ XL Orbitrap coupled to an Agilent 1100 Nano-LC. Peptides were trapped on a fused silica fritted capillary precolumn packed with Magic C18Aq RP spherical silica (2 cm × 75 µm ID, 5 µm, 200 Å; Michrom Bioresources, Auburn, CA) and separated over a 10 cm Magic C18Aq RP analytical column (75 µm ID, 5 µm, 100 Å). For analyses using an alternative column packing, the samedimension precolumn and analytical column were packed with Waters Atlantis dC18 resin mined from a guard column (5 µm, 100 Å). For preliminary analyses on the LTQ, a binary solvent system consisting of Buffer A (0.1% FA) and Buffer B (50% ACN, 0.1% FA) was employed to load samples over sets of consecutive runs in 5%, 4%, 3%, and 2% ACN. Acetonitrile loading concentrations less than 2% were considered in preliminary experiments and found not to net any benefit over the 2% ACN concentration and thus were not included in this final study. After a 10 min loading program, a 30 min “discontinuous” gradient from 10% to 35% ACN was employed, delivering eluate to the mass spectrometer at a tip flow rate of 200 nL/min. Following elution, each run within a set re-equilibrated for 15 min at the ACN concentration required for loading the next sample in the set, which was at a loading ACN concentration reduced by 1%. For the 18 mix analyses, the fourth run in the set (utilizing a 2% ACN loading concentration) re-equilibrated back to 5% ACN for the next set. For the yeast lysate analyses, the fourth run in the set (2% ACN loading concentration) re-equilibrated to 2% ACN and was followed by a fifth run from 2% ACN using a “continuous” gradient that re-equilibrated finally to 5% ACN for the next set. The “continuous” gradient program was created by extending the slope of the “discontinuous” gradient between 15 and 40 min (0.6% ACN/min) back to 10 min, lengthening the total run time by 8 min. Each set of four 18 mix runs and five yeast runs was replicated 10 and 4 times, respectively. Supplementary Figure 1 (Supporting Information) graphically depicts the loading programs used. Supplementary Table 1 (Supporting Information) presents both the discontinuous and continuous gradients in tabular format appropriate for programming an HPLC system. Confirmatory analyses on the LTQ Orbitrap utilized a nanoLC system, in which 18 mix samples were loaded onto the precolumn with an isocratic pump over 10 min in 1% ACN, 0.1% FA. The column was then washed for 10 min using the nano-LC system in 2% or 5% ACN during MS acquisition to mimic the loading conditions utilized in the preliminary analyses. Peptides were eluted using the “discontinuous” elution gradient. The column was re-equilibrated to either 5% or 2% ACN for the subsequent analysis. Each set of 2 runs was replicated 2 times.
technical notes
Analysis of RP-HPLC Loading Conditions
Figure 1. (A) Base peak elution profile for 4 successive 18 mix analyses loaded in (from top) 2, 3, 4, and 5% ACN. Boxed portion from 15 to 25 min is shown in detail in panel B. (B) Detail view of the start of each elution profile, denoted by arrows. (C) Elution profiles for 10 peptides common to each analysis, indicating little run-to-run retention time variation (m/z 532.40, 547.70, 589.90, 618.60, 685.90, 733.90, 804.80, 813.90, 818.00, 971.30).
All MS analyses were performed in positive ion mode. Data were collected in data-dependent mode with 5 data-dependent MS/MS scans per full MS scan (m/z 250-2000) in centroid mode. Data-dependent MS/MS scans were collected at 35% normalized collision energy with dynamic exclusion enabled. The dynamic exclusion parameters were as follows: mass width, m/z 3; repeat count, 1; repeat duration, 30 s; exclusion list size, 50; and exclusion duration, 180 s. Data Processing and Analysis. Eighteen mix data were searched using SEQUEST13 against a custom Hemophilus influenzae database containing the 18 mix proteins as described previously.12 The yeast data were searched against the yeast. nci.20060720 database. Both data sets were searched with carbamidomethylated cysteines as a static modification. Peptide identification numbers were obtained by analysis with PeptideProphet and Trans-Proteomics Pipeline software14,15 employing a minimum PeptideProphet probability of 0.9 (FDR e 1%). Peptide relative hydrophobicity was calculated via the Sequence Specific Retention Calculator version 3.0 (SSRCalc 3.0)for100Åsorbents16 (availableonlineathttp://hs2.proteome.ca/ SSRCalc/SSRCalc.html). Variance was analyzed by one-way ANOVA for correlated samples with Tukey HSD test perfor-
med on significant F-values (available online at http://faculty. vassar.edu/lowry/VassarStats.html).
Results Chromatographic Trends. Peptide elution began earlier as the ACN concentration during sample loading decreased. We compared the base peak elution profiles for ten sets of successive LC-MS/MS analyses in which the standard 18 protein mixture (18 mix) was loaded on a Magic C18Aq RP precolumn in 5%, 4%, 3%, and 2% ACN. The base peak elution profiles show that elution begins earlier with each drop in ACN concentration (Figure 1A, 1B). This trend was observed over all ten sets of runs (40 individual analyses) and was not a result of a general retention time shift. A plot of ten peptides observed in all analyses shows very little run-to-run variation (Figure 1C). The start of the peptide elution differed by as great as 3 min (18 mix) and 6 min (for yeast; data not shown) between loading ACN concentrations, whereas the variation in retention time for the ten monitored peptides was no greater than 40 s for either sample and not correlated to loading ACN concentration. Peptide Identification and Hydrophobicity Trends. Total unique peptide identifications increased and average peptide Journal of Proteome Research • Vol. 8, No. 8, 2009 4163
technical notes
Figure 2. Average unique peptide identifications over ten 18 mix analyses increase with decreasing ACN loading concentration. All averages are statistically significant (*p < 0.01, ∧p < 0.05; n ) 10); error bars represent 95% confidence intervals. Brackets indicate the percent increase in peptide identifications between loading conditions.
hydrophobicity decreased as the loading ACN concentration decreased. The number of unique peptides identified in each 18 mix analysis was determined by searching the data with SEQUEST and applying a PeptideProphet probability cutoff g0.9 (FDR e 1%). The number of unique peptides in each analysis was averaged across all ten runs for each loading condition. An increase in average unique peptide identifications was observed as the sample was loaded in successively decreasing ACN concentrations from 5% to 2% ACN, a trend analogous to that observed in base peak elution profiles. These data are represented in Figure 2 for 18 mix; numerical data are included in Supplementary Table 2 (Supporting Information). Each decrease in loading ACN concentration resulted in a significant increase in the average number of peptides identified from the 18 mix sample. Overall, a 26% increase in average peptide identifications resulted from loading in 2% ACN versus 5% ACN (p < 0.01, n ) 10). On the same instrument, four replicates at each loading condition using the yeast sample had similar results. The number of unique peptide identifications increased with each successive decrease in loading ACN concentration (from 5% ACN to 2% ACN: 1359 ( 88.92, 1382 ( 29.10, 1426 ( 36.39, 1477 ( 23.27; Supplementary Table 2, Supporting Information). Expectedly, the magnitude of the significance of this trend decreased due to the greater sample complexity of the yeast sample, and therefore a greater bulk of peptides eluting midgradient compared to peptides eluting at the start of the gradient. For each consistently identified peptide (identified in at least six of ten 18 mix or three of four yeast analyses), we assigned a relative hydrophobicity (HP) score using the Sequence Specific Retention Calculator version 3.0 (SSRCalc 3.0) established by Krokhin et al.17-20 for 100 Å sorbents. This commonly used model, constructed based on the retention time of 2700 tryptic peptides, calculates the effective HP/retention time of a peptide on a chromatographic column by summation of each amino acid’s retention coefficient and the inclusion of various correction factors. Amino acid retention coefficients in this model range from 11.0 (tryptophan), highly hydrophobic, to -1.9 (lysine), highly hydrophilic; as such, higher HP scores indicate greater overall hydrophobicity. The correction factors take into consideration such characteristics as protein length, 4164
Journal of Proteome Research • Vol. 8, No. 8, 2009
Peterson et al.
Figure 3. Average HP score of all peptides ((), peptides eluting before 2000 s (2), and peptides eluting after 2000 s (9) identified in at least six of ten 18 mix runs at each loading condition. Best fit lines and slope (m) are indicated for each data set.
the proximity of certain amino acids to the N- and C- termini, the amino acid distribution uniformity, pI, and missed cleavages, among others.17 The average HP score calculated at each loading condition for all consistently identified 18 mix peptides was found to increase approximately 0.78 (HP score) per percent ACN (R 2 ) 0.98). The peptides eluting prior to 2000 s were found to be the primary contributors to the difference in average HP between each loading condition (increasing 0.73 per percent ACN; R 2 ) 0.99), whereas there was little change in the average HP score of peptides eluting after 2000 s across loading conditions (0.09; R 2 ) 0.78) (Figure 3). Similar results were obtained in the yeast analyses. The average HP score of all consistently identified peptides at each loading ACN concentration increased 0.41 per percent ACN (R 2 ) 0.81). Again, peptides eluting prior to 2500 s were found to be the primary contributors to this trend, with HP scores increasing 0.39 per percent ACN (R 2 ) 0.94). The average HP score of peptides eluting after 2500 s, on the other hand, changed less (-0.12; R 2 ) 0.19). Furthermore, each decrease in loading ACN concentration corresponded to identification of new, highly hydrophilic peptides clustered at the start of the elution profile. We compared the peptides consistently identified between each successive loading condition (5% vs 4%, 4% vs 3%, 3% vs 2%, and 2% vs 5%). Figure 4 plots each 18 mix peptide as a function of retention time and HP score in the 2% versus 5% ACN comparison (for plots of each individual comparison, see Supplementary Figure 2, Supporting Information). In all comparisons, 18 mix peptides identified only at the lower loading ACN condition were found to be situated at the start of the elution profile with earlier retention times and more hydrophilic HP scores than the peptides shared between both analyses. Peptides identified only at the higher loading ACN condition were distributed throughout elution profile and had average HP scores and retention times more similar to the peptides shared between both analyses, especially at the 5% and 4% loading conditions. Table 1 displays the average HP scores and retention times for peptides shared in each comparison and those identified only at the lower ACN concentration for all comparisons. The same analysis was carried out for the consistently identified peptides in the yeast runs at each loading condition
technical notes
Analysis of RP-HPLC Loading Conditions
Figure 4. Plot of 18 mix peptides identified in at least 6 of 10 analyses for the 5% and 2% ACN loading conditions. Peptides new at 2% ACN (9, pink) are clustered at the beginning of the gradient compared to peptides shared in both analyses (b, black) or only present at 5% ACN (2, blue).
Table 1. Average HP Scores and Retention Times for Consistently Identified Peptides from the Lower ACN Concentration and from Both Analyses for Each Comparison of 18 Mix and Yeast Loading Conditions average HP score
shared peptide IDs
average RT (s)
comparison
new peptide IDs (lower [ACN])
new peptide IDs (lower [ACN])
shared peptide IDs
5% vs 4% 4% vs 3% 3% vs 2% 5% vs 2%
22.45 20.24 18.62 19.56
18 mix 28.93 27.76 27.07 28.92
1516.2 1455.6 1386.2 1401.1
1873.2 1829.5 1795.8 1877.2
5% vs 4% 4% vs 3% 3% vs 2% 5% vs 2%
30.19 28.62 26.99 27.46
Yeast 33.14 33.03 32.83 33.46
1991.2 1911.9 1854.8 1863.4
2202.1 2178.1 2175.2 2233.9
(not including the continuous gradient analysis at 2% ACN). This analysis further corroborated the results reported thus far (Table 1). Plots of each comparison are available in the supplementary data (Supplementary Figure 3, Supporting Information). The average length and amino acid content (classified as basic, acidic, polar, or nonpolar residues) of the peptides consistently identified only at the higher loading ACN condition were compared to those identified by both analyses and the lower ACN condition combined for each analysis set. For both data sets, neither peptide length nor amino acid content were significantly different for the peptides not identified in each comparison by loading at the lower ACN concentration (data not shown). On average, peptides in both data sets (yeast, 18 mix) and all categories (5% vs 4%, etc.) were 15 amino acids long and contained ∼51% nonpolar, ∼23% polar, ∼17% acidic, and ∼9% basic residues. This suggests that the peptides missed when loading at lower ACN concentrations are not of a specific class and that lower ACN loading concentrations do not lead to experimental bias against any particular peptide class. Atlantis Resin Analyses. We employed the Waters Atlantis dC18 resin to determine the generalizability of our observations. The Atlantis resin was chosen because of the apparent dependence of the observed trends on hydrophilic peptide retention, and the manufacturer’s claim of enhanced polar analyte retention over other C18 resins (http://www.waters.com/waters/
Figure 5. Average unique peptide identifications over four yeast digest analyses using Waters Atlantis dC18 resin (dark gray) show no statistically significant change with decreasing ACN loading concentration. Error bars represent 95% confidence intervals. Two analyses of the same yeast mixture performed on Magic resin immediately following the Atlantis experiment indicated an intact trend (light gray).
nav.htm?&cid)513211). Using the same HPLC and MS setup, we repeated the yeast experiment described above for each loading condition using a 1:3 dilution of the same yeast digest. Note that due to sample dilution, this experiment and the previously described experiment using the Magic resin are not comparable on absolute terms; however, a relative comparison of the trend is possible. None of the trends observed in elution profile start time, unique peptide identifications, or hydrophobicity were present using the Atlantis resin. However, the trends remained intact for the Magic C18Aq: two sets of additional analyses immediately following the Atlantis experiment under the exact instrumental conditions again demonstrated the loss of hydrophobic peptides during loading at 5% ACN compared to 2% ACN (Figure 5). The performance differences between the two resins are likely attributable to the unique physical characteristics of each stationary phase and substantiate the manufacture’s claim regarding the retention of polar analytes. Continuous Gradient Analyses. The use of a continuous LC gradient increased MS sampling and hydrophilic peptide identifications using both resins. For both resins, the yeast digest was also analyzed using a method with a continuous elution gradient and 2% ACN loading buffer. As mentioned previously, the continuous elution gradient lengthens the time for peptide elution between 2% and 10% ACN to 13 min (compared to 5 min in the discontinuous gradient method). We hypothesized that the shallower initial ACN gradient would allow for better separation and reduce MS under-sampling of the hydrophilic peptides eluting at the start of the ACN gradient, the crucial portion of the gradient for maximizing peptide identifications when using the Magic resin. The continuous gradient method netted a 5% (Magic resin) and 4% (Atlantis resin) increase in the number of total peptide identifications compared the 2% ACN loading condition with discontinuous gradient (see Supplementary Table 2, Supporting Information). Concordantly, the average HP of all peptides identified decreased and hydrophilic peptides (HP scores