Predicting Retention Time Shifts Associated with Variation of the

Nov 4, 2010 - A theoretical answer to this question was given more than 25 years ago when the linear-solvent-strength (LSS) theory of RP-HPLC for the ...
0 downloads 0 Views 1MB Size
Anal. Chem. 2010, 82, 9678–9685

Predicting Retention Time Shifts Associated with Variation of the Gradient Slope in Peptide RP-HPLC Vic Spicer,†,‡ Marine Grigoryan,† Alexander Gotfrid,† Kenneth G. Standing,†,‡ and Oleg V. Krokhin*,‡,§ Department of Physics and Astronomy, University of Manitoba, Winnipeg, R3T 2N2, Canada, and Manitoba Centre for Proteomics and Systems Biology and Department of Internal Medicine, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg, R3E 3P4, Canada We have developed a sequence-specific model for predicting slopes (S) in the fundamental equation of linear solvent strength theory for the reversed-phase HPLC separation of tryptic peptides detected in a typical bottomup-proteomics experiment. These slopes control the variation in the separation selectivity observed when the physical parameters of chromatographic separation, such as gradient slope, flow rate, and column size are altered. For example, with the use of an arbitrarily chosen set of tryptic peptides with a 4-times difference in the gradient slope between two experiments, the R2-value of correlation between the observed retention times of identical species decreases to ∼0.993-0.996 (compared to a theoretical value of ∼1.00). The observed retention time shifts associated with variations of the gradient slope can be predicted a priori using the approach described here. The proposed model is based on our findings for a set of synthetic species (Vu, H.; Spicer, V.; Gotfrid, A.; Krokhin, O. V. J. Chromatogr., A, 2010, 1217, 489-497), which postulate that slopes S can be predicted taking into account simultaneously peptide length, charge, and hydrophobicity. Here we extend this approach using an extensive set of real tryptic peptides. We developed the procedure to accurately measure S-values in nano-RP HPLC MS experiments and introduced sequence-specific corrections for a more accurate prediction of the slopes S. A correlation of ∼0.95 R2-value between the predicted and experimental S-values was demonstrated. Predicting S-values and calculating the expected retention time shifts when the physical parameters of separation like gradient slope are altered will facilitate a more accurate application of peptide retention prediction protocols, aid in the transfer of scheduled MRM (SRM) procedures between LC systems, and increase the efficiency of interlaboratory data collection and comparison. Chromatography has become an integral part of modern proteomic applications due to the growing complexity of analyzed * Corresponding author. Phone: (204) 789 3283. Fax: (204) 480 1362. E-mail: [email protected]. † Department of Physics and Astronomy, University of Manitoba. ‡ Manitoba Centre for Proteomics and Systems Biology. § Department of Internal Medicine, University of Manitoba.

9678

Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

protein samples. This is particularly true for bottom-up approaches, where separation of thousands (if not millions) of peptides is required.1 Reversed-phase (RP)2 and strong cation-exchange (SCX)3 HPLC are widely utilized for peptide fractionation prior to mass-spectrometric analysis. Both HPLC and mass spectrometry can be considered separation techniques, which utilize different properties of the analytes to provide qualitative and quantitative information on the analyzed samples. The results of both can be viewed as spectra that depend on a signal based on elapsed time or m/z ratio. However, there is no doubt that modern mass spectrometry exceeds LC significantly in both resolving power and accuracy for the analysis of peptides. Indeed, mass spectrometry is a well-established technique where the expected physical parameter (m/z) can be calculated very precisely and compared to experimental data. On the other hand, chromatographic retention times represent the affinity of peptides to the RP or SCX stationary phase, and the precise calculation of these affinities has proven to be a very complicated task. So far attempts have been limited mostly to RP-HPLC, where retention time is assumed to correlate with peptide hydrophobicity. It was postulated in the early 1980s that peptide hydrophobicity could be calculated as a sum of the hydrophobicities of the constituent amino acid residues.4 Several similar models were developed,4-6 some of which featured the introduction of correction factors for peptide length. These simple additive approaches remained state-of-the-art until around 2004, despite compelling evidence that peptide retention in RP-HPLC also possesses sequence-dependent features.7 Since then, several research groups have used data derived from proteomic measurements to develop peptide retention prediction models.8-13 While the typical additive models were able to reach a correlation of experimental vs (1) Lambert, J. P.; Ethier, M.; Smith, J. C.; Figeys, D. Anal. Chem. 2005, 77, 3771–3787. (2) Sandra, K.; Moshir, M.; D’Hondt, F.; Verleysen, K.; Kas, K.; Sandra, P. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2008, 866, 48–63. (3) Washburn, M. P.; Walters, D.; Yates, J. R. Nat. Biotechnol. 2001, 19, 242– 247. (4) Meek, J. L. Proc. Natl. Acad. Sci. U.S.A. 1980, 77, 1632–1636. (5) Guo, D.; Mant, C. T.; Taneja, A. K.; Parker, J. M. R.; Hodges, R. S. J. Chromatogr. 1986, 359, 499–517. (6) Mant, C. T.; Burke, T. W. L.; Black, J. A.; Hodges, R. S. J. Chromatogr. 1988, 458, 193–205. (7) Houghten, R. A.; DeGraw, S. T. J. Chromatogr. 1987, 386, 223–228. (8) Krokhin, O. V. Anal. Chem. 2006, 78, 7785–7795. (9) Petritis, K.; Kangas, L. J.; Yan, B.; Monroe, M. E.; Strittmatter, E. F.; Qian, W. J.; Adkins, J. N.; Moore, R. J.; Xu, Y.; Lipton, M. S.; Camp, D.G., 2nd; Smith, R. D. Anal. Chem. 2006, 78, 5026–5039. 10.1021/ac102228a  2010 American Chemical Society Published on Web 11/04/2010

predicted retention times of ∼0.90, the best sequence-specific algorithms showed ∼0.97-0.98 correlations.8,9 Following earlier studies,14 we then realized that the model should be specific to the “chemical” features of the separation system (the ion-pairing modifier used and the type of stationary phase) and we developed different versions of our sequence specific retention calculator model (SSRCalc) for various eluent/sorbent combinations.15,16 Despite the definite successes in modeling peptide retention in RP HPLC just mentioned, some fundamental challenges still remain unanswered. All retention prediction algorithms created to date were optimized for a specific set of chromatographic conditions: the type of the sorbent, the ion-pairing modifier, column size, flow-rate, gradient slope. Leaving aside the obvious necessity to modify the model when the first two of these “chemical” parameters have changed, one could ask what will happen to RP-HPLC separation selectivity when the last three “physical” parameters are altered? A theoretical answer to this question was given more than 25 years ago when the linearsolvent-strength (LSS) theory of RP-HPLC for the separation of peptides and proteins was developed,17 and Snyder and co-workers showed that separation selectivity is affected by the value of slope S in the basic LSS equation: log k ) log k0 - Sφ

(1)

where k is the retention factor at an organic solvent volume fraction φ (φ ) ACN%/100) and k0 is the retention factor at φ ) 0. Later, the same group demonstrated examples of the variation in peptide separation selectivity (retention order) under the application of different slopes of acetonitrile gradient.18 Their original assumption postulated a relation between slope and molecular weight of the analyte: S ) 0.48(MW)0.44 for a set of polypeptides ranging from 600 to 14 000 Da in molecular weight.17 In later studies, Hearn and co-workers tested this rule for sets of peptides with a much narrower MW variation and attempted to establish the effect of peptide hydrophobicity on S-values.19-21 They concluded that there is no such direct correlation between slope and molecular weight but rather the “magnitude of hydrophobic contact area and the number of interaction sites” determine S.21 Thus to date there has been no quantitative model developed for predicting S for peptidic compounds, so they are typically put in a category of (10) Shinoda, K.; Sugimoto, M.; Yachie, N.; Sugiyama, N.; Masuda, T.; Robert, M.; Soga, T.; Tomita, M. J. Proteome Res. 2006, 5, 3312–3317. (11) Gorshkov, A. V.; Tarasova, I. A.; Evreinov, V. V.; Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A.; Gorshkov, M. V. Anal. Chem. 2006, 78, 7770–7777. (12) Klammer, A. A.; Yi, X.; Maccoss, M. J.; Noble, W. S. Anal. Chem. 2007, 79, 6111–6118. (13) Gilar, M.; Jaworski, A.; Olivova, P.; Gebler, J. C. Rapid Commun. Mass Spectrom. 2007, 21, 2813–2821. (14) Guo, D. C.; Mant, C. T.; Hodges, R. S. J. Chromatogr. 1987, 386, 205–222. (15) Spicer, V.; Yamchuk, A.; Cortens, J.; Sousa, S.; Ens, W.; Standing, K. G.; Wilkins, J. A.; Krokhin, O. V. Anal. Chem. 2007, 79, 8762–8768. (16) Dwivedi, R. C.; Spicer, V.; Harder, M.; Antonovici, M.; Ens, W.; Standing, K. G.; Wilkins, J. A.; Krokhin, O. V. Anal. Chem. 2008, 80, 7036–7042. (17) Stadalius, M. A.; Gold, H. S.; Snyder, L. R. J. Chromatogr. 1984, 296, 31– 59. (18) Glaich, J. L.; Quarry, M. A.; Vasta, J. F.; Snyder, L. R. Anal. Chem. 1986, 58, 280. (19) Aguilar, M. I.; Hodder, A. N.; Hearn, M. T. W. J. Chromatogr. 1985, 327, 115–138. (20) Hearn, M. T. W.; Aguilar, M. I. J. Chromatogr. 1986, 359, 31. (21) Hearn, M. T. W.; Aguilar, M. I. J. Chromatogr. 1987, 392, 33.

“irregular compounds” from the point of view of LSS theory,22 analytes with significant “nonpredictable” variation of S and resulting separation selectivity. Thus, despite having a solid qualitative understanding of how the gradient slope or flow rate alters the peptide separation selectivity,22,23 we are still missing a quantitative description of the parameters affecting slopes S. Recently we have found that one of the missing parts of the problem of understanding the variation of slopes S is the charge of the peptide.24 Driven by the assumption that S is controlled simultaneously by peptide length, hydrophobicity, and charge, we designed, synthesized, and precisely measured the S-values for a set of 37 peptides for 100 Å C18 sorbent with 0.1% trifluoroacetic acid as the ion-pairing modifier. The compositional design of the sequences allowed us to monitor the effect of one parameter at a time while keeping the other two parameters constant. The results unequivocally indicated that S increases with peptide charge and length, although the influence of hydrophobicity is more complex. Following these measurements, we optimized a simple model for predicting S with only three variables.24 The resulting correlation of measured slopes vs predicted ones had an R2-value ∼0.97, supporting our hypothesis. This was achieved for a set of closely related synthetic peptides designed to represent the typical tryptic species observed in bottom-up proteomics experiments. Undoubtedly, any “real-life” set of tryptic peptides will possess a wider variation of the peptides’ physical properties and sequence-derived features. Therefore, we expected to find that the sequence-specific factors affecting the slopes S would still be similar to the ones affecting overall peptide hydrophobicity in our SSRCalc models.8 This paper describes an approach for the measurement of the S-values for a diverse set of tryptic species in a typical nanoRP-HPLC/MS proteomic setup and also further development of the sequence-specific slope calculator (SSSCalc) model. Its application to the fine retention time readjustment for the LC-MS/MS analyses performed with various slopes of the water/acetonitrile gradient is demonstrated. MATERIALS AND METHODS The experimental procedures are detailed in the Supporting Information. Three different tryptic protein digests were used: the “test peptide mixture” consisted of a human proteins digest, the “model peptide mixture” a bovine proteins digest, and peptides generated from whole cell lysate of Clostridium thermocellum. Prior to nano-LC-MS/MS analysis, the mixtures were diluted with buffer A (0.1% formic acid in water) and spiked with the six standard peptides P1-P625 in the test mixture and a set of 11 model peptides (described elsewhere24) in the model mixture. All dilutions were performed to provide an injection of ∼100 fmol of each component into the nanoRP-HPLC-MS system. The 11 model peptides used to determine slopes S in the microflow (150 µL/min) isocratic elution mode were custom synthesized by BioSynthesis Inc. (Lewisville, TX). Table 1 shows the list of peptides, together with their core properties, molecular weight, charge, length, SSRCalc hydrophobicity, and measured S-values. (22) Snyder, L. R.; Dolan, J. W. High-Performance Gradient Elution: The Practical Application of the Linear-Solvent-Strength Model; Wiley: New York, 2006. (23) Gilar, M.; Xie, H.; Jaworski, A. Anal. Chem. 2010, 82, 265–275. (24) Vu, H.; Spicer, V.; Gotfrid, A.; Krokhin, O. V. J. Chromatogr., A 2010, 1217, 489–497. (25) Krokhin, O. V.; Spicer, V. Anal. Chem. 2009, 81, 9522–9530.

Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

9679

Table 1. Synthetic “S-Calibrating” Peptides internal index number a

1 (P2 ) 2 (P3a) 3 (P4a) 4 (P5a) 5 (P6a) 6 7 8 9 10 11

sequence (charge, length)

molecular weight (Da)

calculated hydrophobicity index (HI)b

slopec

LGGGGGGDFR (+2, 10) LLGGGGDFR (+2, 9) LLLGGDFR (+2, 8) LLLLDFR (+2, 7) LLLLLDFR (+2, 8) LASAADFR (+2, 8) LASAAHFR (+2, 8) LLSLADFG (+1, 8) LAGGGSASSSADAAAFR (+2, 17) LLGGSLSSLHAAFR (+3, 14) LAGGGSASSSAHAAAFR (+3, 17)

891.42 890.46 889.50 888.54 1001.63 849.46 871.47 834.45 1494.71 1427.79 1516.74

6.03 8.81 13.33 19.46 22.44 6.47 4.02 16.67 8.71 15.11 5.08

28.2 24.76 21.46 21.78 22.76 27.07 35.59 19.2 34.8 33.81 44.66

a Members of P1-P6 standard peptide mixture for the “hydrophobicity calibration” of RP-HPLC systems.24 b HI ) H0.4954 - 2.6687, where H is the peptide hydrophobicity calculated using the 100A-FA version of the SSRCalc (http://hs2.proteome.ca/SSRCalc/SSRCalc33B.html).25 c Slope values were determined for 0.1% formic acid as an ion-pairing modifier; the application of trifluoroacetic acid provides different results.24

Figure 1. Variation in peptide separation selectivity with alteration of the gradient slope. (a, b) Total ion chromatograms of a test peptide mixture (tryptic digest of human proteins) using two different gradients: 0.75 and 0.1875% acetonitrile per minute. The retention times of NECFLQHKDDNPNLPR and VATVSLPR are shown. (c) Schematic representation of the retention behavior of two peptides with different S-values at isocratic and gradient conditions.

Both chromatographic systems used (nanoflow for the ESI MS/ MS experiments and microflow for the measurement of S-values of model peptides) an identical combination of mobile/stationary phase: water/acetonitrile with 0.1% formic acid ion-pairing modifier and 100 Å C18 sorbent Luna C18(2) (Phenomenex, Torrance, CA) as a stationary phase. RESULTS AND DISCUSSION Variations in Separation Selectivity Caused by the Slope of the Acetonitrile Gradient. Figure 1a,b shows two total-ion count chromatograms of the same test peptide mixture at two 9680

Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

different gradient slopes of 0.75 and 0.1875% acetonitrile per minute. A total of 252 tryptic peptides originating from human proteins of the test protein mixture were confidently identified in these two runs, representing a typical nano-RP-LC-MS run of moderate complexity. An example of reversal in retention order is schematically highlighted in Figure 1. While at a steeper gradient in Figure 1a, NECFLQHKDDNPNLPR (human albumin) elutes prior to VATVSLPR (porcine trypsin), the shallower gradient causes a switch in retention order (Figure 1b). This apparently paradoxical situation when the peptide affinity to the

RP phase changes depending on the gradient18,23 can be explained from the point of view of LSS theory. The larger peptide NECFLQHKDDNPNLPR has a larger value of slope in the basic LSS theory equation, compared to the shorter VATVSLPR, as schematically shown in Figure 1c. In the case of isocratic elution with the acetonitrile concentration below the intersection point φI, the peptide with the lower S-value will elute first. Reversed retention will be observed when the acetonitrile concentration is higher than φI. The application of a shallower gradient under the gradient conditions leads to the situation when most of the separation of the two species occurs with φ < φI, favoring relatively low retention of the peptide with a smaller S (VATVSLPR). The situation is reversed when a steeper gradient is applied: most of the separation happens at φ > φI causing lower retention of the peptide with a higher S (NECFLQHKDDNPNLPR). It should be noted that the relative change in retention of these two species can be predicted correctly based on the original assumption of Snyder and co-workers: indeed the heavier NECFLQHKDDNPNLPR should exhibit a higher S. In general, however, this rule does not hold up: as we show later a prediction model based solely on the MW of separated species cannot provide accurate results. The example shown in Figure 1 represents the case of reversal in separation selectivity. For this to happen, two peptides should possess close hydrophobicities but substantially different S-values. However, in many cases the elution order will remain the same and only the relative retention will be altered. Thus, for the 252 peptides (251 pairs) in the test mixture (Supporting Information), the 91 and 110 near-neighbor peptides switched their retention order with changing the gradient slope from 0.1875% to 0.375% and 0.75%, respectively. The variation of separation selectivity result in deviations from the expected perfect correlations (above 0.999) between retention times recorded at different gradient slopes. When developing a prediction model for one gradient slope, the prediction is still useful for other data sets acquired under different gradients, but the prediction error progressively increases with the gradient slope difference used in the experiments. Thus in our case, the 2-times and 4-times increase in the gradient slope results in 0.998 and 0.993 R2-values of tR vs tR correlations, as shown in Figure 2 for the 252 observed peptides (Supporting Information). Inability to control or adjust for such variations will affect the efficiency of proteomic procedures that employ retention time as one of the parameters in data acquisition or analysis: development and transfer between RP-HPLC systems scheduled MRM(SRM) protocols; filtering false-positive MS/MS identifications using peptide retention prediction.26 The major motivation of this study was to find the rules that describe variation of S-values for a diverse set of tryptic species normally observed in bottom-upproteomic experiments. The data shown in Figures 1 and 2 for the “test mixture” of peptides were obtained in December 2008. At a time 10 months later, an attempt was made to measure experimental S-values for the independent set of “model peptides” and use these values to develop a predictive model for slope values. An ultimate test for this model would be an attempt to improve correlations for the independent data set shown in Figure 2. (26) Strittmatter, E. F.; Kangas, L. J.; Petritis, K.; Mottaz, H. M.; Anderson, G. A.; Shen, Y.; Jacobs, J. M.; Camp, D. G., 2nd; Smith, R. D. J. Proteome Res. 2004, 3, 760–769.

Figure 2. The effect of the gradient slope on the separation selectivity for a large set of peptides: tR vs tR correlations where the gradient slopes differ by 2-times and 4-times.

Effect of S-Values on Peptide Retention (Theoretical Considerations): Measuring S-Values for Peptides. The widely accepted theoretical description of the retention behavior of peptidic compounds is based on LSS theory as detailed by Stadalius et al.17 The retention time of a peptide under gradient elution conditions is given as tg ) tG /(S∆φ) log(2.3k0t0(S∆φ/tG) + 1) + t0 + tD

(2)

where t0 is the column dead-time, tD is the dwell-time of the gradient system, tG is the gradient time for the gradient of ∆φ. While eq 2 is important in the theoretical description of peptide behavior in RP-HPLC systems and determination of S and k0 values from gradient data,17,19-22 the practical application of it for calculating retention times of peptides is limited. It requires precise measurements of the parameters of the RP HPLC system (t0, tD), as well as knowing the coefficients S and k0 for a particular peptide. This is very rarely the case. The calculation of peptide separation selectivity based on eq 2 has not been applied in proteomics up until now for several reasons: there are no accurate models to predict the S and k0 values for peptides; the measurements of t0 and tD for nanoflow systems is very complicated and suffer from the low reproducibility of the nanoflow gradients. These, however do not present an obstacle to obtaining excellent tR vs tR correlations when identical peptide mixtures are separated under similar conditions. Conversely, this equation is often employed for a reverse task: to estimate the coefficients S or k0 using experimental retention times measured at different chromatographic conditions in a gradient separation mode. This approach was used to determine S and k0 values for a number of protein and peptides by Snyder’s and Hearn’s research groups,17,19-21 and more recently by Shinoda et al. in proteomic experiments.27 It should be noted, however, that analytical solution of this equation for several different LC conditions (gradients, flow-rates) is usually obtained by applying numerical multiparameter fitting algorithms and may result in significant errors. The uncertainty in these measurements (27) Shinoda, K.; Tomita, M.; Ishihama, Y. Bioinformatics 2008, 24, 1590–1595.

Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

9681

Figure 3. Calibrating the nano-RP HPLC system in S-scale. b, experimental ∆ vs S dependence for 11 “S-calibrating” peptides (approximated with a logarithmic function); O, best fit reciprocal function ∆ ) 60.21/S - 2.43 (peptides are labeled according to Table 1).

from this dependence for all peptides observed in both nano-RPHPLC MS runs by extrapolation of the experimental ∆ on this plot. The procedure described is based on the assumption that S-values are identical for both micro- and nanoflow systems employing the same mobile and stationary phases. The value of ∆ represents how relative retention (expressed in acetonitrile percentage) shifts upon transfer from a shallow (0.1875%) to a steep (0.75%) gradient. As shown earlier, a steeper gradient causes a negative relative shift in retention for peptides with larger S and positive shifts (∆ values) for peptides with lower S. Because the data in Figure 3 were normalized to P3 peptide (S ) 24.76), the negative ∆ values are characteristic for peptides with S > 24.76 (see eq 3) and positive ∆ for S < 24.76, as shown in Figure 3. S and ∆ are related by a reciprocal function derived from eq 2 (as detailed in Appendix 3 in the Supporting Information): ∆ ) 100 log(G0 /G1)/S + A

(4)

28

was investigated in detail by Ford and Ko. Therefore, in our recent work24 on the determination of S-values of the set of synthetic peptides we used an isocratic elution procedure, in spite of it being extremely labor intense. Isocratic measurements consist of determining retention times of peptides at several constant acetonitrile concentrations and plotting experimental dependence according eq 1 for each peptide. Our resulting plots of log k vs φ exhibited a very high degree of correlation (0.995-0.999), leaving less ambiguity in determining the slopes S.24 The same measurements for a RP-HPLC system with formic acid as the ion-pairing modifier were performed in the present work for the set of synthetic peptides shown in Table 1. Measuring S-Values for Tryptic Digests in Nano-RP HPLC Systems. Our idea was to combine isocratic and gradient methods of S-values measurement and to provide a highly accurate but rapid way of determining S for extensive sets of peptides. First, precise isocratic measurement of S is performed in microflow conditions with UV detection for the set of synthetic “S-calibrating” peptides. As with the previously described peptide mixture (P1-P6) designed to cover a wide range of hydrophobicities,25 these peptides were chosen to cover a wide range of S-values (Table 1). Second, a tryptic digest of the bovine protein mixture (model mixture) is spiked with “S-calibrating” peptides and run under two different gradient slopes in nanoflow RP-HPLC MS: (0.75 and 0.1875% acetonitrile per minute in our case). Third, retention times are assigned for all identified species. Fourth, retention time shifts in acetonitrile percentage (ACN %) units relative to P3 are determined as ∆ ) {(tR0.75 - tRP30.75)0.75} - {(tR0.1875 - tRP30.1875)0.1875} (3) where tR0.75 and tRP3 0.75 are the retention times of any peptide and reference P3 at 0.75% per minute, and tR 0.1875 and tRP3 0.1875 are the retention times of a given peptide and the reference peptide P3 at 0.1875% acetonitrile per minute gradient, respectively. Fifth, an experimental ∆ vs S curve is plotted for S-calibrating peptides (Figure 3). Sixth, S values are extracted (28) Ford, J. C.; Ko, J. J. Chromatogr., A 1996, 727, 1–11.

9682

Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

where G0 and G1 are the gradient slopes, and A is a constant related to the system parameters t0, tD, and the observed retention times of the reference peptide (in this case P3) under gradient slopes G0 and G1. For our 4-times gradient slope ratio, the numerator in this expression 100 log(4) reduces to a constant 60.21. Fitting the observed ∆ and measured S values for the 11 S-calibrating peptides, we found that the optimum value of A ) -2.43 gave an R2 ∼ 0.98, but the function diverged significantly from the data for S-values >35, giving an RSS (residual sum of squares) of 0.32. Conversely, a natural-log fit to the same data gave a slightly reduced R2 ∼ 0.97 but a significantly smoother fit across all data points (RSS of 0.15), so we opted to use this form for our fit function for the determination of slope values: ∆ ) -2.68 ln(S) + 8.72 or S ) 25.85 exp(-0.3619∆). We employed the described procedure, which uses S-calibrating peptides to determine experimental S values for all 298 species detected in the model peptide mixture (provided in the Supporting Information). It is interesting to note that they span an interval from 18.2 to 54, while S measured for the calibrating peptides values in Table 1 varies from 19.2 to 44.6, showing very good coverage on the S-scale. It should be noted that while we use a 4-times difference in the gradient to obtain S values, other ratios can be utilized as well, provided the same S-calibrating peptides are used. Parameters Affecting Slope Values for Peptidic Compounds: Sequence-Specific Slope Calculator (SSSCalc) Model. We found that peptides exhibiting the lowest S-values are all short relatively hydrophobic species carrying the lowest possible number of charged groups (2) for tryptic peptides: DLLFK (18.2), DLLFR (18.4), FCLFK (21.7), DSALGFLR (21.8), EDLIWK (21.9). This is consistent with our previous finding that S increases with peptide length (N), charge (Z), and decreases with hydrophobicity index (HI) for short peptides.24 The highest S-values are characteristic of long peptides carrying multiple positively charged groups at acidic pH plus hydrophilic species. The list of five analytes with the highest S-values includes two of the former (DGTRKPVTDAENCHLAR (50.4), KPVTEAQSCHLAVAPNHAVVSR (49.5)) and three of the latter ones (GEGENQCACSSR (54.1), ARPATATVGQK (51.7), VTGENDKYR (49.0)).

Figure 4. Predicting S-values for the model peptide mixture using various models: (a) the Stadalius et al.17 approach; (b) our earlier model based on peptide charge, length, and hydrophobicity;23 (c) sequence-specific model; (d) corrected tR vs tR correlations for the test peptide mixture for the gradient slopes differ 2 and 4-times (compare to Figure 2).

It was of interest to test the original assumption made by Stadalius et al. that S values can be described as a function of molecular weight: S ) a(MW)b.17 Figure 4a shows the best-fit correlation S ) 3.9(MW)0.3, and the resulting R2-value of 0.268 clearly shows the inapplicability of this approach. Initially the direct relationship between S and MW was found for a limited group of molecules within a very wide (600-14 000 Da) mass range. It is conceivable that for a random set of peptides, the molecule length, number of positively charged groups, and the number of hydrophobic contact sites will increase with molecular weight. However, when extended sets of molecules with rather limited molecular weights are considered (like typical tryptic peptides), an increase in molecular weight might not coincide with an increase in N or Z. In other words, the addition of an extra residue to a small peptide chain causes a much more profound and an often unexpected effect on the properties of the molecule, including shifting the S-value. Subsequently, we applied our previously described model, where S is postulated to be a function of Z, N, and HI with a range of power, reciprocal, and cross-term coefficients (NZHI model). These coefficients were optimized against the 298 observed peptide slope values using the random-walk through parameter space described elsewhere.24 It gives a best-fit 0.874 correlation for the equation (Figure 4b):

SNZHI ) -66.8Z-3.7906 + 19.5332N+0.354 - 36.0981HI+0.2269 + 8.9598/Z + 0.3041/N - 0.0838/HI - 0.9632ZN + 0.2277ZHI + 0.0111NHI + 1.1761ZNHI-0.1196 + 41.832

(5)

Compared to the 0.97 R2-value for the set of synthetic peptides,24 this represents a significant decrease in the model accuracy. This was caused by inclusion of hydrophilic peptides into the current model set, and the overall random character of the molecular composition of detected species. The 37 model peptides studied before all had HI > 10, related structures and consisted of similar amino acids: Leu, Ala, Val, His, Ser, Asp, Gly, Phe, Arg. Real proteomic samples contain a much more diverse set of peptides and represent all naturally occurring residues. We also believed that, similar to models that predict peptide hydrophobicity, the prediction of slopes S should be composition and sequence specific. As in the optimization of the SSRCalc algorithm for hydrophobicity calculation, composition and sequence specific features were established using a semiempirical approach. First we manually analyzed a peptide list with the largest positive/negative errors in prediction of S using the NZHI model. These initial observations suggested possible corrections, which were introduced and accepted if the resulting correlation showed improvements. Thus, Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

9683

in the list of 20 peptides with the highest positive deviations from the predicted S according to eq 5, only one contained a single Gly residue. On the other hand, the 20 peptides with the largest negative deviations contained altogether 27 Gly. This behavior is consistent with the unique properties of glycine, being the amino acid with the smallest side chain. This provides additional flexibility to the Gly-containing peptides, decreases the respective contact area of the molecule in random-coil conformation, and consequently decreases the S-value. This observation clearly shows the necessity to add correction factors related to peptide composition as the first approximation to correct S calculations. Consequently, composition-dependent features were introduced, similar to additive retention prediction models through the assignment of additional coefficients (Si) for each constituent amino acid: S ) SNZHI + ΣSi. The optimized Si values are shown in Table 2S in the Supporting Information. As expected, the highest negative contribution among all amino acids was found for Gly (-1.28). Following the optimization of the composition’s effects, a few sequence specific features became visible. First, peptides with a uniform distribution of hydrophobic residues mostly exhibited positive deviations in S compared to their calculated values. Conversely, when the most hydrophobic residues within a peptide chain are clustered together, it leads to lower slope values. A typical example of the former is the LLGSLSLDAFR peptide. It contains five extremely hydrophobic Leu and Phe residues, which positioned uniformly distributed between the N-terminal and the second to last position. NYELLCGDNTRK shows an opposite example, with a hydrophobic stretch of residues YELL located close to the N-terminus. This causes negative deviation from the predicted S-value. Another interesting example when peptides show negative deviations from calculated S values was observed for the species featuring neighboring acidic (D, E) and basic (K, R, H) residues. For example, the VHKECCHGDLLECADDR fragment from bovine albumin has two such combinations: [KE] and [DR]. It is 17 residues long, has moderate hydrophobicity, and should carry five positively charged groups at acidic pH, but yet it has S ∼ 42 compared to the similarly sized and charged peptides DGTRKPVTDAENCHLAR and KPVTEAQSCHLAVAPNHAVVSR, (described above) which have S ∼49-50. We explain this effect by the possible interactions between two neighboring residues, which reduces the effective positive charge of Lys and Arg, leading to a decrease in S-value. The presence of positively charged groups involved in ion-pairing interactions on both termini is a characteristic feature of tryptic peptides. It increases the effective contact area and provides conditions for interaction of the whole peptide chain with the stationary phase. Removing (or decreasing) the effective charge on one of the termini could provide a significant change of the orientation of the peptide chain when interacting with the stationary phase and consequent deviations in the S-value, as seen in VHKECCHGDLLECADDR. These sequence-specific corrections were introduced in the model to reflect the effects described above. The algorithm is driven by a number of conditionally performed string and numeric operations and cannot be easily reduced to an explicit mathematical equation. The operational summary of empirically deduced 9684

Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

corrections is provided in the experimental section in the Supporting Information. The resulting correlation for the sequence specific slope calculator model improved to a value of ∼0.95, when composition and sequence-specific effects were taken into account (Figure 4c). It should be noted that the correlation shown was obtained by reiterative optimization of both the NZHI and sequence-specific portions of the model. To test its applicability to the independent data set, we calculated S-values and the respective retention time corrections for 252 tryptic species from human proteins shown in Figure 2. Following the retention time correction, the correlation improved from 0.9983 to 0.9997 for 2-times difference in the gradient slope and from 0.993 to 0.9984 for 4-times difference in the gradient slope (Figure 4d). Similar results were obtained when corrections were applied to the retention times of tryptic peptides from the whole cell digest of Clostridium thermocellum. First, we establish that tR vs tR correlations between runs under identical conditions exhibit R2values ∼0.9999. The same plots show a decrease in correlation to 0.999 and 0.996 for the 2 and 4-times difference in the gradient slope, respectively. Following the slope S calculation and retention time correction across the 506 common species detected in these runs (provided in the Supporting Information), the correlations improved to 0.9997 and 0.9992, respectively. Understanding the Retention Mechanism and Future Development. Building a comprehensive model to describe behavior of peptides in RP HPLC systems is equivalent to precise prediction of the coefficients k0 and S in eq 1. Once determined, they can be used for peptide retention prediction in isocratic (eq 1) and gradient (eq 2) separation modes. In practice, however, most retention prediction algorithms have been developed for the gradient separation mode when experimental peptide retention correlates essentially with the concentration of organic solvent φ. We envision the future use of the acetonitrile percentage scale to express peptide hydrophobicity and proposed to use a series of peptides with precisely measured HI to calibrate retention for various RP HPLC conditions.25 When an alteration in the gradient slope has to be taken into account, the respective shifts in relative peptide retention can be expresses in the acetonitrile percentage as well (Figure 3). These shifts can be applied to the calculated HI values providing optimal performance of retention prediction models. Calculating S values and their corresponding retention time corrections does not require that the chromatogram be calibrated in the hydrophobicity scale. Therefore, the retention prediction algorithm developed for a model set of peptides at one gradient can be applied to another gradient using correction factors for minor retention changes caused by variation in the slope. Development of the model for prediction of slopes S highlights the importance of taking into account peptide charge when the retention mechanism is considered. Our results clearly demonstrate that the number of charged groups and their distribution throughout the peptide chain is critically important for correct estimations of S. There is a great similarity with retention prediction in this regard. The major breakthrough in the development of sequence-specific retention models was achieved only following the understanding of the influence of ion-pairing forma-

tion on apparent hydrophobicity on N-terminal residues.29 Another similar trend is noticeable that the critical point in understanding parameters affecting S was clarified when extensive proteomics derived data were considered. Researchers in 1980s and 1990s were dealing with rather limited sets of measured S-values, which precluded the development of an accurate predictive model.17,19-21 It should be noted that we deal with the problem of peptide selectivity variation based on the assumption that LSS theory can adequately describe the behavior of polymeric compounds. This might not be the case for the sufficiently large peptides, which could fold up and exhibit intramolecule interactions.30,31 While the majority of tryptic species used in our experiment seems to behave like small molecules within the framework of LSS theory, some larger analytes might show unpredictable tendencies in the S variation. Similar to the models for predicting peptide hydrophobicity, predicting S-values will benefit from collecting a larger data set of S-values. While at present the algorithm has been optimized for ∼300 peptides in the model mixture, plotting correlations similar to shown in Figure 3 for different samples should provide additional information and help to improve the model accuracy. Another challenging question is the behavior of nontryptic species or peptides with blocked N-termini. These solutes are expected to exhibit different rules compared to tryptic peptides due to the removal of fixed contact sites at terminal positions. Altering the pH of the eluent and the nature of ion-pairing modifiers alters the retention behavior of peptidic compounds; these effects should also be studied in detail to complete our understanding of the RP HPLC separation mechanism.

peptide separation in RP HPLC. Knowing these values allows prediction of the variations in peptide separation selectivity when “physical” parameters of chromatographic system such as gradient slope, flow-rate, or column size are altered. Solution of this fundamental chromatographic problem brings a better understanding of the peptide RP separation mechanisms, and will be a determining factor in building a comprehensive model for peptide retention prediction. Our predictive approach is based on the novel assumption that peptide charge together with its length and hydrophobicity are critical parameters which determine S. We have developed a procedure for the experimental measurement of these values for large sets of analytes observed in proteomic experiments and applied our novel S-modeling approach to this data set. Predicted S-values were used successfully to calculate retention time shifts in an independent set of peptides separated at various acetonitrile gradients, validating the approach. The use of these corrections will be instrumental for the success of all proteomic procedures that employ peptide retention time information: scheduled MRM(SRM) analysis, peptide retention prediction for filtering MS/MS identification, as well as interlaboratory data collection and comparison.

CONCLUSIONS We have developed the first sequence-specific model for predicting slopes S in the basic linear solvent strength theory of

SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.

(29) Krokhin, O. V.; Craig, R.; Spicer, V.; Ens, W.; Standing, K. G.; Beavis, R. C.; Wilkins, J. A. Mol. Cell. Proteomics 2004, 3, 908–919. (30) Boehm, R. E.; Martire, D. E.; Armstrong, D. W.; Khanh, H.; Bui, K. H. Macromolecules 1984, 17, 400–407. (31) Bui, K. H.; Armstrong, D. W.; Boehm, R. E. J. Chromatogr. 1984, 288, 15–24.

Received for review June 27, 2010. Accepted October 19, 2010.

ACKNOWLEDGMENT This work was supported in part by grants from the Technology Transfer Office at the University of Manitoba, Genome Canada (O.V.K.), and the Natural Sciences and Engineering Research Council of Canada (O.V.K. and K.G.S.).

AC102228A

Analytical Chemistry, Vol. 82, No. 23, December 1, 2010

9685