Article pubs.acs.org/ac
Application of Statistical Thermodynamics To Predict the Adsorption Properties of Polypeptides in Reversed-Phase HPLC Irina A. Tarasova,*,†,§ Anton A. Goloborodko,†,§ Tatyana Y. Perlova,†,§ Marina L. Pridatchenko,† Alexander V. Gorshkov,‡ Victor V. Evreinov,‡ Alexander R. Ivanov,∥ and Mikhail V. Gorshkov*,†,# †
Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia N. N. Semenov’s Institute of Chemical Physics, Russian Academy of Sciences, 119991 Moscow, Russia ∥ Barnett Institute of Chemical and Biological Analysis, Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts 02115, United States # Moscow Institute of Physics and Technology (State University), 141707 Dolgoprudny, Moscow Region, Russia ‡
S Supporting Information *
ABSTRACT: The theory of critical chromatography for biomacromolecules (BioLCCC) describes polypeptide retention in reversed-phase HPLC using the basic principles of statistical thermodynamics. However, whether this theory correctly depicts a variety of empirical observations and laws introduced for peptide chromatography over the last decades remains to be determined. In this study, by comparing theoretical results with experimental data, we demonstrate that the BioLCCC: (1) fits the empirical dependence of the polypeptide retention on the amino acid sequence length with R2 > 0.99 and allows in silico determination of the linear regression coefficients of the log-length correction in the additive model for arbitrary sequences and lengths and (2) predicts the distribution coefficients of polypeptides with an accuracy from 0.98 to 0.99 R2. The latter enables direct calculation of the retention factors for given solvent compositions and modeling of the migration dynamics of polypeptides separated under isocratic or gradient conditions. The obtained results demonstrate that the suggested theory correctly relates the main aspects of polypeptide separation in reversed-phase HPLC. fraction of an organic solvent φ. On the basis of this assumption, the retention time can be determined.24 At the initial stage of LSS development, the researchers debated the validity of the assumed linear relationship between the logarithm of the retention factor and the volume fraction of the organic solvent.25,26 Special attention has been paid to the applicability of the LSS to the separation of large molecules.27 Even for peptides, logk depended nonlinearly on φ; thus, LSS was corrected to consider the size exclusion effects.28 Currently, LSS is the widely accepted theory for describing the separation processes.29−31 However, to predict the adsorption properties of a peptide with this theory, the retention factor of a solute in pure water k0, the proportionality coefficient S (S-slope) between the logk and φ, and the retention time in the size exclusion mode tsec must be known. Although there are a number of models predicting the S-slope,24,30,31 the parameters k0 and tsec can only be determined experimentally. It is impossible to measure these parameters for all of the existing
R
eversed-phase high performance liquid chromatography (RP-HPLC) is one of the most widely used methods for peptide separation in biomedical studies. This technique simplifies complex biomacromolecule mixtures prior to the mass spectrometry (MS) detection, thus increasing the dynamic range of the analysis. The first attempts to theoretically describe the effect of the primary structure on the retention times (RT) of peptides in RP-HPLC were conducted in the 1980s.1−10 In recent years, a number of multiparameter RT prediction models were developed to improve protein identification in shotgun proteomics.11−22 However, these models do not provide a general description and modeling of the polypeptide separations under isocratic and gradient conditions that depends on the eluting solvent composition. The only known theory, applicable to peptides and aimed to describe the dependence of the separation and adsorption of solute molecules on the eluting solvent composition, is the Linear Solvent Strength theory (LSS),23,24 developed by Snyder and co-workers. Originally introduced to describe the retention of small molecules, LSS assumes a linear relationship between the logarithm of the solute retention factor logk and the volume © XXXX American Chemical Society
Received: January 21, 2015 Accepted: May 29, 2015
A
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX
Article
Analytical Chemistry peptides; thus, the LSS cannot de novo predict the retention for an arbitrary peptide. The simplest model predicting the retention time for an arbitrary polypeptide and accounting for the effect of the sequence length assumes that the retention time RT depends on the additive contribution of each amino acid Σi RCi and the natural logarithm of the peptide length lnN, as follows:8 RT =
∑ RCi + tcorr − (m ∑ RCi × ln N + b) i
i
Kd =
Zp Z0
(2)
where Zp and Z0 are partition functions, i.e., the sums of Boltzmann coefficients exp(−Eeff i /kT) over all of the configurations of a molecule in a unit volume inside of the pore or in the interstitial space, respectively. The effective energy of the molecule’s interaction with the adsorbing surface Eieff is determined using the correlation theory of the eluting solvent strength, which was originally derived for normal phase separations by Snyder.23,38,39 To a first approximation, it assumes that Eeff i depends on the adsorption energy of an amino acid Xads and the average adsorption energy of components A i and B of the binary solvent EAB as follows:39
(1)
where RCi is the retention coefficient of the i-th amino acid in the peptide sequence, tcorr is the time correction for dead volumes of the column and HPLC setup, and m and b are the slope and the intercept, respectively, of the linear fit between the residual of the linear model (ΣRCi − RT) and the scaled logarithm of peptide length (ΣRCi × lnN). Equation 1 has been proposed to predict peptide and protein retention times with reasonably high accuracy.8−10 However, this correlation model cannot provide any knowledge on how the peptide retention depends on the experimental conditions, e.g., solvent composition, gradient profile, etc. In this study, we consider the model of critical chromatography for biomacromolecules (BioLCCC)32−34 as a tool, capable of not only comprehensively describing and relating the chromatographic phenomena to the experimental conditions, similarly to LSS, but also calculating the retention for an arbitrary peptide sequence. This theory relies on the principles of the polymer statistics35−37 and the eluting solvent strength theory23,38,39 to calculate the distribution coefficients under isocratic conditions and uses the general equation of gradient chromatography23 to describe the gradient elution. In this work, we apply BioLCCC to predict the chromatographic distribution coefficients of peptides. Knowledge of the distribution coefficient for a given composition of the eluting solvent enables not only the prediction of the chromatographic retention but also the modeling of the migration dynamics of the peptide molecules through the column under isocratic and gradient conditions, including the inversion of the elution order. Because the distribution coefficient is uniquely related to the retention factor, the latter can be directly calculated by the theory as well.
Eieff = α(Xiads − EAB)
(3)
where α is a correction factor for the adsorbent activity. The partition functions Zp and Z0 in eq 2 are calculated by the methods of polymer statistics; we use the flexible chain54 and rod55,56 models. These models allow derivation of the exact analytical solutions for Kd and require determination of minimal sets of parameters, characterizing an interaction with an adsorbing surface; therefore, we considered them as the first approximation. In the flexible chain54 model, a polymer is approximated by a random coil, and the intermolecular interactions are neglected. When applied to heteropolymers, i.e., peptides and proteins, this model becomes sequence specific.33 The expression for Kd contains a product of (N − 1) transition matrices. Each matrix W (Eieff) describes the probability of the transition of an amino acid residue to a particular location inside of the pore. Because the matrix multiplication is noncommutative, Kd depends not only on the composition but also on the amino acid sequence. To a first approximation, we assume further that the flexible chain model can be applied to describe peptides and proteins for the following conditions: chemical reduction and deactivation of the S−S bonds at sample preparation stage followed by chromatographic separation under strong acetic conditions (pH 2.0−3.0) and elevated temperatures (40−60 °C). We believe these conditions correspond to denatured states of the proteins, thus, minimizing the spatial structure effects. In the rod model,55,56 the individual amino acids are represented by point particles spaced regularly along the rigid rod. These particles are adsorbed onto the pore wall if they approach it within the adsorption radius. The distribution coefficient Kd for the rod model is determined by the sum over all of the angular orientations and the locations of the center of mass, Kd ∝ Σlocations Σangles exp(−(Eeff/kT)). Each of the residues interacting with the surface contributes the energy Eeff i to the total energy Eeff, where Eeff i is determined by eq 3. In the rod model, both the distribution coefficient and the retention time of a peptide are strongly affected by the terminal residues. This effect is especially pronounced if the adsorption radius is comparable with the size of amino acid. In this case, either all residues of the sequence or one of the terminal residues will be adsorbed. A more detailed description of the chain and rod models is given in the Supporting Information. At the macroscopic level, the distribution coefficient Kd, calculated using the chain or rod model, is further substituted to the general equation of isocratic elution, allowing the prediction of the polypeptide retention volumes under the isocratic conditions:35−37,40
■
THEORY BioLCCC. The theory of critical chromatography for biomacromolecules (BioLCCC)32−34 is based on the principles of the statistical physics of polymers35−37 and rooted in the concept of Liquid Chromatography under Critical Conditions (LCCC),40−49 which was previously developed for the prediction of chromatographic retention of synthetic polymers under isocratic conditions. An approach similar to BioLCCC has been previously applied to describe the gradient separation of macromolecules,50 including homopolymers51 and random copolymers.52,53 The BioLCCC theory describes the microscopic and macroscopic levels of peptide separation. At the microscopic level, it explicitly models polypeptide molecules in the adsorbent pores and derives a relationship between the distribution of biomolecules in the pores and in the interstitial space, using the statistical definition of the distribution coefficient Kd:35−37,40 B
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX
Article
Analytical Chemistry
Figure 1. Theoretical and experimental dependencies ΣRCi − RTtheor/exp vs ΣRCi × ln N and the correlations between the experimental and predicted (BioLCCC chain model) retention times for data set #1: (A) 20 synthetic peptides used by Mant et al.;8 (B) E. coli 30S ribosomal proteins S1−S21 used by Champney.10 A sum of the retention coefficients ΣRCi was calculated using retention coefficients published by Guo et al.6 The experimental data for polypeptides and proteins were extracted from the literature.8,10 Sequences and experimental conditions are specified in the Supporting Information, SI-Appendix D. The adsorbent activity α in the BioLCCC theory was adjusted to achieve the best fit between RTexp and RTtheor.
Kd =
(Vr − V0) Vp
conditions and either a stand-alone HPLC system equipped with a UV detector or a nanoflow HPLC system coupled to a high performance LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, CA) as described elsewhere.57 The parameters of the HPLC systems, the separation conditions, and the information about the samples used for the analysis are summarized in the Supporting Information (SI-Tables 2 and 3). Software and Data Processing. The data were processed using the Python programming language. The BioLCCC theory is implemented as the pyteomics.biolccc module in the Pyteomics open access software package (https://pypi.python.org/pypi/ pyteomics.biolccc/).58 This module calculates the polypeptide physicochemical constants, sequence-specific retention times for defined chromatographic conditions, and coefficients of distribution between the interstitial and pore volumes. Choice between the Rod and the Chain Models in the BioLCCC Theory. The chain model was originally calibrated for 300 Å-C18 reversed-phase using TFA as an ion-pairing agent (pH 2.0).32−34 The rod model was calibrated using the isocratic data for the calibration kit (Supporting Information, SI-Table 1) separated using System II (Supporting Information, SI-Tables 2 and 3).57 Note that the Magic AQ C18 phase used in rod model calibration is a polar-embedded phase, which is different from the “classical” C18-alkyl based phases. This type of phase enhances separation specificity for hydrophilic substances, while the separation of hydrophobic compounds is similar to the “classical” C18 phase. The sets of BioLCCC adsorption energies currently used for the rod and chain models are shown in the Supporting Information (SI-Table 4). The chain and rod models were used to predict the adsorption properties and retention for experimental System I and System II (Supporting Information, SI-Tables 2 and 3), respectively. The choice of whether to use the rod or chain model in calculations was determined by the experimental conditions used for data acquisition.
(4)
where Vr, V0, and Vp are the retention, interstitial, and pore volumes, respectively. Similarly, it enables prediction of the polypeptide retention volumes under the gradient conditions:
∫0
Vr − (V0 + Vp)
dV = 1.0 Vp × (Kd − 1)
(5)
Eq 5 follows directly from the general equation of gradient chromatography, describing the relationship between the retention volume Vr and the retention factor k.23 The relationship between the k and Kd is derived in the Supporting Information. The described framework, combining the previous developments of the polymer statistics and the chromatography of small molecules, provides a general description of the separation processes and quantitatively predicts adsorption properties. More details on the BioLCCC theory can be found in the Supporting Information (SI-Appendixes A−C).
■
EXPERIMENTAL SECTION Materials. The HPLC-grade chromatographic solvents were purchased from Sigma-Aldrich (St. Louis, MO) and Merck (Darmstadt, Germany). The isocratic experiments were performed using a set of 30 synthetic peptides ordered from Peptide 2.0 (Chantilly, VA, USA). The sequences and the corresponding experimental data for these synthetic peptides are shown in the Supporting Information (SI-Table 1). This set of peptides has been previously used to determine the phenomenological parameters (adsorption energies of the amino acid residues) in the BioLCCC rod model.57 The remaining peptides used in this study were purchased from Anaspec (Fremont, CA, USA). HPLC. The polypeptide retention times were obtained in two different laboratories using isocratic and gradient elution C
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX
Article
Analytical Chemistry
Figure 2. Theoretical dependence of the residual of the linear model on the scaled log-length of a peptide, (ΣRCi − RTtheor) vs ΣRCi × ln N. The BioLCCC chain model was used to calculate the retention times for the in silico data set of 260 polypeptides and proteins. ΣRCi was calculated using the retention coefficients obtained by Guo et al.6 A is the scaling factor to account for the change in the experimental conditions. The scaling factor A was recalculated for each range of lengths.
■
RESULTS AND DISCUSSION Dependence of Retention on the Peptide Length: Comparison of the BioLCCC Theory with Empirical Relationships. First, we tested whether the retention times predicted by the BioLCCC theory fit the dependence of the retention time on the natural logarithm of peptide length (eq 1) empirically derived in the earlier works.8,9 The experimental data used in this evaluation were obtained from the original studies on the sequence length effect. They contain peptides with sequence lengths ranging from 5 to 50 amino acids8 and proteins with lengths ranging from 70 to 560 residues10 (called data set #1 in the present study). Figure 1 (both left and middle panels) shows the experimental and theoretical dependencies ΣRCi − RT vs ΣRCi × ln N obtained for this data set. It demonstrates that the difference between the sum of the retention coefficients ΣRCi and the retention time predicted by BioLCCC correlates well with the scaled logarithm of the peptide length (Figure 1, left panels). The slopes m and the coefficients R2 of this correlation are the same for the experimental and predicted data (compare the middle and left panels). Our analysis also shows that even a correlation of R2 0.99 between the difference ΣRCi − RT and the scaled loglength ΣRCi × ln N does not imply accurate retention time prediction; thus, it can be misleading. The middle panels in Figure 1 show that these values are equally correlated for both
of the studied data sets: (A) the relatively short synthetic peptides with lengths from 5 to 50 amino acids and (B) the 30S ribosomal proteins with significantly longer sequences from 70 to 560 residues. Simultaneously, the retention times predicted using BioLCCC are noticeably less accurate for the protein data set (Figure 1, lower right panel). This inconsistency is explained by the fact that ΣRCi ≫ RT for longer sequences. Thus, the dependence (ΣRCi − RT) vs (ΣRCi × lnN) effectively becomes (ΣRCi) vs (ΣRCi × lnN). Thus, the retention time has negligible contribution to this relationship, which can be further approximated by N vs N × ln N dependence. The latter dependence is linearly correlated with R2 of 0.9995 for N varying from 1 to 106. This result suggests that the original studies8−10 have overestimated the effect of the log-length correction on the accurate prediction of retention times of proteins. Next, we generated in silico the data set #2 to test whether the slope m of the log-length correction in eq 1 depends on the peptide length or the experimental conditions. This data set consisted of 260 arbitrary sequences simulating tryptic peptides, block polypeptides with a varying number of segments and intact proteins (SI-Appendix E in the Supporting Information). The sequence lengths and the molecular weights ranged from 5 to 366 amino acids and from 430 Da to 41 kDa, respectively. The retention coefficients determined previously by Guo et al.6 D
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX
Article
Analytical Chemistry
Figure 3. Comparison of the measured and predicted adsorption properties of peptides separated using System I (Supporting Information, SI-Tables 2 and 3). (A) The predicted (dashed and solid lines) and experimental (circles) dependencies of the retention factor on the volume fraction of −1)/(V0 + Vp) (see derivation in the acetonitrile φ = % ACN/100.0. The theoretical retention factor was predicted as ktheor = Vp × (Ktheor d Supporting Information, SI-Appendix A, eq 3). The experimental retention factor was estimated as kexp = (RT − t0)/t0, where t0 is the elution time of the nonretained compounds. (B) The correlation between the predicted and experimental distribution coefficients, Kd. All of the theoretical calculations were performed using the BioLCCC chain model for the adsorbent activity parameter of 1.6 and a Vp/V0 ratio of 0.82.
Table 1. Coefficients of Correlation between the Experimental and Predicted Distribution Coefficients for 30 Synthetic Peptides Separated under the Isocratic Conditions Using System II (Supporting Information, SI-Tables 2 and 3)a Ktheor vs Kexp d d N
peptide
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Ac-GVGKGGVGVK-NH2 Ac-VVKGGVGKVGV-NH2 Ac-KGVGKVGGVK-NH2 Ac-VVGVKGGVGK-NH2 Ac-KGVGGKVGVV-NH2 Ac-GVGGVK-NH2 Ac-KGVVKGVGVKGGVKG-NH2 Ac-HGVGHVGGVK-NH2 Ac-RGVGRVGGVR-NH2 Ac-DGVGDVGGVH-NH2 Ac-EGVGEVGGVH-NH2 Ac-NVGKGNVGVK-NH2 Ac-SVGKGSVGVK-NH2 Ac-QVGKGQVGVK-NH2 Ac-CVGKGCVGVK-NH2
corr. coeff., R2
Ktheor vs Kexp d d
equation y = slope·x + b
0.997 0.996 0.988 not detected 0.996 0.987 0.994 0.990 0.991 0.999 0.999 0.999 0.999 0.999 0.9995
0.76x + 6.90 0.94x + 6.74 1.01x + 6.64 0.70x 1.08x 0.61x 1.34x 1.75x 1.02x 1.17x 1.09x 0.79x 0.84x 1.31x
+ + + + + + + + + + +
6.59 6.37 7.21 6.40 5.96 6.71 6.59 6.33 6.61 6.55 6.37
N
peptide
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Ac-TVGKGTVGVK-NH2 Ac-AVGKGAVGVK-NH2 Ac-YVGKGYVGVK-NH2 Ac-MVGKGMVGVK-NH2 Ac-IVGKGIVGVK-NH2 Ac-LVGKGLVGVK-NH2 Ac-FVGKGFVGVK-NH2 Ac-WVGKGWVGVK-NH2 Ac-pSVGKGpSVGVK-NH2 Ac-pTVNKGpTVGVK-NH2 Ac-pYVGKGpYVGVK-NH2 H-GVGKGGVGVK-NH2 Ac-GVGKGGVGVK-COOH H-VVKGGVGKVGV-NH2 Ac-VVKGGVGKVGV-COOH
corr. coeff., R2
equation y = slope·x + b
0.998 0.997 0.999 0.9998 0.998 0.998 0.999 0.998 0.998 0.997 0.999 not detected 0.999 0.998 0.999
0.80x 0.58x 1.52x 0.95x 0.91x 0.74x 0.89x 1.19x 0.74x 0.77x 1.46x
+ + + + + + + + + + +
6.55 6.95 5.97 6.53 6.37 6.86 6.22 4.98 6.54 6.37 5.88
0.71x + 6.75 0.74x + 6.52 1.35x + 5.90
a
Predictions were performed using the BioLCCC rod model, the adsorbent activity parameter of 1.1, and Vp/V0 of 2.0. The average slope for the correlation equations equals 0.991 with the variance of 0.09.
for tryptic peptides is close to the one observed in the previous experimental studies.13,59 Further, we found that the slope m depends on the sequence length range; e.g., it becomes 0.16 for proteins with lengths from 45 to 366 residues. Note that retention coefficients in the additive model are specific for the experimental conditions and must be rescaled each time when the gradient conditions are changed. Figure 2 shows that the scaling factor A remains the same (A ≈ 1.0) if the column dimensions, pore sizes, and flow rates change (Figure 2, upper panel and left lower panel). However, a 3-fold change in the gradient slope results in a corresponding 3-fold increase in the scaling factor A (Figure 2, right lower panel). Note here that proper rescaling of retention coefficients is crucial for measuring the slope m of the (ΣRCi − RTtheor) vs (ΣRCi × lnN) dependence. Without rescaling, this dependence becomes “V”-shaped with wide data scattering for sequence
were rescaled to reflect the difference in the separation conditions: ΣRCrescaled = A × ΣRCGuo i i . The scaling factor A was estimated using a relative change in the retention time scale under the new conditions predicted by BioLCCC, as follows: A=
(RT 0 − RT end) 0 end − RTnew (RTnew )
(6)
The results of this analysis are shown in Figure 2. The data set was divided into four groups, according to the peptide sequence length: 5−15 (short peptides), 20−45 (long peptides), 45−366 (proteins), and 5−366 amino acids (the entire range of lengths). This analysis shows that the correlation coefficients R2 and slopes m predicted by the BioLCCC theory are independent of the experimental conditions. Moreover, the slope m of 0.2 predicted for the sequences with lengths typical E
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX
Article
Analytical Chemistry
Figure 4. Change in the elution order of the peptides HSTVFDNLPNPEDRK (blue) and ADEYLIPQQ (yellow) under isocratic (A, B) and gradient (C, D) conditions. Dashed lines: theoretical curves obtained using the BioLCCC chain model. Triangles: experimental results. The circled pairs of triangles in panels C and D correspond to the values of RT and Kd at different gradient slopes (indicated near the circles, in % ACN/min). For the predictions, the adsorbent activity parameter α and the ratio Vp/V0 were 1.0 and 2.0, respectively. The theoretical values for RT and Kd were theor ) dependencies, scaled to the experimental values using the linear regression coefficients obtained for the (RTexp vs RTtheor) and (Kexp d vs Kd respectively.
(Supporting Information, SI-Figure 2B). Over the range of concentrations from 2.5% to 15.0% ACN (0.0005 < k < 2.0), the correlation coefficient R2 ranged from 0.986 to 0.999 for 70% of the peptides. Note that linear dependence with R2 ≈ 0.99 was also observed for k beyond the range of 1.0 < k < 10.0, which is typically considered in the LSS theory.25,26 Next, we used the data from these isocratic experiments to evaluate the accuracy of the Kd prediction by the BioLCCC theory. The results of the evaluation are shown in Figure 3B and Table 1 for Systems I and II, respectively. The linear correlations between experimental and predicted distribution coefficients with R2 from 0.98 to 0.99 were obtained. Further validation was performed using literature data. 5,61 The coefficient R2 of the Kexp vs Ktheor correlation ranged from d d 0.952 to 0.998 for most of the peptides in these data. The results of this analysis are shown in the Supporting Information (SI-Figure 3). These results further support the idea that the statistical thermodynamic principles (eq 2) and Snyder’s solvent strength theory (eq 3) can be effectively used to predict the change in peptide adsorption properties with the change in the organic solvent concentration. Note also that, while the solvent strength theory was originally introduced for the normal phase separations,23,38,39 our results support its applicability for reversed-phase separations as well. Inversion of the Peptide Elution Order under Gradient and Isocratic Conditions. Using the BioLCCC theory, one can calculate the trajectory of the peptide displacement along the separation column. The differences in these trajectories among the peptides are ultimately responsible for the inversion of the peptide elution order.62 For example, consider the elution order inversion for two peptides, ADEYLIPQQ and HSTVFDNLPNPEDRK. Figure 4 shows the experimental and theoretical dependencies of the peptide retention time on the volume fraction of acetonitrile. Figure 4A,B shows results for the isocratic conditions. These experiments demonstrate that the longer peptide HSTVF-
lengths from 5 to 45 residues (Supporting Information, SIFigure 1A, lower right panel), and the accuracy of the additive model drops. Importantly, the BioLCCC theory correctly rescales the retention coefficients using eq 6. To further support that the BioLCCC theory can be used to predict the slope m in eq 1 for short peptides (having 6 to 35 residues in the sequence), we used publicly available proteomic data (called here the data set #3). The data set consisted of the peptide identifications for Clostridium thermocellum (504 tryptic peptides)30 and HeLa cell line, digested using a variety of proteases60 (6869 tryptic peptides; 4159 LysC and 1756 GluC derived peptides). The results of the analysis are shown in SIFigure 1B−F in the Supporting Information. The correlation coefficients for (ΣRCi − RT) vs (ΣRCi × lnN) dependences and the slopes m obtained using BioLCCC theory were close or identical to the experimental results (Supporting Information, SI-Figure 1B,D−F, left and middle panels). Finally, the results shown in the Supporting Information, SI-Figure 1C, for the Clostridium thermocellum demonstrate further the need for retention coefficients’ recalibration in the additive model when the experimental conditions change. Prediction of Peptide Adsorption Properties. Isocratic separations of peptides were performed using Systems I and II (Supporting Information, SI-Tables 2 and 3) to derive the experimental dependencies of the retention factors on the volume fraction of acetonitrile and estimate the ranges of log k and φ, in which the LSS theory assumption on the linear relationship between these values is valid. The dependencies kexp(φ) are shown in Figure 3A and SI-Figure 2A (Supporting Information) for Systems I and II, respectively. Expectedly, the measured dependencies were nonlinear in the considered range of φ. An accurate linear fit with the correlation coefficient of R2 ≈ 0.99 was obtained for the logkexp vs φ dependence over a narrow range of 13% to 23% ACN concentrations (12.0 < k < 0.1), as shown in the inset in Figure 3A. Similar results were obtained for the peptides separated using System II F
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX
Analytical Chemistry
■
DNLPNPEDRK elutes later and has a larger Kd compared with the peptide ADEYLIPQQ only if the acetonitrile volume fraction is below 0.155. Above this concentration, the peptide HSTVFDNLPNPEDRK starts eluting first. This difference in the retention times and the distribution coefficients further increases with the acetonitrile concentration. Similar dependencies of retention times and distribution coefficients were obtained theoretically using the BioLCCC theory, as shown in Figure 4 (the dashed lines). The separation results obtained for the same peptide pair using the different gradient slopes are shown in Figure 4C,D. Similar to the case of isocratic separations, the peptide pair elution order changed. However, in this case, the trigger for the inversion was the change in the gradient slope. Figure 4 shows a good agreement between the predicted and experimental Kd and its dependence on φ for the isocratic and gradient modes of separation over the same ranges of acetonitrile concentrations. BioLCCC theory further enables modeling of the migration of the peptides through the column. This modeling provides a general understanding of the separation mechanisms and allows optimization of the separation conditions. Specifically, it is possible to simulate the trajectories of the peptides x(t). The first derivative of this dependence dx/dt is the average velocity of the peptide displacement along the column. Under isocratic conditions at constant values of organic solvent concentrations, all peptides travel at constant velocity. However, under gradient separation, when the organic solvent concentration changes with time, the peptides move with acceleration. This acceleration is sequence specific and depends on the peptide adsorption properties. For example, SI-Figure 4 ,Supporting Information, shows the displacements and instantaneous velocities as functions of time at different gradient slopes for the peptide pair considered in the previous example. The differences in accelerations of these two peptides result in the inversion of elution with the change in the gradient slope.
■
Article
ASSOCIATED CONTENT
S Supporting Information *
Detailed description of the theory (SI-Appendixes A, B, and C); experimental setup (SI-Tables 1 to 4 and Appendixes D and E); supplemental results (SI-Figures 1 to 4). The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.5b00595.
■
AUTHOR INFORMATION
Corresponding Authors
*Phone: +7(499)1378257. Fax: +7(499)1378258. E-mail:
[email protected]. *Phone: +7(499)1378257. Fax: +7(499)1378258. E-mail:
[email protected]. Author Contributions §
I.A.T., A.A.G., and T.Y.P. contributed equally to the manuscript.
Notes
The authors declare no competing financial interest.
■ ■
ACKNOWLEDGMENTS This work was supported by the Russian Science Foundation (project #14-14-00971). REFERENCES
(1) Meek, J. L. Proc. Natl. Acad. Sci. U.S.A. 1980, 77 (3), 1632−1636. (2) Su, S. J.; Grego, B.; Niven, B.; Hearn, M. T. W. J. Liq. Chromatogr. 1981, 4 (10), 1745−1764. (3) Browne, C. A.; Bennett, H. P. J.; Solomon, S. Anal. Biochem. 1982, 124 (1), 201−208. (4) Sasagawa, T.; Okuyama, T.; Teller, D. C. J. Chromatogr. 1982, 240 (2), 329−340. (5) Sakamoto, Y.; Kawakami, N.; Sasagawa, T. J. Chromatogr. 1988, 442, 69−79. (6) Guo, D.; Mant, C. T.; Taneja, A. K.; Parker, J. M. R.; Hodges, R. S. J. Chromatogr. 1986, 359, 499−518. (7) Hearn, M. T. W.; Aguilar, M. J.; Mant, C. T.; Hodges, R. S. J. Chromatogr. 1988, 438, 197−210. (8) Mant, C. T.; Burke, T. W. L.; Black, J. A.; Hodges, R. S. J. Chromatogr. 1988, 458, 193−205. (9) Mant, C. T.; Zhou, N. E.; Hodges, R. S. J. Chromatogr. 1989, 476, 363−375. (10) Champney, W. S. J. Chromatogr. 1990, 522, 163−170. (11) Krokhin, O. V. Anal. Chem. 2006, 78 (22), 7785−7795. (12) Krokhin, O. V.; Ying, S.; Cortens, J. P.; Ghosh, D.; Spicer, V.; Ens, W.; Standing, K. G.; Beavis, R. C.; Wilkins, J. A. Anal. Chem. 2006, 78, 6265−6269. (13) Gilar, M.; Jaworski, A.; Olivova, P.; Gebler, J. C. Rapid Commun. Mass Spectrom. 2007, 21, 2813−2821. (14) Gilar, M.; Xie, H.; Jaworski, A. Anal. Chem. 2010, 82 (1), 265− 275. (15) Azarova, I. N.; Baram, G. I.; Gol’dberg, E. L. Russ. J. Bioorg. Chem. 2006, 32 (1), 50−56. (16) Baczek, T.; Wiczling, P.; Marszall, M.; Heyden, Y. V.; Kaliszan, R. J. Proteome Res. 2005, 4 (2), 555−563. (17) Petritis, K.; Kangas, L. J.; Yan, B.; Monroe, M. E.; Strittmatter, E. F.; Qian, W.; Adkins, J. N.; Moore, R. J.; Xu, Y.; Lipton, M. S.; Camp, D. G.; Smith, R. D. Anal. Chem. 2006, 78, 5026−5039. (18) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; PasaTolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Yu.; Zhao, R.; Smith, R. D. Anal. Chem. 2003, 75, 1039−1048. (19) Klammer, A. A.; Yi, X.; MacCoss, M. J.; Noble, W. S. Anal. Chem. 2007, 79 (16), 6111−6118. (20) Moruz, L.; Tomazela, D.; Käll, L. J. Proteome Res. 2010, 9 (10), 5209−5216.
CONCLUSIONS
In this study, we showed that BioLCCC theory, based on the earlier theoretical developments in polymer statistics54−56 and chromatography of small molecules,23,38,39 provides a general description of the peptide separation under isocratic and gradient conditions, allows simulation of peptide migration through the column, and calculates the distribution coefficients, retention factors, and retention times for arbitrary amino acid sequences. The theory conclusions agree well with the experimental observations obtained for peptides and proteins separated under isocratic and/or gradient RP-HPLC. Specifically, it predicts the peptide distribution coefficient for different isocratic experiments with the accuracy of up to 0.99 R2. It was also shown that BioLCCC enables in silico rescaling of the amino acids’ retention coefficients and direct calculating of the linear regression parameters for the additive model used in proteomic applications for rapid peptide retention time estimations.13,14,59 We believe that BioLCCC can be useful in peptide chromatography and its analytical applications for the in silico optimization of the separation conditions and gradient profiles, as well as a general understanding and modeling of the separation process. G
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX
Article
Analytical Chemistry (21) Shinoda, K.; Sugimoto, M.; Yachie, N.; Sugiyama, N.; Masuda, T.; Robert, M.; Soga, T.; Tomita, M. J. Proteome Res. 2006, 5 (12), 3312−3317. (22) Shinoda, K.; Tomita, M.; Ishihama, Y. Bioinformatics 2008, 24 (14), 1590−1595. (23) Snyder, L. R. In Principles of Adsorption Chromatography; Marcel Dekker: New York, 1968. (24) Stadalius, M. A.; Gold, H. S.; Snyder, L. R. J. Chromatogr. 1984, 296, 31−59. (25) Schoenmakers, P. J.; Billiet, H. A. H.; de Galan, L. J. Chromatogr. 1979, 185, 179. (26) Schoenmakers, P. J.; Billiet, H. A. H.; de Galan, L. J. Chromatogr. 1981, 218, 261. (27) Stadalius, M. A.; Quarry, M. A.; Mourey, T. H.; Snyder, L. R. J. Chromatogr. 1986, 358, 17−37. (28) Larmann, J. P.; DeStefano, J. J.; Goldberg, A. P.; Stout, R. W.; Snyder, L. R.; Stadalius, M. A. J. Chromatogr. A 1983, 255, 163−189. (29) Beyaz, A.; Fan, W.; Carr, P. W.; Schellinger, A. P. J. Chromatogr. A 2014, 1371, 90−105. (30) Spicer, V.; Grigoryan, M.; Gotfrid, A.; Standing, K. G.; Krokhin, O. V. Anal. Chem. 2010, 82, 9678−9685. (31) Vu, H.; Spicer, V.; Gotfrid, A.; Krokhin, O. V. J. Chromatogr. A 2010, 1217, 489−497. (32) Gorshkov, A. V.; Tarasova, I. A.; Evreinov, V. V.; Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A.; Gorshkov, M. V. Anal. Chem. 2006, 78, 7770−7777. (33) Gorshkov, A. V.; Evreinov, V. V.; Tarasova, I. A.; Gorshkov, M. V. Polym. Sci. Series B 2007, 49, 93−107. (34) Tarasova, I. A.; Gorshkov, A. V.; Evreinov, V. V.; Adams, C. M.; Zubarev, R. A.; Gorshkov, M. V. Polym. Sci. Series A 2008, 50, 309− 321. (35) Pasch, H.; Trathnigg, B. In HPLC of Polymers; Springer-Verlag: Berlin, 1998. (36) Kawakatsu, T. Statistical Physics of Polymers. An Introduction; Springer, Berlin, 2004. (37) Grosberg, A. Y.; Khokhlov, A. R. Statistical Physics of Polymers; AIP Series in Polymers and Complex Materials; AIP Press: New York, 1994. (38) Snyder, L. R.; Poppe, H. J. Chromatogr. 1980, 184, 363−413. (39) Snyder, L. R.; Glajch, J. L. J. Chromatogr. 1981, 214, 1−19. (40) Entelis, S. G.; Evreinov, V. V.; Gorshkov, A. V. Adv. Polym. Sci. 1986, 76, 129−175. (41) DiMarzio, E. A.; Guttman, C. M.; Mah, A. Macromolecules 1995, 28 (8), 2930−2937. (42) Guttman, C. M.; DiMarzio, E. A.; Douglas, J. F. Macromolecules 1996, 29 (17), 5723−5733. (43) Gorbunov, A. A.; Solovyova, L. Y.; Skvortsov, A. M. Polymer 1998, 39 (3), 697−702. (44) Brun, Y.; Alden, P. J. Chromatogr. A 2002, 966 (1), 25−40. (45) Gorbunov, A.; Trathnigg, B. J. Chromatogr. A 2002, 955, 9−17. (46) Skvortsov, A. M.; Fleer, G. J. Macromolecules 2002, 35, 8609− 8620. (47) Philipsen, H. J. A. J. Chromatogr. A 2004, 1037, 329−350. (48) Gorbunov, A. A.; Vakhrushev, A. V. Polymer 2004, 45, 6761− 6770. (49) Bashir, M. A.; Radke, W. J. Chromatogr. A 2006, 1131, 130−141. (50) Boehm, R. E.; Martire, D. E.; Armstrong, D. W.; Bui, K. H. Macromolecules 1983, 16 (3), 466−476. (51) Armstrong, D. W.; Boehm, R. E. J. Chromatogr. Sci. 1984, 22 (9), 378−385. (52) Brun, Y. J. Liq. Chromatogr. Relat. Technol. 1999, 22 (20), 3027− 3065. (53) Brun, Y. J. Liq. Chromatogr. Relat. Technol. 1999, 22 (20), 3067− 3090. (54) DiMarzio, E. A.; Rubin, R. J. J. Chem. Phys. 1971, 55, 4318− 4336. (55) Casassa, E. F. J. Polym. Sci., Part A-2 1972, 10, 381−384. (56) Zhulina, Y. B.; Gorbunov, A. A.; Skvortsov, A. M. Polym. Sci. USSR 1984, 26 (5), 1020−1029.
(57) Perlova, T. Y.; Goloborodko, A. A.; Margolin, Y.; Pridatchenko, M. L.; Tarasova, I. A.; Gorshkov, A. V.; Moskovets, E.; Ivanov, A. R.; Gorshkov, M. V. Proteomics 2010, 10 (19), 3458−3468. (58) Goloborodko, A. A.; Levitsky, L. I.; Ivanov, M. V.; Gorshkov, M. V. J. Am. Soc. Mass Spectrom. 2013, 24 (2), 301−304. (59) Gilar, M.; Jaworski, A. J. Chromatogr. A 2011, 1218, 8890−8896. (60) Guo, X.; Trudgian, D. C.; Lemoff, A.; Yadavalli, S.; Mirzaei, H. Mol. Cell Proteomics 2014, 13, 1573−1584. (61) Mant, C. T.; Lorne Burke, T. W.; Hodges, R. S. Chromatographia 1987, 24, 565−572. (62) Tarasova, I. A.; Perlova, T. Y.; Pridatchenko, M. L.; Goloborodko, A. A.; Levitsky, L. I.; Evreinov, V. V.; Guryca, V.; Masselon, C. D.; Gorshkov, A. V.; Gorshkov, M. V. J. Anal. Chem. 2012, 67 (13), 1014−1025.
H
DOI: 10.1021/acs.analchem.5b00595 Anal. Chem. XXXX, XXX, XXX−XXX