Liquid Chromatography at Critical Conditions: Comprehensive

Oct 21, 2006 - An approach to sequence-dependent retention time prediction of peptides based on the concept of liquid chromatography at critical condi...
1 downloads 9 Views 288KB Size
Anal. Chem. 2006, 78, 7770-7777

Liquid Chromatography at Critical Conditions: Comprehensive Approach to Sequence-Dependent Retention Time Prediction Alexander V. Gorshkov,† Irina A. Tarasova,‡ Victor V. Evreinov,† Mikhail M. Savitski,§ Michael L. Nielsen,§ Roman A. Zubarev,§ and Mikhail V. Gorshkov*,‡

Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Leninsky Prosp. 38, bld.2, Moscow 119334, Russian Federation, N.N. Semenov’s Institute of Chemical Physics, Russian Academy of Sciences, Kosygina 4, Moscow 119991, Russian Federation, and Biological and Medical Center, Laboratory for Biological and Medical Mass Spectrometry, Uppsala University, Box 583, S-75 123 Uppsala, Sweden

An approach to sequence-dependent retention time prediction of peptides based on the concept of liquid chromatography at critical conditions (LCCC) is presented. Within the LCCC approach applied to biopolymers (BioLCCC), the specific retention time corresponds to a particular sequence. In combination with mass spectrometry, this approach provides an efficient tool to solve problems wherein the protein sequencing is essential. In this work, we present a theoretical background of the BioLCCC concept and demonstrate experimentally its feasibility for sequence-dependent LC retention time prediction for peptides. BioLCCC model is based on three notions: (a) a random walk model for a macromolecule chain; (b) an entropy and energy compensation for the macromolecules within the adsorbent pore; and (c) a set of phenomenological parameters for the effective interaction energies of interactions between the amino acid residues and the adsorbent surface. In this work, the phenomenological parameters have been obtained for C18 reversed-phase HPLC. Note, that contrary to alternative additive models for retention time prediction based on summation of the so-called “retention coefficients”, the BioLCCC approach takes into account the location of amino acids within the primary structure of a peptide and, thus, allows the identification of the peptides having the same composition of amino acids but differing by their arrangement. As a result, this new approach allows prediction of retention time for any possible amino acid sequence in particular HPLC experiments. In addition, the BioLCCC model lacks of main drawbacks of additive approaches that predict retention time for sequences of limited chain lengths and provide information about amino acid composition only. The proposed BioLCCC approach was characterized experimentally using LTQ FT LC-MS and LC-MS/MS data obtained earlier for Escherichia coli. The HPLC system calibration was per* Corresponding author. Phone: +7(495) 1371007. Fax: +7(495) 1378258. E-mail: [email protected]. † N.N. Semenov’s Institute of Chemical Physics. ‡ Institute for Energy Problems of Chemical Physics. § Uppsala University.

7770 Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

formed using peptide retention standards. The results received show a linear correlation between predicted and experimental retention times, with a correlation coefficient, R2, of 0.97 for a peptide standard mixture and 0.9 for E. coli data, respectively, with the standard error below 1 min. The work presents the first description of a BioLCCC approach for high-throughput peptide characterization and preliminary results of its feasibility tests. A method of liquid chromatography at critical conditions (LCCC) applied to synthetic oligomers was developed in the early 1980s.1-4 A basic feature of the LCCC mode is that macromolecules are separated at the so-called critical LC parameters corresponding to adsorption-phase transition5,6 that can be described by renormalization group theory.7 In critical conditions, the molar mass distribution of synthetic polymers “disappears” and separation takes place in accordance with other types of chain heterogeneity, e.g., a number and the types of functional or modified groups, their positions in the chain, and a chain topology. The existence of the LCCC mode is based on the consideration of polymer adsorption as a second-order phase transition that takes place in a system consisting of monomers connected into a flexible chain. The chain is characterized by the entropy losses, ∆S, due to restrictions implied on the monomer degrees of freedom near the pore wall and the free energy changes due to monomer interactions with the pore surface. The interaction between a macromolecule and a surface is characterized by an interaction energy, ∆Eads, which depends, at fixed temperature T, on a binary solvent composition, NB, i.e., ∆Eads)∆Eads(NB). At certain critical solvent composition, Nc, one can achieve an exact compensation (1) Skvortsov, A. M.; Gorbunov, A. A. Vysokomol. Soed. 1980, 22 (12), 26412647. (2) Gorshkov, A. V.; Evreinov, V. V., Entelis, S. G. Dokl. Akad. Nauk USSR 1983, 272, 632-635. (3) Entelis, S. G.; Evreinov, V. V.; Gorshkov, A. V. Adv. Polym. Sci. 1986, 76, 129-175. (4) Entelis, S. G.; Evreinov, V. V.; A. I. Kuzaev Reactive Oligomers: VSP: Utrecht, 1989. (5) Lifshits, I. M. Sov. Phys.-JETP 1969, 28, 1280-1294. (6) Di Marzio, E. A.; Rubin, R. J. Chem. Phys. 1971, 55 (9), 4318-4336. (7) Eisenriegler, E.; Kremer, K.; Binder, K. J. Chem. Phys. 1982, 77, 62966320. 10.1021/ac060913x CCC: $33.50

© 2006 American Chemical Society Published on Web 10/21/2006

of entropy losses by energy gains so as the total change in free energy, ∆G, of this system becomes zero, i.e., ∆G ) ∆Eads(Nc) kT∆S ) 0. These conditions correspond to a chain transition from a solution to a localized state for the macromolecule inside the pore. Following this consideration, the LCCC theory describes the chain as having three states: (i) a nonadsorbed 3d coil in a solution at kT∆S . ∆Eads; (ii) completely adsorbed 2d coil at the surface (when each of the monomers binds to a surface) at kT∆S , ∆Eads; and (iii) a coil, which undergoes the phase transition between the nonadsorbed and the adsorbed states. These thermodynamic states correspond to the size exclusion chromatography (SEC), the liquid adsorption chromatography (LAC), and the LCCC modes of chromatographic separation. For a given chain length, N, and an interaction energy, ∆Eads, the particular mode of separation can be realized. It is quite clear that for the coils in 3d (nonadsorbed) and 2d (fully adsorbed) states there is no retention time dependence on the macromolecule sequence and the elution will be sensitive to either a size of a chain (SEC) or the chain composition (LAC). Near the critical conditions, the adsorption properties become sensitive to smaller details, such as the chain sequence or the presence of certain chemical groups. Using the LCCC theory, it is possible to calculate the interaction energies between the monomers and the surface for a known chemical structure, or sequence, and, thus, to predict LC retention times. Earlier, this approach has been successfully applied to characterize synthetic oligomers8-12 and copolymers.13,14 Recently, the LCCC approach to chromatographic separation of complex polymer mixtures becomes more broadly accepted15,16 with new developments including the combination of LCCC with mass spectrometry.17-19 The BioLCCC approach in combination with tandem mass spectrometry (MS/MS) for the characterization of biopolymers has been proposed recently, and the feasibility of this approach to predict LC retention time based on peptide amino acid sequence and, in combination with MS/MS, to facilitate de novo sequencing has been demonstrated. Note, that the developments of LC retention time prediction models related to peptide separation have gained renewed interest in recent years as a way to improve efficiency of protein identification. Indeed, LC data are complimentary to the MS and MS/MS data in respect to the peptide structure and, thus, bring additional information about the structure that can be missed or overlooked during routine MS experiments. Among recently developed approaches to LC peptide

retention time prediction are the artificial neural network (ANN) approach,22 the model based on sequence-specific correction factors,23 and the model based on quantitative structure-retention relationships.24 In this work, we present the BioLCCC concept and first experimental results of its application for sequence-dependent LC retention time prediction.

(8) Gorshkov, A. V.; Evreinov, V. V.; Entelis, S. G. Vysokomol. Soed. A 1982, 24 (3), 524-535. (9) Gorshkov, A. V.; Evreinov, V. V.; Entelis, S. G. Zh. Phys. Khim. 1983, 57 (11), 2665-2673. (10) Evreinov, V. V.; Gorshkov, A. V.; Prudskova, T. N.; Gur’yanova, V. V.; Pavlov, A. V.; Malkin, A. Ya.; Entelis, S. G. Polym. Bull. 1985, 14, 131-136. (11) Gorshkov, A. V.; van Aalten, H.; Overeem, T.; Evreinov, V. V. Polym. Bull. 1987, 18, 513-516. (12) Gorshkov, A. V.; Verenich, S. S.; Evreinov, V. V.; Entelis, S. G. Chromatographia 1988, 26, 338-342. (13) Gorshkov, A. V.; Much, H.; Becker, H.; Pasch, H.; Evreinov, V. V.; Entelis, S. G. J. Chromatogr. 1990, 523, 91-102. (14) Pasch, H.; Kruger, H.-P.; Much, H.; Just, U. J. Chromatogr. 1992, 589, 295306. (15) Pasch, H. Adv. Polym. Sci. 2000, 150, 1-66. (16) Pasch, H.; Trathnigg, B. HPLC of Polymers: Springer: Berlin, 1999. (17) Olesik, S. V. Anal. Bioanal. Chem. 2004, 378, 43-45. (18) Philips, S. L.; Ding, L.; Ding, S. L.; Stegemiller, M.; Olesik, S. V. Anal. Chem. 2003, 75, 5539-5543. (19) Philips, S. L.; Olesik, S. V. Anal. Chem. 2003, 75, 5544-5553.

Equation 3 is a general equation, which can be used to find retention volume in a gradient mode of separation that further allows determination of the retention volume for particular macromolecule and specific gradient conditions, NB ) NB(V), and column parameters.

THEORY Separation in chromatography is defined by the distribution coefficient Kd, associated with a change in free energy, ∆G, when the macromolecule passes from a mobile phase into the pores of a stationary phase. Consider the general expression for chromatographic distribution coefficient Kd as follows:

Kd ) exp-∆G/kT

(1)

in which ∆G ) ∆Eads - kT∆S is the energy difference, ∆S is the change in entropy, and ∆Eads is the adsorption energy that characterizes interaction of a macromolecule with the surface. The particular mode of separation will be SEC, when ∆G > 0 (Kd< 1), or LAC, when ∆G < 0 (Kd > 1). Finally, at critical conditions (LCCC), when ∆G ) 0, the distribution coefficient equals 1, Kd ) 1, and is independent of the molecular size or the polymerization degree. The retention volume is defined as

VR ) V0 + KdVp

(2)

in which V0, Vp are the interstitial and total pore volumes, respectively. Under the conditions of gradient elution, a binary solvent composition changes and becomes dependent on a volume, V ) V(t), passing through a column, i.e., NB ) NB(V), where NB is a molar fraction of a solvent component B (e.g., acetonitrile) in a component A (e.g., water). Respectively, the distribution coefficient changes with time, Kd ) Kd(V). The corresponding equation for the retention volume in a gradient elution becomes the following:25



VR - V0

0

dV )1 VPKd(V)

(3)

(20) Gorshkov, A. V.; Evreinov, V. V.; Gorshkov, M. V. 52nd ASMS Conference, Nashville, TN, 2004; MPX447. (21) Tarasova, I. A.; Gorshkov, A. V.; Evreinov, V. V.; Kharybin, O. N.; Gorshkov, M. V. 53rd ASMS Conference, San Antonio, TX, 2005; THP263. (22) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; Pasaˇ-Tolic’, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen Yu.; Zhao, R.; Smith, R. D. Anal. Chem. 2003, 75, 1039-1048. (23) Krokhin, O. V.; Craig, R. V.; Spicer, V.; Ens, W.; Standing, K. G.; Beavis, R. C.; Wilkins, J. A. Mol. Cell. Proteomics 2004, 3.9, 908-919. (24) Kaliszan, R.; Baczek, T.; Cimochowska, A.; Juszczyk, P.; Wisniewska, K.; Grzonka, Z. Proteomics 2005, 5, 409-415. (25) Snyder, L, Saunders, J. Chromatogr. Sci. 1969, 7 (4), 195-199.

Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

7771

1,k and k + 1,k elements of the matrix, respectively). It is assumed here that in the layers 1 and D (at the surface of the slit pore) i monomer attracts to the surface with the attraction energy i. The attraction increases the statistical weight of a particular configuration by factor ei in the first and the last rows of the transition matrix. Therefore, for the macromolecule chain having a length of N (number of monomers), the distribution coefficient Kd will be determined by the following expression:

Kd )

N

1 D

U‚

[∏ ]

W(i) ‚P(1)

(5)

i)2

in which U is the unit row vector (which summarizes the possible chain configurations) and Figure 1. Schematic presentation of a free walk model for a monomer chain in a cubic lattice.

At the beginning of an LC gradient, the macromolecules are localized at the surface and stay in a strong adsorption mode. Then the solvent strength increases, and the macromolecules start desorbing at certain solvent composition and move further through a column with the solvent. This transition from the localized state into the solvent takes place in a narrow range of solvent composition, near the corresponding critical point, Nc.4 In the gradient elution, different macromolecules are settled along the gradient profile according to their critical points and, thus, eluted at different times depending on their unique chemical properties of chain sequence. The model presented below takes into consideration both the entropy and the energy factors and predicts the existence of sequence-dependent LC critical conditions. Random Walk Model To Determine a Distribution Coefficient. An adsorption of a macromolecule can be quantitatively described using a simple random walk model introduced by DiMarzio and Rubin.6 According to this model, schematically shown in Figure 1, a macromolecule is considered as a random walk of monomers inside a slitlike pore of size d where the pore is replaced with a cubic lattice of size D (D is a number of lattice fragments, and usually, D ) d/a, where a ∼ 10 Å is the size of a monomer and d is a size of a pore). The random walk in this cubic lattice can be described by a transition matrix W(i), which depends on the interaction energy i (kT, normalized) between the i monomer of the macromolecule chain and the pore surface:

|

0 ... 0 2/3‚ei 1/6‚ei 0 1/6 2/3 1/6 0 ... 0 0 1/6 2/3 1/6 ... 0 W(i) ) ... ... ... ... ... ... 0 0 ... 1/6 2/3 1/6 0 0 ... 0 1/6‚ei 2/3‚ei

|

(4)

DxD

Each m,k element of the matrix is a conditional probability to find the i monomer in the layer m if i - 1 - monomer is in the layer k. For a simple cubic lattice, the probability that i monomer stays in layer k equals 2/3 (k,k, diagonal elements of the matrix). The probability to move into layers k - 1, or k + 1 equals 1/6 (k 7772

Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

||

e1 1 P) ... e1

D

is the starting vector corresponding to the probability of the first monomer of a macromolecule chain to be found in different lattice elements between the walls of the pore (respectively, the first and the last elements of this vector reflect the interaction of this first monomer with the pore surface). It is important to note that matrices in eq 5 are noncommutative W(i)W(j) * W(j)W(i), j * i, and therefore, the distribution coefficient, Kd, will depend on the macromolecule sequence. Note also that this model can be applied to both low molar mass and high molar mass chains and has no internal restrictions due to chain length. Note that this model in a limited case of low molar mass peptides and strong interaction energies coincides with the additive models. The energy of interaction, i, between the i monomer and the surface is specific for each monomer and can be determined from the following equation:26

i ) Xi0 - AB

(6)

in which Xi0 is the surface-monomer and AB is the solventsurface (the so-called “solvent strength”) interaction energies, respectively. Note that following Snyder’s assumption,26 eq 6 is received in approximation that the solvent-monomer interaction energies are negligible compared with solvent-surface and monomer-surface interaction energies. Equation 6 corresponds to the isocratic mode of separation. In gradient elution, the molar fraction of a solvent B, NB, in a binary solvent, changes continuously so as the interaction energy, i becomes a function of NB:

i(NB) ) Xi0 - AB(NB)

(7)

To determine AB, we now introduce the transition matrix W(AB) describing interactions in a binary solvent. Here, we again consider (26) Snyder, L. R. Principles of Adsorption Chromatography; M. Dekker: New York, 1968.

the random walk model for the macromolecule. When the monomer is adsorbed on the surface, it will replace solvent molecules, either A or B. Consider the probability of desorption of a molecule B, PB, when the monomer is adsorbed at the surface. The probability of a molecule A being desorbed at this same place will be (1 - PB), respectively. The probability PB is equal to a fraction of surface occupied by the molecules B. Thus, for the solvent related transition matrix W(AB), we can write the following expression:

W(AB) ) (1 - PB)W(A) + PBW(B)

(8)

in which W(A) and W(B) are transition matrices corresponding to desorption of the solvent molecules A or B in the pure singlecomponent solvents. These matrices are expressed in the same form as the matrix W(i) in eq 4. For the effective interaction energy, AB, we then obtain

AB ) A + ln(1 - PB + PBe∆)

(9)

in which ∆ is the kT-normalized difference between the interaction energies of pure solvent molecules A and B with the surface, that is, ∆ ) B - A. To determine the probability, PB, we will use the Langmuir isotherm approach where the constant of equilibrium, K(P,T), is defined as

K(P,T) )

PB(1 - NB) (1 - PB)NB

) e∆

(10)

and from eqs 9 and 10, we finally receive

AB(V) ) A + ln[1 - NB(V) + NB(V)e(B-A)]

(11)

Equation 11 is close to Snyder’s equation for solvent strength of a binary mixture.26 Equations 3, 5, 7, and 11 form the complete system of equations to determine the distribution coefficient for a given gradient elution mode of separation:

{

AB(V) ) A + ln[1 - NB(V) + NB(V)e(B-A)] i(V) ) Xi0 - AB(V) N 1 Kd(V) ) U‚ W(i) ‚P(1) (12) D i)2 dV VR-V0 )1 VpKd(V) 0

[∏ ]



From this system of equations one can find distribution coefficient, taking into account the experimentally determined interaction energies of the monomers and the solvent molecules with the adsorbent surface, and predict retention volume (and/ or retention time) for a given gradient LC conditions and any particular macromolecule sequence. Determination of the phenomenological parameters. To complete description of the macromolecule separation it is necessary to determine three phenomenological parameters in

the system of equations (12): monomer adsorption energy, Xi0, and solvent surface interaction energies A and B. Here, we call this set of energies as LCCC vector of parameters {Xi0, A, B} that completes description of our BioLCCC model of sequence dependent LC retention time/volume prediction. Note, that problem of determination of the vector of parameters is simplified in the case of a homopolymer consisting of the same monomers as it has been shown earlier.4 Using described BioLCCC approach we have generated a table of interaction energies for 20 most commonly occurring amino acids. The energies were calculated based on the C18 reversed-phase gradient HPLC experimental data available in the literature.27-29 These data were covering synthetic peptides with the sequences such as Ac-G-xx-LLLKK-amid, where x corresponds to different amino acid residues27 that allows determination of the phenomenological parameter Xi0 for each amino acid residue. Note, that the table of interaction energies, Xi0, for all amino acid residues has been received for the following experimental LC parameters: pH)2.0, C18 phase, water/acetonitrile solvent mixture. Also, the energy calculations have been performed in approximation that the interstitial volume is equal to the pore volume, that is V0 ) Vp ) 1.5 mL, pore size of D ) 30, (corresponding to the pore size of 300 Å), gradient slope of 1% B/min, and the flow rate of 1.0 mL/min. The solvent-surface interaction energies, A and B (A - water, B - acetonitrile) were found empirically to be A ) 0.0 and B ) 2.40 from the available data for retention times of peptides.27-29 Interaction energies Xi0 were found in the following way: by varying the interaction energies for lysine (K), glycine (G), and leucine (L) amino acid residues incorporated in the set of synthesized peptides AcGGGLLLKK-amide, Ac-GLLLLLKK-amide, Ac-GKKLLLKK-amide from the available experimental data,24 we received the interaction energies for these residues, X0K, X0G, and X0L corresponding to the best fit between observed and predicted by the model retention volumes. Then, from the known values for X0K, X0G, X0L, the interaction energies for all other residues were received from the best fits between observed and predicted retention volumes for the peptides like Ac-GxxLLLKK-amide. The resulting data on interaction energies for all amino acid residues are presented in Table 1. Importantly, these interaction energies are specific for pH ) 2.0, C18 phase, and water/acetonitrile mixture. It is further assumed that they are general for other LC parameters, such as gradient profile, pore size, column dimension, and the flow rate. Note, that the peptide may have different types of C- and N-terminal groups that have to be taking into account by measuring their interaction energies. In this work we have received interaction energies for the end groups corresponding to N-hydrogen, N-acetyl, C-amide and C-free acid groups. It is assumed that effective interaction energy of the first and the last monomer is a sum of the energies of the corresponding residue and the corresponding end group. The end group interaction energies were also found from the literature data.27 Note also, that equation 5 is applicable for sequences, where the initial vector, (27) Guo, D.; Mant, C. T.; Taneja, A. H.; Parker, R. J.; Hodges, R. S. J. Chromatogr. 1986, 359, 499-517. (28) Guo, D, Mant, C. T. Taneja, A. K.; Hodges, R. J. Chromatogr. 1986, 359, 519-532. (29) Mant, C. T.; Burke, T. W.; Black, J. A.; Hodges, R. S. J. Chromatogr. 1988, 458, 193-205.

Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

7773

Table 1. Interaction Energies X0i for 20 Common Amino Acid Residuesa AA residue

Lys

His

Arg

Asn

Gly

Ser

Gln

Asp

Thr

Glu

interaction energy, Xi

-0.140

-0.020

0.110

0.208

0.250

0.292

0.340

0.375

0.470

0.578

AA residue

Ala

Pro

Cys

Tyr

Val

Met

Ile

Leu

Phe

Trp

0.737

0.737

0.890

1.280

1.345

1.416

1.750

1.892

1.913

2.030

0

0

interaction energy, Xi

a These interaction energies are specific for pH ) 2.0, C phase, and water-acetonitrile solvent compositions.27-29 In addition, the following 18 interaction energies for the N- and C-terminal groups were obtained:27 COOH ) -0.3, amide ) 0.0, Ac ) 0.0, and Hydrogen ) -1.69.

P1, and the transition matrix, W(N), are corresponding to C- and N-terminals, respectively. EXPERIMENTAL SECTION High-performance liquid chromatography has been done on an Agilent 1100 nanoflow system (Agilent Technologies) using a 15-cm fused-silica emitter (75-µm inner diameter, 375-µm outer diameter; Proxeon Biosystems) as analytical column packed inhouse by Reprosil-Pur C18-AQ 3-µm silica. The HPLC system had a thermostated microautosampler, and the analytical column had no temperature control. Mobile phases consisted of component A (0.5% acetic acid and 99.5% water) and component B (0.5% acetic acid, 10% water, and 89.5% acetonitrile). All samples have been separated under the same conditions: 20-min elution in isocratic mode in 4% B at a flow rate 500 nL/min and followed after that by a 90-min gradient elution from 4 to 45% B buffer at A constant flow rate 200 nl/min. The hybrid system linear ion trap (LTQ) 7-T FTICR mass spectrometer (Thermo Electron, Bremen, Germany) coupled with a nanoelectrospray ion source (Proxeon Biosystems, Odense, Denmark) has been used for mass analyses. The BioLCCC model was first characterized using peptide retention standard S1-S5 (Pierce Biotechnology, Inc.). Peptide retention standard S1-S5 consists of five peptides: four peptides denoted as S2-S5 have the generic formula Ac-RG-xx-GLGLGKamide, where xx corresponds to Gly-Gly, Ala-Gly, Val-Gly, and ValVal; and the fifth peptide, S1, contains xx as Ala-Gly and has a free NR-amino group. Experimental data for a tryptic digest of Escherichia coli were obtained earlier30 using 7-T hybrid linear ion trap FTICR mass spectrometer LTQ FT (Thermo Electron) under LC conditions described above. The BioLCCC model was implemented as an in-house-developed software package using Dev-C++ (Free Software Foundation, Inc., Cambridge, MA). Two LC data sets for two E. coli samples were used to study the predictive capabilities of our model. These samples were randomly selected out of 20 E. coli samples collected ealier.30 The first data set (sample A) contained LC data for 46 identified peptides, and the second one (sample B) contained LC data for 77 identified peptides. The peptides in the samples were identified using two complementary MS/MS techniques, Electron capture dissociation (ECD) and collision activated dissociation (CAD). ECD and CAD spectra of E. coli were compared, and only complementary pairs of peaks were chosen for further analysis in the Mascot search. The details of peptide characterization using complementary (30) Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Mol. Cell. Proteomics 2005, 4.6, 835-845.

7774 Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

information from ECD and CAD were published previously.30 Based on the results of the Mascot search, the peptides with a score higher then 34 as the Mascot threshold were selected to test the BioLCCC model. Mass accuracies for MS and MS/MS spectra were 5 and 20 ppm, respectively. Retention time data for the peptides from the digest were received from the total ion current (TIC) and documented in samples A and B data sets. For each peptide from the samples, the experimental retention times have been determined at TIC’s peak maximum. RESULTS AND DISCUSSION “Theoretical Chromatograph”. For practical implementation of the BioLCCC approach we have developed a software package called the Theoretical Chromatograph. Figure 2 shows a block diagram for Theoretical Chromatograph. The software allows us to calculate retention time/volume for a given peptide sequence. The input parameters for the software are as follows: (1) column parameters, V0, Vp, D; (2) gradient conditions; (3) peptide sequence; and (4) type of end groups. The program uses a tabulated set of interaction energies {Xi0, A, B} corresponding to 20 of the most common amino acid residues (Table 1), C- and N-terminal end groups, and water-acetonitrile as a binary solvent composition. The end groups taken into account in this work were free acid and amide as the C-terminal and hydrogen and acetylation as the N-terminal groups. During the data processing, the software calculates the gradient function for the interaction energy AB(NB) which is further used to obtain Kd ) Kd(V) and returns, as the final result, the retention time for a given amino acid sequence. Retention Time Prediction for Peptide Retention Standard S1-S5. The first test of the BioLCCC approach was performed using peptides from retention standard S1-S5 (Pierce Biotechnology, Inc.) separated under the following LC conditions: stationary-phase volume (pore volume) equal to mobile-phase volume, i.e., V0 ) Vp ) 316 nL; D ) 12; flow rate 200 nL/min. The following interaction energies for the end groups were used: N-acetyl ) 0, C-amide ) 0. The results of the correlation between experimental and predicted retention times for the NR-acetylated peptides S2, S3, S4, and, S5 from the standard are shown in Figure 3a. Each point in the Figure 3a corresponds to the average of three LC runs. The peptide S1 from the standard was not used in the test due to its different N-terminal group (hydrogen). An account of different types of end groups for peptides in the same mixture cannot be done using the current version of Theoretical Chromatograph, but obviously, this feature will be added in future work. In Figure 3b, we show the results of correlation between

Figure 2. Block diagram of Theoretical Chromatograph used in sequence-dependent LC retention time prediction based on BioLCCC approach.

Figure 3. Correlation between the experimental and theoretically predicted retention times for acetylated peptides from peptide standard S2-S5: (a) BioLCCC model; (b) additive approach based on summation of retention coefficients.27-29 The experimental retention times are the average values from three LC runs.

experimental retention times for the same S2-S5 peptides received using the additive approach based on summation of retention coefficients. The BioLCCC model gives a correlation coefficient of R2 ) 0.974 between experimental and predicted retention times for the standard with standard deviation below 0.59 min that is comparable with a well-characterized for this mixture additive approach that gives R2 ) 0.986 and a standard deviation of 0.41 min. The additive approach used to generate predictions in Figure 3b and corresponding figures below was based on the table of retention coefficients obtained in earlier works.27-29 Retention Time Prediction for E. coli Peptide Sequences. The feasibility of the BioLCCC approach to predict retention time was then demonstrated using two sets of LC data from the samples A and B described above and selected from the bulk of LC data obtained earlier for E. coli tryptic digest.30 For calculations of retention times in Theoretical Chromatograph the flow rate,

column and the interaction energy parameters were the same as for the S1-S5 peptide mixture used in previous experiments. The interaction energies for the end groups were the following: Free Acid ) -0.03 for C-terminal group and Hydrogen ) -1.69 for N-terminal group. Figures 4 and 5 show a correlation between experimental and predicted retention times. For both samples, the BioLCCC model gives the linear correlation coefficient of about R2 ) 0.90 with standard deviations below 0.76 and 0.59 min for samples A and B, respectively (Figures 4a and 5a). Note that the data collected earlier for the given digest lack the information regarding the age of the column, its temperature, and exact pH that may result in the lower accuracy in retention time prediction compared with the calibration mixture experiments. However, the interaction energies in Table 1 for amino acid residues were received for pH ) 2.0. Increase in pH can completely reverse the interaction energies order for some residues (e.g., Arg, Thr, His, etc.27); therefore, these energies have to be determined and Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

7775

Figure 4. Correlation between the experimental and theoretically predicted retention times for E. coli tryptic digest. Sample A consisted of 46 tryptic peptides. (a) BioLCCC model; (b) additive approach based on summation of retention coefficients.27-29

Figure 5. Correlation between the experimental and theoretically predicted retention times for E. coli tryptic digest. Sample B consisted of 77 tryptic peptides. (a) BioLCCC model; (b) additive approach based on summation of retention coefficients.27-29

tabulated in the whole range of pH values of up to pH ) 7.0. Also, the BioLCCC model was based on the assumption that the monomer-solvent and the monomer-monomer interaction energies are negligible compared with the monomer-surface and the solvent-surface interaction energies. This assumption may not be completely true in the case of nanoLC columns with peptide overload conditions that may also decrease the accuracy of retention time prediction. Another assumption used in the calculations was the equality between the interstitial and total pore volumes that may not be true for the nanoLC columns used in different LC runs. Finally, the peptide sequence assignments were based truly on the Mascot results of the E. coli database search using CAD and ECD spectra so we would like to keep the safe harbor statement for accuracy in the sequence identifications as well, especially given the possible unidentified isomeric forms for some of the peptides in the samples. However, when the additive approach based on summation of retention coefficients24-26 has been applied to the same data, the correlation between experimental and predicted retention times was much worse with R2 below 0.76 and standard deviations of 1.32 and 0.93 min for the samples A and B, respectively (Figures 4b and 5b). In both cases, we apply the same delay volume estimated for the nanoLC system used. 7776

Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

CONCLUSIONS An approach to sequence-dependent LC retention time prediction based on liquid chromatography at critical conditions (BioLCCC) has been developed and demonstrated for a variety of peptide mixtures. The predictive capabilities of this approach were tested experimentally using peptide standard mixture S1-S5, and the results obtained in these experiments for the interaction energies of amino acids under reversed-phase C18 gradient nanoLC conditions were then applied to a bulk of previously collected nanoLC-MS/MS data for E. coli bacterium. These results have demonstrated the capability of the BioLCCC approach to predict retention times for the peptide mixtures within an accuracy below 1 min that is comparable with alternative approaches such as ANN and different additive models. The major feature of the BioLCCC approach is that it takes into account the rearrangements of amino acid residues in the sequence. This feature, in combination with mass spectrometry, is especially useful for de novo sequencing or to distinguish peptides having the same molecular mass and amino acid compositions. In general, the BioLCCC approach offers the set of data regarding the peptide sequence, which is complementary to mass spectrometry data and, thus, can increase the throughput of LC-MS analysis in the broad range of proteomics research. Further efforts will be focused on the extension of the BioLCCC approach to include sequences containing residues with

posttranslational modifications such as phosphorylation and glycosylation, as well as peptides containing specific functional and end groups. Modifications result in changes in the interaction energies for each amino acid residue and, therefore, for each type of modification, the corresponding table of interaction energies has to be experimentally generated. The simplicity of the BioLCCC approach is based on the neglecting the energies describing monomer-solvent interaction. It is possible to apply more advanced correlation theories to describe effective interaction energies and their dependence on solvent composition. One can also take into consideration the dependence of effective energies on nearest neighbors. Finally, it is possible to introduce a more realistic model for a peptide chain compared with the random walk model used in the presented BioLCCC approach. In this case, the model may be much more complex and require more sophisticated computational tools. This work is currently in progress. Another simplicity of the presented BioLCCC approach is based on the assumption that the interaction forces acting between the monomers in a chain of a macromolecule and between the monomers and the adsorbent surface are short range. It is directly follows from the format of transition matrix in eq 4. This assumption may not be correct for amino acids, which have ionized groups at pH ) 2.0. Particularly the amine groups at the N-terminal can be in the ionized form of NH3+. Interaction energy for this end group may not be short range. Also this charged

N-terminal group affects the interaction energies of nearest amino acids connected to that end due to counterion surrounding.23 Thus, this type of end group needs to be accounted more carefully; that will be a subject of future efforts. Also, the free walk model used to predict peptide separation is less applicable for very short peptides. The larger the size of a peptide, the closer its conformation is to the coil-like one. We assume here, quite boldly, that our model is applicable even for the short peptides based on the preliminary experimental data. Finally, based on the preliminary experimental results, we can conclude that the presented BioLCCC model describes correctly the connectivity of amino acids into a chain and the general mechanism of peptide separation. ACKNOWLEDGMENT This work was supported by International Association (INTAS) (grants 04-83-2643 and Genomics-05-1000004-7759), Russian Foundation of Basic Research (grants 06-04-49632 and 06-08-08085), and Civilian Research and Development Foundation (grant RUE1000588-MO-05). The authors thank Marina Zubareva and Christopher M. Adams from Uppsala University for help with peptide synthesis and LTQ FT experiments. Received for review May 18, 2006. Accepted September 12, 2006. AC060913X

Analytical Chemistry, Vol. 78, No. 22, November 15, 2006

7777