Peptide Retention Time Prediction in Hydrophilic Interaction Liquid

Apr 21, 2017 - The goal of this study was to develop an approach for high-content retention data collection in HILIC (Waters XBridge Amide) mode using...
0 downloads 8 Views 1MB Size
Article pubs.acs.org/ac

Peptide Retention Time Prediction in Hydrophilic Interaction Liquid Chromatography: Data Collection Methods and Features of Additive and Sequence-Specific Models Oleg V. Krokhin,*,†,‡ Peyman Ezzati,† and Vic Spicer† †

Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg, Manitoba R3E 3P4, Canada ‡ Department of Internal Medicine, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg, Manitoba R3E 3P4, Canada S Supporting Information *

ABSTRACT: The development of a peptide retention prediction model for hydrophilic interaction liquid chromatography (XBridge Amide column) is described for a collection of ∼40 000 tryptic peptides. Off-line 2D LC-MS/MS analysis (HILIC-RPLC) of S. cerevisiae whole cell lysate has been used to acquire retention information for a HILIC separation. The large size of the optimization data set (more than 2 orders of magnitude compared to previous reports) permits the accurate assignment of hydrophilic retention coefficients of individual amino acids, establishing both the effects of amino acid position relative to peptide termini and the influence of peptide secondary structure in HILIC. The accuracy of a simple additive model with peptide length correction (R2 value of ∼0.96) was found to be much higher compared to an algorithm of similar complexity applied to RPLC (∼0.91). This indicates significantly smaller influence of peptide secondary structure and interactions with counterions in HILIC. Nevertheless, sequence-specific features were found. Helical peptides that tend to retain stronger than predicted in RPLC exhibit negative prediction errors using an additive HILIC model. N-cap helix stabilizing motifs, which increase retention of amphipathic helical peptides in RP, reduce peptide retention in HILIC independently of peptide amphipathicity. Peptides carrying multiple Pro and Gly (residues with lowest helical propensity) retain stronger than predicted. We conclude that involvement of the peptide backbone’s carbonyl and amide groups in hydrogen-bond stabilization of helical structures is a major factor, which determines sequence-dependent behavior of peptides in HILIC. The incorporation of observed sequence dependent features into our Sequence-Specific Retention Calculator HILIC model resulted in 0.98 R2 value correlation, significantly exceeding the predictive performance of all RP and HILIC models developed for complex mixtures of tryptic peptides. apid developments in the field of proteomics have fueled recent progress in separation science of biomolecules, especially peptides. The vast majority of proteomic analyses nowadays are performed using the bottom-up approach: proteins of interest are digested with specific enzymes; resulting peptides are separated by high performance liquid chromatography (HPLC) or capillary zone electrophoresis (CZE) and analyzed by mass spectrometry.1 This has brought more attention to accelerating the development of peptide separation techniques. These developments can be roughly divided into technological improvements of separation hardware, hyphenation techniques (with MS), and studies targeting better understanding of separation mechanisms. The former developments include ongoing improvements in peptide separation efficiency and selectivity, with the wide adoption of UPLC, core−shell sorbents, and introduction of stationary phases with unique selectivity being best examples. The latter targets the development of peptide retention prediction algorithms,2−4

R

© 2017 American Chemical Society

which have found wide applications in improving the confidence of LC-MS identification,5 and guiding method development for quantitative LC-MS analyses6 and multidimensional LC-MS protocols.7 Our laboratory has long-standing interest in studying the mechanism of peptide reversed-phase chromatography. Applying retention modeling to large retention data sets acquired using proteomic techniques has provided a significant impact. In our early work, we used a data set of 346 tryptic peptides to establish the effect of ion-pairing formation on apparent hydrophobicity of N-terminal residues.8 One of the latest discoveriesthe description of N-cap stabilization of amphipathic helical peptides on the C18 surfacerequired a Received: February 13, 2017 Accepted: April 21, 2017 Published: April 21, 2017 5526

DOI: 10.1021/acs.analchem.7b00537 Anal. Chem. 2017, 89, 5526−5533

Article

Analytical Chemistry

found that the correlation of 0.96 “is on the higher end of previous RP and HILIC peptide retention prediction models.” This review of the literature on peptide retention prediction in HILIC shows that data sets of ∼150 peptides or less were used in all of them. According to ours25 (and others’4) observations, this count is very close to the lower limit that provides sufficient information to accurately derive retention coefficients of individual amino acids in additive (20 parameter) models without being subject to model overfitting. The accuracy of resulting HILIC algorithms varied from 0.92 to 0.97. This also could be a consequence of the small size of the data sets and significant differences in the length of peptides. The superior correlation demonstrated by Le Maux et al. (0.992 R2 value)23 serves as the best example of that. A very small size of peptides (2−4 residues) and subsequent absence of any secondary structure helped achieve this high prediction accuracy. Taken together, these studies led us to the conclusion that a significant increase in the size and complexity of optimization data set for HILIC separation may help in solidifying the assignment of retention coefficients of individual amino acids and finding novel sequence-specific features of HILIC separation. All previously reported HILIC models used retention times accurately determined using ESI MS21−24 or UV20 detection. In our laboratory we often apply a simplified approach based on fraction collection and subsequent analysis of these fractions by either MALDI MS8 or LC-ESI MS (as a second dimension separation).26 Our first version of the reversed-phase SSRCalc model was based on RP-HPLC−MALDI MS analysis of a 17 protein digest.8 We used only 40 fractions, with most of the 346 peptides found within a 25 fraction wide window. Nevertheless, it was sufficient to establish a sequence-specific effect of ionpairing at the peptide N-terminus. Since this study, the throughput of mass analyzers improved significantly. Application of 2D LC-MS with 30−40 fractions in the first dimension and 60−90 min runs in the second dimension can generate tens of thousands of unique peptide identificationshundreds times more than used in any of the previous reports on modeling HILIC separation of peptides. The goal of this study was to develop an approach for highcontent retention data collection in HILIC (Waters XBridge Amide) mode using 2D LC-MS (HILIC-RP, ∼40 fractions in the first dimension) of a complex tryptic digest. The resulting retention data would provide a solid background for confident assignment of retention (hydrophilicity) coefficients of individual amino acids through the development of an additive retention prediction model and help in establishing sequencespecific features of HILIC separation. So far, these features have not been sufficiently described due to the small number of peptides used in previous studies. We anticipate that the fine details of separation mechanisms discovered in this work will be applicable to other HILIC separation conditions (columns) we plan to study.

collection of ∼280 000 peptides with accurately measured retention properties.9 The incorporation of features describing peptide amphipathic helicity into our Sequence-Specific Retention Calculator (SSRCalc) is still ongoing: the current SSRCalc database of ∼1.5 million tryptic peptides separated using C18−formic acid conditions provides solid support for it. Recently, we applied the SSRCalc approach to CZE data using a large collection of peptides (∼4400) identified using CZE-MS/MS; all previously described models used less than 130 peptides. This 30-fold increase in data set size resulted in the discovery of novel sequence specific features that affect peptide electrophoretic mobility and yielded a significant improvement in model accuracy (∼0.995 R2 value).10 We believe that this trend of using larger MS-acquired data sets is applicable to any peptide separation technique. Similar to RPLC and CZE, the increase in the size of the data for any peptide separation mode will shed a new light on details of separation mechanisms and lead to significantly more accurate retention modeling. Hydrophilic interaction liquid chromatography (HILIC) is one of the most popular peptide separation techniques in proteomics.11,12 HILIC separation carries characteristics of other major HPLC techniques: normal-phase, reversed-phase, and ion-exchange.13 Its history started back in the 1970s,14 and the HILIC acronym was introduced by Alpert,15 who also studied this separation technique in great detail. HILIC provides separation efficiency comparable to RP-HPLC along with unique separation selectivity, allowing the separation of hydrophilic analytes that are not retained in RP systems.16 Not surprisingly, it drew a lot of attention from peptide separation specialists upon the arrival of proteomic bottom-up techniques.11 Significant efforts have been applied to establish an optimal combination of stationary/mobile phases to improve both efficiency of separation and sensitivity of ESI MS detection.17,18 Additionally, HILIC exhibits separation selectivity sufficiently orthogonal to RP to prompt its use in multidimensional separation schemes.12,19 Wider application of the HILIC separation mode in proteomics called for the development of HILIC peptide retention prediction models similar to RP-HPLC in the early 2000s. Yoshida developed the first additive model to predict peptide retention on a TSK gel Amide-80 column using a 121 synthetic peptide data set (2−54 residues long, 9.3 residue length on average) with ∼0.94 R2-value prediction accuracy.20 Gilar et al. used retention data sets of ∼150 tryptic peptides to study the contribution of individual residues in peptide retention on three different HILIC columns at different pH values by optimizing additive prediction models with a logarithmic length correction.21 The accuracies of resulting models varied between a 0.92 and 0.97 R2 value. Harscoat-Schiavo et al.22 separated 58 synthetic peptides (2−11 residues, 4.6 on average) on a TSK Gel Amide 80 column and used retention times to build their model based on amino acid composition (0.97 correlation). Le Maux et al.23 used a data set of 153 peptides (2−4 amino acids long) with a resulting 0.992 R2-value correlation; to achieve this accuracy, the authors introduced separate sets for retention coefficients for N- and C-terminal and internal residues (i.e., the model had 60 parameters). Badgett et al.24 used a HILIC− ESI MS of tryptic digest of eight purified proteins to generate a data set of 118 peptides for model optimization; they selected peptides shorter than 15 residues and applied an additive model with correction for N-terminal positions of six residues. They



EXPERIMENTAL SECTION Materials and Digest Preparation. Deionized water and HPLC-grade acetonitrile were used for preparation of the eluents. All chemicals were sourced from Sigma Chemicals (St. Louis, MO) unless otherwise noted. Sequencing grade modified trypsin (Promega, Madison, WI) and 15 mL of Amicon centrifugal filter units (Merck Millipore, Ireland) were used for the digestion. Siliconized 1.5 mL tubes (BioPlas, San Rafael, CA) were used to handle the fraction collection. The custom 5527

DOI: 10.1021/acs.analchem.7b00537 Anal. Chem. 2017, 89, 5526−5533

Article

Analytical Chemistry

Figure 1. Selection of the gradient slope for HILIC separation and fraction collection for 2D-LC MS/MS analysis of yeast digest. (A and B) HILIC separation of the BSA digest using 1 and 0.7% water per minute gradients (10 mM ammonium formate, pH 4.5), respectively. (C) HILIC separation of S. cerevisiae whole cell digest using a 0.7% gradient.

designed standard peptides P1−P627 were synthesized by BioSynthesis Inc. (Lewisville, TX). A tryptic digest of S. cerevisiae was prepared using the FASP protocol scaled up for 15 mL centrifugal filter units.28 The digest (∼1 mg of peptides) was acidified with trifluoroacetic acid, purified by reversed-phase C18 chromatography, aliquoted into vials with approximately 100 μg (according to NanoDrop 2000 (ThermoFisher Scientific)) of digest in each, and finally lyophilized. A tryptic digest of bovine serum albumin was used for the initial selection separation conditions (determining the optimal gradient slope). This digest was prepared using a standard in-solution digestion of 2 mg/mL solution of BSA: reduction/alkylation with iodoaceatmide (with quenching), followed by trypsin digestion. The resulting digest was purified by RPLC, aliquoted (50 μg each vial), and lyophilized. First Dimension Separation Conditions. An Agilent 1100 series HPLC system with a UV detector (214 nm) and 50 uL injection loop was used for HILIC and RPLC separations. A 3 mm × 50 mm XBridge Amide 3.5 μm column (Waters, Milford, MA) was used with a 300 μL/min flow rate for HILIC separations. Both eluents, A (water) and B (9:1 acetonitrile/ water) contained 10 mM ammonium formate (pH 4.5). These were prepared by a 1:10 dilution of 100 mM ammonium formate at pH 4.5 with water and acetonitrile, respectively. Optimized separation conditions to fit a ∼40 min separation window used a 0.7% per minute increase of water content (10 to 60%). The gradient program consisted of the following steps. Starting conditions: 100% eluent B (10% water). A linear decrease of B from 100 to 44.4% (60% water) occurred in 71.43 min. The linear portion of the gradient was followed by a 5 min wash with 90% eluent A and an equilibration step with 100% B. One-minute fractions were collected within the expected interval of peptide elution (8−46 min), lyophilized, and dissolved in 30 μL of buffer A for the second dimension. Second Dimension LC-MS/MS. The 2D LC Ultra system (Eksigent, Dublin, CA) delivered buffers A and B through a 100 μm × 200 mm analytical column packed with 3 μm Luna C18(2) (Phenomenex, Torrance, CA) at a 500 nL/min flow rate. Approximately 1/3 of each collected fraction (10 μL) was spiked with standard P1−P6 peptides (∼200 fmol per injection) and was loaded on a 300 μm × 5 mm PepMap 100 trap-column (ThermoFisher). The gradient program consisted of the following steps: a linear increase from 0.4 to 31% buffer B (acetonitrile) in 77 min, 5 min at 80% B, and then

8 min at 0.4% B for column equilibration (90 min total analysis time). Both eluents A (water) and B (acetonitrile) contained 0.1% formic acid. Data-dependent acquisition using a TripleTOF5600 mass spectrometer (Sciex, Concord, ON) in standard MS/MS mode was used. The following settings were applied: 250 ms survey MS spectra (m/z 370−1500) followed by up to 20 MS/MS measurements on the most intense parent ions (300 counts/s threshold, +2 through +5 charge state, m/z 100−1500 mass range for MS/MS, 100 ms each). Previously targeted parent ions were excluded for 12 s from repetitive MS/MS acquisition. Data Analysis and Retention Time Assignment. Raw spectra files were converted to Mascot Generic Format files for protein/peptide identification by the X!Tandem algorithm. The following search parameters were applied: 20 and 50 ppm mass tolerance for parent and daughter ions, respectively; constant modification of Cys with iodoacetamide; a list of potential modifications including oxidation of Met and Trp; N-terminal cyclization at Gln and Cys; N-terminal acetylation; and deamidation (Asn, Gln). All nonmodified tryptic peptides with a log(e) < −1 confidence score were considered for modeling. All identifications with low confidence values −3 < log(e) < −1 were additionally filtered using retention time prediction. The existing SSRCalc retention prediction model for formic acid conditions, along with stored retention values (Hydrophobicity Index, HI) for yeast peptides from the SSRCalc database, and a first rough approximation of the HILIC model were used for peptide retention filtering in both dimensions. Retention times of peptides in the second dimension were assigned as the time of acquisition of the most intense MS/MS spectra and converted into HI (% acetonitrile) units using the established retention values of the standard peptides.27 Retention times in the first (HILIC) dimension were assigned as being equal to the fraction number in which this peptide was found. When the peptide signal was distributed between two or more fractions, an intensity weighted average fraction number was used.



RESULTS AND DISCUSSION Selection of Chromatographic Conditions (Gradient Slope) in HILIC Mode. A review of the literature showed that linear acetonitrile/water (10−50% water) gradients are typically used for HILIC separation of peptides,20−24 while the pH of the 5528

DOI: 10.1021/acs.analchem.7b00537 Anal. Chem. 2017, 89, 5526−5533

Article

Analytical Chemistry

Table 1. Identification Output of 2D (HILIC-RP) LC-MS/MS and 2D (RP-RP)-LC MS/MS for the Analysis of Whole Cell Yeast Tryptic Digest

a

separation mode

number of fractions

total LC-MS time (hr)

amount injected (μg)

# of MS/ MS

# of identified peptides

# of nonredundant peptide IDs

# of protein IDs

HILIC-RP RP-RPa

38 20

57 30

∼30 ∼30

389917 226386

207357 126705

44489 34621

4218 4093

A standard 2D LC-MS/MS (high pH − low pH) with fraction concatenation applied in our lab.26

Figure 2. Workflow for optimization of the SSRCalc HILIC model.

prediction models. Application of synthetic peptides or digests of purified proteins with a known sequence is the preferable option, but it is time/cost prohibitive when larger data sets are required. Our experience shows that 2D LC-MS/MS analysis of complex digests with retention time prediction filtering in both dimensions represents a compromise between quality and size of the retention data set.26 The steps we apply for dealing with separations with novel selectivity in one of the separation dimensions are shown in Figure S-2. A total of 0.7% (297 peptides) of all identifications were excluded. The remaining population of 40 290 species was used for model optimization. Tryptic peptides in this data set were 6−51 residues long (14.1 on average). Optimization and Major Features of Additive HILIC Model. Due to the specific chemical heteropolymeric nature of peptides, all peptide retention prediction models have major components based on the summation of retention coefficients (RC) of individual amino acids.29,30 These values are usually determined through linear multiple regression analysis with the goal of maximizing the correlation between predicted and observed retention values across all peptides. A 20-parameter model should be supported by a sufficiently large retention data set to avoid overfitting. We have found empirically that ∼100 peptides are needed to ensure a reliable assignment of retention coefficients in additive models; i.e, a ∼5:1 points to parameters count ratio.25 As seen from our overview of the literature, most of the prior models barely meet this requirement. Some of the models introduce separate retention coefficients for terminal positions of the residuesthis further increases the number of required data points. Additionally, all residues should be well represented in the training set sequences for confident assignment of RC; this can be achieved easily using synthetic peptides. Using real protein digests predetermines the representation of each residue according to the natural abundance of amino acids. Therefore, specific care should be taken to ensure that a sufficient number of “rare” amino acids (Trp, Cys) are present.

eluent varies. Our preliminary experiments with separation of synthetic peptides (not shown here) with formic acidic based eluents showed significant peak tailing; these observations were supported by a number of previous publications. 17,23 Subsequently, we selected 10 mM ammonium formate with a pH of 4.5 as an eluent additive. It showed good peak shape and reproducible separation over time. Another important parameter to consider is a gradient slope: it has to provide sufficient peptide separation to fit the expected ∼40 min (fractions) elution window. Figure 1 shows the optimization of experimental gradient slope for HILIC separation using a BSA tryptic digest, chosen to represent typical collection of tryptic peptides. HILIC separation with a 1% per minute increase of water starting with 10% showed that a majority of tryptic peptides from BSA elute between 10 and 30 min (Figure 1A). Thus, the gradient slope was adjusted to 0.7% per minute, which gave the desired length of the separation window (Figure 1B). Separation of the ∼100 μg of yeast tryptic digest is shown in Figure 1C. As expected, we did not observe well-separated high intensity peaks, as this mixture contains thousands of peptide species of greatly varying abundance. LC-MS/MS Analysis in the Second Dimension: Identification Output. One third of each fraction (4 residues, 1 allowed missed cleavage) showed that all 441 664 peptides are amenable to separation in CZE,10 while 14.2% of peptides with HI < 0% acetonitrile will not be retained in the RPLC system with formic acid as an ion pairing modifier (Figure 4C). A similar calculation for HILIC showed that only 0.27% of tryptic peptides have HII < 10% water and will not retain under the separation conditions used.

conformation and makes backbone polar groups accessible for interaction with the hydrophilic stationary phase. Clusters of hydrophobic residues lead to higher retention, too; this effect is opposite to what is observed in RPLC. We also observed that peptides with positive prediction errors often contained a large number of basic amino acids. This can be explained by their additional interaction electrostatic with the residual silanol groups of the modified silica surface. Introducing Helical Features into SSRCalc Modeling. Introducing helical features into SSRCalc modeling was approached in a simple format. The precise incorporation of helicity into prediction models is extremely challenging and has yet to be completely addressed in RPLC modewhich has been studied in much greater detail compared to HILIC. We applied simple corrections using an additive approach, optimizing weighting variables with the goal of increasing the predicted versus observed correlation across the training data set. For example, the corrective algorithm counted the number NP, GP, SP, TP, and DP motifs in the sequence and added their respective weighting values to the overall peptide hydrophilicity (a negative contribution in this case). Similarly, corrections were introduced for clusters of hydrophobic residues, peptide charge, and multiple Pro and Gly instances. Resulting R2 correlation of the model improved slightly from 0.976 to ∼0.980 (Figures 2 and 4A). Expression of Peptide Hydrophilicity in HILIC Separations. All steps of model optimization were performed using unitless hydrophilicity retention values (Figure 2). Practical implementation of retention prediction in HILIC would benefit from expression in a more “user-friendly” format. Similar to RPHPLC, which often uses acetonitrile percentage as an indication of peptide hydrophobicity,27,30 we propose to use the water percentage at which peptide is eluting from the HILIC column as a measure of peptide hydrophilicity Hydrophilicity Index (HII). Retention times (fraction #) were converted into water % using the known values of delay time of this LC system (3.5 min at 300 uL/min) and experimental gradient slope. The unitless output of the predictive model was converted into HII units by introducing mapping slope and intercept values into the final reporting calculation (Figure 2). Peptide Retention Prediction Filtering in 2D LC (HILIC-RP) Systems. Figure 4 shows application of peptide retention filtering in both dimensions to the whole data set of



CONCLUSIONS High-content proteomic analysis has allowed us to access the retention characteristics of ∼40 000 tryptic peptides in HILIC using 58 h of instrument LC-MS time, to build a peptide retention prediction model (SSRCalc HILIC), and to characterize the major additive and sequence specific features driving the separation mechanism. Limiting the size of collected fractions (38 1-min fractions) led to some uncertainty in assigning retention times (≤30 s). However, this still was sufficient for correct delineating of major retention trends: average absolute retention prediction error of the final model was 0.57 min. The optimization data set contained longer peptides (14.1 residues on average) and was more than 2 orders of magnitude larger than that used for all previously reported HILIC models. We applied our proven semiempirical approach to model optimization, using previously acquired knowledge about peptide secondary structure effects in RPLC. This led to the development of the most accurate prediction model for separation of tryptic peptides reported to date (among both RPLC and HILIC). Higher predictive accuracy in HILIC was demonstrated previously only on a collection of peptides two to four amino acids long.23 5532

DOI: 10.1021/acs.analchem.7b00537 Anal. Chem. 2017, 89, 5526−5533

Article

Analytical Chemistry

(2) Krokhin, O. V. Anal. Chem. 2006, 78, 7785−7795. (3) Petritis, K.; Kangas, L. J.; Yan, B.; Monroe, M. E.; Strittmatter, E. F.; Qian, W. J.; Adkins, J. N.; Moore, R. J.; Xu, Y.; Lipton, M. S.; Camp, D. G., 2nd; Smith, R. D. Anal. Chem. 2006, 78, 5026−5039. (4) Moruz, L.; Tomazela, D.; Kall, L. J. Proteome Res. 2010, 9, 5209− 5216. (5) Strittmatter, E. F.; Kangas, L. J.; Petritis, K.; Mottaz, H. M.; Anderson, G. A.; Shen, Y.; Jacobs, J. M.; Camp, D. G., 2nd; Smith, R. D. J. Proteome Res. 2004, 3, 760−769. (6) Lange, V.; Picotti, P.; Domon, B.; Aebersold, R. Mol. Syst. Biol. 2008, 4, 222. (7) Spicer, V.; Ezzati, P.; Neustaeter, H.; Beavis, R.; Wilkins, J. A.; Krokhin, O. V. Anal. Chem. 2016, 88, 2847−2855. (8) Krokhin, O. V.; Craig, R.; Spicer, V.; Ens, W.; Standing, K. G.; Beavis, R. C.; Wilkins, J. A. Mol. Cell. Proteomics 2004, 3, 908−919. (9) Spicer, V.; Lao, Y. W.; Shamshurin, D.; Ezzati, P.; Wilkins, J. A.; Krokhin, O. V. Anal. Chem. 2014, 86, 11498−11502. (10) Krokhin, O. V.; Anderson, G.; Spicer, V.; Sun, L.; Dovichi, N. J. Anal. Chem. 2017, 89, 2000−2008. (11) Boersema, P. J.; Mohammed, S.; Heck, A. J. Anal. Bioanal. Chem. 2008, 391, 151−159. (12) Di Palma, S.; Hennrich, M. L.; Heck, A. J.; Mohammed, S. J. Proteomics 2012, 75, 3791−3813. (13) Buszewski, B.; Noga, S. Anal. Bioanal. Chem. 2012, 402, 231− 247. (14) Linden, J. C.; Lawhead, C. L. J. Chromatogr. 1975, 105, 125− 133. (15) Alpert, A. J. J. Chromatogr. 1990, 499, 177−196. (16) Jandera, P. Anal. Chim. Acta 2011, 692, 1−25. (17) Yang, Y.; Boysen, R. I.; Hearn, M. T. W. J. Chromatogr. A 2009, 1216, 5518−5524. (18) Simon, R.; Enjalbert, Q.; Biarc, J.; Lemoine, J.; Salvador, A. J. Chromatogr. A 2012, 1264, 31−39. (19) Gilar, M.; Olivova, P.; Daly, A. E.; Gebler, J. C. Anal. Chem. 2005, 77, 6426−6434. (20) Yoshida, T. J. Chromatogr. A 1998, 808, 105−112. (21) Gilar, M.; Jaworski, A. J. Chromatogr. A 2011, 1218, 8890−8896. (22) Harscoat-Schiavo, C.; Nioi, C.; Ronat-Heit, E.; Paris, C.; Vanderesse, R.; Fournier, F.; Marc, I. Anal. Bioanal. Chem. 2012, 403, 1939−1949. (23) Le Maux, S.; Nongonierma, A. B.; FitzGerald, R. J. Food Chem. 2015, 173, 847−854. (24) Badgett, M. J.; Boyes, B.; Orlando, R. Chromatogr. Today 2015, 39−42. (25) Shamshurin, D.; Spicer, V.; Krokhin, O. V. J. Chromatogr. A 2011, 1218, 6348−6355. (26) Dwivedi, R. C.; Spicer, V.; Harder, M.; Antonovici, M.; Ens, W.; Standing, K. G.; Wilkins, J. A.; Krokhin, O. V. Anal. Chem. 2008, 80, 7036−7042. (27) Krokhin, O. V.; Spicer, V. Anal. Chem. 2009, 81, 9522−9530. (28) Wisniewski, J. R.; Zougman, A.; Nagaraj, N.; Mann, M. Nat. Methods 2009, 6, 359−362. (29) Meek, J. L. Proc. Natl. Acad. Sci. U. S. A. 1980, 77, 1632−1636. (30) Guo, D.; Mant, C. T.; Taneja, A. K.; Parker, J. M. R.; Hodges, R. S. J. Chromatogr. 1986, 359, 499−518. (31) Alpert, A. J.; Petritis, K.; Kangas, L.; Smith, R. D.; Mechtler, K.; Mitulovic, G.; Mohammed, S.; Heck, A. J. Anal. Chem. 2010, 82, 5253−5259. (32) Lacroix, E.; Viguera, A. R.; Serrano, L. J. Mol. Biol. 1998, 284, 173−191. (33) Richardson, J. S.; Richardson, D. C. Science 1988, 240, 1648− 1652. (34) Mant, C. T.; Litowski, J. R.; Hodges, R. S. J. Chromatogr. A 1998, 816, 65−78.

The accuracy of current SSRCalc models for complex mixtures of tryptic peptides decreases in the order: CZE (0.995 R2 value) > HILIC (0.98) > RPLC (∼0.965). We attribute this to the significantly more complicated mechanisms in sorbentbased separation techniques compared to CZE. We found that the contributions of peptide helicity and interactions with counterions are smaller in HILIC when compared to RPLC. This resulted in higher prediction accuracy for SSRCalc HILIC, seen even in a first attempt to optimize the HILIC model. RPLC systems are the most studied; however the profound effects of amphipathic helicity on peptide retention still hamper efforts of separation scientists to build highly accurate predictive models. Overall, HILIC demonstrates opposite retention trends compared to RPLC. This is true for the retention coefficients of individual amino acids, peptide helicity, presence of hydrophobic clusters within the peptide sequence, etc. At the same time, our observations indicate that the mechanisms of interaction of helical peptides with hydrophobic (RP) and hydrophilic (HILIC) chromatographic supports are fundamentally different. Peptide amphipathic helicity and hydrophobic interactions of amino acid side chains with C18 sorbent dominate in RPLC. In HILIC, accessibility of hydrophilic amide and carbonyl groups on the peptide backbone plays an important role. Helical structures are stabilized by CO··· HN hydrogen bonds, thus excluding them from possible interactions with HILIC sorbent in helical peptides, reducing peptide retention. The drastic difference in the polarity of the mobile phase is another possible factor driving effects of peptide helicity in RPLC and HILICand still has to be explored in greater detail.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.7b00537. (Figure S-1) Distribution of number of detectable features across fractions for 2D LC-MS acquisitions; (Figure S-2) workflow for retention data filtering and optimization of the model; (Table S-1) position dependent retention coefficients; (Table S-2) axial helical projections for peptides in Table 2 (PDF)



AUTHOR INFORMATION

Corresponding Author

*Fax: (204) 480 1362. E-mail: [email protected]. ORCID

Oleg V. Krokhin: 0000-0002-9989-6593 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (RGPIN-201605963; O.V.K.). The authors thank Dr. M. Gilar for providing HILIC columns. The authors also thank Dr. D. Court and S. Shuvo for providing S. cerevisiae samples.



REFERENCES

(1) Bensimon, A.; Heck, A. J.; Aebersold, R. Annu. Rev. Biochem. 2012, 81, 379−405. 5533

DOI: 10.1021/acs.analchem.7b00537 Anal. Chem. 2017, 89, 5526−5533