mi
Anal. Chem. 7993, 85, 3701-3707
CORRESPONDENCE
Dependence of Retention and Separation Quality in Planar Chromatography on Properties of the Mobile Phase David Nurok>+Robert M. Kleyle) Kenneth B. Lipkowitz> Steven S. Myers> and Michelle L. Kearnst School of Science, Indiana University-Purdue University at Indianapolis, 402 North Blackford Street, Indianapolis, Indiana 46202
INTRODUCT10N There is a large body of literature relating retention, in both gas and liquid chromatography, to the properties of the solutes being separated and a rather smaller number of references relating retention to the properties of the stationary phase.l With the exception of a few studies where retention is related to the properties of the mobile phase using theoretical considerations,24 most studies involving the mobile phase have been limited to the variation of retention with the percentage composition of a mobile phase consisting of two or more preselected components5 or to the variation of retention with mobilephase strength, defined by w e of a chromatographic strength parameter such as to or P or a solvatochromic parameter such as E~(30).' Retention is usually expressed as the capacity factor, the retention index, or-in planar chromatography-as Rj. The relationship between retention and solute properties is well reviewed in the book by Kaliszan,' which contains a table listing 44 references that use solute descriptors such as dipole moment, molar refractivity, length-to-breadth ratio, quantum chemical indexes, or topological indexes. Some reports consider only a very limited number of descriptors, whereas the number of potential descriptors considered can be very large, as in the report by Stanton and Jura8 in which the gas chromatographicretention index, for 107 substituted pyrazines is related by a suitable regression equation to six molecular structure descriptors selected on statistical criteria from 85 candidate descriptors. There are also a number of reports relating retention in either gas9 or liquid10 chromatography via a regression model in which the descriptors are selected on the basis of linear solvation energy relationships. Very high values of the multiple correlation coefficient are listed in many of these reports, including those in refs 8-10, As noted earlier in this report, most studies relating retention to the properties of the mobile phase in liquid + Department of
Chemistry. t Department of Mathematical Sciences. (1)Kaliszan,R. Quantitatiue Structure-Chromatographic Retention Relationships; John Wiley and Sons: New York, 1987. \ (2) Scott, R. P. W. J. Chromatogr. 1976,122,35. (3) Horvath, C.; Melander,W.; Molnar, J. J. Chromatogr. 1976, 125, 129.
(4) Snyder, L. R. Principles of Adsorption Chromatography; Marcel Dekker: New York, 1968. (5) Schoenmakers,P. J. Optimixationof Chromatogra~hic Selectivity; Elsevier: Amsterdam, 1986. (6) Snyder, L. R. J . Chromtogr. Sci. 1978, 16, 223-234. (7) Johnson, B. P.; Khaledi, M. G.; Dorsey, J. G. Anal. Chem. 1986, 58,2354-2365.
(8)Stanton, T. S.; Jurs, P. C. Anal. Chem. 1989,61,132&1332. (9) Poole, C. F.; Kollie, T.0.;Poole, S. K. Chromatographia 1992,34, 216-302.
(10)Sadek, P. C.; Carr, P. W.; Dohetry, R. M.; Kamlet, M. J.; Taft,R. W.; Abraham, M. H. Anal. Chem. 1985,57, 2971-2978. 0003-2700/93/036C3701$04.0010
chromatography are restricted to descriptors such as the percentage composition of the mobile phase or ita strength as defied by a parameter such as eo. There do not appear to be any reports of a regression equation relating retention to molecular descriptors such as dipole moment and surface area of a mobile-phase component. In the discussion below it isshown that retention in planar chromatography, ae defmed by average Rj,is related to the density, dipole moment, molar volume, and surface area of the variable solvent in a series of binary mobile phases in which the common component is a strong solvent of fixed concentration. Such a series, with a variable weak solvent, spans a large range of mobile-phase strength." There are several established approaches in both gas and liquid chromatography (see ref 5 for a general review of the topic and ref 12 for a review with respect to planar chromatography) to optimizing separation quality wing a suitable separation metric. An example would be the use of the COF'S as the dependent variable and the percentage composition of a binary or ternary phase as the independent variable. In ita simplest form the COF is
Ri In 1=1 Rd n
COF =
where Ri is the resolution between an adjacent pair of peaks, Rd is the desired resolution, and n is the number of peak pairs of interest. Any pair of peaks having a lesser resolution than that desired contributes a negative increment to the COF. The metric has a value of zero when all peak pairs have the desired resolution. There are, however, no published reports of a separation metric being related to the physical propertiea of the mobile phase. Such a relationship is described in this communication.
EXPERIMENTAL SECTION The following experimentalwork was performed to obtain data for the computer-simulated separations that are wed in the regression studies. The steroids and dansyl amino acids were purchased from the Sigma Chemical Co. (St. Louis, MO), and the solvents were purchased from the Aldrich Chemical Co. (Milwaukee,WI). The reagents for preparing the esters of dansyl amino acids were a gift from Pierce (Rockford, IL),and the esters were prepared according to the procedure in the 1989 Pierce Handbook, which is based on ref 14. A single spot was obtained for all derivatives, (11) Nurok, D.; Julian, L. A.; Uhegbu, C. E. Anal. Chem. 1991, 63, 1524-1529. (12) Nurok, D. Chem. Rev. 1989,89,363-375. (13) Glajch, J. L.; Kirkland, J. J.; Squire, K. M.; Minor, J. M. J. Chromatogr. 1980,199,57-79. (14) Knapp D. R.; Krueger, S. Anal. Lett. 1975,8,603-610.
Ca 1993 Amerlcan Chemlcal Society
8702
ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993
except those of serine and threonine, each of which yielded two spots. The spot with the lower Rf was used for these two derivatives. K5 silica gel TLC plates, Catalog No. 4850-820, were a gift from Whatman Inc. (Clifton,NJ). The plates were heated at 90 "C for 30 min and then maintained at a relative humidity of 60 % until immediately before use. Chromatography was performed in a twin trough chamber (Camag Scientific Inc., Wilmington, NC) using plates cut in 10 X 10 cm sections, with a 15-min conditioningperiod after adding solvent. The solventpath length used was 7.3 cm. Visualizationwas in the fluorescencemode for both classes of solute, after removal of mobile phase from the TLC plate. In the case of the steroids,the plate was first sprayed with 10%sulfuric acid and then heated at 110 OC for 10 min before visualization. An alternative procedure was used for visualizing steroids, for the separations with 2-butanone as constant strong solvent. The plate was dipped into a 17% aqueous sulfuric acid solution for about 2 s and then heated at 110 O C for 10 min. The mobile phases used were binary mixtures of a strong and weak solvent. For each solute in each mobile phase, Rj values were measured at five mole fractions of the strong solvent in order to determine the regression constanta in eq 2 below. It was possible to perform simulated separations once these constants were determined.
CHROMATOGRAPHIC RELATIONSHIPS Soczewinskihas reported,ls in a somewhat different format, the following linear relationship: log ki = ailog X,+ bi
(2)
where ki is the capacity factor of solute i, X,is the mole fraction of the strong solvent in a binary mixture of a strong and weak solvent, and a and b are experimentally determined constants for each solute. The relationship between Rf and capacity factor is given by
R, = 1/(1+ k)
(3)
The difference between two Rf values, A&, is (4) Uf = R , l - Rf,, and may be defined in terms of the corresponding capacity factors kl and kz:
(5)
SD,the distance between two spot centers is ARfL
(6)
where L is the mobile phase path length. The value of L was set at 7.3 cm in this paper to correspond to the path length used for the experimental determination of the regression constanta in eq 2. The IDF was arbitrarily selected as the metric for separation quality in this study. It has been used in various optimization studies in planar chromatographyl"la and is similar in form to a metric introduced by Gonnord and co-authors19for a (15) Soczewinski, E.Anal. Chem. 1969,41, 179-182. (16) Habibi-Goudarzi,S.;Ruterbories,K. J.; Steinbrunner,J. E.;Nurok, D. J . Planar Chromatogr. 1988, 1, 161-167. (17) Nurok, D.; Habibi-Goudarzi, S.;Kleyle, R. M. Anal. Chem. 1987, 59,2424-2428. (18)Nurok,D.;Knotts,K.D.;Kearna,M.L.;Ruterbories,K.J.;Uhegbu, C. E.; Alberti, P. C. J. Planar Chromatogr. 1992,5, 350-358. (19) Gonnord, M.-F.; Levi, F. J.; Guiochon, G. J. Chromatogr. 1983, 264, 1-6.
similar purpose. The metric is defined as IDF =
" 1 Es;,
(7)
1-1
where n is the number of neighboring solute pairs and Sb is the distance between neighboring spot centers. Spots that are separated by less than 1.0 mm are assigned an SDof 1.0 mm in order to prevent the IDF being overly weighted by the most poorly separated spots.
COMPUTATION OF SOLVENT AREAS The surface area of each solvent was obtained by a computational approach, due to the lack of a simple experimental method for determining this parameter. A variety of different types of molecular surfaces have been defined,20 especially for quantitative structure-activity relationships (QSAR),21and the solvent-accessible surface area was used in the current study. This area has been used in a variety of reports including the determination of protein folding and the prediction of solubility of drug molecules.22The solventaccessible area was first defined by Lee and Richards as the locus of the center of a solvent %phere" which is rolled over the van der Waals surface of the s ~ l u t e For . ~ the ~ ~current ~~ study, solvent-accessible surface areas were calculated using the method and original parameters of Lee and Richards,29 with the implementation of a grid spacing of 0.1 A. Cartesian coordinates of solvent molecules needed for these calculations were determined by the MM2 force field26 or, if parameters were not available, with MOPAC2s using the AM1 Hamiltonian. It is common practice to partition the surface area into polar and nonpolar categories in a manner that is somewhat arbitrary and subjective. The nonpolar category is further divided into the accessible surface area contributed by saturated atoms and that contributed by unsaturated atoms such as alkene, alkyne, and aromatic carbons. Carbon, hydrogen, and also halogen atoms are treated as nonpolar. Monovalent atoms attached to the unsaturated atoms of benzene are classified as saturated in the algorithm ~ e d . ~ 3 Hence, as an example, benzene has a component of saturated (hydrogen) and unsaturated (carbon) atoms contributing to the total nonpolar surface area.
MULTIPLE LINEAR REGRESSION Models were constructed using the stepwise rather than the conventional procedure to ensure that all terms included are statistically significant. This approach-which may be performed in the forward or the backward sense-allows greater confidence in the relationships between the independent variables and the dependent variable. (Vide infra for the reasons for using the backward version in this study.) This version of stepwise regression begins by performing a complete regression including all of the independent variables. The regression coefficients for each term are then tested for statisticalsignificancea t some specified significance level. The 0.15 level was used in this study because of the (20) Mezey, P.In Reviews in Computational Chemistry; Lipkowitz, K. B., Boyd, D. B., Eds.; VCH Inc.: New York, 1990; Chapter 7. (21) Pearlman, R. S. In Physical Chemical Properties of Drugs; Yalkowsky, S . H., Sinkula, A. A.,Valvani, S.C., E&.; MedicinalReeearch Series 10; Marcel Dekker: New York, 1980; Chapter 10. (22) Camilleri, P.;Watts, S.,A.; Boraston, J. A. J.Chem. Soc., Perkin Tram. 2 1988,1699-1707, and references therein. (23) Lee, B.; Richards, F. M. J. Mol. Biol. 1971,55, 379-400. (24) Richards, F. M. Methods Enxymol. 1986,115, 440-464. (25) Allinger, N.L. J. Am. Chem. SOC.1977,99, 8127-8134. (26) Stewart, J. J. P. QCPE Bull. 1989, 9, 98.
ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993
3703
Table I. Steroids Table 111. Physical Properties of the Weak Solvents 17a-acetoxyprogesteronec 8-estradiol-3-methylethe9 molar dipole satd unsatd AI-adrenosteroneasb estrone4$ density, vol, moment, area, area, 5a-androstane-3,17-dionebethisteroneb solvent g/mL mL D A A 4-androstene-3,17-dione*c 17a-ethynylestradio13-methylethep bromobenzenen 1.495 105.0 1.70 98.0 43.0 5-androstene-3B.17B-diol~~~ hydrocortisonea$ .. . 111.4 96.5 0.00 0.0 l?a-methyl-As-androstene-3~,17~-diolb carbon tetrachloridea4 1.594 androsteronec chlorobenzene1.108 101.8 1.69 88.2 40.5 prednisoneb cholesterola*b l-chlorobutane142.9 0.0 0.886 104.5 2.05 5a-pregnane-3&208-diolb corticosteroneasb 72.1 80.5 chloroform1.483 1.01 0.0 pregnenolone4sb cortisone4J 2-chlorotoluene4 1.083 116.9 1.56 114.2 39.2 l-dehydrotestosterone4~b progesterone4.b cumene' 0.862 139.5 137.9 40.4 spironolactonec 11-deoxycortisol ac 0.00 cyclohexane0.779 108.1 0.0 146.9 stigmasterolb diethylstilbestrolb*d 94.1 cyclopentanea 0.746 0.0 0.00 131.1 epiandrosteroneb testosteronea*c d e ~ a l i n ~ ~ ~ ~ ~ @ 0.883 154.2 200.88 0.0 f 8-estradio1a.c 0.00 0.0 254.8 n-decaneH 0.730 194.9 0.00 299.5 n-dodecaneb 0.749 227.5 0.0 a Used with 2-butanoneas strong solvent. Used with ethyl acetate 120.4 39.7 ethylbenzene0.867 122.5 0.59 as strong solvent. c Used with tetrahydrofuran as strong solvent. 94.0 76.8 46.5 1.60 fluorobenzenea 1.023 Nonsteroidal hormone; included inadvertently. 168.8 0.00 n-hexane44 0.660 130.5 0.0 208.1 isooctaneM 0.692 165.1 0.0 f 1.60 64.0 methylene chloride4*bc,e 1.327 0.0 87.6 Table 11. Dansyl Amino Acids. 109.5 61.2 a-methylstyrend 0.911 130.1 a-amino-n-butyric acid norvaline 213.3 n-octaneboc 0.703 162.6 0.00 0.0 y-amino-n-butyric acid phenylalanine n-pentane4?bdee 0.626 115.2 0.0 0.00 148.8 aspartic acid sarcosine 124.5 0.00 102.2 tetrachloroethylenea 1.623 8.4 glutamic acid serine 0.36 toluenea4 96.6 41.1 0.867 106.3 glycine threonine 126.7 34.7 1,2,4-trichloroben~ene~ 1.454 124.8 leucine tryptophan 1.78 132.0 99.6 l,l,l-trichloroethanea 0.0 1.339 methionine valine 11.8 106.3 89.7 trichloroethylenea 1.464 norleucine 145.1 0.0 1,1,2-trichlorotri1.564 138.0 fluoroethane" 4 Used as p-nitrobenzyl esters. 0.861 123.3 0.00 123.2 39.5 p-xylenea
exploratory nature of the analysis and because the sample sizes (Le., the number of weak solvents) are relatively small. If all terms in the model are significant at the 0.15 level, the full model is selected as in ordinary multiple regression. Otherwise the term with the least significant regression coefficient (i.e., the largest p-value) is deleted, and the regression is recomputed. Terms are deleted from the model in this manner until all the remaining terms are significant at the 0.15 level or until all terms except the intercept have been eliminated (i.e., the model collapses). Only one term is eliminated at each step in the procedure even though more than one term may fail to meet the significance criterion. This is because when the regression is recomputed in the next step, a term which was not significant in the previous step due to its relationship with the deleted term may now be significant. In this study the backward procedure was found superior to ordinary (Le., forward) stepwise regression, which builds the model one term at a time. With ordinary stepwise regression it was found that, for several of the systems, none of the individual terms correlate highly enough with the dependent variable for the procedure to get started. For the majority of the systems, backward stepwise regression produced viable results with decent, and in some instances very good, R2.
RESULTS AND DISCUSSION The mobile phases discussed in this report consist of a fixed concentration of either 2-butanone, ethyl acetate, or tetrahydrofuran in a binary mixture with each of a series of different weak solvents. Each of these sets of mobile phases is referred to as constant strong solvent series. The value of the average Rf, and of the IDF, was calculated for the separation on silica gel of mixtures of the steroids listed in Table I or the p-nitrobenzyl esters of the 15 dansyl amino acids listed in Table 11. The physical properties of the weak solvents are listed in Table 111. The regression analyses reported in this paper are based on simulated separations. Equation 2 was employed for the
Used as a binary with ethyl acetate for separating the p-nitrobenzyl esters of dansyl amino acids. Used as a binary with ethyl acetate for separating steroids. Used as a binary with tetrahydrofuran for separating steroids. Used as a binary with 2-butanone for separating p-nitrobenzyl esters of dansyl amino acids. e Used as a binary with 2-butanone for separating steroids. f Decalin and isooctane were assigned a dipole moment of 0.0 D in all models containing 15or less weak solvents. 8 Average of areas for cis- and trans-decalin.
prediction of capacity factor by interpolation over the range of mole fractions used for determining the regression coefficients. As this was an exploratory investigation, extrapolation beyond this mole fraction range was also used for several of the systems where data were not available at either of the mole fractions considered. The Rf values were determined using eqs 2 and 3, and the IDF values were determined using eqs 2 and 5-7. Most of the chromatographic data for this report was available from previous studies. Additional data were generated by separating the p-nitrobenzyl esters of the dansyl amino acids in a series of 25 constant ethyl acetate mobile phases, 10 of which were the same mobile phases as used in the preliminary study. Dipole moment is an important descriptor (vide infra). Literature values of the dipole moment were available for 18 of these solvents.27 The average Rf and IDF were calculated for each systemat strong solvent mole fractions of 0.1 and 0.3. Regression analyses were performed with either average H f or the IDF as dependent variable and the following properties of the weak solvent as independent variables: density, dipole moment, molar volume, saturated surface area, and unsaturated surface area. These variables were selected due to their ready availability, either from published tables or by computation. It is likely that other descriptors would yield equivalent or better regression models. Both linear and second-order models were considered. The data available were used to construct a linear model, in which all the (27)Nelson, R. D., Jr.; Lide, D. R., Jr.; Maryott, A. A. Selected Values Electric Dipole Moments for Molecules in the Gas Phase; NSRDSNBS 10,1967(reprintedby the National Technical Information Service). of
3704
ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993
Table IV. Linear Regression with Average &as Dependent Variable
constant solvents ethyl acetate (first replicate) ethyl acetate (second replicate) ethyl acetate (first replicate) ethyl acetate (second replicate) ethyl acetate ethyl acetate ethyl acetate ethyl acetate ethyl acetate ethyl acetate ethyl acetate ethyl acetate tetrahydrofuran tetrahydrofuran 2-butanone 2-butanone 2-butanone 2-butanone
General Form of the Linear Regremion Equation:' Rf = A + B p + Cp + DV + EA, + FA, least squares regremion Coefficients A B C weak solvents solutes X,
F
R2
10
15 NEDAAb
0.1
0.80
0.16
C
C
0.766
10
15 NEDAAb
0.1
0.59
0.16
C
C
0.730
10
15 NEDAAb
0.3
0.75
0.13
0.20
0.01
0.983
10
15 NEDAAb
0.3
0.62
0.15
0.19
0.01
0.980
18 18 25 18 18 25 15 15 12 12 10 10 11 11
15 NEDAAb 15 NEDAAb 15 NEDAAb 15 NEDAAb 15 NEDAAb 15 NEDAAb 15 steroids 15 steroids 11 steroids 11 steroids 15 NEDAAb 15 NEDAAb 15 steroids 15 steroids
0.1 0.1 0.1 0.3 0.3 0.3 0.1 0.3 0.1 0.3 0.1 0.3 0.1 0.3
0.42 0.69 0.67 0.63 0.76 0.83 0.06 0.31 0.48 0.53 0.13 0.49 0.53 0.75
0.15
C C
C
C
(+)d
0.780 0.516 0.675 0.932
e e
0.13 e e 0.07 0.06 0.10 0.07 0.15 0.10 0.07
0.13 0.22 0.18 0.15 0.18
C
0.01 (+Id
0.01 0.01
0.809
0.853
C
0.852
(+Id
0.963 0.867 0.988 0.974 0.976 0.796 0.903
C
C
0.25 0.16 0.30
C
C
C
0.21
(+)d
(+)d
0.01
p is the dipole moment, p is the density, V is the molar volume, A, is the saturated area, and A,, is the unsaturated area. b p-Nitrobenzyl estersof dansyl amino acids. Indicatesthat the term is not in the regremion model generated by the backward stepwiseprocedure. d Regression coefficients significant (is less than 0.005) and has the sign shown in parenthesis. e Dipole moment not considered as a possible variable.
independent variables were considered for each system. Not every potential variable is necessarily included in the final model, since the backwards stepwise procedure was used to construct the regression. Second-order models were constructed only for the separation of the amino acid derivatives in the constant ethyl acetate series, which contains sufficient data to prevent overfitting of the model. A problem in any regression study is to decide on the number of observations required to build a meaningful model. The number of terms in a linear model corresponds to the number of descriptors in the model plus the intercept. Inspection of Tables IV and V shows that the linear models contain between three and six terms and the number of weak solvents varies between 10 and 25. There are a maximum of 10 terms in a second-order model with three descriptors, and inspection shows that the models in Table VI, which are for either 18or 25 weak solvents, contain between 5 and 10terms. There are a maximum of 15 terms in a second-order model with four descriptors, and these models were considered for 25 weak solvents and were found to contain between 5 and 13terms. The reason that most models contain less than the maximum number of terms is due to the stepwise procedure that was used. The residual degrees of freedom (the number of observations minus the number of terms in the model) for the models in the three tables, as well as for the four descriptor models, vary between 5 and 20. The least residual degrees of freedom are in the linear models for the series containing between 10 and 12 weak solvents. These particular models are not strong enough to stand alone, but are included both to illustrate that this approach appears to apply to a variety of systems and, for the amino acid derivatives in the constant ethyl acetate systems, to demonstrate reproducibility. There are, however, several models in this report, where there are between three and five times more residual degrees of freedom than there are terms in the model, and these models should be meaningful in predicting either the average Rf or the IDF as a function of the relevant set of descriptors.
Tables IV and V list the linear models for average Rf and IDF, respectively, as dependent variable. In all the systems examined with average Rf as dependent variable, the multiple correlation coefficient (R2) is higher (for 2-butanone the difference is marginal) a t a mole fraction of 0.3than a t a mole fraction of 0.1, with the highest value being an R2 of 0.988for the separation of 11steroids in a constant tetrahydrofuran series. The situation is different with the IDF as dependent variable in that the value of R2 is-with one exception-lower a t a mole fraction of 0.3 than a t a mole fraction of 0.1. The exception is the ethyl acetate system with steroids as solutes, where the value of R2 is virtually the same at both concentrations. At a mole fraction of 0.1, the value of R2 for the IDF models is greater than 0.93in each of the systems in which dipole is not excluded as a variable. These results show that the properties of the weak(er)solvent are important predictors of separation quality, even in the presence of a strong solvent. With the exception of the fiist replicate of 10 solvents in the constant ethyl acetate series of mobilephases, the intercept of the average Rf model is larger at the higher concentration of the strong solvent, as would be expected with an increase in mobile-phase strength. The intercept of the IDF model is smaller (i.e., separation is better) at the higher concentration of strong solvent, which shows the dependence of separation quality on mobile-phase strength. With one exception, all the regression models for both average Rf and IDF contain four or less of the possible five independent variables. There is a strong collinearity between molar volume and saturated surface area for the solvents used, and the consequenceof this is that many of the models contain only one of these two descriptors. Inspection of the relevant coefficient indicates that, a t the lower mole fraction, dipole moment has a larger contribution to Rfandalso a larger contribution (i.e., a more negative value of the coefficient) to separation quality as defined by the IDF. This indicates that the polar properties of the weak solvent make a larger contribution to the character of the
8705
ANALYTICAL CHEMISTRY, VOL. 85, NO. 24, DECEMBER 15, 1993
Table V. Linear Regression with the I D F as Dependent Variable
+
General Form of the Linear Regression Equation:' IDF = A + E p + Cp + DV + EA, FA, least squares regression coefficients A weak solvents solutell E C D E X, ~~
constant solvents ethyl acetate (first replicate) ethyl acetate (second replicate) ethyl acetate (first replicate) ethyl acetate (second replicate) ethyl acetate ethyl acetate ethyl acetate ethyl acetate ethyl acetate ethyl acetate ethyl acetate tetrahydrofuran tetrahydrofuran 2-butanone 2-butanone 2- butanone 2-butanone
R2
10
15 NEDAAb
0.1
15.77
-1.54
-2.84
C
C
-0.05
0.948
10
15 NEDAAb
0.1
15.15
-1.58
-1.91
C
C
-0.08
0.959
10
15 NEDAAb
0.3
9.65
C
C
C
C
C
d
10
15 NEDAAb
0.3
9.98
C
C
C
C
C
d
18 18 25 18 25
15 NEDAAb 15 NEDAAb 15 NEDAAb 15 NEDAAb 15 NEDAAb
0.1 0.1 0.1 0.3 0.3
13.28 11.49 10.21 6.78 9.61
-1.75 e e 0.85 e
-0.95 -2.18 -1.67
-0.07
0.06 0.02
C
C
C
-0.03 0.02
C C
C
C
C
C
0.942 0.691 0.722 0.272 d
15 15
15 steroids 15 steroids
0.1 0.3
10.85 6.66
-1.07
-2.93 -0.91
0.03 0.02
C
-0.05 -0.01
0.944
C
12 12
11steroids 11steroids
0.1 0.3
1.45 -0.60
-0.57 0.54
C
C
1.69
0.04 0.02
-0.06 -0.02
0.946 0.676
10 10
15 NEDAAb 15 NEDAAb
0.1 0.3
16.30 4.66
-1.97 0.52
-4.04 2.38
C C
C
0.02
-0.04 0.02
0.931 0.755
11 11
15 steroids 15 steroids
0.1 0.3
12.41 10.95
-1.10
-2.84
0.07
-0.05
-0.09
C
C
-0.05
0.03
C
0.950 0.334
C
-0.06 -0.07 C
C
0.942
0 p is the dipole moment, p is the density, V is the molar volume, A, is the saturated area, and A, is the unsaturated area. p-Nitrobenzyl esters of dansyl amino acids. Indicates that the term is not in the regression model generated by the backward stepwise procedure. None of the regression coefficients (except the intercept) are statistically significant at the 15% significancelevel. The linear model collapses.e Dipole moment not considered as a possible variable.
Table VI. Separation of pNitrobenzyl Esters of Dansyl Amino Acids, by a Series of Constant Ethyl Acetate Mobile-Phase Second-Order Regression Models with Three Independent Variables terms in the modelb dipole moment as independent variables" mole no. of X Y Z P P 2 2 X Y X Z Y Z X Y Z R" fractn weak solvents potential descriptor
0.1
0.3
18
Yes
25
no
18
25
0.1
0.3
no
18 25
no
18
Yes no
25
Dependent Variable: Average Rf V A, .d P P V A, V A , * P P Ae A , P Ae A , * P V A , . P A, A , * P V A, P V A, P P A, V A, P P V A , * Dependent Variable: V A, P A P P P Ae A P Ae A P V A P V A
-C 0
.
.
* . . .
.
o * o -
. .. . - -
0.974 0.973 0.971 0.956
-
.
0.W
-
0
0
-
0
.
-
-
-
0
.
-
0
.
-
0
-
e
0
.
-
.. .- .- .. .- . - - . .- .- - 0
.
-
-
0
0
*
-
.
.
-
0
-
-
0
-
-
-
0
a
, , ,
* . *
*
*
. -
-
. *
U
.
.
.
o
,
*
-
-
*
.
-
IDF -
0
-
0
.
0
-
0
-
.
-
-
0.949
-
. - . - -
-
0
.
0
.
0.979 0.969 0.964 0.859
. . o
.
0.8841
0.983 0.977 0.977 0.959 0.954
.
.
0.907
. 0
.
-
0.730'
0 p is the dipole moment, p is the density, V is the molar volume, A, is the saturated area, and A, is the unsaturated area. The intercept is an additional term. Models listed have an R2greater than 0.95 or are for the model with best regression fit. 0 , term in the regression model. e -, term not in the regression model. f There are other models with similar but slightly lower R2.
mobile phase when there is a lower concentration of strong solvent. Inspection of either the IDF or average Rfmodels, in a given strong solvent system, shows that the regression coefficient for dipole moment has a consistently larger absolute value for the amino acid derivatives ( which are polar) than for the steroids (which are nonpolar). The resulta for the amino acid derivatives in the constant ethyl acetate series indicate that exclusion of dipole moment
as a descriptor leads to a substantial reduction in the value of the multiple correlation coefficient in either the average Rf or the IDF models. There is little variation in the value of the multiple correlation coefficient for the models without dipole moment, when either 18 or 25 weak solventa are considered. The magnitude of the unsaturated surface area coefficient may be interpreted as a measure of the role played by induced
S706
ANALYTICAL CHEMISTRY, VOL. 05, NO. 24, DECEMBER 15, 1993
dipole in these separations. In the IDF models, the coefficients of dipole moment and unsaturated surface area behave in a broadly similar manner; these are negative at the lower mole fraction and less negative-or positive-at the higher mole fraction. Unsaturated surface area is consistently absent from the average Rf models at the lower mole fraction, but present with a coefficient of very low value at the higher mole fraction. The coefficient is positive' for both dipole moment and unsaturated surface area when these descriptors are present in the latter models. Replicate data were available for 10 of the constant ethyl acetate mobile phases for the separation of the amino acid derivatives, and it was thus possible to obtain a preliminary estimation of the stability of the regression models. There is a good agreement-with respect to regression coefficients and R2values, but not with respect to the intercept-between the respective replicates for the Rf model at a mole fraction of either 0.1 or 0.3. There is a very good agreement between replicates-with respect to intercept, regression coefficients, and R2 values-for the IDF model at a mole fraction of 0.1. The IDF model for 10 mobile phases collapses at a mole fraction of 0.3,even though there is still a good agreement between the replicate values of intercept. Replicatedata were not available for the other systems. The totalnumber of weak solvents considered in this study is rather small, and it is quite possible that the regression model may change if this number is increased. It is nevertheless noted, for the amino acid derivatives in the constant ethyl acetate series, that the two dipole-included models containing either 10 or 18 weak solvents have reasonably similar intercepts, regression coefficients, and R2 values in the IDF models at a mole fraction of 0.1. There is no point in comparing the corresponding models at a mole fraction of 0.3; in one case there is a very poor R2 value, and in the other case there is no regression fit. The Rf models, for 10 and 18 solvents, at a mole fraction of 0.1 are reasonably similar with respect to regreasion coefficients and R2values, but not with respect to the intercept. The correspondingmodels at a mole fraction of 0.3are reasonably similar with respect to intercept, regression coefficients, and R2. A similar comparison can be made for 18and 25 weak solvents if dipole moment is excluded as a descriptor. There is a reasonably good agreement in intercepts, regression coefficients, and regression fit when the respective average Rf models are compared at either concentration of ethyl acetate, or the IDF model at a mole fraction of 0.1. As discussed above, there is no point in discussing the IDF model at a mole fraction of 0.3,as it is either very weak or nonexistent. The behavior of Rf and the IDF may also be described using second-order models. The value of the multiple correlation coefficient in such a model increases (even though each successive increase may be very small) with an increasing number of variables. There is, however, a limit to the number of variables that can be considered for a given data set, because the number of potential terms in such a regression equation becomes larger as the number of variables increases. With two variables there are 6potential terms, with three variables there are 10 potential terms, and with four variables there are 15 potential terms. The second-order models were constructed for the amino acid derivatives in the constant ethyl acetate series, as this is the only series where there is sufficient data to include three variables without overfitting the model. TableVI shows the terms present (in addition to the intercept) in the best R f and IDF models a t the two concentrations considered, as well as all models with R2 greater than 0.95. Sets of three descriptors were selected either from the five potential descriptors in the series of 18 weak Solvents or from the four
descriptors (dipole moment excluded) for the series of 26 weak solvents. It is seen that the value of R2,for the best of the second-order modela, is higher than in the linear models, for the corresponding chromatographic system. The greatest improvement is for the IDF model at a mole fraction of 0.3, where the R2 values are 0.272 and 0.907, respectively, for the linear and the best second-ordermodel. The highest R2is for the Rf model a t a mole fraction of 0.3where the value is 0.983 with dipole moment, saturated surface area, and unsaturated surface area as independent variables. The existence of more than one satisfactory second-order model suggesta that the variables used are surrogates for more fundamental solvent descriptors. The linear models require dipole moment for a good regreasion fit, whereas there are two second-ordermodels without dipole that have R2 greater than 0.95. Each of these two models has density, molar volume, and unsaturated surface area as variables. There are sufficient degrees of freedom with 26 weak solvents to consider a second-ordermodel with all of the four available descriptors (i.e., with dipole excluded) that are available for the set of weak solvents used for separating the amino acid derivatives. For the IDF model, this results in somewhat similar multiple correlation coefficienta-at X,0.1, the R2 for the beat three-variable model is 0.859 while with four independent variables R2 is 0.850. At X, 0.3, the corresponding values are 0.730and 0.834. The corresponding values for the Rf model are 0.884and 0.960at a mole fraction of 0.1. At a mole fraction of 0.3,the best three-variable model has an R2of 0.949,as does the potentially four-variablemodel. In thisinstance the latter collapses to a three-variable model, with the aame descriptors as in the beat three-variable model. The actual terms included in the models are, however, somewhat different. For those comparisons above, where there is a substantial difference in the multiple correlation coefficient between the models with either three or four descriptors, it is the four-descriptor model that givesthe higher value of R2, as would be expected on statistical grounds. The above report indicates that both retention and separation quality can be related to readily available descriptors. As noted in the Introduction, there is a large number of solute descriptors that have been used in regression studies, and it seems likely that the use of some of these as solvent descriptors will result in an improvement in the regression fit of the linear models. It is also possible that the regression fits can be improved by considering other separation metria such as the PRF.I7 The above approach may also be valid for HPLC in the normal-phasemode, using a series of mobile phases with a constant strong solvent. It is noted in closing that a satisfactory regression model may be constructed with the dipole moment of the weak solvent as the dependent variable. The preliminary survey was restricted to the separation of the amino acid derivatives with 18 weak solvents and ethyl acetate 88 constant strong solvent at a mole fraction of either 0.1 or 0.3. The potential independent variables for the linear modela were density, molar volume, saturated surface area, unsaturated surface area, and either the IDF or the average Rf. The linear models are of poor to moderate quality with values of R2in the range 0.361-0.844. It is interesting to note that the IDF and average Rf are not eliminated by the stepwise procedure in the respective linear models, whereas each of the other descriptors is eliminatsd in at least two of the models. Second-order models were considered at mole fractions of 0.1and 0.3,using the same set of descriptors as above, but with the restriction that only three descriptors were considered for each model. Either the IDF or the average Rfwas included as one of thew three descriptors. The IDF was considered only at a mole fraction of 0.1 and the average Rf only at a mole fraction of
ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993
0.3. At a mole fraction of 0.3, the best second-order model has an R2 of 0.947, with average Rf, molar volume, and saturated surface area as independent variables. At a mole fraction of 0.1, the best second-order model has anR2of 0.965, with IDF, density, and unsaturated surface area aa independent variables. It is possible that, with further refinement, this approach may be wed as a method for determiningdipole moment by planar chromatography. There is a substantial drop in the multiple correlation coefficient if average Rf and the IDF are excluded as potential descriptors for the secondorder models. The best such model uses density, molar volume, and unsaturated area as descriptors and has an R2 of 0.819.
8707
ACKNOWLEDGMENT This work was supported by granta from the Dow Chemical Co. and the Research Corp. Whatman Inc. is thanked for a gift of the TLC plates and Pierce for a gift of the reagent for preparing the p-nitrobenzyl esters of the dansyl amino acids. Lisa Julian, Kirk Knotta, and Christopher Uhegbu are thanked for data originally used in refs 11 and 18.
RECEIVED for review May 18, 1993. Accepted September 14, 1993.