1990
Anal. Chem. 1081, 53, 1990-1993
Prediction of Gas Chromatographic Retention Indices from Linear Free Energy and Topological Parameters Lutgarde Buydens and D. L. Massart" Farmaceutisch Instltuut, VrJe Universlteit Brussel, Laarbeeklaan 103, B- 1090 Brussel, Belgium
Llnear free energy related (LFER) parameters such as the Hammet and Hansch constants, u and x , In combinatlon with topologlcal parameters are lnvestlgated for thelr predlctlng value for gas-liquld chromatography (GLC) retentlon indlces. For thls purpose, multiple regresslon and factor analytical techniques are used. The correlation wlth retentlon Indices on stationary phases of different polarlty Is calculated for alkanes, fatty acid methyl esters, and a set of mlxed test substances. Topological indlces perform best for homologous series and comblnatlons of a topological Index wlth t~ for the mlxed data set.
In medicinal chemistry, the quantitative-structure activity relationship (QSAR) approach is often used to describe or predict the activity of drugs. Such an approach can also be useful in some fields of analytical chemistry. In this study we present a first attempt to investigate to what extent this approach can be used in describing gas chromatographic behavior. Several authors have tried to explain gas chromatographic behavior by deriving the important factors in the solutestationary phase interaction. Rohrschneider (I),for instance, derived a characterization of gas chromatographic stationary phases. He described the behavior of the stationary phases with five probe which leads to the implicit conclusion that five factors are involved. The number of factors involved has been studied by several authors by means of several techniques. Hartkopf and co-workers (2), propose the use of four probes. De Clercq and Massart (3) used a nonhierarchical clustering algorithm to group solutes according tu their gas chromatographic behavior and came to the conclusion that four factors are involved. These studies are empirical in the sense that they try to derive the factors involved by explaining as much of the variability of the data set by using some substances whose behavior is characteristic for a specific solute-stationary phase interaction type. The study of factors involved can also be carried out by factor analysis and several important studies in this context were carried out by Weiner, Howery, Parcher, and co-workers (4-7). Another very different and more theoretical approach is the extended solubility parameter theory of Karger and co-workers (8). The original Hildebrand solubility parameter is divided into five values, calculated from physical properties of the molecules such as the heat of vaporization and the molar volume. Karger et al. (9) came to the conclusion that at least two principal factors are involved in the descrpition of the GLC interactions: dispersion forces and a factor describing polar interactions which can be- subdivided into several factors, describing more specific interactions. In view of the succes of quantitative structure-activity relationships in medicinal chemistry where one tries to explain the extent of the effect of a drug using structure-related parameters, it seemed of interest to use these same parameters to try and explain or predict the interactions of a solute with a stationary phase.
In fact, one has already tried to use one of these parameters for this kind of study, namely, x the molecular connectivity. The use of x, a topological parameter, was pioneered in medicinal chemistry by Kier and Hall and co-workers (10-12). Michotte and Massart (13)and Kier and Hall (14) obtained satisfactory correlations of the retention indices of several classes of compounds (saturated aliphatic hydrocarbons, monosaturated hydrocarbons, aliphatic alcohols and ketones) with x. Mc Gregor (15) and Millership and Woolfson (16) came to the same conclusion for series such as alcohols, ketones, hydrocarbons, and also some barbiturates, amphetamines, and phenothiazines. Less attempts have been made to relate chromatographicbehavior to structural factors, such as linear free energy parameters as is usual in medicinal chemistry. We wanted to study combinations of several of these parameters in combination with topological parameters and to compare several topological parameters as to their effectivity in calculations of this type. The approach proposed here is also theoretical in the sense that fundamental parameters, related to the structure of the compounds, are employed. it is, however, more empirical than the Karger et al. approach because the relationship between the parameters in question and the retention is empirical.
THEORY The different parameters that were tested in this study are the molecular connectivity, lxV(10,17) the Wiener number W (17,18),Hosoya's index 2 (181, the Hansch constant Cq, and Hammett's constant u (19,20). The definitions and computations necessary can be found in references (10, 18-20). However, since analytical chemists are certainly unfamiliar with most of these concepts, it seems necessary to give an outline of the definitions and some examples. lxVthe molecular connectivityis calculated in the following way (10, 17). The molecule is written down in the skeletal form and each atom i is assigned a ai value, corresponding to the difference between the number of valence electrons and the number of hydrogen atoms attached to that atom. An example (2-methyl-1-butanol) is given in Figure 1. The Si value of the methyl group is equal to 4 - 3 = 1. To the other carbon atoms of this molecule, a value of 2 or 3 is assigned according to the number of hydrogen atoms attached to the carbon atom and the 6 value of the oxygen atom of the alcohol group is equal to 6 - 1 = 5. lxVis then calculated by means of eq 1
xv =
1
cN (6iSj)-'/2
(1)
s=l
where N is the total number of bonds in the molecule and SiSj are the 6 values of the two atoms of bond s. In this way the lxVvalue of the example 2-methyl-1-butanol becomes 1 1 2 1 v=- 2.42
fi+m+-+-fi
v f x
The Wiener number Wand Hosoya's index 2 are also topo-
0003-2700/81/0353-1990$01.25/00 1981 American Chemical Soclety
ANALYTICAL CHEMISTRY, VOL. 53, NO. 13, NOVEMBER 1981 1
2
3
2
5
Table 11. Alkanes, Stationary Phase, Octadecane
c-c:-c-c-0
correlation
I
1
C
Figure 1. Calculatlon of
the atoms.
'xv. The respective values are given above
X n
W
Table I. Test Substances nature of test substances
no. of components
stationary phases octadecane
2. linear and branched
32
butanedi'ol succinate (BDS) squalane
n
(SE) 30
X
chain methyl esters of fatty acids 100
consisting of alkanes, alkenes, alcohols, ether, esters, aldehydes, ketones
polyethylene glycol (PEG) 300
diethylene glycol IIUCcinate (DEGS)
logical parameters. The way according to which the:y are computed can be found in references (10, 12, 18). The most successful models in medicinal chemistry are the linear free energy relationships (LFERs) with the Hammett equation as the most prominent example. The Hammhtt equation takes the form where 0 is the reaction constant, depending on the nature of the reaction. By definition 8 = 1,when the standard reaction is the ionization of benzoic acid in water at 25 "C. In the present cme, the uZ scale, derived from the hydrolysis of aliphatic acids and proposed by Taft and co-workers (19) is used. The uZ values of the different substituents are summed and yield the u parameter that is used here. The uZwlue of a primary OH group is 0.25. The ul parameter of 2methyl-1-butanol is then equal to 0.25 since the a1 (CH3) is zero. The Hansch constant Eai refers to the hydrophobicity 7r
px = log -
PH
(3)
where P, and PHare the partition coefficients of the substituted and parent molecule between octanol and water. Cai is calculated by summation of relevant a values (19) of each structure element. For example the value of a for a CIH3 or CH2 group is equal to 0.50, for a primary OH group it is -1.16, and for a branch it is -0.20. For 2-methyl-1-butanol,a = (5 X 0.5) - 0.2 - 1.16 := 1.14.
EXPERIMENTAL SECTION Data. The three different sets of test substancesare described in Table I. The retention index (RI) values of the data set consisting of alkanes and the mixed data set are taken from the literature (15,21).The RI values of the linear and branched {chain fatty acid methyl esters were provided by Massart-Leen (22). Calculations: The correlation Coefficients for the mulltiple regressions are calculated following a stepwise inclusion method by the SPSS program, subprogram REGRESSION (23). The nndependent variables are entered in the multiple regression equation only if they meet certain statistical criteria (23). The order of inclusion is determined by the respective contribution of each variable to the explained variance. The variable that explains
SR
0.983 0.993 0.995
0.983 0.886 0.924
SR = simple
Table 111. Methyl Esters of Fatty Acids, Stationary Phase, Butanediol Succinate
18
branched chain
MRa
MR = multiple correlation coefficient. correlation coefficient.
1. alkanes: linear and
3. mixed set of test substances
1991
correlation W
MR
SR
0.97 0.98 0.98
0.97 0.97 0.96
Table IV. Correlation Coefficients of the Mixed Data Set on Squalane correlation MR SR X n
u
W
2
0.90 0.95 0.95 0.95 0.95
0.90 0.61 -0.06 0.81 0.72
Table V. Correlation Coefficients of the Mixed Data Set on SE-30 correlation MR SR X n 0
W
z
0.85 0.92 0.93 0.93 0.93
0.86 0.53 0.02 0.77 0.68
Table VI. Correlation Coefficients of the Mixed Data Set on DEGS correlation MR SR u X n
W
z
0.56 0.68 0.75 0.75 0.75
0.56 0.17 -0.25 0.13 0.08
the greatest amount of variance in the dependent variable enters the equation first. The variable that explains the greatest amount of variance, unexplained by the variable already in the equation, enters the equation in the following step. Factor analysis is carried out by using a SPSS program, subprogram FACTOR (3).
RESULTS AND DISCUSSION The retention index of a solute i is related to the topological and structural parameters with an equation of the form
(RI)i = a
+ bxli + cxZi+ dxai + ... + ei
(4)
where xli, xzi, etc. are the values of the topological and structural parameters for i and ei is the residual error. The coefficients a, b, c, ... are determined by multiple regression.
1992
ANALYTICAL CHEMISTRY, VOL. 53, NO. 13, NOVEMBER 1981
Table VII. Correlation Coefficients of the Mixed Data Set on PEG 300 correlation U
X 71
W
z
MR
SR
0.44 0.62 0.70 0.71 0.72
0.45 0.26 0.14 0.20 0.15
Table VIII. Correlations between the Different Parameters 71 X W 71
x W u
- 2
1.00 0.84 9.76 -0.71 0.72
0.84 1.00 0.89
0.76 0.89 1.00
-0.33
-0.21
0.95
0.81
U
0.71 -0.33
-0.21 1.00 -0.25
z 0.72 0.81 0.95 -0.25 1.00
A
Factor 2
In Tables 11-VII, SR, the simple correlation coefficient, and MR, the multiple correlation coefficient, for the data sets of Table I are given. The multiple correlation coefficient is the correlation coefficient of the multiple regression equation with all the variables entered up to the considered step. One observes that the correlations are very good for the data sets consisting of alkanes and methyl esters of fatty acids (Tables 11,111). These results are in agreement with the results of Mc Gregor (15) and Millership and Woolfson (16)in the sense that these data sets consist of molecules of the same family, which means that these molecules interact in a similar way with the stationary phases. In both cases, the best result is obtained by using one of the topological indices x or W.One can conclude from these results that the topological indices are very useful in describing the interaction between member molecules of one family and the stationary phases. The parameters such as r,describing more specific properties of the molecules do not improve the regression equation to a great degree because these properties are similar for all these molecules. The parameter CT is not included in these data sets since ui is zero for the alkanes, and since u has the same value for all members of the same family. As Michotte and Massart (13) already stated earlier, the correlation is much worse, when one tries to correlate the RI of a mixed data set with x, Kier and Hall (14) pointed out correctly that it is necessary to include a term, describing the electronic differences between different classes of chemical compounds, besides a topological parameter. As shown by different authors (see introductory section) more factors are involved when substancesof different families are considered. therefore multiple regression techniques are used. The results for the correlation between the parameters and the retention indices for the mixed set of test substances are given in Tables IV-VII. Both polar and apolar stationary phases were considered. The interaction of the substances with the apolar stationary phases is aspecific. Therefore it can be expected that a parameter such as x,which gives an idea of the general size and shape of the molecules can explain a great deal of the variance in this case. This can be seen in the results of Tables IV and V where the first variable entered is x. With the more polar Stationary phases (Tables VI and VII) one must expect that the interactions are still more complex. Dipole interactions and hydrogen bonds, for example, can occur. These interactions are much more specific and therefore one can expect that correlation coefficients with the more polar phases are not so good. The results of Tables VI and VI1 confirm this and one notes also that the first variable entered is u, a parameter that describes the electronic properties of the molecules. For an explanation of why the variables are entered according to this particular sequence, two steps of analysis are carried out. First the correlation between the different parameters is studied, and then factor analysis studies were carried out. Factor analytical techniques enable one to see whether some underlying pattern of relationships exists, such that the data may be rearranged or reduced to a smaller set of so-called fundamental or latent variables, accounting for the observed interactions in the data. After the extraction of the initial
0 PEO-300
Flgure 2. Factor analysis of the different parameters and the retention indices on different stationary phases.
factors, explaining the greatest possible amount of variance, rotation to a terminal solution may yield simple and interpretable factors. In Table VI11 the correlation matrix between the different parameters is presented. The correlation coefficients are calculated on data set 3 of Table I. As one can see the topological parameters are very highly intercorrelated, while u is rather poorly related to these parameters. Thus when one topological parameter enters the regression equation, the greatest amount of variance in the retention indices due to the general shape of the molecules is explained. The other topological parameters will then not improve the regression equation to a great degree. As can be seen from Tables I1 and I11 the simple correlation coefficients of the different topological parameters are very similar when one considers only one group of substances such as the alkanes and the methyl esters of fatty acids. however there is a clear difference between the correlation coefficient of x and the other topological parameters for the mixed set of test substances. There x has a much better correlation coefficient than W or 2 for polar and apolar stationary phases as can be seen from Tables IV-VII. One can therefore conclude that as far as the topological parameters are concerned, x is the most efficient parameter in this type of calculations. A still better method to gain insight in the variance of the system is factor analysis. The results are shown in Figure 2. The factor analysis results in the extraction of two factors, one explaining about 60% of the total variance and the other explaining about 30% of the total variance, so that the variance explained by the two factors is about 90% of the total variance, In Figure 2 one can see that the topological parameters form a cluster. This cluster is in the neighborhood of the cluster of the apolar stationary phases, SE-30and squalane. The parameter, Q, describing electronical effects is rather distant from this cluster and is much more related to the polar stationary phases. Therefore it is not surprising
1993
Anal. Chem. 1981, 53, 1993-1997
that x enters first in the regression equation for apolar stationary phases and IT for polar stationary phases. also since x and u clearly describe different properties it is logical that a combination of these two parameters should be more powerful then, for instance, a combination of x and W whiclh are parameters that give nearly the same information. from these results, one can conclude that very good descriptions of the chromatographic behavor of substances with analogous structure can be obtained by using a simple topological parameter. This is in agreement with the results from the literature (13-16). The topological parameter x is in the context the most efficient parameter. The description of the behavior of a mixed data set is more complex. Correlations of the retention indices with only one kind of parameters are not good at all, especially for the ]polar stationary phases. There, the use of parameters descriibing different properties of the molecules is much more indicated. In the description of the behavior of a mixed data set, besides a topological parameter such as x, at least one electronic parameter is necessmy, such as u. One should note that the interpretation can be complicated by the quality of the RI. For instance adsorption effects can worsen this quality. Part of the unexplained variance can be due to such effectli. There is a parallelism between this conclusion and Karger and Synder's (9). They consider two main interactions, namely 6d (for aspecific interactions) and 6p (for more polar, specific interactions), while we use a x (aspecific) and a u (specific). Improvements can be expected including other variables such as higher order terms of the connectivity to describe residual structural effects and the dipole moment which describes more specifically the overall polarity of the molecules. Fuirther research is being carried on a t the present time in our Laboratory.
ACKNOWLEDGMENT The authors thank I'. Geerlings, D. Coomans, and Y. Michotte for helpful discussions and FGWO for financial assistance.
LITERATURE CITED (1) Rohrschneider, L J. Chromatogr. 1966, 22, 6. (2) Hartkopf, A. J. Chromatogr. Sci 1974, 72, 199-123. (3) De Clercq, H, Despontln, J; Kaufman, L.; Massart, D. L. J. ChromatOgr. 1977, 722, 535451. (4) Welner, P. H.; Parcher, S. F. Anal. Chem. 1973, 45, 302-307. (5) Weiner, P. H.; Howery, D. G. Can. J. Chem. 1972, 50, 448-451. (6) Weiner, P. H.; Parcher, J. F. J. Chromatogr. Sci1972, 70, 612-615. (7) Selzer, B. R.; Howery, D. G. J. Chromafogr. 1975, 775, 139-151. (8) Karger, B. L.; Synder, L. R.; Eon, C. J . Chromatogr. 1978, 725, 71-88. (9) Karger, B. L.; Synder, L. R; Eon, C. Anal. Chem. 1978, 50, 2126-2 136. (IO) Kier, L. B.; Hall, L. H. "Molecular Connectivity in Chemistry and Drug Research"; Academlc Press: New York, NY, 1976. (11) Hall, L. H.; Kler, L. B. J. fharm. Sci. 1977, 66, 642-644. (12) Di Paolo, T.; Kier, L. El.; Hall, L. H. J. Pharm. Sci. 1979, 68, 39-42. (13) Michotte, Y.; Massatt, D. L.; J . fharm. Sci. 1977, 66, 1630-1632. (14) Kier, B. L; Hall, L. H. J. fharm. Scl. 1979, 68, 120-121. (15) Mc Gregor. T. R. J. Chromatogr. Sci 1979, 77, 314-316. (16) Mlllershlp, J. S.; Wooifson, A. D. J. fharm. fharmcol. 1978, 30, 483-485. (17) Kier, L. B.; Hall. L. H. J. Pharm. Sci. 1976, 65, 1806-1809. (18) Amldon, G. L.; Anik, 8. T. J. Pharm. Sci. 1976, 65, 801-806. (19) Hansch, C. "Drug Design"; Aliens, E. J., Ed.; Academic Press: New York, 1971; Voi. 1, Chapter 11. (20) Shorter, J. "Correlation Analysls-Recent Advances"; Chapman, N. B., Shorter, J., Eds.; Plenum: New York, 1978; Chapter 4. (21) Mc Reynolds, W. 0. "Gas Chromatographic Retention Data"; Preston, Ed.; Technical Abstracts Co. IL, 1966. (22) Massatt-Leen, A. M.; De Poorter, H.; Decloedt, M.; Schamp, N. Lipus 1981, 76, 286-292. (23) Nie, N. H.; Hull, C. H.; Jenklns, J. G.; Stenbrenner, K.; Bent, D. H. "SPSS-Statistical Package for the Social Sclences", 2nd ed.;McGrawHill: New York, 1975.
RECEIVED for review February 3,1981. Accepted June 4,1981.
Effects of Solvent Composition in the Normal-Phase Liquid Chromatography of Alkylphenols and Naphthols Robert J. Hurtublre" and Anwar Hussein Chemistry Department, The University of Wyoming, Laramie, Wyoming 8207 1
Howard F. Sllver Chemical Engineering Department, The Univerdty of Wyoming, Laramie, Wyoming 8207 7
The normal-phase liquid chromatographic models of Scott, Snyder, and Socrewlnski were considered for a p-Bondapak NH, stationary phase. n-Heptane:2-propanol and n-heptane:ethyi acetate moblle phases of dlfferent composltlons were used. Linear kelatlonships were obtained from graphs of log k'vs. log mole fraction of the strong solvent for both n-heptane:2-propanoi and n-heptane:ethyl acetate mobile phases. A linear relationship was obtained between the reciprocal of corrected retentlon volume and % wt/v of 2propanol but not between the reciprocal of corrected retention volume and % wt/v of ethyl acetate. The slopes and intercept terms from the, Snyder and Socrewinski models 'were found to approxlmateiy descrlbe interactions with p-Bondapak NH2. Capacity factors can be predicted for the compounds by uslng the equations obtained from mobile phase compositlon variation experiments.
The effects of mobile phase composition in normal-phase
high-performance liquid chromatography (HPLC) are important both theoretically and practically. Snyder and Kirkland (1)have considered many aspects of mobile phase composition in liquid chromatography, and Snyder (2,3) and Snyder and Poppe (4) have discussed the role of solvent in liquid-solid chromatography. Scott and co-workers (5-10) have studied solute interactions with mobile and stationary phases in liquid-olid chromatographywith emphasis on silica gel. Soczewinski (11) and Soczewinski and Golkiewicz (12, 13) developed a moleeular model for adsorption chromatography on silica gel in which adsorption from solution was considered the result of competition between the solute and the solvent for active Bites on the stationary phase surface. Snyder (2) and Snyder and Poppe ( 4 ) have discussed in considerable length the various aspects of the Snyder, Soczewinski, and Scott models. Recently, Soczewinski (14) commented on the relationships between his model, the Scott and Kucera (5) model, and the Snyder (3) model and concluded that all three models for silica gel yield equivalent results in routine chromatographic work with silica gel.
0003-2700/81/0353-1993$01.25/00 1981 American Chemical Soclety