Prediction of gas chromatographic retention indexes of selected

New Approach by Kriging Models to Problems in QSAR. Kai-Tai Fang, Hong Yin, and Yi-Zeng Liang. Journal of Chemical Information and Computer Sciences ...
0 downloads 0 Views 484KB Size
2770

Anal. Chem. 1985, 57,2770-2773

(13) Caudill, W. L.; Howell, J. 0.; Wightman, R. M. Anal. Chem. 1982, 5 4 , 2532-2535. (14) Sleszynski, N.; Osteryoung, J.; Carter, M. Anal. Chem. 1984, 56, 130-135. (15) Aokl, K.; Osteryoung, J. J . Necfroanal. Chem. 1981, 125, 315-320. (16) Aoki, K.; Osteryoung, J. J . Nectroanal. Chem. 1981, 122, 19-35. (17) Aokl, K.;Osteryoung, J. J . Electroanal. Chem. 1984, 760, 335-339. (18) Heinze, J. Ber. Bunsenges. Phys. Chem. 1981, 85. 1096-1103. (19) Contamin, 0.; Levart, E. J . Nectroanal. Chem. 1982, 136, 259-270. (20) Reiler, H.; Klrowa-Elsner, E.; Gileadi, E. J . Nectroanal. Chem. 1982, 138. 65-77. (21) Reller, H.; Kirowa-Eisner, E.; Gileadi, E. J . Electroanal. Chem. 1984, 161, 247-268. (22) Gueshi, T.; Tokuda, K.: Matsuda. H. J . Electroanal. Chem. 1978. 89. 247-260. Gueshi, T.; Tokuda, K.; Matsuda, H. J . Electroanal. Chem. 1979, 107, 29-38. Bixler, J. W.; Bond, A. M.; Lay, P. A.; Thormann, W.; van den Bosch, P.; Fleischmann, M.; Pons, 8. S., unpublished work, Deakin University, 1985. Cieslinski, R.; Armstrong, N. Anal. Chem. 1979, 51, 565-568. Weisshaar, D. E.; Taliman, D. E.; Anderson, J. L. Anal. Chem. 1981, 53, 1809-1813. Weisshaar, D. E.; Tailman, D. E. Anal. Chem. 1983, 55, 1146-1151. Armentrout, D. N.; McLean. J. D.: Lona, M. W. Anal. Chem. 1979, 9 7 ,-1039- 1045. Stullk, K.; PacBkovB, V.; StBrkovB, B. J . Chromatogr. 1981, 213, 4. 1.-A6

Wohitjen, H. Anal. Chem. 1984, 56, 87A-103A. Siu, W.; Cobbold, R. S. C. Med. Blol. Eng. 1976, 14, 109-121. Thormann, W. Ph.D. Dissertation, University of Bern, 1981.

(33) Schumacher, E.; Thormann, W.; Arn, D. I n "Analytical Isotachophoresis"; Everaerts, F. M., Ed.; Eisevier: Amsterdam, 1981; pp 33-39. (34) Thormann, W.; Am, D.; Schurnacher, E. Sep. Sci. Techno/. 1985, 19, 995-1011. (35) Thormann, W.; Am, D.; Schurnacher, E. Nectrophoresls 1984, 5 , 323-337. (36) Thormann, W.; Twitty, G.; Tsai, A,; Bier, M. I n "Electrophoresis '84"; Neuhoff, V., Ed.; Verlag Chemie: Weinheim, 1984; pp 114-117. (37) Thormann, W.; Mosher, R. A.; Bier, M. I n "Electrophoresis '84"; Neuhoff, V., Ed.; Verlag Chemle: Weinheim, 1984; pp 118-121. (38) White, H. S.; Kittiesen, G. P.; Wrighton, M. S. J . Am. Chem. Soc. 1984, 706, 5375-5377. (39) Klttlesen, G. P.; White, H, S.; Wrighton, M. S. J . Am. Chem. SOC. 1984, 106, 7369-7396. (40) Anderson, J. E.; Bagchi, R. N.;Bond, A. M.; Greenhill, H. B.; Henderson, T. L. E.; Walter, F. L. Am. Lab. (Fairfield, Conn.) 1981, February, 21-32. (41) Bond, A. M.; Greenhill, H. 8.; Heritage, I. D.; Reust, J. B. Anal. Chim. Acta 1984, 765,209-216. (42) Bond, A. M.; Heritage, I. D. Anal. Chem. 1985, 57, 174-179. (43) Bond, A. M.; Lay, P. A. J . Electroanal. Chern., in press. (44) Edmonds, T. E.; Guollang, J. Anal. Chim. Acta 1983, 751,99-108. (45) Bond, A. M. "Modern Polarographic Methods in Analytical Chemistry"; Marcel Dekker: New York, 1980.

RECEIVED for review May 13, 1985. Accepted July 29, 1985. This work was partly sponsored by a grant provided by the Australian Research Grants Scheme.

Prediction of Gas Chromatographic Retention Indexes of Selected Olefins R. H. Rohrbaugh and P. C. Jurs* Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802

Gas chromatographlc retention indexes for olefins can be calculated by uslng descriptors based on chemlcal structure. The retentlon indexes of 86 olefins were regressed bgainst calculated descriptors, and a four varlable equatlon was found wlth a multlple correlation coefficient of 0.997. A separately derived equation with two variables and a correlation coefficient of 0.998 was also calculated. I n both cases, the descriptors chosen were consistent with theory.

will exhibit unique retention characteristics based on its chemical, structural, and electronic properties. The prediction of retention based on observable or calculable properties has been widely reported for many chemical compound classes (1-8). Recently, Lubeck and Sutton (9) reported the retention indexes of 86 selected olefins on three column types. This study describes the development of structure-based descriptors for these 86 compounds and the subsequent regression of these descriptors against the retention indexes.

EXPERIMENTAL SECTION Olefinic hydrocarbons comprise an important class of compounds t o be analyzed in the fuel refining industry. Currently, gas chromatography is the method of choice for analyzing samples containing olefins. In gas chromatography, Kovats retention indexes are widely used for identifying compounds. The Kovats retention index is based on the retention of a given compound relative to a standard set of hydrocarbons. The retention index is given by the following equation:

where I A is the retention index of compound A, tA is the corrected retention time of compound A, tN is the corrected retention time of the n-alkane standard with carbon number N , and tN+lis the corrected retention time of the n-alkane with carbon number N 1. N is chosen such that it is the carbon number of the standard n-alkane which elutes immediately prior to compound A. Retention is a phenomenon that is mainly dependent on the solute-stationary phase interactions. Ideally, each solute

+

0003-2700/85/0357-2770$0 1.50/0

The procedure used for studying the structure-retention relationships consists of three stages: (1) entry and storage of structures and associated information, (2) generation of molecular descriptors, and (3) multiple linear regression analysis. All the work was performed on the chemistry department PRIME 750 computer using the ADAPT computer software system (10, 11). Data Set. In order to perform this type of structure-property study, one must have access to a set of known compounds and the correspondingmeasured properties. For this study, the data set consisted of 86 selected olefins and their Kovats retention indexes as reported by Lubeck and Sutton (9). The olefins ranged from four- to nine-carbon compounds and included straight-chain, branched, and ring compounds. The compounds are listed in Table I and the associated retention indexes in Table I1 of ref 9. The retention indexes were determined on three column types using a Hewlett-Packard 5880 gas chromatograph. The column types and specificationsare summarized in Table I1 (as reported in ref 9). Hydrogen carrier gas was used at a linear velocity of 45 cm/s. Column temperature was held constant at 40 "C. The reported retention indexes were calculated according to eq 1. The authors report a reproducibility of 0.1 retention index units. Structure Entry. The structures of the olefins were entered into the ADAPT system using computer graphics. Each structure 0 1985 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 57, NO. 14, DECEMBER 1985

2771

Table I. Set of Olefins Studied no.

compound

1 2

2-methylpropene" 1-butenea trans-2-buteneb cis-2-buteneb 3-methyl-1-butene* 1-pentene 2-methyl-1-buteneb trans-2-pentene 3,3-dimethyl-l-butene cis-2-pentene 2-methyl-2-butene cyclopentene 4-methyl-1-pentene 3-methyl-1-pentene 2,3-dimethyl-l-butene 4-methyl-cis-2-pentene 4-methyl-trans-2-pentene 2-methyl-1-pentene 1-hexene 2-ethyl-1-butene trans-3-hexene cis-3-hexene trans-2-hexene 2-methyl-2-pentene 3-methylcyclopentene 3-methyl-cis-2-pentene 4,4-dimethyl-l-pentene cis-2-hexene 3-methyl-trans-2-pentene 4,4-dimethyl-trans-2-pentene 2,3-dimethyl-2-butene 3,3-dimethyl-l-pentene 2,3,3-trimethyl-l-butene 3,4-dimethyl-l-pentene 4,4-dimethyl-cis-2-pentene 2,4-dimethyl-l-pentene 1-methylcyclopentene 3-methyl-1-Hexene 2-methyl-cis-3-hexene 3-ethyl-1-pentene 2,4-dimethyl-2-pentene 5-methyl-1-hexene 2,3-dimethyl-l-pentene

3 4 5 6 7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

bP ref. 13 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12

12 12 12 12 12 12 12 12

12 12

13 13 13 12 12 12

13 12 12

13 12

no. 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 87 68 69 70 71 72

73 74 75 76 77 78 79 80 81 82 83 84 85 86

compound 2-methyl-trans-3-hexene 4-methyl-1-hexene 4-methyl-trans-2-hexene 4-methyl-cis-2-hexene 2-ethyl-3-methyl-1-butene 5-methyl-trans-2-hexene cyclohexene 3,4-dimethyl-cis-2-pentene

5-methyl-cis-2-hexene 2-methyl-1-hexene 3,4-dimethyl-trans-2-pentene I-heptene 2-ethyl-1-pentene 3-methyl-cis-3-hexene trans-3-heptene cis-3-heptene 3-methyl-cis-2-hexene 2-methyl-2-hexene 3-methyl-trans-3-hexene trans-2-heptene 3-ethyl-2-pentene 3-methyl-trans-2-hexene cis-2-heptene 2,3rdimethyl-2-pentene 3-ethylcyclopentene 3-methylcyclohexene 4-methylcyclohexene 3,4-dimethyl-1-hexenec 3,4-dimethyl-1-hexeneC 2,3-dimethyl-l-hexene 1-ethylcyclopentene 2,5-dimethyl-2-hexene 1-methylcyclohexene 2-methyl-1-heptene 1-octene trans-4-octene trans-3-octeneb cis-4-octene cis-3-octene 2-methyl-2-heptene trans-2-octene cis-2-octene 1-nonene

bP ref. 13 12

13 13 12

13 12 12

13 12 12 12 12

13 12 12

13 12

13 12 12

13 12 12 12 12 12

13 13 13 12 12 12 12 12 12 12 12 12 12 12 12 12

aNo retention index data for DB-1 and DB-5. bNo retention index data for DB-5. CStereoisomers. Table 11. Experimental Parameters for Measurement of Retention Indexes source" typeb Film thickness, Fm column i.d., mm column length, m

J&W DB-1 0.25 0.264 60

H-P PONA 0.5 0.21 50

J&W DB-5 0.25 0.252 60

Columns obtained from J&W Scientific and Hewlett-Packard. bDBl is a cross-linked and bonded methyl silicone phase. DB5 is similar to DB1 with 5% phenyl substitution. PONA is a crosslinked methvl siloxane Dhase. was stored as a connection table containing atom types, bond types, and bond connections. Each molecule was then modeled by using molecular mechanics methods. Descriptor Generation. Molecular descriptors are numerical values derived from molecular structure. The purpose of these descriptors is to quantitatively represent a molecule's properties. The descriptors chosen depend on the specific structure-property relation being examined. In this case, the descriptors chosen consisted of physical property, topological, geometrical, and electronic descriptors. Physical Properties. Two descriptors of this type were used in the study. First, observed boiling points were obtained from the literature for the compounds in the data set (12,13). In several instances, cisftrans isomerism existed and only one boiling point

was reported. In such cases, the same boiling point value was used for both compounds. The column in Table I lists the references from which each compound's boiling point was obtained. Second, the logarithm of the 1-octanolfwater partition coefficient (log P ) of each compound was calculated. These values were computed by use of the additivity rules of Hansch and Leo (14). The method involves summing fragment constants associated with certain molecular fragments and bonding types within a molecule. Topological Properties. The topological descriptors used include fragment and molecular connectivity values. These descriptors were calculated from the stored connection tables. The fragment descriptors are counts of all atoms, atoms of each type, all bonds, bonds of each type, rings, and molecular weight. Molecular connectivity was first described by Randic (15) as a method to describe a molecule's branching. The method is based on simple graph theory where the atoms of the molecule are treated as nodes and the bonds are represented by edges. Path-one connectivity is simply the sum of all paths of length one for the entire molecule. Molecular connectivity indexes have been shown to correlate with several physiochemical properties (16). Geometric Descriptors. Geometric descriptors me values that are calculated from the modeled three-dimensional coordinates. The only such descriptors used in this study are the principal moments of inertia. These values are calculated from the moment of inertia tensor (17). Six descriptors representing the three moments were calculated: x , y, and z moments, and the ratios x l y , x l z , and y f z .

2772

ANALYTICAL CHEMISTRY, VOL. 57, NO. 14, DECEMBER 1985

Table 111. Descriptors Used in the Linear Regression Analysis total

u

Table IV. Summary of the Coefficients and Statistics for the Six Predictive Equations

charge

DB-1

PONA

charge separation

electron density molecular weight X moment of inertia path one molecular connectivity path one molecular connectivity (corrected) log P (octanol/water) molecular refractivity (MR) all paths total count all paths total count/total atoms boiling points

bP ljMR const

n r S

F bP log P x mom.

Electronic Descriptors. Four electronic descriptors were calculated for this study. Three of these values were u charge based descriptors. These include the most negative u charge on a single atom in the molecule, the interatomic distance between the most positive and the most negative u charge in the molecule, and the sum of the absolute values of all atomic u charges in the structure. The fourth value calculated was the molecular refractivity (MR) or electronic polarizability. This was achieved by summing the appropriate atomic and structural constants for a given molecule (18).

mol wt const n r S

F

DB-5

3.83 f 0.06 3500 343 232

3.82 i 0.06 3300 f 345 239

217

86 0.998 6.67 9555

84 0.998 6.53 8342

79 0.997 6.43 6120

3.56 f 0.10 -6.73 i 3.1 0.033 f 0.007 -0.85 i 0.30 457

3.66 f 0.10 -4.5 i 3.0 0.028 f 0.007 -1.01 rt 0.29 457

4.00 i 0.09 0.014 f 0.007 -1.53 i 0.19 470

86 0.997 7.78 3509

84 0.997 7.40 3233

79 0.996 7.19 3266

*

3.94 i 0.06 3700 i 378

REGRESSION ANALYSIS Atotal of 33 descriptors were generated for each compound in the data set. Many of these descriptors encoded similar information about the molecules of interest (i.e., they were highly correlated). Variables that exhibit high degrees of collinearity can often affect the efficiency of the linear regression. It is therefore desirable to test each descriptor and eliminate those with multicollinear relationships and high correlation coefficients. Each descriptor was regressed against all remaining descriptors. On the basis of these tests, many of the original descriptors were eliminated from further analysis. The 12 descriptors used for the final analysis are listed in Table 111. Correlations among these descriptors are larger than desirable, but they were found to best represent the compounds’ retention properties. These variables were subjected to multiple linear regression analysis (19) and the leaps and bounds method as described by Furnival and Wilson (20).

R E S U L T S AND DISCUSSION The best equation developed by stepwise multiple linear regression for the Hewlett-Packard PONA column is I (3.56 f 0.10) (bp) - (6.73 f 3.0) (log P ) + (0.033 f 0.007) (x moment) - (0.85 f 0.30) (mol wt) 457 (2)

+

n = 86

S = 7.79

r = 0.997

F ( 4 , 8 6 ) = 3509

The variables are listed in the order they were selected. Also listed are the 95% confidence limits for the coefficients. The standard error for the estimated values is 7.79 retention units. This represents an error of approximately 1.2% for the observed range of retention index values of 389-883. The multiple correlation coefficient was 0.997 and the overall F value was 3509. As these values indicate, the equation represents an excellent fit of the experimental values. Identical procedures were followed in studying the DB-1 and DB-5 columns. The results of these analyses are given in Table IV. The multiple linear regression analysis related the retention behavior of small olefins t o four structure-based properties-boiling point, molecular weight, x moment of inertia, and log P. For all three stationary phases the boiling point was found to be the major contributor to the high correlation. This relationship is expected from theory and has been reported previously (21-24). The carbon number of a compound has also been related to retention (25, 26). Again, the results of the study are consistent with theory since,

i_.--I--_

300 0

600 0 EXPERIMENTAL

I

91 00

Flgure 1. Predicted Kovats retention indexes vs. experimentally determined retention indexes for the H-P PONA column.

for a homologous series of hydrocarbons, the molecular weight is directly related to carbon number. The x moment of inertia, a measure of the extent of the molecule along its major axis, when combined with the carbon number is an indicator of the sue of the molecule. Molecular size is related to the electronic polarizability. Solute-stationary phase interactions of compounds with small permanent dipoles (e.g., alkenes) are mainly dispersive, and the energy of these interactions is dependent on the electronic polarizability. Therefore, the x moment can be thought of as encoding information about the interactions between the solute and stationary phase. Finally, log P is included in the equation. log P is important since it is a measure of a solute’s partitioning abilities. A recent paper by Bermejo and Guillen (27) reports a predictive equation for alkanes based on boiling point and molar refractivity. A similar equation for olefins was developed by using the leaps and bounds regression I = (3.83 f 0.06) (bp) (3500 f 343) (1/MR) 232 (3)

+

n = 86

S = 6.67

r = 0.998

+

F(4, 86) = 9555

The leaps and bounds method yielded a better equation, both in terms of ability to fit the data and the number of parameters needed. Figure 1 is a plot of the predicted retention indexes vs. the experimental values. The line indicates a theoretical perfect fit. The residuals, the difference between the actual value and the predicted value, were also plotted vs. the experimental values (Figure 2). The plot shows no systematic dependence of the residuals on the retention index

ANALYTICAL CHEMISTRY, VOL. 57, NO. 14, DECEMBER 1985

50

7 7

2773

structure descriptors. Boiling point, molar refractivity, log

P, x moment of inertia, and molecular weight were chosen by the regression analysis programs as the most important variables. The equations are statistically strong and fit the data very well.

LITERATURE CITED Cohen, A. S.; Grushka, E. J. Chromatogr. 1985, 318, 221. Saura-Calixto, F.; Garcia-Raso, A,; Garcia-Raso, J. J. Chromatogr 1985, 322, 35. Peetre, I.; Ellren, 0.; Smith, B. E. F. J. Chromatogr. 1985, 318, 41. Jinno, K.; Kawasaki, K. J. Chromatogr. 1984, 316, 1. Sabljic, A. J . Chromatogr. 1985, 319, 1. Buydens, L.; Massart, D.; Geerllngs, P. Anal. Chem. 1983, 55,738. Hale, M. D.; Hileman, F. D.; Mazer, T.; Shell, T.; Nobel, R.; Brooks, J. Anal. Chem. 1985, 5 7 , 840. Whalen-Pedersen, E. K.; Jurs, P. C. Anal. Chem. 1981, 53, 2184. Lubeck, A. J.; Sutton, D. L. HRC CC,J. High Resolut. Chromatogr. Chromatogr. Commun. 1984, 7 , 4542. Jurs, P. C.; Chou, J. T.; Yuan, M. I n "Computer-Asslsted Drug Deslgn"; Olson, E. C., Christoffersen, R. E., Ed.; Amerlcan Chemical Society: Washington, DC, 1979; pp 103-129. Stuper, A. J.; Brugger, W. E.; Jurs, P. C. "Computer Assisted Studies of Chemical Structure and Biological Function"; Wiley-Interscience: New York, 1979. Weast, R. C., Ed. "CRC Handbook of Chemistry and Physics", 6lst ed.; CRC Press: Boca Raton, FL, 1980. Ferris, S. W. "Handbook of Hydrocarbons"; Academic Press: New York, 1955,pp 143-266. Hansch, C.; Leo, A. "Substituent Constants for Correlation Analysis In Chemistry and Biology"; Wiley-Intersclence: New York, 1979; pp

.

-25

I@t

-50 - 1 30@,@

600,0 EXPERIMENTAL

I

900,8

Flgure 2. Residuals vs. experimentally determlned retention indexes for the H-P PONA column.

values. Also, a visual inspection of the structures of the ten best-predicted and ten worst-predicted olefins showed no dependence of the residuals on structure. Identical procedures were followed for predicting the retention index values on both the DB-1 and DB-5 columns. The results for these analyses are summarized in Table IV. The parameters used in the equation were boiling point and molar refractivity. A question may arise as to whether eq 2 and 3 are consistent with each other. A comparison of the two equations shows that both contain boiling point as the major component. In addition to boiling point, the multiple linear regression equation contains molecular weight, x: moment, and log l? The leaps and bounds equation, on the other hand, contains only a molar refractivity term. In both cases, the additional terms are basically bulk property descriptors. The molar refractivity, as is the case with molecular weight and x moment, is related to the electronic polarizability and therefore to the solute-stationary phase interactions (28). Although prediction is the most desirable form of validation, no prediction set was available. Therefore, another method was used to test the validity of the equations. The data set was randomly divided into two nearly equivalent subsets. Next, regression equations were fit based on each half independently. In each case, the remaining half was used to test the equation. The equations obtained were able to predict the I values with correlation coefficients of 0.997.

CONCLUSIONS Retention indexes for 86 olefins with four to nine carbon atoms have been fit with model equations based on molecular

18-43. Randlc, M. J . Am. Chem. SOC.1975, 97, 8604. Kler, L. B.; Hall, L. H. "Molecular Connectivity In Chemistry and Drug Research"; Academic Press: New York, 1976. Goldstein, H. "Classical Mechanics"; Addison-Wesley: Cambridge, MA, 1950;pp 146-156. Vogel, A. I. "Elementary Practical Organic Chemistry; Part 2: Qualitative Organic Analysis"; Wlley: New York, 1966;pp 24-25. Draper, N.; Smith, H. "Applied Regression Analysis"; 2nd ed.; WileyIntersclence: New York, 1981;pp 307-312. Furnival, G. M.; Wilson, R. W., Jr. Technometrics 1974, 16, 499. Sojak, L.; Krupclk, J.; Rijks, J. Chromatographia 1974, 7 , 26. Baumann, F.; Straus, A. E.; Johnson, J. F. J. Chromatogr. 1985, 20,

1. Sojak, L.; Hrivnak, J.; Simkovlcova, J.; Janak, J. J. Chromatogr. 1972, 71, 243. Saura-Calixto, F.; Garcia-Raso, A,; Canellas, J.; Garcia-Raso, J. J. Chromatogr. Sci. 1983, 21. 267. Pierottl, G. J.; Deal, C. H.; Derr, E. L.; Porter, P. E. J. Am. Chem. SOC.1956, 76, 2829. Toth, A.; Zala, E. J. Chromatogr. 1984, 298, 381. Bermejo, J.; Guillen, M. D. HRC CC, J. High Resolut. Chromatogr. Chromatogr Commun , 1984, 7 , 19 1. Bermejo, J.; Canga, J. S.; Gayol, 0. M.; Guillen, M. D. J . Chromatogr. Sci. 1984, 22, 252.

.

RECEIVED for review June 4, 1985. Accepted August 5, 1985. This work was supported by the National Science Foundation under Grant CHE-8202620. The PRIME 750 computer was purchased with partial financial support of the National Science Foundation.