List operations on chemical graphs. 3. Development of vertex and

Mar 1, 1993 - Development of vertex and edge models for fitting retention index data. C. Duvenbeck and P. Zinn. J. Chem. Inf. Comput. Sci. , 1993, 33 ...
2 downloads 8 Views 996KB Size
J. Chem. ZnJ Comput. Sci. 1993,33,21 1-219

211

List Operations on Chemical Graphs. 3. Development of Vertex and Edge Models for Fitting Retention Index Data Ch. Duvenbeck and P. Zinn' Lehrstuhl fir Analytische Chemie, Ruhr-Universitat Bochum, Postfach 102148, D-4630 Bochum 1, Germany Received June 12, 1992 A general method for fitting the gas chromatographic retention index data is presented. The method bases on vertex and edge models developed from graph theory. The resulting multilinear regression models use atomic-level topological indexes as the atomic-level molecular connectivity or the electrotopological index. By fitting a standard data set of 217 acyclic and cyclic alkanes, alkenes, alcohols, esters, ketones, and ethers the mean absolute errors of the edge models were observed in a range from 7 to 9 retention index units. The models can be used to predict retention index data of compounds that incorporate the same bond types as those of the standard data set.

INTRODUCTION The estimation of chromatographic retention index data is a widely investigated topic in Quantitative StructureProperty Relationship (QSPR) studies. Gas chromatographic retention indexes measured on nonpolar stationary phases are an especially well-suited object of QSPR studies. In contrast to other molecular properties, interactions between eluents can widely be neglected. Other reasons are the high precision of index measurement and the large amount of data reported in literature.14 Several aspects of the development of valid estimation models and the significance of model parameters are discussed by K a l i ~ z a n .Among ~ model parameters, topological indexes are of special interest because they can be derived directly from the molecular structure without any experimental effort. There are two types of topological indexes that can be used as model parameters. Conventionally topological indexes characterize a molecule as a whole. These indexes are most popular in QSPR studies and other applicationsof graph theory in chemistryS6 The index developed by Wiener' bases on the distance matrix of a molecule. Hosoya8 counts the combinations of disconnected bonds, and RandiC9describes indexes for molecular connectivity. Indexes characterizing molecular branching are given by Balaban.lo Kier and Hall1' show concepts of topological indexes for molecular shape. In developing estimation models for QSPR studies, topological indexes are frequently used. Jurs reported about the prediction of boiling pointsI2 and surface tensions13 using a wide spectrum of topological indexes among other descriptors as model parameters. TrinajstiCI4 recently compared topological indexes for the estimation of boiling points. Several authors discuss the estimation of retention index data using topological model parameter~.l~-~O In most cases, the developed estimation models work well for a limited class of substances. Therefore, a more general approach seemed to be a supplementary contribution to retention index data estimation. The second type of topological indexes describes structure information at the atomic level of a molecule. One of the most interesting indexes of this type is the electrotopological state (E-state), developed by Kier and The E-state can be calculated for each atom in a molecule, including heteroatoms. The electronic interactions of atoms in a molecular field are quantified for each atom by this index. It 0095-2338/93/1633-0211$04.00/0

is to be expected that E-state-based estimation models can be applied to different substance classes. In order to make a more general approach for the retention index estimation, incremental models are also of interest. A typical example of an incremental model was introduced by T a k P ~ s . ~ ~TakPcs' - * ~ approach is an additive model that summarizes increments corresponding to atoms and bonds of a molecule. Gautzsch et al.25showed that the application of this method is limited due to the proliferation of bond types. Another more practical incremental model is described by Hemdon26for the estimation of enthalpic properties of alkanes. According to Herndon, we have developed regression models that consider all atoms of a molecule, the so-called vertex models, and those considering all bonds, the so-called edge models. The used regression parameters are based on atomiclevel topological indexes. To derive atomic-level-based descriptors and to build the descriptor matrix, we used the list operating system, described in ref 27. The calculation of topological indexes is a typical application of list operations on chemical graphs.28 As an interface to the list operating system, the list structured line notation referring to ref 27 is used. Table I summarizes the compounds of the used calibration data set and their corresponding line notations. The output of the list operating system is a matrix containing topological indexes belonging to atom or bond types of a molecule. The following multilinear regression analysis was carried out with commercial statistic software products. In the first step, we present the developed models for the retention data set of 7 1 acyclic and cyclic alkanes measured by Rijks' on squalane at 50 OC. In the second step, we extend the models in order to investigate more complex compounds such as alkenes, alcohols, ethers, ketones, and esters. The origin of the retention indexes of these compounds measured on OV1 at 60 OC is the Sadtler catalogue.2 All compounds and the corresponding retention indexes are listed in Table I. In this paper we will discuss the fitting of model-calculated retention indexesversus data from the literature. In a following paper we will demonstrate the predictive power of the models for the retention data estimation of terpenes and related compounds.29 VERTEX MODELS FOR FITTING RETENTION INDEX DATA Vertex models are multilinear regression modeis for the estimation of physicochemical molecular properties. As regression parameters, atomic-level topological indexes are 0 1993 American Chemical Society

DUVENBECK AND ZINN

212 J. Chem. Inf. Comput. Sci., Vol. 33, No. 2, I993

Table I. Compounds of Used Calibration Data Set and Corresponding Retention Indexes from Literature I[lit.] and Those Resulting from Model Estimation I[est .]a I[est.] no. 1

2 3 4 5 6 7 8 9 10 11 12 13 14

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

compound 2,2-dimethylpropane 2-methylbutane 2,2-dimethylbutane 2,3-dimethylbutane 2-methylpentane 3-methylpentane 2,2-dimethylpentane 2,4-dimethylpentane 2,2,3-trimethylbutane 3,3-dimethylpentane 2-methylhexane 2,3-dimethylpentane 3-ethylpentane 2,2,4-trimethylpentane 2,2-dimethylhexane 2,5-dimethylhexane 2,4-dimethylhexane 2,2,3-trimethylpentane 3,3-dimethylhexane 2,3,4-trimethylpentane 2,3,3-trimethylpentane 2-methylheptane 4-methylheptane 3,4-dimethylhexane 3-methylheptane

2,2,4,4-tetramethylpentane 3-methyl-3-ethylpentane 2,2,5-trimethylhexane 2,2,4-trimethylhexane 2,4,4-trimethylhexane 2,3,5-trimethylhexane 2,2-dimethylheptane

2,2,3,4-tetramethylpentane 2,2,3-trimethylhexane

2,2-dimethyl-3-ethylpentane 3.3-dimethylheptane 2,4-dimethyl-3-ethylpentane 2,3,4-trimethylhexane

2,3,3,4-tetramethylpentane 3-methyloctane 3,3-diethylpentane n-propane n-butane n-pentane n-hexane n-heptane n-octane n-nonane n-decane ethylcyclopropane cyclopentane ethylcyclobutane methylcyclopentane cyclohexane

1,1,2-trimethyIcyclopentane 1,1,3-trimethylcyclopentane meth ylcyclohexane eth ylcyclopentane 1,l -dimethylcyclohexane isopropylcyclopentane n-propylcyclopentane ethylcyclohexane lI1,3-trimethylcyclohexane 1-cis-3-dimethylcyclopentane 1-tram-3-dimethylcyclopentane 1-cis-2-dimethylcyclopentane 1-trans-2-dimethylcyclopentane 1-trans-2-dimethylcyclohexane 1-cis-2-dimethylcyclohexane 1-tram-3-dimethylcyclohexane 1-cis-3-dimethylcyclohexane 2-methyl-2-butanol 1-butanol 3-methyl-2-butanol

I[lit.]

M model

' 5 model

S model

R model

412.3 475.3 536.8 567.3 569.7 584.2 625.6 629.8 639.7 658.9 666.6 671.7 686.0 689.9 719:4 728.4 731.9 737.1 743.5 752.4 759.4 764.9 767.2 770.6 772.3 772.7 774.0 776.3 789.1 807.7 8 12.0 8 15.4 819.6 821.6 822.2 835.8 836.5 849.1 858.0 870.2 877.2 300 400 500 600 700 800 900 1000 510.2 565.7 621.1 627.9 662.7 763.2 723.6 725.8 733.8 787.0 812.1 830.3 834.3 840.4 682.7 686.8 720.9 689.2 801.8 829.3 805.6 785.0 626.2 646.5 666.0

408.9 465.2 524.5 551.6 565.2 567.7 624.5 630.3 624.6 640.0 665.2 654.1 670.2 689.6 724.5 730.3 732.8 727.2 740.0 740.6 740.1 765.2 767.7 756.7 767.7 748.9 755.5 789.6 792.1 805.1 819.3 824.5 813.6 827.2 829.7 840.0 843.1 843.1 840.3 867.7 87 1.O 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 521.6 551.4 621.6 619.1 651.4 794.0 759.0 719.1 721.6 791.3 808.1 821.6 821.6 859.0 686.7 686.7 708.1 708.1 808.1 808.1 786.7 786.7 613.0 653.6 656.3

421.3 475.1 531.2 556.7 568.3 574.8 625.3 630.4 629.3 645.0 668.3 657.9 68 1.9 686.0 725.3 736.5 730.1 731.1 739.1 741.0 742.9 768.3 761.1 759.2 767.9 742.1 762.7 793.5 785.7 799.8 8 13.3 825.3 814.0 824.3 840.5 839.1 851.3 842.3 839.8 867.9 884.4 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 528.9 560.6 628.9 621.7 660.6 792.5 747.3 721.7 728.9 793.7 8 13.6 822.0 828.9 847.3 676.1 676.7 706.1 706.1 806.1 806.1 776.7 776.7 621.8 654.6 665.3

423.8 473.9 533.7 562.0 572.0 573.7 631.1 634.8 631.8 644.3 670.7 662.2 673.4 691.4 729.2 733.5 734.5 732.7 742.2 747.9 743.6 769.6 770.5 762.7 770.9 744.7 755.6 790.4 790.9 802.2 821.2 827.6 814.5 830.3 833.6 840.5 848.6 848.6 838.5 870.0 867.6 308.9 408.2 507.3 606.5 705.9 805.4 905.0 1004.6 520.8 553.3 620.0 619.6 653.1 801.6 756.6 718.6 719.2 791.4 810.6 818.4 818.7 853.5 684.6 684.6 713.0 713.0 811.2 811.2 782.8 782.8 626.3 657.7 656.9

409.7 469.0 529.3 554.2 566.9 573.8 626.7 627.1 627.1 645.0 666.4 658.2 674.7 686.7 726.2 727.3 731.9 731.1 742.5 742.4 742.7 765.8 769.6 762.1 771.1 746.1 757.0 787.1 79 1.4 802.4 816.3 825.6 815.5 829.1 831.4 841.9 841.8 846.4 840.7 870.6 865.2 304.1 406.5 506.0 605.5 704.9 804.4 903.8 1003.3 523.0 556.2 622.4 620.9 655.7 794.0 755.0 720.4 721.9 790.7 805.5 819.8 821.4 854.4 684.9 684.9 709.8 709.8 809.3 809.3 784.4 784.4 608.6 653.9 647.8

list structured line notation

( c c c c c c c C~C) ( C C c c c c c c c C) (CCClCCl) (C 1 c c c c 1) ( C C C 1 c c c 1) (C c 1 c c c c 1) ( c i cccccij ( C C 1 (C) C(C) c c c 1) (C c 1 (C) c c (C) c c 1) ( C C 1 c c c c c 1) (CCC 1 c c c c I)

(cc1(c~cccccI~ (cc (c)c 1 c c c c i j (C 1 (C c C) c c c c 1) (ccc1ccccc1) (C c 1 (C) c c (C) c c c 1) ( C C 1 C C ( C ) C C 1) ( C C 1 C C ( C ) c c 1) ( C C 1 c (C) c c c 1) ( C C 1 c (C) c c c 1) ( C C 1 C ( C ) C C C C 1) ( C C 1 c (C) c c c c 1) ( C C 1 c c (C) c c c 1) ( C C 1 c c (C) c c c 1)

J. Chem. Znf. Comput. Sci., Vol. 33, No. 2, 1993 213

LISTOPERATIONS O N CHEMICAL GRAPHS.3 Table I (Continued) ~~

~~~

Z[est.] no. 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149

compound 2-pentanol 2-methyl-2-pentanol 3-methyl- 1-butanol 4-methyl-2-pentanol 1-pentan01 2-methyl-3-pentanol 2,4-dimethyl-2-pentanol 3,3-dimethyl- 1-butanol 3-hexanol 2-methyl-2-hexanol 2-methyl- 1-pentanol 2,4-dimethyl-3-pentanol 4-methyl- 1-pentanol 2,3-dimethyl-3-pentanol 2-ethyl- 1-butanol 3-methyl-1-pentanol 5-methyl-3-hexanol 3-ethyl-3-pentanol 1-hexanol 4-heptanol

2,2,4-trimethyl-3-pentanol 3,5-dimethyl-3-hexanol 2-methyl-2-heptanol 6-methyl-2-heptanol 4-ethyl-3-hexanol 4 oct a noI 3-octanol 3,6-dimethyl-3-heptanol 3-methyl-2-butanone 2-pentanone 3-pentanone 3,3-dimethyl-2-butanone 4-methyl-2-pentanone 2-methyl-3-pentanone 4-methyl-3-pentanone 3-methyl-2-pentanone 3-hexanone 2-hexanone

-

2,4-dimethyl-3-pentanone 5-methyl-3-hexanone 2-methyl-3-hexanone 5-methyl-2-hexanone 4-heptanone 3-heptanone 2-heptanone

2,2,4,4-tetramethyl-3-pentanone 2,6-dimethyl-4-heptanone 2,2-dimethyl-3-heptanone 3-octanone 2-octanone acetic acid, ethyl ester propionic acid, methyl ester isobutyric acid, methyl ester propionic acid, ethyl ester acetic acid, propyl ester butyric acid, methyl ester isobutyric acid, ethyl ester acetic acid, sec-butyl ester acetic acid, isobutyl ester isopentanoic acid, methyl ester butyric acid, ethyl ester propionic acid, propyl ester acetic acid, butyl ester butyric acid, isopropyl ester isopentanoic acid, ethyl ester propionic acid, isobutyl ester butyric acid, propyl ester acetic acid, 1,3-dimethylbutyl ester propionic acid, butyl ester acetic acid, pentyl ester isobutyric acid, isobutyl ester hexanoic acid, methyl ester butyric acid, isobutyl ester acetic acid, 2-ethylbutyl u t e r butyric acid, butyl ester

Z[lit.]

Mmodel

'[model

Smodel

R model

682.7 717.6 719.3 744.1 750.4 758.0 775.9 778.8 780.4 817.3 818.4 821.2 821.2 823.7 825.9 828.8 838.2 843.1 853.0 875.4 88 1.5 883.1 916.4 951.1 953.3 975.5 982.0 986.6 640.9 666.3 676.4 693.1 721.2 733.0 733.0 734.8 764.8 767.9 779.0 816.7 820.0 836.5 853.4 865.8 868.7 900.0 954.7 964.7 966.0 968.8 600.0 615.2 671.0 694.2 696.3 705.6 744.6 743.8 757.7 761.3 784.0 792.6 796.2 827.6 838.4 852.8 881.5 885.1 891.4 896.4 900.0 907.0 940.3 957.0 979.4

669.8 7 13.0 718.7 735.0 753.6 758.8 778.1 778.0 772.4 813.0 821.2 845.3 818.7 828.7 823.8 821.2 837.5 844.0 853.6 872.4 918.3 893.6 913.0 935.0 963.9 972.4 972.4 993.6 630.9 666.8 659.4 685.6 732.0 723.5 723.5 733.4 759.4 766.8 787.6 824.6 823.5 832.0 859.4 859.4 866.8 897.0 989.7 978.2 959.4 966.8 591.4 612.5 676.6 684.0 691.4 712.5 748.2 726.3 756.6 777.6 784.0 784.0 791.4 816.3 849.2 849.2 884.0 891.4 884.0 891.4 913.3 912.5 949.2 961.7 984.0

680.8 715.8 722.8 743.0 755.6 757.7 776.6 779.9 77 1.6 815.8 807.9 836.8 822.8 825.4 821.9 822.5 833.8 842.0 854.6 864.7 ,907.4 884.8 915.8 949.1 967.7 964.7 971.6 992.3 631.0 667.8 662.1 685.5 731.8 724.2 724.2 736.1 759.3 767.8 786.8 823.3 821.5 836.0 856.6 859.3 867.8 897.9 984.5 976.8 959.3 967.8 596.6 612.2 676.4 692.8 687.0 709.4 757.1 732.1 751.0 773.4 790.1 783.3 787.0 823.4 854.0 847.3 880.5 887.4 883.3 887.0 911.5 909.4 944.5 957.9 980.5

682.9 721.2 720.4 743.1 755.1 760.6 779.0 773.0 786.4 817.2 821.0 828.6 817.7 801.3 824.7 820.1 846.3 850.8 853.0 882.2 863.8 891.1 914.0 938.0 961.7 978.9 980.5 989.0

659.1 706.1 714.8 719.3 753.4 766.8 766.0 774.6 778.0 805.5 827.9 852.5 814.3 833.9 830.9 819.6 839.2 867.8 852.8 876.9 922.2 897.5 905.0 918.9 971.0 976.3 977.9 997.9 628.2 661.1 662.9 681.1 722.7 729.5 729.5 734.8 764.0 760.5 795.4 825.6 830.6 821.4 865.1 863.4 860.0 897.0 988.3 981.6 962.9 959.4 579.3 615.1 684.5 685.4 683.1 716.2 754.7 726.4 747.6 777.8 786.5 789.2 782.6 822.6 848.1 853.7 890.3 884.6 888.6 882.0 923.0 915.1 954.8 953.4 989.7

637.8 672.6 67 1.3 678.8 731.5 729.0 729.0 736.4 764.7 767.5 784.5 823.0 821.8 827.9 858.3 859.7 863.6 857.4 973.1 958.3 955.9 960.6 614.6 622.4 675.9 701.8 703.5 711.4 754.3 739.7 760.0 767.4 790.9 790.8 795.6 819.4 846.3 846.8 880.0 886.8 882.8 889.6 897.9 897.4 935.7 951.8 972.1

~

list structured line notation (C c (0)c c C) (C c (C) (01c c C)

(c c c ( c c) c (ojc c j (C c c c c (0)c c C) (C c c c c c (0) c C) (c c (C) c c c (C) (0)c C) (C c (C) c (=O) C) (C c (=O) c c C) (C c c (=O) CC) (C c (C) (C) c (=O) C) (C c (=O) c c (C) C) (C c c (=O) c (C) C) (C c C(=O) c (C) C) (C c c (C) c (=O) C) c c C) (C c c (=O) (C c (=O) c c c C) c (C) C) (C c (C) c (=O) (C c c (=O) c c (C) C) (C c (C) c (=O) c c C) (C c (=O) c c c (C) C) (C c c c (=O) c c C) (C c c (=O) c c c C) (c c c c c (-0) c (c) (cj cj (CC c (=O) c c c c C) (CC (=O) c c c c c C) (C c (=O) 0 c C) OC) ( C C c (=O) (C c (C) c (=O) 0 C) (C c C(=O) 0 c C) 0 c c C) (C c (=O) (C c c c (=O) 0 C) (C c (C) c (=O) 0 c C) 0 c (C) c C) (C c (=O) 0 c c (C) C) ( C C (=O) 0 C) (C c (C) c c (=O) ( C C c c (=O) 0 c C) (C c c (=O) 0 c c C) 0 c c c C) (C c (=O) (CC c c (=O) 0 c (C) C) (CC (C) c c (-0) 0 c C) ( C C c (=O) o c c (C) C) ( C C c c (-0) 0 c CC) (C c (-0) 0 c (C) c c (C) C) (CC c (=O) o c c c C) ( C C (=O) 0 c c c CC) ( C C (C) c (=O) o c c (C) C) 0 C) (C c c c c c (=O) (CC c C(=O) 0 c c (C) C) (CC (=O) o c c (C C) C C ) ( C C c c (=O) 0 c c c C)

214 J . Chem. In$ Comput. Sci., Vol. 33, No. 2, 1993

DUVENBECK A N D ZINN

Table I (Continued) no.

compound

IElit.1

M model

150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 21 1 212 213 214 215 216 217

hexanoic acid, ethyl ester propionic acid, pentyl ester acetic acid, hexyl ester propyl ether butylethyl ether butyl ether 1-octene 2-methyl-3-ethyl-2-pentene 2,3,4-trimethyl-2-pentene 2,5-dimethyl-2-hexene 2,3-dimethyl- 1-hexene 2-methyl-3-ethyl- 1-pentene

982.9 990.5 996.5 681.6 688.5 876.3 781.2 778.4 765.9 749.9 739.3 735.0 716.8 692.8 715.4 704.3 703.4 697.2 69 1.2 690.4 687.5 691.2 684.6 681.8 68 1.8 678.1 678.3 670.6 659.1 657.9 656.7 654.9 650.4 650.0 646.9 644.7 640.6 637.7 636.9 635.5 614.7 626.1 625.1 604.6 603.6 596.9 612.7 602.8 597.8 592.6 592.1 592.0 582.3 580.1 558.8 561.9 556.2 551.4 549.4 514.3 506.8 500.0 504.9 488.0 481.8 450.3 406.6 416.9

984.0 984.0 991.4 691.4 691.4 891.4

990.1 983.3 987.0 684.8 694.4 884.8

786.0 794.8 766.3 760.6 750.5 753.0 707.3 707.3 717.0 708.3 702.2 696.3 695.4 685.8 685.8 687.9 687.9 686.0 676.5 683.9 667.8 667.8 640.6 653.6 660.5 660.5 650.5 651.1 647.5 645.0 65 1.9 649.0 631.5 623.0 623.0 623.0 609.6 610.4 601.5 601.5 603.7 603.7 595.4 585.8 585.8 576.5 586.0 583.9 548.0 558.0 558.0 545.0 551.1 511.1 507.5 501.5 501.5 483.9 486.0 442.5 417.2 417.2

783.4 802.2 764.9 770.1 744.8 764.3 702.6 702.6 723.8 700.9 701.6 695.2 700.5 683.7 683.7 695.2 695.2 683.4 678.7 680.3 660.9 660.9 646.7 652.7 662.9 662.9 651.7 651.7 661.4 638.5 655.2 644.3 632.4 622.0 622.0 627.5 606.7 610.8 603.0 603.0 599.4 599.4 597.6 580.7 580.7 581.5 583.4 580.3 546.5 554.4 554.4 545.3 552.0 509.2 507.8 500.1 500.1 483.0 483.4 436.8 419.5 419.5

2,2-dimethyl-cis-3-hexene 2,2-dimethyl-trans-3-hexene 2,4,4-trimethyl-2-pentene 2,4,4-trimethyl-l -pentene 2,3-dimethyl-2-pentene 3-ethyl-2-pentene 2-methyl-2-hexene cis-3-heptene trans-3-heptene 3-methyl-trans-3-hexene 3-methyl-cis-3-hexene 1-heptene 2-ethyl- 1-pentene 2-methyl- 1-hexene 3,4-dimethyl-trans-2-pentene

3,4-dimethyl-cis-2-pentene 3-methyl-2-ethyl- 1-butene 4-methyl- 1-hexene 4-methyl-trans-2-hexene 4-methyl-cis-2-hexene 2,3-dimethyl- I-pentene 5-methyl- I-hexene 3-ethyl-1-pentene 3-methyl- 1-hexene 2,4-dimethyl-2-pentene 2,4-dimethyl- 1-pentene 3,4-dimethyl- 1-pentene 4,4-dimethyl-cis-2-pentene 4,4-dimethyl-trans-2-pentene 3,3-dimethyl- 1-pentene 2,3-dimethyl-2-butene 4,4-dimethyl-l-pentene cis-2-hexene trans-2-hexene 3-methyl-trans-2-pentene 3-methyl-cis-2-pentene 2-methyl-2-pentene cis- 3-hexene trans- 3-hexene 2-ethyl-I-butene 1-hexene 2-methyl- 1-pentene 2,3-dimethyl- 1-butene 4-methyl-trans-2-pentene 4-methyl-cis-2-pentene 3-methyl- 1-pentene 4-methyl- 1-pentene 2-methyl-2-butene 3,3-dimethyl- 1-butene trans-2-pentene cis-2-pentene 2-methyl- 1-butene 1-pentene 3-methyl- 1-butene trans-2- butene cis-2-butene

I€

model

S model

R model

list structured line notation

977.1 916.9 985.1 689.2 690.0 883.5 782.3 194.5 766.6 756.4 748.7 751.3 708.5 708.5 711.9 701.6 701.9 699.4 692.8 686.3 686.3 687.7 687.7 683.7 676.1 682.2 671.5 671.5 643.0 651.5 657.8 657.8 651.8 649.5 644.4 643.4 649.6 646.9 63 1.8 623.3 623.3 620.2 608.5 610.1 599.8 599.8 605.4 605.4 595.9 588.9 588.9 579.1 585.5 584.4 552.3 558.8 558.8 547.1 552.7 510.6 512.5 502.4 502.4 487.3 488.2 449.8 415.4 415.4

985.4 988.1 981.5 697.6 693.3 896.5 785.3 778.6 759.4 757.3 750.8 755.6 707.5 707.5 713.7 709.8 705.8 690.2 696.5 686.1 686.1 686.5 686.5 685.8 674.3 687.6 664.7 664.7 630.8 652.5 659.5 659.5 652.8 647.3 647.1 646.0 649.3 649.8 631.3 622.2 622.2 624.7 629.3 607.2 600.7 600.7 607.5 607.5 598.9 588.5 588.5 573.2 586.4 588.1 546.3 556.7 556.7 548.0 547.7 521.2 510.7 503.2 503.2 487.0 486.9 445.1 417.8 417.8

( C C c c c c (=O) 0 c C) (C c c (=O) o c c c c C) (C c ( = O ) 0 c c c c c C) (C c c 0 c c C ) ( C C c c 0 c C) (C c c c o c c c C) (C = c c c c c c C) ( C C (C) = C ( C C) C C ) (C c (C) = c (C) c (C) C) (C c (C) = c c c ( C ) C) (C = c (C) c (C) c c C) (C = c (C) c (C C) c C) (C c (C) ( C ) c = c c C) (C c (C) (C) c = c c C) ( C C (C) = c c (C) (C) C) (C = c (C) c c (C) (C) C) (C c (C) = c (C) c C) (c c = c (C C ) c C) (CC(C) = C C C C ) ( C C c = c c c C)

(cc c = c c c c j (C c c (C) = c c C) (C c c (C) = c c C) (C = c c c c c C) (C = c (C C) c c C) (C = c (C) c c c C)

(CC=C(C)C(C)C) (CC=C(C)C(C)C) (C = c (C C) c (C) C) (C = c c c (C) c C) ( C C = c c (C) c C) (C c = c c (C) c C) (C = c (C) c (C) c C) (C = c c c c (C) C) (C = c c (C C) c C) (C = c c (C) c c C) (C c (C) = c c (C) C) (C = c (C) c c (C) C) (C = c c (C) c (C) C) ( C C = c c ( C ) (C) C) ( C C = c c (C) (C) C) (C = c c (C) (C) c C) (CC(C)=C(C)C) (C = c c c (C) (C) C) ( C C = c c c C) ( C C = c c c C) (C c = c (C) c C) (C c = c (C) c C) (C c (C) = c c C) (C c c = c c C) ( C C c = c c C) (C = c (C C) c C) (C = c c c c C) (C = c (C) C C C ) (C = c (C) c (C) C) (C c = c c (C) C) ( C C = c c (C) C) (C = c c (C) c C) (C = c c c (C) C) ( C c ( C ) = c C) (C = c c (C) (C) C) (C c = c c C) (C c = c c C) (C = c (C) c C) (C = c c c C) (C = c c (C) C) (C c = c C) (C c = c C)

Additionally, the used line notations2’ are shown

calculated and summarized for identical atom types (vertices). The resulting model parameters are based on four different atomic-level indexes:

the number of identical vertices N the atomic-level x indexes O,$ and the electrotopological state S

J. Chem. ZnJ Comput. Sci., Vol. 33, No. 2, 1993 215

LISTOPERATIONS ON CHEMICAL GRAPHS.3 4

lk = A,

+ I:Aka? 11‘

a1

-

NI

,. I.

c a7 - c ’4

a: =

O5,

1 .I

I

a:

-

I

* .

SI I

.

..

Figure 1. Vertex equations 11 modeling retention indexes of alkane compounds. A,k’s are the regression coefficients; af’sare the regression parameters based on the atomic level topological indexes O F , I[, and S. The index j is ranging from 1 to 4 corresponding to the atom types CH3- to -C4.

1,1,3-Trim e t hyIc y cIohex ane ZH3

1 1’c Of

1s

“ci, 6G,b -ir

-ir

Jr

$ r + + f + & us)

us)

m

D u ,

NDu

$‘‘r

%-Lii~Da

N = 3

3.000 1.577 7.160

Figure 3. Comparison of fitted retention indexes I[est.] of 7 1 acyclic and cyclic alkanes using vertex models with data from literature I[lit.]. E M ~=Emean absolute error; ESE= standard error.

I[est.]= 57.2 + 176.0 EMAE=IOS

+ 30.2 % E~~=13.3

N = l 0.577 1.394 0.980

C

;X;

u

N = l

0.500 1.707 0.650

Figure 2. Example of the calculation of the regression parameters of the vertex models in Figure 1 for 1,1,3-trimethylhexane. In the first step, the atomic-level indexes O[, I[, and S are calculated for each atom. In the second step, the indexes corresponding to the same atom types are summed up to form regression parameters.

300

see

7ee

see

iiee

I[est.]

Figure 1 shows the vertex equations, modeling the retention indexes of alkane compounds. The coefficientsA are obtained by regression analysis. As an example for the calculation of the regression parameters, a! corresponding to the molecular structure 1,1,3-trimethyllcyclohexanein Figure 2 is chosen. In this example, vertices of degree 1 are appearing three times, of degree 2 four times, and once for degrees 3 and 4. The atomic-level indexes are calculated for each atom and are summed up according to the vertex degree. The results of fitting acyclic and cyclic alkanes with the four different vertex models are shown in Figure 3. The fitted

Figure 4. Comparison of fitted retention indexes I[est.] of 7 1 acyclic and cyclic alkanes using the reference model with data from literature I[lit.]. E M A E= mean absolute error; ESE= standard error.

retention indexes Z[est.] are plotted vs literature data Z[lit.]. The results are very similar with respect to the mean absolute error EMAEand the standard error of estimation ESE. The resulting coefficients are always highly significant with the exception of the A0 coefficients that must be neglected. With

216 J. Chem. In5 Comput. Sci., Vol. 33, No. 2, 1993

DUVENBECK AND ZINN

Table 11. Bond Types, Corresponding Regression Coefficients, and Regression Parameters Used in Edge Equations Modeling Retention Indexes of Alkanes bond

j

C H 3-C H 3 CHj-CH2CHj-CHC CH3-Ct -CH2-CH2-CHz-CH< -CH2-C+ >CH-CH< >CH-C*

>c-c4

coeff

1 2

Bik B2k

3 4 5 6 7 8

B3k

parameter

a: a: a: a: a: a," a: a:

B4k

B5k &k

B7k Bgk

note ethane only

3am

I/

,

,

,

.

~,

,

,

,

,

,

.i

not present in data set E,&

9

E,-12

5

Il"

Mi

I

p: pf

=

-

c('6'+ 'Y)i

M-model 't-model

I

C(S' + S')!

S-model

I

= x(R' + R#

R-model

I

Figure 5. Edge models Ik for the calculation of retention indexes of alkane compounds. B,k's are the regression coefficients; $3 are the regression parameters based on the atomic-level indexes ' 5 , S, and R. The superscripts 1 and r mean 'left' and 'right' atoms of a bond. Bond types j correspond to Table 11.

r

1

=?quYXO 2823 1373

I = E,

+

3.570 B,

+ 2.227

E5

+

3.200

+ 1.373

E2

+ 8.469

B3

+ 2.682

E,

B,

Figure 6. Edge equation of 2,2,3-trimethylhexane for the S model. The bond indexes are assigned to the edges of the compound's skeleton. They are built by summarizing the electrotopological indexes of both atoms of the bond. The regression parameters in the edge equation are the sums of the bond indexes corresponding to the same bond types. The bond types -CH2-C+ (j = 6) and >CH-CK< (j = 7) are of no relevance for this compound.

respect to the reference model in Figure 4, no improvement in fitting the index data can be observed. The reference model is based on the molecular connectivity indexes ' x and 3x and fits the same data set as used for the vertex models. The chosen reference model gave the best results within a group of several models using topological indexes on molecular level. EDGE MODELS FOR FITTING RETENTION INDEX DATA To improve the quality of fitting, the vertex models were extended in order to form edge models. The building of edge models is analogous to vertex models considering molecular bonds (edges) instead of atoms. For each bond, a bond index is calculated and summed up to a regression parameter. Table

Figure 7. Comparison of fitted retention indexes I[est.] of 7 1 acyclic and cyclic alkanes using edge models including a regressionconstant Bok with data from literature I[lit.]. E M A E= mean absolute error: ESE= standard error.

I1 gives an overview of the used bond types, corresponding parameters, and coefficients. With respect to the used data set, the number of regression coefficients is twice the number in the case of the vertex models. The equations for edge models fitting alkane data and the formulation of the associated regression parameters are summarized in Figure 5 . The edge models differ in the types of the regression parameters j3k from the vertex models. Because of the same results which were obtained by using O E and N indexes in the case of the vertex models, the O[ model was discarded and substituted by a model considering the R index. R is a reduced form of the electrotopological index S, considering only bonded contributions as described by Hall et aLzl An example of an edge model equation based on the S index for 2,2,3-trimethylhexane is given in Figure 6. The topological indexes on bond level are assigned to the edges of the compound. The regression parameters are derived by summing up bond indexes for the same bond types. The bond types - C H S f (j = 6) and >CH-CH< (j = 7) are of no relevance for this compound. FITTING RETENTION INDEXES OF ACYCLIC AND CYCLIC ALKANES The described edge models were used to fit retention indexes of the standard data set of 7 1 acyclic and cyclic alkanes. The results are shown in Figure 7, where the fitted values are plotted versus the literature data. The mean absolute errors EMAE and the standard errors ESEof the four edge models are smaller in comparison with the reference and vertex models. Great similarity is to be mentioned between the edge models. A statistical analysis of the resulting regression coefficients shows that they are all significantly different from 'V,even the BOcoefficient.

LISTOPERATIONS ON CHEMICAL GRAPHS. 3

J. Chem. Znf. Comput. Sci., Vol. 33, No. 2, 1993 217 Table III. Coefficients and Corresponding Statistical Parameters from Regression Analysis of Alkane Data Set Using Edge Models of Figure 8 ~

coeff type

coeff

SE

t value

sig level

M

Br I CH3-CHr CH&H< CH,-C< -CHrCHr -CH2-CH< -CHrC< >CH-CH< >CH-C< Br2 CH,-CH?CH&H< CH,-C< -CHz-CH2-CH&H< -CH2-C< >CH-CH< >CH-C< Br3 CH3-CH2CH,-CH< CH3-C< -CH&H2-CH&H< -CHz-C< >CH-CH< >CH-C
CH-CH< >CH-C
C4H2 >C=CH>C=C