On Topological Indices, Boiling Points, and Cycloalkanes - Journal of

For this latter disappointing result at least two possible reasons are obvious: First, of course, there is no warranty that the good descriptor (combi...
0 downloads 0 Views 206KB Size
788

J. Chem. Inf. Comput. Sci. 1999, 39, 788-802

On Topological Indices, Boiling Points, and Cycloalkanes Gerta Ru¨cker and Christoph Ru¨cker* Institut fu¨r Organische Chemie und Biochemie, Universita¨t Freiburg, Albertstrasse 21, D-79104 Freiburg, FRG Received March 4, 1999

The experimental boiling points (bp) of saturated hydrocarbons (acyclic through polycyclic) up to decanes are systematically compiled. The bp values are classified into groups of lower or higher reliability according to the accuracy and frequency with which they were reproduced by independent researchers. For each hydrocarbon structure the values of several simple topological indices (TI) of widely varying origin are given, including the values of molecular walk counts of various lengths and their sum. The sensitivity of the TIs for structural changes within comprehensive groups of cyclic saturated hydrocarbons is evaluated, and the total walk count is found to be most sensitive. By multilinear regression structure-bp correlations are obtained for various comprehensive compound samples. Both the detour index and the walk counts are found to play a major role in the best models. Comparison of the bp models obtained with those from the recent literature reveals significant improvements for both cyclic and acyclic alkanes, which is attributed in part to the higher quality of experimental data, in part to the use of novel descriptors, and in part to the use of a more diverse pool of descriptors to select from. Despite this a descriptor combination allowing to accurately model cycloalkane bps is not yet found. INTRODUCTION

The boiling point at normal pressure (bp) has become a benchmark property for the evaluation of descriptors for structure-property relationship studies.1 Numerous models for the bp of saturated acyclic hydrocarbons (the alkanes) were established in terms of more or less simple graph invariants, the so-called topological indices (TIs), a few of which from the past decade are given in a later section (Table 7).2 For (poly)cyclic saturated hydrocarbons, on the other hand, structure-bp studies, though highly desirable, are scarce. Such studies suffered from problems both with the data and with the TIs: First, the diversity in the structures of (poly)cyclic saturated hydrocarbons is overwhelming (fused, bridged, spiro ring systems, bearing as substituents open-chain or again (poly)cyclic attachments), so as to render the include/exclude decision everything but clear-cut. Second, a small fraction of all possible (poly)cycloalkanes even of small size has been synthesized, and their bps, if obtained at all, were determined mostly during purification (distillation) and as such are of low precision. Third, the problem of stereochemistry, existent but neglected in most studies of acyclic alkanes, has to be dealt with in the case of cycloalkanes. Most substituted or polycyclic cycloalkanes exist as diastereomers differing in their bps, whereas experimental data are often from mixtures of unknown composition of stereoisomers. On the other hand, TIs by definition do not include stereochemical information. Fourth, it is sometimes not straightforward to extend the definition of a TI, first given for acyclics, to cyclic compounds. This is particularly so for a TI based on paths, since in (poly)cyclics there may exist several paths of equal length as well as different lengths between a pair of vertices.3 One result of Hosoya’s pioneering study (as late as 19724) was that (poly)cycloalkane bps exhibit linear dependence on

log(Z) and possibly on the number of rings, nr. Even far later others included in their alkane-centered investigations some cycloalkanes, such as Lukovits5 and Zinn.6,7 Lukovits,8 Trinajstic´,9 and ourselves10 studied the detour index, a TI recently introduced especially for cyclic compounds, and its correlation with bps for mixed samples of cyclic and acyclic compounds. Plavsic´ et al. devoted a study to the monocycloalkanes of up to six carbon atoms, using an index Z* derived from the Z matrix.11 Das et al. found for a sample of monocycloalkanes nonlinear correlations of the bps with the Wiener, Schultz, and Szeged indices and a high linear interrelation among these indices.12 Diudea regressed the bps of some monocycloalkanes against indices derived from the Wiener, Szeged, and Cluj matrices as well as from the corresponding reciprocal matrices.13 Estrada regressed the bps of some mono- and bicyclic alkanes against descriptors derived from the edge adjacency matrix.14 Finally, Lukovits and Linert compared combinations of several Wiener-type indices for the bps of a mixed sample of some monocyclic and all acyclic octanes.3c In contrast to all the work done in constructing and comparing molecular descriptors for structure-bp correlations, far less attention has been paid to the data, that is, to the compounds included in such studies and to their bps. The samples of cyclic compounds used were ill-defined in most of the above studies, in that typically a particular compound was included if it happened to appear with its bp in one or another handbook, or it simply was selected without any comment given, resulting in no two of these studies dealing with the same sample. Moreover, several bp values used in the above studies are more or less incorrect, so that in certain cases the values for a compound included in two studies differed considerably.15 For these reasons the qualities of the above structure-bp relationships for cycloalkanes or

10.1021/ci9900175 CCC: $18.00 © 1999 American Chemical Society Published on Web 07/15/1999

TIS,

BPS, AND

CYCLOALKANES

mixed alkane/cycloalkane samples cannot be compared. We therefore undertook to compile a list as comprehensive as possible of all saturated hydrocarbons with normal bps known as reliably as possible up to a certain molecular size limit. Earlier we had used for the first time a comprehensive sample of acyclic and cyclic alkanes with known bps at 760 Torr up to size n ) 8.10 In the present work first we extend that compilation up to n ) 10 (including bps measured at g720 Torr, extracting the data from the most comprehensive base for this kind of data, the Beilstein database) and add the values for a few important TIs, in the hope to provide a list which may be useful to others in future studies. Second, we subject a few simple TIs of potential or demonstrated value for bp studies to a test for structural sensitivity within two comprehensive samples of cyclic alkanes. Third, we propose best (within our limited pool of descriptors) 1-descriptor, 2-descriptor, etc., multilinear models for the bp within several comprehensive samples of saturated hydrocarbons. We compare our results to those found in the recent literature. Fourth, we demonstrate how improper use of literature data and use of incomplete compound samples can result in structure-property correlations of deceptively high quality. RESULTS AND DISCUSSION

Experimental Bps of Saturated Hydrocarbons up to n ) 10. A data file of all saturated hydrocarbon structures up to n ) 10 with known normal bps was compiled by the procedure detailed in the Experimental Section (531 structures). The file contains for each structure a shorthand name (composed in an obvious, semisystematic manner), the size (number of C atoms, n), the number of rings (nr), for monocyclics the size of the ring (sr), and what we think is the best available bp value, based on a literature survey only, that is, before undertaking any structure-bp correlations. A classification of bps is made as follows: Bps are regarded as “reproduced” (rel ) 1) if at least two independent research groups reported values differing by no more than 4 °C, otherwise a bp is “not reproduced” (rel ) 0). As an exception, a bp value resulting from the careful measurements of Rossini (National Bureau of Standards) is regarded “reproduced” even if it is the only one available.16 This qualification enables us to exclude completely unchecked data from structure-bp correlations; it had proved useful in the earlier work.10 Since the quality even of the reproduced experimental bps may still be too lowsafter all, the bp values result from measurements done all over the world and during a time period of well over a centuryswe went a step further in this work. Bps which were reported by at least four independent research groups within a span of no more than 2 °C are classified as “known beyond reasonable doubt” (rel ) 2). Also included in the data file are the values of the following simple TIs: Wiener index (W), Hosoya index (Z; calculated in this work in a novel manner, see Experimental Section), Randic index (χ), detour indices (ω and ω′),10 diameter (longest distance, dia), molecular walk counts of different lengths k (mwck), total walk count (twc),17 and path counts of length k including and excluding cyclic paths (pk and apk). These descriptors are defined for cyclic as well as acyclic graphs without any problems. Descriptors mwck and

J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 789

twc were introduced by us earlier: They are conceptually extremely simple, extremely easily calculated by a Morgan type summation procedure, and highly discriminant within acyclic alkanes (first degeneracies of twc in three pairs of dodecanes). In (cyclo)alkanes twc is a good measure of structural complexity.17 Walk counts had scarcely been used in structure-bp studies when the present work was begun;18 for their fundamental nature we included them here for a test. The data file is too voluminous to be printed here; it is available from the authors in SAS-readable electronic form. The 531 structures along with their respective shorthand names as an identifier are shown in Figure 1, where they appear in the order of increasing n, nr, sr (monocyclics only), and complexity as measured by twc.17 By this ordering it should be easy to verify a particular structure (or its absence) in Figure 1. Table 1 (an extract from the data file) gives, in the same order, the bp values, their reliability status (rel ) 0/1/2), and the twc values from the file. Table 2 gives an overview of the number of structures contained in Table 1/Figure 1 and of all possible (polycyclo)alkane structures, ordered by size n and number of rings nr. In each cell of Table 2 in the upper line the number of structures of a particular n and nr in Table 1 is given and in the lower line the number of all (mathematically) possible cycloalkane structures (neglecting stereoisomerism) of that n and nr, as generated by the program MOLGEN.19 Though MOLGEN for high nr includes many structures which would be considered “impossible” for hydrocarbons by most chemists, at least for low nr it gives the correct number of chemically reasonable structures. Comparison of the upper and lower lines in Table 2 demonstrates how small a fraction of all cycloalkanes is actually known with bps at normal pressure. Comparison of the upper line entries of Table 2 with the corresponding counts of structures included in refs 3c, 5-9, and 12-14 demonstrates that these studies were far from comprehensive within their respective sets of structures. Sensitivity of TIs within Comprehensive Cycloalkane Samples. We decided to evaluate the sensitivity of some TIs for structure changes in cycloalkanes in two comprehensive samples thought to be typical. The challenge in the first test sample should be in a large number of monocyclic isomers differing in ring size and thus in number, size, and structure of side chains. Table 3 (an expansion of one column of Table 2) gives the number of structures of monocycloalkanes from cyclopropane to the monocyclodecanes, ordered by size n and ring size sr, where the meaning of the upper and lower entries is as in Table 2. It is seen from Table 3 that of the small-ring monocycloalkanes most (or at least most bps) are unknown. The monocyclic decanes (475 structures) were chosen for the test (sample 1). The second test sample was chosen to challenge the TIs due to a large variety of polycyclic systems within a narrow size range. It consists of all mathematically possible saturated hydrocarbon structures including (poly)cycloalkanes of n ) 4-6, 105 structures according to Table 2 (sample 2). All structures in samples 1 and 2 were generated by MOLGEN, and those not already present in the data file (i.e., 402 monocyclodecanes and 64 polycyclic butanes through hexanes) were additionally included as a new section in the

790 J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999

RU¨ CKER

AND

RU¨ CKER

TIS,

BPS, AND

CYCLOALKANES

J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 791

792 J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999

RU¨ CKER

AND

RU¨ CKER

Figure 1. Saturated hydrocarbon structures (n ) 1-10) for which bps are available, with their respective shorthand names. The structures appear in the order of increasing n, nr, sr (monocyclics only), and twc.

TIS,

BPS, AND

CYCLOALKANES

J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 793

Table 1. Bp, Reliability Status of Bp, and twc for the 531 Saturated Hydrocarbons in Figure 1a name

bp (°C)

rel

twc

-161.5 -88.6 -42.1 -32.8

2 2 2 1

0 1 5 9

-0.5 -11.7 0.7 12.6 8

2 2 1 2 0

16 18 32 28 51

n5 2mn4 22mn3 1ec3 12mc3 11mc3 1mc4 c5 bc111p bc210p s22p mbc110b

36.0 27.8 9.5 35.9 32.6 20.6 36.3 49.3 36 46 39 33.5

2 2 2 2 2 2 2 2 0 2 2 0

44 53 70 93 107 116 89 75 147 150 166 188

n6 2mn5 3mn5 23mn4 22mn4 1pc3 1ipc3 1e2mc3 1e1mc3 123mc3 112mc3 1ec4 13mc4 12mc4 11mc4 1mc5 c6 bc211hx bcpr bc220hx bc310hx s23hx mbc210p 13mbcb

68.7 60.3 63.3 58.0 49.7 69 58.3 63 57 63 52.6 70.7 59 62 53.6 71.8 80.7 71 76 83 81 69.5 60.5 55

2 2 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 1 0 1

111 134 142 165 185 245 282 301 331 354 377 240 270 278 299 225 186 390 403 403 408 457 527 686

n7 2mn6 3mn6 3en5 24mn5 23mn5 22mn5 33mn5 223mn4 1bc3 1sbc3 1m2pc3 12ec3 1m1pc3 1m2ipc3 1tbc3 11ec3 1e23mc3 1m1ipc3 11m2ec3 12m1ec3 1123mc3 1122mc3 1pc4 1ipc4 1e3mc4 1e2mc4 1ec5 13mc5 12mc5 11mc5

98.5 90 92 93.5 80.5 89.8 79.2 86.1 80.9 98 90.3 93 90 84.9 81.1 80.5 88.6 91 81.5 79.1 85.2 78 76 100.7 92.7 89.5 94 103.5 91.3 95.6 87.9

2 2 2 2 2 2 2 2 2 2 1 0 1 0 0 0 1 0 1 1 0 1 1 0 1 0 0 2 2 2 2

268 329 354 378 399 436 489 526 588 615 756 791 836 891 905 940 956 1001 1022 1077 1098 1275 1355 607 705 732 751 590 670 690 766

n1 n2 n3 c3 n4 2mn3 1mc3 c4 bc110b

name

bp (°C)

rel

twc

name

bp (°C)

rel

twc

name

bp (°C)

rel

twc

1mc6 c7 dcprm bc221h bc311h bc320h bc410h s33h s24h 2mbc310hx 6mbc310hx mbc211hx mbc310hx 13mbc111p 14mbc210p 11ms22p 122mbcb tc410024h tc311024h tc221026h tc410027h tc410013h tec320h tec410h

101 118.4 102 105.5 110 110.5 116 96.5 98.5 100 103 81.5 92 71.5 74 78 84 105 107 106 110 107.5 108.5 104

2 2 1 2 0 1 2 0 2 0 1 0 2 0 0 0 1 1 0 1 1 0 1 0

547 441 987 987 1003 1037 1063 1204 1216 1253 1320 1327 1433 1767 1894 1913 2330 1867 1868 1890 2113 2234 3131 3138

bc420o bc510o 2mbc221h s34o 7mbc221h 2mbc320h s25o 1mbc221h 7mbc410h 1mbc410h 33mbc310hx 14mbc211hx 66mbc310hx 2244mbcb 1223mbcb tc510035o tc510024o tc3210o tc3300o 3mtc2210h ds2121o 1mtc2210h ds2022o tec330o

133 141 125 128 128 130.5 125 117 138 125 115 91 126.1 104 105 142 149 136 125 120.5 103 111 115 137.5

1 1 1 1 0 0 1 1 1 2 1 0 0 0 1 1 0 2 1 0 0 1 0 0

2593 2692 2962 3043 3068 3122 3123 3280 3452 3772 3864 4414 4656 7613 8886 4670 4938 4943 5162 6006 6014 6679 6682 8457

n8 2mn7 3mn7 4mn7 25mn6 3en6 24mn6 23mn6 34mn6 22mn6 3e2mn5 234mn5 33mn6 224mn5 3e3mn5 223mn5 233mn5 2233mn4 1pec3 1spec3 b2mc3 1nepec3 5msbc3 1e2pc3 ib2mc3 11m2pc3 1m12ec3 11m2ipc3 112m2ec3 11223mc3 1ibc4 p3mc4 1sbc4 12ec4 1234mc4 1133mc4 1pc5 1ipc5 1e3mc5 1e2mc5 124mc5 1e1mc5 123mc5 113mc5 112mc5 1ec6 14mc6 13mc6 12mc6 11mc6 1mc7 c8 bcprm bc330o bcb

125.7 117.6 118.9 117.7 109.1 118.5 109.4 115.6 117.7 106.8 115.6 113.5 112.0 99.2 118.2 109.8 114.8 106.5 128 117.7 124 106 115.5 108 110 105.9 108.9 94.4 104.5 100.5 120.1 117.4 123 119 114.5 86 131 126.4 121 124.7 115 121.5 117 104.9 114 131.8 121.8 122.3 126.6 119.5 134 149 129 137 136

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 1 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 2 2 2 1 1 2 1 2 2 2 2 2 1 2 2 2 1 2 1

627 764 838 856 911 928 997 1068 1136 1142 1152 1296 1301 1317 1441 1536 1609 2047 1493 1912 1997 2116 2172 2176 2190 2860 3100 3209 3967 4654 1646 1829 1837 2018 2780 3018 1449 1701 1736 1820 2033 2071 2100 2211 2368 1401 1572 1583 1657 1829 1279 1016 2385 2571 2571

n9 2mn8 3mn8 4mn8 26mn7 3en7 4en7 25mn7 24mn7 23mn7 35mn7 2m4en6 22mn7 34mn7 2m3en6 235mn6 3m4en6 225mn6 33mn7 44mn7 234mn6 224mn6 24m3en5 3m3en6 244mn6 223mn6 33en5 233mn6 22m3en5 334mn6 23m3en5 2244mn5 2234mn5 2334mn5 2233mn5 1hxc3 1shxc3 1m2pec3 12pc3 1e1bc3 nepelmc3 11m2ibc3 tb22mc3 12m12ec3 112233mc3 3pec4 1bc5 1ibc5 1m3pc5 1sbc5 13ec5 1m2pc5 12ec5 1m3ipc5 1m2ipc5

150.8 142.8 144 142.4 134 143 142.1 136 133.5 140.5 136 133.8 132.7 140.6 138 131.3 140.4 124 137.3 135.2 139 126.5 136.7 140.6 130.7 133.6 145 137.7 133.8 140.5 142 122.3 133 141.5 140.2 149 143 153 142 140.2 125 125.5 121 130.8 124.5 148.7 156.6 148 148.3 154.3 148.2 149.5 150.5 141 145

2 2 2 2 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 2 0 0 0 1 1 0 1 0 0 0 2 2 1 2 0 1 1 1 1

1439 1763 1942 2019 2098 2188 2263 2290 2429 2522 2543 2610 2764 2767 2833 2948 3002 3129 3196 3337 3364 3472 3485 3672 3712 3849 4056 4089 4165 4273 4495 4524 4555 4938 5691 3553 4696 4930 5629 6524 7708 7982 10513 11691 17370 4833 3451 3903 4292 4373 4518 4525 4814 4961 5263

1m1pc5 12m4ec5 13m4ec5 12m3ec5 13m2ec5 1tbc5 11ec5 11m3ec5 13m1ec5 1m1ipc5 11m2ec5 1234mc5 12m1ec5 1134mc5 1124mc5 1123mc5 1133mc5 1223mc5 1122mc5 1pc6 1ipc6 1m4ec6 1m3ec6 1m2ec6 135mc6 124mc6 123mc6 1m1ec6 114mc6 113mc6 112mc6 1ec7 14mc7 13mc7 12mc7 11mc7 1mc8 c9 bc331n dc4m bc421n bc430n bc520n 1cpc6 bc610n 2mbc222o 2mbc321o 6mbc321o 3mbc330o s44n s35n 2mbc330o s26n mbc222o 2ebc221h mbc321o 23mbc221h ebc221h 22mbc311h 4ms25o mbc510o 22mbc221h 12mbc221h 1ms25o 17mbc221h 77mbc221h 66mbc311h ebc410h 15ms33h 77mbc410h 266mbc310hx 155mbc211hx 1m5ebc310hx 135mbc310hx tc331037n tc421037n tc421024n tc430037n ds2122n 11cprc3

145 142.5 138 147 151 145 151 133 135.5 143 138 135.9 142.5 127 129.5 134 118.2 138 135 156.7 154.8 150.8 150 154.3 139.5 144.8 149.4 152 136 136.6 145.1 163.7 153.8 151 157 150 168.2 175 169 161.8 163 163 164 157.8 168 159 155 154 152.5 157 159 149 153 145 152 148.2 151 146.7 151.5 149 151 143 138.8 146.5 142 148.2 150 150.5 132.2 154.5 141 131 135 127.2 164.6 162.5 166 161.4 142.5 147.8

0 1 1 0 0 1 2 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 1 1 1 2 1 2 1 2 2 1 1 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 1 1 0 0 1 0 1 0 0 0 1 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0

5317 5335 5399 5565 5629 5630 5775 5887 6139 6230 6413 6424 6591 6825 7047 7228 7409 7450 8309 3396 4020 4020 4092 4325 4680 4779 5016 5037 5247 5396 5789 3256 3652 3725 3894 4416 2945 2295 5817 5817 5862 6260 6359 6360 6695 7119 7159 7289 7443 7586 7658 7716 7926 7940 7947 8191 9187 9202 9780 9799 9808 9854 10224 10240 10550 10550 10895 10944 11862 12415 14534 14924 15302 15660 11383 11951 12060 12493 15357 15500

RU¨ CKER

794 J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999

AND

RU¨ CKER

Table 1. (Continued) name

bp (°C)

rel

twc

name

bp (°C)

rel

twc

name

bp (°C)

rel

twc

name

bp (°C)

rel

twc

scprnorb 3mtc3210o ds2023n etc2210h 33mtc2210h 17mtc2210h 12mtc2210h tec33100n tec43000n

140.7 161 146 137.5 137.5 131 128 168.3 153

0 0 0 0 1 0 0 0 1

15500 15959 18029 20122 20325 21804 24329 21609 23294

n10 2mn9 3mn9 4mn9 27mn8 5mn9 3en8 26mn8 4en8 25mn8 4pn7 24mn8 36mn8 23mn8 2m5en7 35mn8 22mn8 2m4en7 246mn7 34mn8 3m5en7 236mn7 2m3en7 45mn8 4ipn7 226mn7 4m3en7 235mn7 3m4en7 33mn8 245mn7 225mn7 25m3en6 44mn8 34en6 224mn7 234mn7 255mn7 2m3ipn6 345mn7 22m4en6 3m3en7 23m4en6 24m3en6 244mn7

174.1 166.8 167.8 165.7 160 165.1 168 158.5 164 157 161.8 153 159.5 164 159.7 159 155 160 145 166 160 155.7 166 162.4 159.5 148.2 167 159.7 167 161.2 157 147 157 160 162 147.7 163 152.8 163 164 147 163.8 164 164 152

2 2 2 2 2 2 1 2 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

3250 3948 4394 4597 4658 4665 5026 5118 5285 5393 5475 5557 5649 5808 5835 6010 6195 6252 6465 6496 6514 6621 6671 6693 6924 6937 7286 7326 7343 7387 7492 7569 7729 7832 7933 8073 8150 8301 8533 8650 8667 8794 8821 8967 8973

223mn7 335mn7 4m4en7 2245mn6 2255mn6 2345mn6 233mn7 24m4en6 22m3en6 2235mn6 33en6 24m3ipn5 334mn7 344mn7 2335mn6 33m4en6 23m3en6 2244mn6 2234mn6 34m3en6 224m3en5 2344mn6 2m33en5 2334mn6 23m3ipn5 2233mn6 22344mn5 3344mn6 223m3en5 22334mn5 1ihx1mc3 1ip2ibc3 11m2pec3 12p1mc3 11m2tpec3 112m3tbc3 12ipc4 12m34ec4 13e22mc4 112m3ipc4 1pec5 1ipec5 1m3bc5 1spec5 1m2bc5 1m3ibc5 3pec5 1nepec5 6mibc5 1e2ipc5 13e2mc5 1tpec5 12m3ipc5 11m3ipc5 123m4ec5

158 155.7 166 147.9 137 158 160 158 159 148.7 166.3 157 164 164 153 165 169 153 155 170 155.3 162 174 164 169.4 160 159.3 170.5 168 166 155.1 148.3 157.8 153.5 146.7 146 158 155.5 154 145.5 180 172 170.7 176.5 172 153 175.7 173.9 173.5 160 172 173.9 160.5 148.5 172

1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 2 1 0 1 1 0 0 0 0 0 1 0 1 0 0

9133 9140 9149 9509 9624 9713 9869 10014 10215 10249 10263 10305 10544 10712 11117 11351 11412 11524 11603 11890 12053 12305 12678 13053 13889 14580 14829 15816 15958 17193 14800 17674 18862 22225 29301 35781 17040 21097 23393 23812 8030 8871 10232 10667 10892 11303 11390 11883 12294 13903 14745 15167 15928 16422 17113

124m3ec5 11m2ipc5 12345mc5 11334mc5 1bc6 1ibc6 m4pc6 m3pc6 1sbc6 14ec6 13ec6 m2pc6 1m4ipc6 12ec6 1m3ipc6 1e35mc6 1m2ipc6 1m1pc6 12m3ec6 1tbc6 11ec6 14m1ec6 1245mc6 13m1ec6 1235mc6 1m1ipc6 1234mc6 1135mc6 1134mc6 1224mc6 1144mc6 1133mc6 1123mc6 1122mc6 1pc7 135mc7 125mc7 124mc7 123mc7 113mc7 112mc7 1ec8 14mc8 13mc8 12mc8 11mc8 1mc9 c10 bc440d bcpe bc530d 3mbc331n 6mbc322n 2mbc331n 3mbc430n

171 163 164 141.5 180.9 171.3 173.4 169 179.3 175.5 172 174.5 170 176 167 168.5 171 174.3 175 171.5 179.5 168 167 166.6 166.5 177.5 172.5 153 160.3 158 153 155 167 161.5 182.8 164.2 173 170.7 177.5 161 174 191.4 183.5 181.5 185.5 170.5 193.6 202 191.5 190 193 182 190 187 178

0 0 0 1 1 2 1 1 1 1 1 1 2 1 1 2 1 2 0 2 2 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 2 1 1 0 0 0

17268 18379 20290 22887 7954 8978 9725 9882 10240 10240 10458 10611 11322 11338 11553 11900 12417 12554 13122 13333 13826 14126 14344 14405 14471 15021 15031 15328 15644 16562 16712 17188 17256 20476 7793 10569 10909 11080 11791 12496 13803 7431 8323 8492 8997 10175 6637 5110 14943 14943 15024 16487 17001 17126 17607

8mbc430n 9mbc331n s45d 2mbc430n s36d 2ebc222o 7mbc430n 1mbc331n 2ebc330o 23mbc321o 26mbc222o 37mbc330o 26mbc321o ebc222o 23mbc222o 1mbc430n 26mbc330o 24mbc330o 22mbc321o 1ms44n 22mbc222o 14mbc321o 13mbc330o 15mbc321o 225mbc221h 1pbc410h 223mbc221h 123mbc221h 277mbc221h 133mbc221h 266mbc311h 147mbc221h 377mbc410h 177mbc221h 11ms25o 1ip4mbc310hx 1p5mbc310hx 15m3ebc310hx 1ip5mbc310hx 3cpbc410h tc422025d tc521026d 6mtc3220n 12cpr1mc3 bc310hxsc5 ds2024d 38mtc321024o 334mtc2210h 133mtc2210h 177mtc2210h scptc3210o tec52100d pec530000d

173.5 189.9 185 182 183 183 182 178 174 174.5 172.5 166 164.5 179.5 171.5 175.8 160.5 165 172.5 186.2 174.5 164 160.5 159.5 162 172.5 165.5 159.5 163 150 168 153 168.5 160 152 157.5 158.5 159.5 152 175.5 219 188 189.5 158.3 192.7 160 152.5 151 143.5 153 174 155 171

0 0 2 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 2 1 0 1 2 0 2 0 0 0 0 0 1 0 0 0 0 0 1 2 2 0 0 0

18097 18197 18493 18663 18761 18792 18850 19598 20560 21174 21356 21403 21452 21791 21971 22230 23101 23393 23524 23611 23633 23947 26214 27046 29493 29677 31361 31780 32172 32785 32851 36023 36067 37026 37138 39776 42873 44140 47509 29041 30968 31680 38037 45624 47196 48556 50296 70170 73139 75358 62817 76738 95615

a

Numbers in italics are duplicate twc values.

data file with their TIs. The results of the sensitivity search are given in Table 4, where the distinction between upper and lower lines is as before (compounds with known bps and all possible structures, respectively). For the TIs W, Z, χ, ω, ω′, (Wω), and twc the number of distinct values and the sensitivity (number of distinct values divided by number of distinct structures) are shown. As seen from Table 4, lower lines, W, Z, χ, ω, and ω′ are not very discriminant TIs, (Wω) is intermediate, while twc despite its simplicity is very sensitive to structure variations in both samples. In fact there were found only 14 pairs of monocyclodecanes with identical twc in sample 1, and four pairs of cyclic hexanes with identical twc in sample 2 (to be compared with 15 dublets and 1 triplet found earlier for a complexity measure in sample 220). For comparison, Diudea had constructed a putatively comprehensive sample of 439 monocyclodecanes and found

for W a sensitivity of 0.116, for Szeged and Cluj indices sensitivities between 0.223 and 0.483, and for the more complex corresponding Harary indices values between 0.355 and 1.000 in that sample.13a Interestingly, in the subsamples of compounds with known bps (Table 4, upper lines), all sensitivity values are considerably higher than in the complete samples (lower lines). Obviously neither the known monocyclodecanes nor the known cyclic n ) 4-6 saturated hydrocarbons are representative for the respective complete populations, no surprise in the light of Tables 2 and 3. The ranking of TIs by sensitivity agrees for the two full samples; in the subsamples slight differences are seen, but twc ranks first in all cases. As a further comparison consider the known sensitivity values in the sample of all 75 acyclic decanes: W 0.53, Z 0.53, χ 0.65, Balaban’s J 0.99, twc 1.00.17,21 Obviously

TIS,

BPS, AND

CYCLOALKANES

J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 795

Table 2. Number of Saturated Hydrocarbon Structures, Ordered by Increasing Size n and Number of Rings nra nr n

0

1

1 1 1 1 1 1 2 2 3 3 5 5 9 9 18 18 35 35 75 75

1 1 2 2 5 5 12 12 24 29 34 73 58 185 73 475

1 1 4 5 7 17 15 56 18 182 36 573 46 1792

1 4 18 5 79 8 326 13 1278 11 4875

2 14 2 79 1 430 2 2161 2 10162

1 8 59 427 2768 1 16461

1 3 31 298 2616 20346

1 9 134 1714 18436

2 35 707 11477

6 154 4399

150 150

209 782

127 2626

37 6581

7 12848

1 19724

23295

20294

12221

4559

2 3 4 5 6 7 8 9 10 ∑ a

1

2

3

4

5

6

7

3

1 1 1 1 3 3 6 6 14 15 12 33 10 83 6 196

1 1 1 1 4 4 4 8 6 24 1 55 4 147

1 1 1 1 4 4 9 9 28 28 19 71

1 1 1 1 5 5 12 12 30 40

1 1 1 1 5 5 7 13

1 1 1 1 5 6

53 338

21 240

62 114

49 59

14 20

7 8

7 8 9 10 ∑ a

4

5

6

W

3

6

11



16 845

59

1 1 1 1 2 2 5 6 12 21 24 78 55 353 79 1929 144 12207 208 89402

861

59

531 104000

10

Table 4. Test Results for the Structure Sensitivity of TIs within Two Comprehensive Samples of Cycloalkanesa

sr n

5

9

Upper rows, structures with known bps; lower rows, all mathematically possible structures.

Table 3. Number of Monocyclic Alkane Structures, Ordered by Increasing Size n and Increasing Ring Size sra

4

8

7

8

10



1 1 1 1

1 1

1 1 2 2 5 5 12 12 24 29 34 73 58 185 73 475

2 2

1 1

209 782

9

The meaning of upper and lower rows is as in Table 2.

complete discrimination between isomers is far more difficult to achieve in cyclic than in acyclic alkanes. Some Structure-Bp Correlations. For each sample of compounds of constant size n considered, the following descriptors were included in a pool from which to select descriptors for multilinear regressions: W, Z, log(Z) (abbreviated lZ), χ, ω, mwc2-mwc6, twc, log(twc) (abbreviated ltwc), p3-p5, ap2-ap5, dia, Wω, (Wω)0.125. For samples of variable n additionally n, n2, and n0.5, for monocyclics sr, and for samples of variable number of rings nr were included. For acyclic structures ω is identical to W; (Wω) therefore reduces to W2, and (Wω)0.125 reduces to W0.25, a quantity

Z

χ

ω

distinct values sensitivity

30 50 0.411 0.105

45 62 0.616 0.131

Sample 1 43 51 136 102 0.589 0.699 0.286 0.215

distinct values sensitivity

20 27 0.488 0.257

18 37 0.439 0.352

Sample 2 34 28 79 49 0.829 0.683 0.752 0.467

a

ω′



twc

52 132 0.712 0.278

71 321 0.973 0.676

72 461 0.986 0.971

36 62 0.878 0.590

39 83 0.951 0.790

40 101 0.976 0.932

The meaning of upper and lower rows is as in Table 2.

found earlier to be of some value.22 This selection of TIs was guided by the criteria proven usefulness, availability, and simplicity as an indication of fundamentality. Using the SAS procedures rsquare and reg we obtained the few best one-variable, two-variable, etc., (multi)linear regressions for each sample. Comprehensive samples considered were the acyclic, monocyclic, and bicyclic heptanes, octanes, nonanes, decanes, combinations n ) 7-10, and combinations n ) 2-10 (all such compounds, those with bp values of rel > 0 only, and those with values of rel ) 2 only) and the heptanes, octanes, nonanes, decanes, and combinations containing any number of rings, including zero. Structure-bp correlations for such a diverse set of saturated hydrocarbons were never obtained before. Some of the best correlations are given in Table 5 in the format descriptors, r2 (squared coefficient of correlation), s (standard deviation). From Table 5 the following conclusions can be drawn: (1) In the samples of high-quality bps, the s values for an increasing number of descriptors level off to a lower plateau than in the corresponding samples containing low-quality or

RU¨ CKER

796 J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999

AND

RU¨ CKER

Table 5. Best Multilinear Structure-Bp Correlations Including 1, 2, 3, etc., Descriptors, in the Format Descriptor(s), r2, s A. Acyclics rel > 0 only

all

rel ) 2 only

Octanes n ) 8 N ) 18 Z W p3 1twc W0.25 p3

0.7868 0.9785 0.9842

2.91 0.96 0.85

Octanes n ) 8 N ) 18 (see left)

Octanes n ) 8 N ) 18 (see left)

Nonanes n ) 9 N ) 35 Z mwc4 mwc6 Z p4 p5 mwc5 twc p3 W0.25 W2 mwc3 mwc4 mwc6 twc

0.5849 0.9414 0.9587 0.9727 0.9810

3.90 1.49 1.27 1.05 0.89

Nonanes n ) 9 N ) 35 (see left)

Nonanes n ) 9 N ) 15 Z mwc4 mwc6 W mwc4 twc

0.9442 0.9734 0.9908

1.98 1.42 0.88

Decanes n ) 10 N ) 75 Z W p3 W2 mwc2 mwc5 mwc3 mwc6 p3 W0.25

0.3360 0.8588 0.8881 0.9047

5.75 2.67 2.39 2.22

Decanes n ) 10 N ) 73 1Z W p3 W2 mwc2 mwc5 W mwc2 mwc3 mwc6

0.3855 0.8583 0.8879 0.9046

5.57 2.70 2.40 2.24

Decanes n ) 10 N ) 10 p2 p3 W0.25

0.8677 0.9621

2.01 1.15

Acyclic Alkanes n ) 7-10 N ) 137 n0.5 0.9189 1Z mwc5 0.9796 mwc2 mwc5 W0.25 0.9919 mwc4 mwc6 p3 W0.25 0.9929

6.57 3.31 2.10 1.97

Acyclic Alkanes n ) 7-10 N ) 135 χ 0.9236 1Z mwc5 0.9803 mwc2 mwc5 W0.25 0.9918 mwc4 mwc6 p3 W0.25 0.9930

6.40 3.26 2.11 1.96

Acyclic Alkanes n ) 7-10 χ 1Z mwc6 mwc2 mwc5 W0.25 mwc2 mwc5 W0.25 p5 mwc5 mwc6 W0.25 p3 n

N ) 52 0.9801 0.9900 0.9967 0.9978 0.9987

3.68 2.64 1.53 1.27 1.00

Acyclic Alkanes n ) 2-10 N ) 149 n0.5 0.9756 p3 W0.25 0.9891 0.25 2 p3 W n 0.9964 mwc5 p2 W0.25 n2 0.9976 mwc2 mwc3 mwc6 W0.25 n2 0.9980

6.67 4.47 2.56 2.12 1.92

Acyclic Alkanes n ) 2-10 N ) 147 n0.5 0.9756 p3 W0.25 0.9891 2 0.25 p3 n W 0.9964 mwc5 p2 n2 W0.25 0.9976 mwc2 mwc3 mwc6 n2 W0.25 0.9980

6.70 4.49 2.58 2.13 1.93

Acyclic Alkanes n ) 2-10 W0.25 p2 n0.5 W0.25 p3 n2 W0.25 p3 n2 dia W0.25 mwc4 mwc5 p3 n W mwc5 mwc6 W0.25 p3 n2

N ) 64 0.9849 0.9944 0.9986 0.9991 0.9994 0.9997

6.44 3.97 2.01 1.57 1.34 1.01

B. Monocyclics rel > 0 only

all Octanes n ) 8 N ) 34 Z p3 1twc 1twc p4 ap3 1twc p4 ap3 ap5

0.8188 0.9272 0.9461 0.9576

5.21 3.35 2.93 2.65

Nonanes n ) 9 N ) 58 Z Zω Z ap5 sr 1Z p4 ap5 sr

0.8149 0.9038 0.9367 0.9512

4.86 3.53 2.89 2.56

Decanes n ) 10 N ) 73 Z Z sr W Z p4 mwc5 mwc6 p4 ap3 mwc5 mwc6 1twc p4 ap3 mwc4 mwc5 1twc p4 ap2 ap3

0.6919 0.8401 0.8619 0.8876 0.9018 0.9113

6.56 4.76 4.45 4.05 3.81 3.65

Monocycloalkanes n ) 7-10 N ) 189 1Z 0.9616 1Z sr 0.9786 1Z dia sr 0.9802 1Z p4 dia sr 0.9834 Z 1Z p4 dia sr 0.9847 0.9864 1Z p4 ap5 (Wω) sr n2 W 1Z twc 1twc p4 ap5 n0.5 0.9873 Monocycloalkanes n ) 3-10 N ) 209 1Z 0.9796 1Z sr 0.9858 Z 1Z sr 0.9893 Z 1Z ap5 sr 0.9908 0.125 W 1Z p4 dia (Wω) 0.9921 1Z p4 ap5 (Wω) n sr 0.9930 W 1Z twc 1twc p4 ap5 n 0.9936

rel ) 2 only

Octanes n ) 8 N ) 24 Z p3 1twc mwc4 twc dia 1twc p4 ap3 ap5

0.8751 0.9455 0.9710 0.9821

4.47 3.02 2.26 1.82

Nonanes n ) 9 N ) 46 Z Z sr 1twc p5 ap3 Z p3 p5 ap3 Z p3 p5 ap3 ap4

0.8438 0.9142 0.9470 0.9600 0.9713

4.41 3.31 2.63 2.31 1.98

Decanes n ) 10 N ) 38 Z Z sr 1Z p4 dia W 1Z p4 ap5 χ p4 ap3 ap5 sr

0.8291 0.9219 0.9471 0.9695 0.9769

4.97 3.41 2.85 2.20 1.94

5.72 4.28 4.12 3.79 3.65 3.45 3.33

Monocycloalkanes n ) 7-10 N ) 123 1Z 0.9695 1Z sr 0.9841 χ p4 sr 0.9860 χ p4 dia sr 0.9886 W 1Z mwc2 mwc4 mwc6 0.9904 Z 1Z twc p4 ap5 sr 0.9924 W 1Z twc 1twc p4 ap5 n0.5 0.9941

4.90 3.56 3.35 3.04 2.80 2.50 2.22

5.80 4.84 4.22 3.91 3.64 3.43 3.29

Monocycloalkanes n ) 3-10 N ) 143 1Z 0.9859 Wχ 0.9904 Z 1Z sr 0.9935 Z 1Z p5 sr 0.9948 0.125 2 1Z p4 dia (Wω) n 0.9958 Z 1Z twc p4 ap5 sr 0.9969 0.9974 1Z twc p4 ap5 (Wω) (Wω)0.125 sr

5.16 4.28 3.53 3.15 2.85 2.44 2.26

Octanes n ) 8 N ) 13 Z 0.8852 χ dia 0.9656

3.76 2.16

Nonanes n ) 9 N ) 15 1Z 0.9017 1Z sr 0.9662

3.34 2.09

Decanes n ) 10 N ) 7 Z 0.6958

2.76

Monocycloalkanes n ) 7-10 N ) 42 1Z 0.9751 4.12 1Z sr 0.9871 3.01 1Z p5 sr 0.9916 2.45 1Z p4 dia sr 0.9939 2.12 1twc ap2 ap4 ap5 sr 0.9953 1.88 Z 1Z twc p4 ap5 sr 0.9968 1.57 Monocycloalkanes n ) 3-10 N ) 54 1Z 0.9902 4.46 W 1Z 0.9955 3.04 1Z (Wω) sr 0.9965 2.74 W χ ap4 sr 0.9975 2.31 1twc p4 ap2 ap5 sr 0.9982 1.99 Z 1Z twc p4 ap5 sr 0.9989 1.54

TIS,

BPS, AND

CYCLOALKANES

J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 797

Table 5. (Continued) C. Bicyclics rel > 0 only

all Octanes n ) 8 N ) 18 Z mwc2 1twc mwc3 mwc5 mwc6 Nonanes n ) 9 N ) 36 mwc2 mwc2 (Wω)0.125 ω mwc4 (Wω)0.125 χ mwc4 mwc6 twc ω ap2 ap5 dia (Wω)0.125

0.7925 0.9235 0.9557

6.20 3.89 3.06

0.8011 0.8259 0.8332 0.8540 0.8789

4.53 4.30 4.27 4.06 3.76

Decanes n ) 10 N ) 46 mwc2 Z 1twc mwc2 ap4 dia Z 1twc ap4 dia

0.7718 0.8060 0.8471 0.8582

5.78 5.39 4.85 4.72

Bicycloalkanes n ) 7-10 N ) 115 1Z 0.9517 1Z twc 0.9622 mwc4 mwc5 n0.5 0.9672 mwc4 mwc6 dia n0.5 0.9702 mwc4 mwc6 ap5 dia n0.5 0.9728 mwc6 ap2 ap4 ap5 dia n0.5 0.9753

6.34 5.63 5.27 5.05 4.85 4.64

Bicycloalkanes n ) 4-10 N ) 127 1Z 0.9741 1Z twc 0.9802 mwc4 mwc5 n0.5 0.9828 W mwc4 mwc6 n 0.9838 mwc4 mwc5 p5 ap5 n0.5 0.9852 0.5 mwc6 ap2 ap4 ap5 dia n 0.9871

6.33 5.55 5.20 5.07 4.85 4.56

rel ) 2 only

Octanes n ) 8 N ) 13 Z 1twc ap2

0.8370 0.9340

4.35 2.90

Octanes n ) 8 N ) 2

Nonanes n ) 9 N ) 14 mwc3 1twc p5

0.8433 0.8641

3.77 3.66

Decanes n ) 10 N ) 18 mwc2 mwc4 ap4 mwc3 ap3 ap4

0.8962 0.9355 0.9595

4.13 3.36 2.76

Bicycloalkanes n ) 7-10 N ) 53 1Z 0.9728 1Z ap4 0.9783 1Z 1twc (Wω)0.125 0.9811 1Z 1twc p5 (Wω)0.125 0.9834 mwc2 p2 ap3 ap4 (Wω)0.125 0.9847 ω 1twc p4 ap3 (Wω)0.125 n2 0.9857

4.63 4.18 3.94 3.73 3.62 3.54

Bicycloalkanes n ) 4-10 N ) 61 1Z 0.9851 1Z twc 0.9884 1Z (Wω)0.125 n2 0.9903 ap4 1Z (Wω)0.125 n2 0.9912

4.70 4.18 3.85 3.70

Nonanes n ) 9 N ) 1

Decanes n ) 10 N ) 5

Bicycloalkanes n ) 7-10 N ) 12 1Z 0.9685 5.97 1Z mwc6 0.9876 3.95

Bicycloalkanes n ) 4-10 N ) 15 1Z 0.9842 5.92 1Z n2 0.9941 3.76 mwc3 mwc4 (Wω)0.125 0.9966 2.97

D. Saturated Hydrocarbons (Acyclic-Polycyclic) rel > 0 only

all Octanes n ) 8 N ) 79 χ Z p4 Z ω p4 Z ω p4 nr Z ω mwc6 twc ap4

0.6586 0.7655 0.8329 0.8601 0.8829

7.27 6.06 5.15 4.75 4.37

Octanes n ) 8 N ) 59 χ Z p4 Z ω p4 Z ω p4 nr Z ω 1twc p4 ap3

rel ) 2 only 0.7363 0.8584 0.8941 0.9133 0.9330

5.96 4.41 3.84 3.51 3.11

Nonanes n ) 9 N ) 144 χ 0.6725 6.45 Z twc 0.7891 5.19 Z ω ap4 0.8280 4.71 1Z twc (Wω) (Wω)0.125 0.8592 4.27 1Z mwc3 mwc4 (Wω) (Wω)0.125 0.8742 4.05 Decanes n ) 10 N ) 208 χ 0.5090 8.50 Z twc 0.6622 7.07 W Z ap4 0.7440 6.17 Z p3 p4 ap3 0.7933 5.56 Z p3 p4 ap3 dia 0.8144 5.28 Saturated Hydrocarbons n ) 7-10 N ) 486 χ 0.9245 7.50 1Z nr 0.9320 7.12 χ mwc6 dia 0.9440 6.47 χ ω ap4 (Wω) 0.9496 6.14 χ ω ap4 dia (Wω) 0.9551 5.80 1Z ω 1twc ap4 (Wω) n2 0.9573 5.67

Nonanes n ) 9 N ) 98 χ 0.7191 5.72 Z ap4 0.8461 4.26 W Z ap4 0.8911 3.60 W Z 1twc ap4 0.9157 3.18 W Z 1twc ap3 ap4 0.9236 3.05 Decanes n ) 10 N ) 133 χ 0.6119 7.00 Z p5 0.8200 4.78 W Z p4 0.8698 4.08 W Z p4 ap3 0.8841 3.87 W 1Z p4 (Wω) (Wω)0.125 0.9084 3.45 Saturated Hydrocarbons n ) 7-10 N ) 326 χ 0.9406 6.31 χ dia 0.9496 5.82 1Z p4 n0.5 0.9571 5.38 W 1Z p4 n 0.9650 4.86 W 1Z 1twc ap4 n 0.9696 4.54 W 1Z mwc3 1twc ap4 n 0.9723 4.34

Saturated Hydrocarbons n ) 2-10 N ) 530 χ 0.9601 7.97 Wχ 0.9645 7.53 W χ ap4 0.9726 6.63 W 1Z mwc4 n0.5 0.9761 6.19 W 1Z 1twc ap4 n 0.9794 5.75 W 1Z 1twc ap4 dia n0.5 0.9803 5.64

Saturated Hydrocarbons n ) 2-10 N ) 366 χ 0.9692 7.31 Wχ 0.9765 6.39 0.5 1Z p4 n 0.9820 5.61 W 1Z p4 n0.5 0.9860 4.95 W 1Z 1twc ap4 n 0.9882 4.56 W 1Z 1twc ap4 n2 n0.5 0.9886 4.47

unchecked bps. This justifies the quality classification of bps introduced in this work. This moreover is an independent

Octanes n ) 8 N ) 34 1Z Z p4 Z p4 dia W Z 1twc p4

0.7689 0.9308 0.9547 0.9711

5.10 2.83 2.33 1.89

Nonanes n ) 9 N ) 31 Z Z twc W Z mwc5 Z p3 p5 ap3

0.8540 0.9310 0.9519 0.9614

4.36 3.05 2.59 2.37

Decanes n ) 10 N ) 24 χ 0.7284 1Z twc 0.9074 1Z ω ap4 0.9464

5.87 3.51 2.74

Saturated Hydrocarbons n ) 7-10 N ) 109 χ 0.9570 5.57 1Z p5 0.9668 4.92 χ twc dia 0.9773 4.09 W χ ω p4 0.9829 3.57 χ ω p4 dia (Wω) 0.9859 3.25 Saturated Hydrocarbons n ) 2-10 N ) 136 (Wω)0.125 0.9787 7.05 Wχ 0.9840 6.14 W χ ap4 0.9909 4.66 W 1Z mwc4 n0.5 0.9942 3.72 W 1Z mwc4 ap4 n0.5 0.9949 3.52 χ ω ap4 dia n2 n0.5 0.9956 3.28

justification for the use of TIs in QSPR studies: If the highquality bps were less well-described by TIs than all bps, then

RU¨ CKER

798 J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999

AND

RU¨ CKER

Table 6. Comparison of Structure-Bp Relationships for (Poly)Cyclic Alkanes from the Recent Literature (left, mostly incomplete samples), from This Work for the Same Samples (middle), and for the Corresponding Complete Samples, as Far as rel ) 2 (right) no. of descr

s

no. of descr

s

no. of descr

2.84 2.48

1 2

Monocyclic Alkanes n ) 3-6 rel ) 2 N ) 12 1Z p2 (Wω)0.125

descriptors

Monocyclic Alkanes n ) 3-6 N ) 20 1 1Z 2 1Z W

a

descriptors

s

1 2

3.69b 2.87b

1

Some Monocyclic Alkanes n ) 4-10 N ) 45 5.93c 1 1Z

5.13

1

Monocyclic Alkanes n ) 4-10 rel ) 2 N ) 54 1Z 4.46

1 2

Some Monocyclic Alkanes n ) 7-10 N ) 30 7.10d 1 1Z 5.30d 2 1Z sr

5.40 2.88

1 2

Monocyclic Alkanes n ) 7-10 rel ) 2 N ) 42 1Z 4.12 1Z sr 3.01

3 4

Acyclic and Some Monocyclic Octanes N ) 32 2 Z p4 2.98e 3 Z p4 dia 2.54e 4 Z p4 ap5 sr

2.29 1.70 1.26

Acyclic and Monocyclic Octanes rel ) 2 N ) 31 2 Z p4 2.86 3 Z p4 dia 2.26 4 Z p4 ω ap5 1.78

2 2

Acyclic and Some Monocyclic Octanes N ) 29 (4.78)f,g 2 Z p5 3.00f 2 Z p4

2.92 2.38

7 8

Some Acyclic and Monocyclic Alkanes n ) 4-9 N ) 69 3.43h 3 1Z n0.5 p5 3.60h 7 W mwc4 mwc6 ap4 ap5 n sr

3.16 1.70

6

Some Mono- and Bicyclic Alkanes n ) 3-9 N ) 80 2 Wχ 4.80i 6 n χ p4 ap3 dia (Wω)0.125

4.23 2.34

7 9 12

Some Acyclic-Bicyclic Alkanes n ) 4-10 N ) 130 5.04j 3 1Z ap5 (Wω)0.125 3.65j 5 mwc5 p4 ap3 ap5 (Wω)0.125 3.03j 7 W 1Z mwc4 mwc6 ap2 ap5 n2

4.89 3.63 2.85

2.71 2.05

Acyclic and Monocyclic Octanes rel ) 2 N ) 31 (see above) Acyclic and Monocyclic Alkanes n ) 4-9 rel ) 2 N ) 99 6 mwc4 mwc6 1twc ap4 ap5 sr 2.46 Mono- and Bicyclic Alkanes n ) 3-9 rel ) 2 N ) 57 1 χ 5.00 4 1Z mwc2 dia (Wω)0.125 2.54 Acyclic-Bicyclic Alkanes n ) 4-10 rel ) 2 N ) 131 7 W mwc3 mwc6 p4 ap3 n nr 3.08

a Complete sample. b Reference 11. c Reference 13b. d Reference 13a. e Reference 13c. f Reference 3c. g s value calculated in this work. h Reference 6. i Reference 14. j Reference 7.

the better correlations would be suspect to be due to mere coincidence. (2) In going from the octanes to the decanes, the correlations with a given number of descriptors become less and less satisfactory. (3) In many samples the best single descriptor for bps is Z or log(Z). (4) The detour index ω or its variants (Wω) or (Wω)0.125, used earlier for samples of cyclic compounds, is apparently very useful. (5) The molecular walk counts of length k (mwck), their half-sum (twc), and the logarithm thereof are useful descriptors, taking part in most of the more precise models. (6) There is no descriptor or combination of few descriptors found which would describe the bps in all or most of the various samples with reasonable precision (say, s < 1 °C). On the contrary, with the possible exception of the acyclic alkanes, we are far from that goal, even in the samples with high-quality bp values. For this latter disappointing result at least two possible reasons are obvious: First, of course, there is no warranty that the good descriptor (combination) which may exist for bps is included in our limited descriptor pool. Second, there is no reason why the actual dependence of bps on any descriptor should be linear. In Table 6 we compare our structure-bp correlations for cycloalkanes with results from the recent literature. In each case we first use the compound sample from the respective reference; then we consider the corresponding comprehensive sample with highly reliable bp values (rel ) 2), which is assumed to be more significant than the corresponding complete sample including low-quality bps.

Acyclic alkanes are not the focus of the present study. Nevertheless, having compiled the data and having obtained the best correlations within our pool of descriptors for acyclics as well, we compare in Table 7 our models to the best ones (as far as we are aware) from the recent literature. The data may be useful to other workers in the field. It is seen from Tables 6 and 7 that for most samples considered, our pool of diverse TIs allowed to construct better regression models for the bp than were available hitherto. Of course, in much of the work referenced in the tables, the primary objective was to test the members of a family of descriptors, rather than to obtain the very best model. In contrast, the better performance of our models built from diverse descriptors emphasizes the usefulness of nonrelated descriptors, a few of which may span the relevant topological space more efficiently than highly intercorrelated descriptors from one family. Deceptively Impressive Regressions Obtained from Incomplete or Biased Data. The best two-variable regression we were able to find for monocyclic octanes includes as independent variables p3 and ltwc (see Table 5B), for all such compounds with known bps:

bp ) (4.78 ( 0.55)p3 - (159.47 ( 10.48)ltwc + (586.70 ( 28.72) 2 (1) N ) 34, r ) 0.9272, s ) 3.35, F ) 197.5 and for the compounds with reproduced bps (rel > 0):

TIS,

BPS, AND

CYCLOALKANES

bp ) (5.55 ( 0.71)p3 - (165.95 ( 11.63)ltwc + (599.85 ( 30.86) 2 (2) N ) 24, r ) 0.9455, s ) 3.02, F ) 182 The experimental bps used in these regressions are shown once more in column 2 of Table 8. We now want as a caveat to demonstrate the effect of selective perception of data. First trap: noninclusion of some structures. If 1-ethyl-2propylcyclopropane (1e2pc3), 1-butyl-2-methylcyclopropane (b2mc3), 1,1-dimethyl-2-isopropylcyclopropane (11m2ipc3), and 1,1-dimethyl-2-propylcyclopropane (11m2pc3) are simply excluded from the regression, most readers will not miss them. In fact these compounds have never before been included in structure-bp correlations (though all their bp data known today are available since 1973), and there are even reasons for excluding them, in that these are “exotic” or “unimportant” substances whose bps were measured only once or twice and as such are not known with some certainty. The result of this exclusion is the following correlation:

bp ) (4.84 ( 0.45)p3 - (159.10 ( 8.84)ltwc + (584.92 ( 24.36) 2 (3) N ) 30, r ) 0.9499, s ) 2.71, F ) 256 which obviously is somewhat better than eqs 1 and 2. Note that the four structures excluded were not the worst cases as measured by the residuals from regressions 1 and 2, columns 6 and 7 in Table 8. Second trap: selective perception of bp records. Table 8 also contains the bp range and the number of bp records in Beilstein for the given compounds (columns 4 and 5). We next changed all bp values, shifting them into the direction of better fit, as suggested by the residuals, but not beyond the range of the observations documented in Beilstein. The new fictitious bps (columns 8 and 9 in Table 8) therefore look innocent: they are manipulated but not out of the range of true experimental observations. Now using these fictitious “experimental” bp values, we obtained the following regressions for all monocyclic octanes with known bps:

J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 799 Table 7. Comparison of Structure-Bp Relationships for Samples of Acyclic Alkanes from the Literature (left) and from the Present Work (right)z no. of descr 2 2

1.11b-d 0.866d

1 2 3 5 5 6

3.56f,g 2.70h 1.60i 1.69j 0.98k 1.687j

6.59V 5.25V

CONCLUDING REMARK

It is obvious that the problems discussed here in the context of multilinear regression of bp data of cycloalkanes using TIs pertain to other molecular properties, to other sets of

4.96 2.75 1.75 1.40 1.19

Acyclic Alkanes n ) 2-9 N ) 74e,p 1 χ0.25 2 mwc5 χ0.25 3 p3 W0.25 n2 4 1twc p3 W0.25 n 5 mwc2 mwc3 mwc6 W0.25 n2

4.94 2.74 1.73 1.43 1.19

Acyclic Alkanes n ) 6-10 N ) 109u 3 mwc2 mwc5 W0.25 4 mwc4 mwc6 p3 W0.25

1.98 1.90

Acyclic Alkanes n ) 6-10 N ) 142 3 mwc2 mwc5 W0.25 4 mwc2 mwc3 mwc6 W0.25

2.15 2.02

Acyclic Alkanes n ) 6-10 rel ) 2 N ) 57 3 mwc2 mwc5 W0.25 4 mwc2 mwc5 p5 W0.25 5 mwc5 mwc6 p3 W0.25 n

and for the monocyclic octanes with reproduced bps (rel > 0):

These latter regressions (to be compared to eqs 1 and 2) seem impressive and even seem to allow predictions, yet in fact they are the result of severe data manipulation. It is needless to detail what may happen if both erroneous procedures are unintentionally combined.

4.13

Acyclic Alkanes n ) 3-9 N ) 73 1 χ0.25 2 mwc5 χ0.25 3 p3 W0.25 n2 4 1twc p3 W0.25 n 5 mwc2 mwc3 mwc6 W0.25 n2

5.79n

3 6

3.42 1.49 1.27 1.05 0.89

Acyclic Alkanes n ) 2-8 N ) 39 1 χ0.33

1

4.74q 3.83r 3.52q 1.84k 2.21s 1.38k 3.39t 1.64t

Acyclic Nonanes N ) 35e 1 XMT4R 2 mwc4 mwc6 3 Z p4 p5 4 mwc5 twc p3 W0.25 5 W2 mwc3 mwc4 mwc6 twc

2.21 1.38 0.79 0.72

4.92m 3.75m 2.37m 1.75m 0.91m 0.64m

2 2 3 3 5 5 9 12

0.96 0.85

Acyclic Alkanes n ) 3-8 N ) 38 2 χ0.33 mwc5 3 p3 W0.25 n 4 p3 W0.25 dia n2 5 p3 W0.25 W2 dia n2

2 3 4 5 4 5

5.97o 3.63o 2.17o 1.41o

2 10

s

2.26 0.87

2.37l

1 2 3 4

descriptors

Acyclic Octanes N ) 18a 2 W p3 3 W0.25 p3 1twc

Acyclic Alkanes n ) 6-8 N ) 32 2 1Z mwc5 3 W0.25 p3 n

3

bp ) (4.79 ( 0.28)p3 - (158.09 ( 5.34)ltwc + (582.09 ( 14.63) 2 (4) N ) 34, r ) 0.9794, s ) 1.71, F ) 737

bp ) (5.73 ( 0.20)p3 - (167.38 ( 3.34)ltwc + (602.56 ( 8.86) 2 (5) N ) 24, r ) 0.9952, s ) 0.87, F ) 2177

no. of descr

s

9.46w 3.34x,y

Acyclic Alkanes n ) 2 N ) 149 2 χ0.25 mwc5 5 mwc2 mwc3 mwc6 W0.25 n2

1.57 1.31 1.06 3.55 1.92

a See also ref 23. b Descriptors W and p3, bp values different from ours were used. c Reference 24. d Reference 25. e For this sample XMT4R was included as an additional descriptor; see ref 26. f Descriptor XMT4R, bp values different from ours were used. g Reference 26. h Reference 27. i Reference 18b. j Reference 28. k Reference 29. l Reference 23b. m Reference 30. n Reference 31. o Reference 32. p See also refs 26 and 33. q Reference 34. r Reference 35. s Reference 36a. t Reference 36b. u Incomplete sample. V Reference 37. w Reference 38. x Two outliers excluded. y Reference 39. z In the calculations given here χ0.25, χ0.33, and χ0.5 were included as additional descriptors; see ref 22.

compounds, to other types of descriptors, and to other statistical methods as well. We stress the importance of obtaining a comprehensive unbiased data set before under-

RU¨ CKER

800 J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 Table 8. Bp Data for Monocyclic Octanes in Beilstein name c8 1mc7 1ec6 1pc5 1pec3 14mc6 13mc6 1ibc4 12mc6 1ipc5 1e3mc5 1e2mc5 p3mc4 11mc6 1sbc4 1spec3 b2mc3 12ec4 124mc5 1e1mc5 123mc5 1nepec3 5msbc3 1e2pc3 ib2mc3 113mc5 112mc5 1234mc4 11m2pc3 1133mc4 1m12ec3 11m2ipc3 112m2ec3 11223mc3

bp (°C) rel 149 134 131.8 131 128 121.8 122.3 120.1 126.6 126.4 121 124.7 117.4 119.5 123 117.7 124 119 115 121.5 117 106 115.5 108 110 104.9 114 114.5 105.9 86 108.9 94.4 104.5 100.5

2 2 2 2 2 2 2 0 1 2 2 1 1 2 0 0 1 1 1 2 1 0 0 0 0 2 2 0 1 1 1 0 1 0

residual

fictitious bp used

bp range

no. of bp rec

eq 1

eq 2

eq 4

eq 5

120.3-156 113-136 129-136 127-131.5 124.1-129 119-124.9 105-125 119.9-120.4 122-130 118-129.3 117-121.4 121.2-128.4 116.1-118.7 115-125 123 117.74 121.1-128 115.5-123 109-120 98-121.5 109-123 106-106.2 115.5 99-117.5 109.5-110.5 104-116 113-114.5 106.5-122.5 105.5-106.3 84-87.6 108.3-109.9 93.9-94.9 103.5-104.5 100-101

23 10 28 12 11 29 40 1 26 9 8 13 2 19 1 1 2 2 19 5 24 1 1 2 1 10 9 2 2 3 2 1 2 1

3.58 -0.26 -0.93 0.61 -0.32 -2.95 -1.97 -1.47 0.72 2.33 -1.66 0.54 -1.64 5.23 -0.52 -3.05 6.26 -2.79 -1.50 6.28 -2.03 1.83 -1.20 -8.57 -1.35 -1.01 3.29 0.56 3.48 -3.14 2.50 4.82 5.63 -1.65

3.75 -0.21 -1.39 0.23 -0.61 -3.09 -2.09 -0.05 1.64 -2.29 -0.74 -2.13 5.52 5.25 -4.55 -2.46 5.37 -3.67 -0.96 1.98 1.94 -2.99 -0.35 1.91 -

145.4 134.3 132.7 130.4 128.3 124.8 124.3 120.4 125.9 124.1 121.4 124.2 118.7 115 123 117.7 121.1 121.8 116.5 115.2 119 106 115.5 116.6 110.5 105.9 113 113.9 105.5 87.6 108.3 94.9 103.5 101

145.2 134.2 133.2 130.8 128.6 124.9 124.4 126.7 124.8 121.4 125.4 118.7 115 121.1 123 117.5 116.1 120.7 105.9 113 105.5 87.6 109.3 103.5 -

taking a structure-property relationship study and of not restricting one’s attention to descriptors from one class of related TIs. EXPERIMENTAL SECTION

All saturated hydrocarbons up to n ) 10 with known bps were identified in the Beilstein database Crossfire Plus Reactions, as of the third update for 1998 (BS9803PR). Whenever possible, data were checked in the primary literature. Unfortunately, in many cases this was not possible for lack of access to older, in particular to Russian, journals. Only those structures were included in the data file for which at least one bp record at a pressure g 720 Torr was present. Bp values measured between 720 and 800 Torr were adjusted to 760 Torr using an increment of 0.05 °C/Torr. Multiple entries of bps in Beilstein, resulting from multiple measurements at the same or different pressures or from stereoisomers or isotopomers, were averaged. All bp and s values in this paper are given in °C. For a few compounds bp values differing by an unreasonable margin or bps unreasonable on comparison with those of structurally related compounds were reported; such structures were excluded from the data file. For example, for 1,1,4-trimethylcycloheptane bp values as discrepant as 152, 158, and 164 °C are reported. For 1,4-dimethylbicyclo[2.2.1]heptane a bp of 124.5 °C is given, far too low compared to the reliable value of 117 °C for 1-methylbicyclo[2.2.1]heptane. The bp given for 1-tert-butylbicyclo[1.1.1]-

AND

RU¨ CKER

pentane, 90 °C 760 Torr, is far too low compared with the 100 °C 70 Torr for 1-tert-butyl-3-methylbicyclo[1.1.1]pentane and in fact is far lower than the bp of any other nonane. For similar reasons, the bps recorded in Beilstein for 1-methyl-2-isopentylcyclopropane (114 °C), 1-ethyl-2propylcyclopentane (72.9 and 77.4 °C), 1,3,3,5-tetramethylbicyclo[3.1.0]hexane (137.2 °C), and 3-propyltricyclo[2.2.1.02,6]heptane (132 °C) are regarded as unreasonable. While this study was underway, a rather old book was brought to our attention (thanks go to Dr. E. Estrada) in which a few other saturated hydrocarbons with bps are collected, which for reasons unknown to us do not appear in Beilstein.40 These were also included in the data file but regarded as “not reproduced”, rel ) 0 (unless reported by Rossini), since the values could not be checked in the primary literature. The bp value of scptc3210o was found in a paper referenced by Beilstein in another context.41 There are probably some more saturated hydrocarbons of n e 10 with known bps buried in the literature, but we do not see a systematic way to uncover them. Hosoya index (Z) values in this work were obtained in a novel way by our unpublished program ZPATH. Program ZPATH reduces the problem of generating all possible combinations of nonadjacent edges from the graph to a pathtracing problem. A selection of nonadjacent edges is a string of labels (in increasing order) of nonadjacent edges, which can be found by means of the edge-adjacency matrix in the same way as a string of labels of adjacent vertices (a path) is found using the usual vertex-adjacency matrix. The resulting p(G,k) values and their sums, the Z values, were checked against and found identical with those published for acyclic alkanes (up to n ) 8),42a mono- and bicyclic alkanes (up to n ) 8),42b and (perhydro)fullerenes (up to C42).43 Program ZPATH is thus a means to calculate the matching polynomial of a graph in a very elementary manner. The Wiener index, detour index, and diameter values were obtained using our program DETOUR.10 The walk counts mwck and the total walk count twc were obtained using our program MORGAN.17 The path counts pk (including cyclic paths (circles)) and the acyclic path counts apk (without circles) were obtained by our unpublished program PATHS; the apk were checked against and found identical to published values.6,44 In the comparisons with published structure-bp relationships we used the experimental bp values as contained in our data file, that is in Table 1, not the values used in the original work, since we tried to obtain comparable s values and since we think our carefully checked bp values, as a rule, are more correct. In several cases we employed, as a double-check, additionally the bp values as given in the respective reference, without significant changes in the obtained s values, except for refs 30 (where very uncommon bp values were used), 13c, and 14. Where the original authors treated cis/trans isomeric cycloalkanes as two independent cases (exhibiting identical TIs but different bp values), we (not distinguishing stereoisomers) included the same structure (with its averaged bp value) twice in order to obtain a comparable sample. Zinn et al.6,7 use a convention for the designation of paths that differs from the one used by most other authors including us. They characterize a path by the number of nodes on it, while we use the number of edges on it, so that their pk is

TIS,

BPS, AND

CYCLOALKANES

our pk-1 for acyclic structures. Further, for cyclic structures they count acyclic paths only, so that their pk corresponds to our apk-1. In ref 7 the descriptor and bp values given for spiro[4.5]decane actually correspond to bicyclo[5.3.0]decane, so that we included bc530d rather than s45d in our comparison sample. For one compound in ref 7, 1-butyl-1-methylcyclopentane, we were not able to find a bp value in the primary literature. This compound therefore is not contained in our data file; it was however included in the sample for comparison with Zinn’s results with the bp value given in ref 7. REFERENCES AND NOTES (1) (a) Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Normal Boiling Points for Organic Compounds: Correlation and Prediction by a Quantitative Structure-Property Relationship. J. Chem. Inf. Comput. Sci. 1998, 38, 28-41. (b) Ivanciuc, O.; Ivanciuc, T.; Balaban, A. T. Quantitative Structure-Property Relationship Study of Normal Boiling Points for Halogen-/Oxygen-/Sulfur-Containing Organic Compounds Using the CODESSA Program. Tetrahedron 1998, 54, 9129-9142. (2) Randic´, M.; Mihalic´, Z.; Nikolic´, S.; Trinajstic´, N. Graph-Theoretical Correlations - Artifacts or Facts? Croat. Chem. Acta 1993, 66, 411434. (3) (a) Diudea, M. V.; Gutman, I. Wiener-Type Topological Indices. Croat. Chem. Acta 1998, 71, 21-51. (b) Gutman, I.; Diudea, M. V. Correcting the Definition of Cluj Matrices. MATCH 1998, 37, 195-201. (c) Lukovits, I.; Linert, W. Polarity-Numbers of Cycle-Containing Structures. J. Chem. Inf. Comput. Sci. 1998, 38, 715-719. (4) Hosoya, H.; Kawasaki, K.; Mizutani, K. Topological Index and Thermodynamic Properties. I. Empirical Rules on the Boiling Point of Saturated Hydrocarbons. Bull. Chem. Soc. Jpn. 1972, 45, 34153421. (5) Lukovits, I.; Linert, W. A Novel Definition of the Hyper-Wiener Index for Cycles. J. Chem. Inf. Comput. Sci. 1994, 34, 899-902. (6) Gautzsch, R.; Zinn, P. List Operations on Chemical Graphs. 5. Implementation of Breadth-First Molecular Path Generation and Application in the Estimation of Retention Index Data and Boiling Points. J. Chem. Inf. Comput. Sci. 1994, 34, 791-800. (7) Hermann, A.; Zinn, P. List Operations on Chemical Graphs. 6. Comparative Study of Combinatorial Topological Indexes of the Hosoya Type. J. Chem. Inf. Comput. Sci. 1995, 35, 551-560. (8) Lukovits, I. The Detour Index. Croat. Chem. Acta 1996, 69, 873882. (9) Trinajstic´, N.; Nikolic´, S.; Lucic´, B.; Amic´, D.; Mihalic´, Z. The Detour Matrix in Chemistry. J. Chem. Inf. Comput. Sci. 1997, 37, 631-638. (10) Ru¨cker, G.; Ru¨cker, C. Symmetry-Aided Computation of the Detour Matrix and the Detour Index. J. Chem. Inf. Comput. Sci. 1998, 38, 710-714. (11) Plavsic´, D.; Soskic´, M.; Dakovic´, Z.; Gutman, I.; Graovac, A. Extension of the Z Matrix to Cycle-Containing and Edge-Weighted Molecular Graphs. J. Chem. Inf. Comput. Sci. 1997, 37, 529-534. (12) Das, A.; Do¨mo¨to¨r, G.; Gutman, I.; Joshi, S.; Karmarkar, S.; Khaddar, D.; Khaddar, T.; Khadikar, P. V.; Popovic´, L.; Sapre, N. S.; Sapre, N.; Shirhatti, A. A Comparative Study of the Wiener, Schultz and Szeged Indices of Cycloalkanes. J. Serb. Chem. Soc. 1997, 62, 235239. (13) (a) Diudea, M. V. Indices of Reciprocal Properties or Harary Indices. J. Chem. Inf. Comput. Sci. 1997, 37, 292-299. (b) Diudea, M. V. Cluj Matrix Invariants. J. Chem. Inf. Comput. Sci. 1997, 37, 300305. (c) Diudea, M.; Katona, G.; Lukovits, I.; Trinajstic´, N. Detour and Cluj-Detour Indices. Croat. Chem. Acta 1998, 71, 459-471. (14) Estrada, E. Spectral Moments of the Edge Adjacency Matrix in Molecular Graphs. 3. Molecules Containing Cycles. J. Chem. Inf. Comput. Sci. 1998, 38, 23-27. (15) For example: propylcyclobutane, ref 13b used 110, ref 14 100.6 °C; isopropylcyclohexane, ref 13b used 146, ref 7 154.6, refs 8 and 9 171.3 °C; 1,3-dimethylcyclohexane, ref 13a used 120, ref 13b 124.5 °C. (16) Rossini, F. D.; Pitzer, K. S.; Arnett, R. L.; Braun, R. M.; Pimentel, G. C. Selected Values of Physical and Thermodynamic Properties of Hydrocarbons and Related Compounds, Comprising the Tables of the American Petroleum Institute Research Project 44, Extant as of December 31, 1952; Carnegie Press: Pittsburgh, PA, 1953. We were not able to locate a more recent edition of this compilation (as far as to bp values). However, a more recent reference work, citing as source of data the findings of API Research Project 44, gives somewhat

J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 801

(17) (18)

(19) (20) (21) (22) (23)

(24)

(25) (26) (27) (28) (29) (30) (31) (32)

(33)

(34)

(35) (36)

different bp values: Physical Constants of Hydrocarbons C1 to C10; prepared by ASTM Committee D-2 on Petroleum and Lubricants and API Research Project 44 on Hydrocarbons and Related Compounds: Philadelphia, PA, 1971. Many of the latter bp values were used and printed (citing as data source the API Research Project 44, 19471991) in the following: Gakh, A. A.; Gakh, E. G.; Sumpter, B. G.; Noid, D. W. Neural Network-Graph Theory Approach to the Prediction of the Physical Properties of Organic Compounds. J. Chem. Inf. Comput. Sci. 1994, 34, 832-839. In the absence of evidence for the latter values being better than the former, we decided to use the original values from Rossini (1953). Ru¨cker, G.; Ru¨cker, C. Counts of All Walks as Atomic and Molecular Descriptors. J. Chem. Inf. Comput. Sci. 1993, 33, 683-695. (a) Diudea, M. V.; Minailiuc, O. M.; Katona, G. Novel Connectivity Descriptors Based on Walk Degrees. Croat. Chem. Acta 1996, 69, 857-871. (b) Ivanciuc, O.; Diudea, M.; Khadikar, P. V. New Topological Matrices and Their Polynomials. Indian J. Chem., Sect. A 1998, 37A, 574-585. Wieland, T.; Kerber, A.; Laue, R. Principles of Generation of Constitutional and Configurational Isomers. J. Chem. Inf. Comput. Sci. 1996, 36, 413-419. Hendrickson, J. B.; Huang, P.; Toczko, A. G. Molecular Complexity: A Simplified Formula Adapted to Individual Atoms. J. Chem. Inf. Comput. Sci. 1987, 27, 63-67. Razinger, M.; Chretien, J. R.; Dubois, J. E. Structural Selectivity of Topological Indexes in Alkane Series. J. Chem. Inf. Comput. Sci. 1985, 25, 23-27. Randic´, M.; Hansen, P. J.; Jurs, P. C. Search for Useful Graph Theoretical Invariants of Molecular Structure. J. Chem. Inf. Comput. Sci. 1988, 28, 60-68. (a) Randic´, M. Comparative Regression Analysis: Regressions Based on a Single Descriptor. Croat. Chem. Acta 1993, 66, 289-312. (b) Bhattacharjee, S.; Dasgupta, P. Molecular Property Correlation in Alkanes With Geometric Volume. Comput. Chem. 1994, 18, 61-71. (c) Nikolic´, S.; Trinajstic´, N.; Mihalic´, Z. The Wiener Index: Development and Applications. Croat. Chem. Acta 1995, 68, 105129. (a) Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17-20. (b) Kirby, E. C. Sensitivity of Topological Indices to Methyl Group Branching in Octanes and Azulenes, or What Does a Topological Index Index? J. Chem. Inf. Comput. Sci. 1994, 34, 1030-1035. Randic´, M. Linear Combinations of Path Numbers as Molecular Descriptors. New J. Chem. 1997, 21, 945-951. Medeleanu, M.; Balaban, A. T. Real-Number Vertex Invariants and Schultz-Type Indices Based on Eigenvectors of Adjacency and Distance Matrixes. J. Chem. Inf. Comput. Sci. 1998, 38, 1038-1047. Plavsic´, D.; Soskic´, M.; Landeka, I.; Gutman, I.; Graovac, A. On the Relation between the Path Numbers 1Z, 2Z and the Hosoya Z Index. J. Chem. Inf. Comput. Sci. 1996, 36, 1118-1122. Randic´, M.; Trinajstic´, N. Isomeric Variations in Alkanes: Boiling Points of Nonanes. New J. Chem. 1994, 18, 179-189. Lucic´, B.; Trinajstic´, N. New Developments in QSPR/QSAR Modeling Based on Topological Indices. SAR QSAR EnViron. Res. 1997, 7, 4562. Bonchev, D. Novel Indices for the Topological Complexity of Molecules. SAR QSAR EnViron. Res. 1997, 7, 23-43. Ren, B. A New Topological Index for QSPR of Alkanes. J. Chem. Inf. Comput. Sci. 1999, 39, 139-143. Toropov, A.; Toropova, A.; Ismailov, T.; Bonchev, D. 3D Weighting of Molecular Descriptors for QSPR/QSAR by the Method of Ideal Symmetry (MIS). 1. Application to Boiling Points of Alkanes. J. Mol. Struct. (Theochem) 1998, 424, 237-247. (a) Needham, D. E.; Wei, I.-C.; Seybold, P. G. Molecular Modeling of the Physical Properties of the Alkanes. J. Am. Chem. Soc. 1988, 110, 4186-4194. (b) Basak, S. C.; Niemi, G. J.; Veith, G. D. Predicting Properties of Molecules Using Graph Invariants. J. Math. Chem. 1991, 7, 243-272. Toropov, A. A.; Toropova, A. P.; Ismailov, T. T.; Voropaeva, N. L.; Ruban, I. N. Extended Molecular Connectivity: Prediction of Boiling Points of Alkanes. J. Struct. Chem. (Engl. Transl.) 1997, 38, 965969. Basak, S. C.; Grunwald, G. D.; Niemi, G. J. In From Chemical Topology to Three-Dimensional Geometry; Balaban, A. T., Ed.; Plenum: New York, 1997; Chapter 4, p 78. (a) Estrada, E.; Guevara, N.; Gutman, I. Extension of Edge Connectivity Index. Relationship to Line Graph Indices and QSPR Applications. J. Chem. Inf. Comput. Sci. 1998, 38, 428-431. (b) Estrada, E. Generalized Spectral Moments of the Iterated Line Graphs Sequence. A Novel Approach to QSPR Studies. J. Chem. Inf. Comput. Sci. 1999, 39, 90-95.

RU¨ CKER

802 J. Chem. Inf. Comput. Sci., Vol. 39, No. 5, 1999 (37) Estrada, E.; Ivanciuc, O.; Gutman, I.; Gutierrez, A.; Rodriguez, L. Extended Wiener Indices. A New Set of Descriptors for Quantitative Structure-Property Studies. New J. Chem. 1998, 819-822. (38) Yang, Y.-Q.; Xu, L.; Hu, C.-Y. Extended Adjacency Matrix Indices and Their Applications. J. Chem. Inf. Comput. Sci. 1994, 34, 11401145. (39) Liu, S.; Cao, C.; Li, Z. Approach to Estimation and Prediction for Normal Boiling Point (NBP) of Alkanes Based on a Novel Molecular Distance-Edge (MDE) Vector, λ. J. Chem. Inf. Comput. Sci. 1998, 38, 387-394. (40) Ferris, S. W. Handbook of Hydrocarbons; Academic Press: New York, 1955. (41) Varushchenko, R. M.; Pashchenko, L. L.; Druzhinina, A. I. Enthalpies of Vaporization of Some Mono- and Polycyclic Hydrocarbons. Russ. J. Phys. Chem. (Engl. Transl.) 1996, 70, 208-211. (42) (a) Mizutani, K.; Kawasaki, K.; Hosoya, H. Tables of Non-Adjacent Numbers, Characteristic Polynomials and Topological Indices. I. Tree

AND

RU¨ CKER

Graphs. Nat. Sci. Rep. Ochanomizu UniV. 1971, 22, 39-58. (b) Kawasaki, K.; Mizutani, K.; Hosoya, H. Tables of Non-Adjacent Numbers, Characteristic Polynomials and Topological Indices. II. Mono- and Bicyclic Graphs. Nat. Sci. Rep. Ochanomizu UniV. 1971, 22, 181-214. (43) (a) Manoharan, M.; Balakrishnarajan, M. M.; Venuvanalingam, P.; Balasubramanian, K. Topological Resonance Energy Predictions of the Stability of Fullerene Clusters. Chem. Phys. Lett. 1994, 222, 95100. (b) Balasubramanian, K. Exhaustive Generation and Analytical Expressions of Matching Polynomials of Fullerenes C20-C50. J. Chem. Inf. Comput Sci. 1994, 34, 421-427. (44) Randic´, M.; Wilkins, C. L. Graph Theoretical Ordering of Structures as a Basis for Systematic Searches for Regularities in Molecular Data. J. Phys. Chem. 1979, 83, 1525-1540.

CI9900175