Molecular Topology. 15. 3D Distance Matrixes and Related

Babes-Bolyai University, 3400 Cluj, Romania, and Ruder Boskovic' Institute Zagreb, 41001 Zagreb, Croatia. Received September 13, 1993®. 3D-Metric ...
0 downloads 0 Views 814KB Size
129

J. Chem. In$ Comput. Sci. 1995, 35, 129-135

Molecular Topology. 15. 3D Distance Matrices and Related Topological Indices Mircea V. Diudea,*qt Dragos Horvath,t and Ante Graovac’ Babes-Bolyai University, 3400 Cluj, Romania, and Ruder Boskovic’ Institute Zagreb, 41001 Zagreb, Croatia Received September 13, 1993@ 3D-Metric distances supplied by the M M X calculations are used to build 3D and LM3D matrices’s2 in a full analogy with the construction of the distance matrices D from the topological distances. The 3D matrices are used as a basis for the construction of two types of topological indices, c (centricity) and x (centrocomplexity), and the interconversion of the new indices is studied. Similar c and x indices are derived on the ground of LM3D matrices. It is shown that these indices are useful in the QSPWQSAR studies. INTRODUCTION The construction of topological indices (TIS), as unique descriptors characterizing molecular graphs and suitable for QSAWQSPR purposes, evolved during the years from the integer local vertex invariants (LOVIs) to real-number LOVIs.’ Indeed, matrices making a basis for LOVIs attributing stage were defined over the real n ~ m b e r s .Layer ~ a r r a y s ’ ~ ~were *~>~ also considered in order to devise real number LOVIs and TIS. A layer matrix, EM, collects the properties (topological or chemical) of vertices u located on concentric shells (layers), G(u)j, at distance j around the vertex i in the graph G, and can be defined*,2,6as u~G(u)j

LM(G) = {1qj; i E [l,n]; j E [O,d]} where m and M are labels for a given property and the corresponding matrix, respectively; n is the number of vertices in the graph; and d stands for the diameter of the graph, (Le., the largest topological distance in the graph). A particular interest has been devoted to LOVIs capable of “seeing” the total graph environment of each vertex. This goal can be reached either by using quadratic3 or layer matrices’~2~8 for deriving such local invariants. Thus, we proposed1~2~6~8 two classes of LOVIs which give c- and x-type ordering when applied to a layer matrix:

dsp is a specified topological distance, usually larger than the diameter of the graph (here dsp = 10, unless otherwise specified); z is the number of digits of the (integer part of) max lmi, value in the graph; li is a local parameter for multiple bonds; fi is a multigraph factor, with ciu-the conventional bond order (1, 2, 3, and 1.5 for single, double, triple, and aromatic bonds, respectively); ti is a weighting factor, accounting for heteroatoms (e.g., a Sanderson type of electronegativityg). Here, the c-symbol refers to the centricity of a vertex i (its location vs the center of graph), whereas the x-symbol refers to the centrocomplexity’~*~* (the location vs a vertex “of importance”, i.e., a vertex with highest degreelvalence, electronegativity,etc., we called the “center of complexity”). In this paper we present new clx-type LOVIs and TIS which were derived both on quadratic and layer distance matrices (with 3D-metric distances supplied by an original MMX program of molecular geometry). A QSPWQSAR study shows the proposed indices to have good correlating ability. THREE-DIMENSIONALDISTANCE MATRICES We constructed the (three-dimensional) distance matrices, 3D, in a full analogy with the construction of distance matrices, D, from the topological distances: the entries in 3D are the actual 3D distances between the vertices of a graph which was geometrically optimized (i.e., by a MMX calculus). The 3D matrix for 2,3-dimethylpentane(23M2C5), GI, is presented in Figure 1.

ecci

l6

(3) j=1

ecci

X(LM>,=

[E lqjlo-zJ f li]*lti

(4)

j=O

li = fi(lqo/lo

+lq~lloo)

(5)

where ecci is the eccentricity of vertex i (the maximal topological distance from vertex i to any vertices in graph); Babes-BolyaiUniversity.

* Ruder Boskovic’ Institute Zagreb. @

Abstract published in Advance ACS Absrrucrs, December 1, 1994. 0095-233819511635-0129$09.OO/O

1 1 o.oo00 2 1.5414 3 2.5709 4 3.9411 5 4.5163 6 2.5178 7 3.0305

2 3 1.5414 2.5709 O.oo00 1s543 1 .5543 O.oo00 2.5634 1.5468 3.0891 2.5852 1.5388 2.5930 2.5821 1.5395

4 3.9411 2.5634 1.5468 O.oo00 1.5364 3.0398 2.5461

5 4.5163 3.0891 2.5852 1.5364

6

7 2.5178 3.0305 1.5388 2.5821 2.5930 1s395 3.0398 2.5461 O.oo00 3.6199 3.9326 3.6199 O.oo00 3.2366 3.9326 3.2366 O.oo00

Figure 1. 3D matrix for 2,3-dimethylpentane, GI (optimized geometry).

By considering the mode of how the distance matrices collect the information on molecular structures, the matrices 0 1995 American Chemical Society

130 J. Chem. In5 Comput. Sci., Vol. 35, No. 1, 1995

DIUDEA ET AL.

Table 1. Reversing c/x LOVIs: Labels and Exponent Values (cf. eq 8) LOVI a b

c(D) -1 +l 0 1

C

P

x(D) +1 -1 0 1

43D) -1 fl

0 1

x(3D) $1 -1 0 1

D (or 2D) can be referred to as “through bond” type, whereas the matrices 3D are of “through space” type.1° According to eqs 1 and 2 and using the 3D distances, we constructed a new layer matrix, LM3D, with the following specifications

q=

c

34,=3D,

d3D, k) -1 $1 -112 k

x(3D, k) +1 -1 - 112 k

c(3D, s) -1 +1 - 112

x(3D, s) +1 -1 - 112

5

5

follows ecci

[E (lqjlo-zJ)/ki - l i p t i

x(LM3D), =

(9)

j=O

(7)

all vcG

where 3Du is the 3D distance sum from vertex u to all the other vertices in the graph. Thus, the matrix LM3D is the 3D correspondent of the LMD matrix, earlier introduced1V2 (old label R). Its dimensions are nd. Basically, the LM3D matrix should differ for different conformations, Le., it could be used for the characterization of molecular geometry. However, we do not have rigorous proof that LM3D is an injective function of molecular geometry. The LM3D matrix is examplified for 2,3(missing entries are zeroes). dimethylpentane, GI; 1 2 3 4 5 6 7

18.118 12.869 12.390 15.174 19.279 16.546 16.867

12.869 47.053 44.910 31.669 15.174 12.869 12.390

28.935 32.041 53.943 29.736 12.390 30.507 28.043

32.041 19.279 34.664 29.736 32.041 53.943

where 3Duhas the meaning given by eq 7. Summation over all i vertices in G will provide the corresponding global indices, donated C(LM3D) and X(LM3D), respectively. INTRAMOLECULAR ORDERING The character c or x of the local invariants resulted by applying reversing operators can be changed when the sign of a and b exponents in eq 8 interchanges. This is examplified on 2,2-dimethylnonane (22M2C9), G2 (Figure 2).

19.279 Gp ( 22MMC9 )

34.664 19.279

REVERSING C/X INDICES ON D AND 3D MATRICES When a separation measure between the vertex i and any other vertices j in the graph is large enough (i.e., the cubic power of distances), the overall sum of the inverse values of such measure will give good information on the i location in graph. A precedent exists in the work of Hall and K i e ~ - . ~ In the view of exploring such topological information we defined a series of c and x LOVIs, l(m,p)i, on the matrices D and 3D in the following way

c LOVIS (* 102)

vertex

c(D)i

4 5 3 6 2 7 1 10 11 8 9

0.31746 0.30488 0.21459 0.19569 0.12706 0.11338 0.07622 0.07622 0.07622 0.06798 0.04310

d3D, k),

d3D)i 5 4 6 3 7 2 10 11 8 1 9

0.16345 0.16058 0.11117 0.10222 0.06230 0.05874 0.04981 0.04981 0.03729 0.03700 0.02302

4 5 6 3 2 7 8 10 11 1 9

43D, s)i

c(LM3D)i

0.25700 5 0.23785 5 6.11819 0.25667 4 0.23275 4 4.59918 0.17659 6 0.16125 6 4.09266 0.16914 3 0.14702 3 3.17255 0.14062 7 0.08997 7 2.67781 0.10067 2 0.07907 2 2.12072 0.06119 10 0.07710 8 1.70948 0.05992 11 0.07710 1 1.39544 0.05992 1 0.05716 10 1.39508 0.04495 8 0.05366 11 1.39508 0.02698 9 0.03579 9 1.08531

x LOVIS afl j

where 1 is labeling for c and x LOVIs, and m denotes the separation property and the correspondin matrix (i.e., D or 3D). The vertices could be either uncharacterized (in which case pu = 1 for each vertex u) or characterized, i.e., by their s (pu degrees, k, or Sanderson adjusted electronegati~ities,~ = ku and pu = su,respectively). The values for the exponents a, b, and c and the symbols of the corresponding LOVIs are listed in Table 1. Summation of l(m,p)i over all i vertices in G will give the corresponding C and X global indices. As the signs of a (and b) are opposite, this results in c and x LOVIs (and TIS). For this reason they are called the reversing indices. Examples will be given in a next section. INDICES ON LAYER MATRIX LM3D

The centricity index, c(LM3D)i, was defined in agreement with the eq 3. For the x LOVI, the eq 4 was modified as

vertex 2 3 4 5 6 7 8 1 10 11

9

x(D)i 4.19321 2.56529 2.42177 2.38657 2.36370 2.32455 2.19904 1.44516 1.44516 1.44516 1.19907

2 3 4 5 6 7 8 10 11 1 9

~(3D)i

x(3D, k)i

x(3D, s)i

1.16857 0.81636 0.77307 0.73848 0.72652 0.70717 0.64740 0.51401 0.51401 0.49294 0.37233

2 2.62915 2 3 1.74616 3 4 1.54225 4 5 1.46869 5 6 1.44006 6 7 1.37688 7 8 1.13150 8 10 0.83364 10 11 0.83364 11 1 0.80216 1 9 0.52595 9

1.60840 1.12819 1.07417 1.02403 1.00770 0.98547 0.92739 0.76152 0.76152 0.73028 0.55792

x(LM3D)i 2 4 5 3 6 7 8 10 11 1 9

0.11833 0.08139 0.07818 0.07592 0.07143 0.06120 0.05136 0.03479 0.03479 0.03168 0.02492

Figure 2. c and x ordering of vertices in 2,2-dimethylnonane, G2

(optimized geometry). Among the C-type LOVIs, the c(LM3D)’ gives the best centric ordering of vertices in 22M2C9, which alternates around the center of the longest chain in the graph (Bonchev’s first centric criterion: minimum eccentricity”). The c(D)induced ordering is improved by c(3D) which uses 3D-metric

J. Chem. In$ Comput. Sci., Vol. 35,No. 1, 1995 131

3D DISTANCE MATRICES AND TOPOLOGICAL INDICES Table 2. Values of Global TIS for 17 Geometrical Isomers of Heptane" C Indices graph c7 2MC6 3MC6 24M2C5 22M2C5 3EC5 23M2C5 33M2C5 223M3C4

C(LK)

C(D)*l@

C(3D)*102

C(3D,K)*lo2

C(3D,S)*102

C(LM3D)

1.32495 1.32495 1.32495 1.47244 1.47244 1.47244 1.54056 1.54056 1.73782 1.78726 1.80268 1.80268 1.83800 1.83800 1.93840 1.93840 2.14856

4.56222 4.56222 4.56222 6.00622 6.00622 6.00622 7.24297 7.24297 8.34368 9.62898 9.44105 9.44105 10.29196 10.29196 12.60393 12.60393 14.54199

2.1 1553 2.42672 2.45578 3.54742 3.56967 3.04090 3.91708 3.66401 4.58103 4.89901 5.19622 5.25442 5.86807 5.47451 6.69405 6.36465 7.12919

3.24221 3.76273 3.76062 5.40582 5.37848 4.57529 6.00291 5.59445 6.82365 7.28299 8.05560 8.14142 9.00538 8.29493 10.30236 9.78058 10.77349

3.10206 3.54885 3.60175 5.22565 5.27559 4.49299 5.77210 5.40426 6.81571 7.32021 7.63888 7.72468 8.68786 8.12380 9.94592 9.46510 10.68455

0.58938 0.59800 0.59975 0.74061 0.74258 0.73001 0.78580 0.78246 0.99623 1.02187 1.03608 1.03730 1.06819 1.06293 1.14982 1.14236 1.34806

X Indices maDh

c7 2MC6 3MC6 3EC.5 24M2C5 23M2C5 22M2C5 33M2C5 223M3C4

X(LK)

X(D)

X13D)

X13D. KI

Xf3D. 57

X1LM3D)

14.39506 14.39506 14.39506 14.61504 14.61504 14.61504 14.63682 14.63682 14.65860 14.65860 14.83680 14.87640 14.87640 15.05460 15.09420 15.09420 15.31200

13.68131 13.68131 13.68131 13.92205 13.92205 13.92205 13.98012 13.98012 14.03819 14.03819 14.17130 14.25694 14.25694 14.39005 14.47569 14.47569 14.69444

4.10895 4.14010 4.14459 4.28829 4.30982 4.25306 4.30147 4.29778 4.38751 4.39871 4.41880 4.49663 4.47666 4.49844 4.58674 4.56549 4.65702

7.35749 7.39726 7.42188 7.78408 7.84379 7.73902 7.92740 7.92905 8.16986 8.18192 8.16495 8.43149 8.39795 8.39053 8.68318 8.64335 8.84922

5.85054 5.89811 5 .go124 6.12085 6.14824 6.06682 6.128 18 6.12245 6.24714 6.26502 6.31601 6.42484 6.39518 6.45383 6.57482 6.54519 6.70325

0.85993 0.88553 0.89694 0.97793 0.99767 0.94935 1.01281 1.00820 1.08685 1.08945 1.05926 1.12948 1.11269 1.09245 1.17568 1.16101 1.20190

C, stands for the longest path in the graph; M and E are labels for the methyl and ethyl groups, respectively, with corresponding multiplicity and location, i.e., 22M2C5 = 2,2-dimethylpentane.

distances. Despite different orderings, the c-trend is obvious within this set of LOVIs. The x character denotes a location versus a crucial vertex or subgraph (one of the highest LOVI). Most frequently, the LOVI is a function of vertex degree, so that the x character of LOVI values (supplied as a spectrum by an operator within the MOLORD algorithm, see ref 12) increases with the increasing order of L, (iterative line dervatives). When the vertex ordering does not change along the spectrum of LOVI values, this is proof of a pure x character for the considered LOVI. The best x-type LOVI tested in ref 1 was x(LK) (denoted therein as BX). In the case of 22MMC9 that LOVI gives the following vertex ordering: 2, 3, 4, 5, 6 , 7, 8, 1, 10, 11, 9. Since 3D LOVIs cannot be computed for L, (with the exception of the initial graph, n = 0, whose metric distances are computed by MMX calculations), we have taken the ordering induced by x(LK) LOVI as a standard x one. In Figure 2, one can see the good x ordering given by the operators based on D and 3D matrices. A quite different ordering was induced by x(LM3D) index, due to the particular construction of LM3D matrices (the same ordering is given by Balaban's J index); however, the most important vertex 2, was correctly located.

INTERMOLECULAR ORDERING The global TISsupplied by C and X indices in the set of 17 geometrical heptane isomers are listed in Table 2. The proposed indices give different orderings in the heptane isomers (Tables 2 and 3), depending on their c or x character. For comparison, we introduced the C(LK) and X(LK) indices' as representative ones of c and x character, respectively.

In the C set of indices, the C(D) induces only one inversion (3EC5 before 22M2C5) vs C(EK) and C(EM3D) (Table 3). The same ordering (as C(D)) was obtained by treating the path ~equences'~ of heptane isomers according to the 1P3P criteria of graph centricity" (Tables 3 and 4). The best x ordering of heptane isomers is given by the X(LK), X(D), X(3D), and X(3D,S) indices, which is identical to that obtained by the lexicographic ordering of the path sequences (Tables 4 and 5). Notice that X(Lp0 index was computed taking the weighting factor ti = 1. In X(3D,K) and X(LM3D) induced ordering, two inversions (24M2C5, 3EC5 and 22M2C5,23M2C5) appeared vs X(LK) ordering. For comparison, J and DJ indices' induced only one inversion (24M2C5, 3EC5), Randic's ID number14 one inversion (3EC5, 3MC6), but none was observed for DM' and DMZ

in dice^.'^

132 J. Chem. Inf. Comput. Sei., Vol. 35, No. 1, 1995

DIUDEAET AL.

Table 3. C and X Ordering in Heptane Isomers C Ordering

C(LK): C(D): C(3D): C(3D, K): C(3D. S): C(LM3D): path seq:

c7, c7, c7, c7. c7, c7, c7,

2MC6, 2MC6, 2MC6, 2MC6, 2MC6, 2MC6, 2MC6,

3MC6, 3MC6, 3MC6, 3MC6, 3MC6, 3MC6, 3MC6,

24M2C5, 24M2C5, 24M2C5, 24M2C.5, 24M2C5, 24M2C5, 24M2C5,

22M2C5, 3EC5, 22M2C5, 22M2C5, 22M2C5. 22M2C5, 3EC5,

3EC5, 22M2C5, 3EC5, 3EC5, 3EC5, 3EC5, 22M2C5,

23M2C5, 23M2C5, 23M2C5, 23M2C5, 23M2C5, 23M2C5, 23M2C5,

33M2C5, 33M2C5, 33M2C5, 33M2C5, 33M2C5, 33M2C5, 33M2C5.

223M3C4 223M3C4 223M3C4 223M3C4 223M3C4 223M3C4 223M3C4

X(LK): X(D): X(3D): X(3D, K): X(3D, S): X(LM3D): path seq:

c7, c7, c7 c7, c7, CI. c7,

2MC6, 2MC6, 2MC6, 2MC6, 2MC6, 2MC6, 2MC6,

3MC6, 3MC6, 3MC6, 3MC6, 3MC6, 3MC6, 3MC6,

3EC5, 3EC5, 3EC5, 24M2C5, 3EC5, 24M2C5, 3EC5,

X Ordering 24M2C5, 24M2C5, 24M2C5, 3EC5, 24M2C5, 3EC5, 24M2C5,

23M2C5, 23M2C5, 23M2C5, 22M2C5, 23M2C5, 22M2C5, 23M2C5,

22M2C5, 22M2C5, 22M2C5, 23M2C5, 22M2C5, 23M2C5, 22M2C5,

33M2C5, 33M2C5, 33M2C5, 33M2C5, 33M2C5, 33M2C5, 33M2C5,

223M3C4 223M3C4 223M3C4 223M3C4 223M3C4 223M3C4 223M3C4

1

Table 4. X and C Ordering in Heptane Isomers According to Path Sequenced3 path sequence 6 6 6 6 6 6 6 6 6

5 6 6 6 7 7 8 8 9

4 4 5 6 4 6 4 6 6

X ord

3 3 3 3 4 2 3 1 0

2 2 1 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

c7 2MC6 3MC6 3EC5 24M2C5 23M2C5 22M2C5 33M2C5 223M3C4

~~~~~~~~

X(LK) 14.39506 14.61504 14.63682 14.65860 14.83680 14.87640 15.05460 15.09420 15.31200

X(D) 13.68131 13.92205 13.98012 14.03819 14.17130 14.25694 14.39005 14.47569 14.69444

DM'

C ord

C(D)* 10'

13.42462 14.76562 15.08212 15.36658 16.36313 16.94921 17.94975 18.48528 20.54701

c7 2MC6 3MC6 24M2C5 3EC5 22M2C5 23M2C5 33M2C5 223M3C4

4.56222 6.00622 7.24297 8.34368 9.44105 9.62898 10.29196 12.60393 14.54199

~~~~~~

Table 5. X and C Ordering in Octane Isomers According to Path SequencesI3 path sequence 7 7 I 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

6 7 7 7 7 8 8 8 8 8 9 9 9 9 1 1 1 1

5 5 6 6 7 5 6 7 8 8 5 7 8 9 0 0 0 2

4 4 4 5 5 4 5 4 4 5 4 4 4 3 5 8 9 9

3 3 3 2 2 4 2 2 1 0 3 1 0 0 6 3 2 0

0 0 0 0

2 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

X ord

WLK)

X(D)

DML

C ord

C(LM3D)

C8 2MC7 3MC7 4MC7 3EC6 25M2C6 24M2C6 23M2C6 34M2C6 3E2MC5 22M2C6 33M2C6 234M3C5 3E3MC5 224M3C5 223M3C5 233M3C5 2233M4C4

16.83951 17.05950 17.08148 17.08346 17.10544 17.27968 17.30344 17.32324 17.34502 17.34700 17.49946 17.54302 17.56480 17.58460 17.72320 17.78260 17.80240 18.23800

16.06772 16.31189 16.37670 16.39195 16.45677 16.55937 16.63269 16.67552 16.73359 16.74884 16.79337 16.90952 16.96759 17.01042 17.05787 17.18634 17.22917 17.66667

15.61028 17.02015 17.56044 17.56044 17.91494 18.60840 19.20822 19.60890 20.00839 20.10744 20.51561 21.42983 22.07279 22.13693 22.80578 24.14856 24.49869 29.75000

C8 2MC7 3MC7 4MC7 25M2C6 22M2C6 3EC6 24M2C6 23M2C6 34M2C6 33M2C6 224M3C5 3E2MC5 234M3C5 3E3MC5 223M3C5 233M3C5 2233M4C4

0.44479 0.56848 0.60339 0.63236 0.71165 0.73761 0.75937 0.76325 0.76779 0.81452 0.82412 1.01891 1.04532 1.05845 1.11623 1.08177 1.13486 1.39893

The set of octane isomers was similarly tested. The path sequence ordering can be seen in Table 5 . The best c index C(EM3D) (one inversion vs the path sequence c ordering: 223M3C5, 3E3MC5) is followed by the C(D) index (two inversions: 24M2C6, 3EC6 and 223M3C5, 3E3MC5), whereas other 3D indices give very different ordering. This is probably due to the increasing flexibility of the longest carbon chain in octanes and to the geometry optimizing procedure (see below). The x indices, particularly the 3D ones, again show a large variety of ordering. In contrast, the pure topological x indices X(EK) and i ( D ) give an ordering whichis identical to that given by DM' s~perindex'~ or to that given by the lexicographic ordering of path sequences in octanes (see Table 5). By comparison, the index X(3D,S) shows two inversions (25M2C6, 3EC6 and 22M2C6, 3E2MC5).

Topological indices are frequently interconelated. The intercorrelations generally change with the number and type of the molecules studied. The intercorrelation matrices contain along with the c- and x-type indices two van der Waals (VDW) parameters (computed as described in the section Molecular Geometries) for the set of 17 geometric isomers of heptane (presented in Table 6 ) . Further discussion on the intercorrelations is presented in the section QSPW Q S A R studies. PROPERTIES OF C AND X ORDERINGS While the c ordering in a set of molecules is easily

conceivable, the x ordering requires some comments. Within the MOLORD algorithm, which works on the ground of L, the intermolecular ordering can be made by taking into account subgraphs larger than one vertex. For example, the

J. Chem. In$ Comput. Sci., Vol. 35, No. 1, 1995 133

3D DISTANCE MATRICES AND TOPOLOGICAL INLXCES

Table 6. Intercorrelation matrices in 17 Geometric Isomers of Heptane (Index of Correlation Given as 6 ) (a) van der Waals Parameters and C Global Indices VDW area VDW area SPI C(3D) C(3D, K ) C(3Q S) C(LM3D)

SPI

1.00000

0.91192 1.m

C(3D) 0.99178 0.92355 1.m

C(3D, K )

C(3D, S)

0.99005 0.90885 0.99918 1.00000

0.99135 0.92906 0.99986 0.99849 1.00000

C(LM3D) 0.97707 0.93059 0.98196 0.97659 0.98311 1.m

(b) van der Waals Parameters and X Global Indices

VDW area VDW area SPI X(3D) X(3Q K ) X(3D, S) X(LM3D)

1.m

SPI

X(3D)

X(3D, K )

W3D, S)

X(LM3D)

0.91192 1.00000

0.9701 1 0.98186 1.00000

0.97508 0.97010 0.99629 1.00000

0.96087 0.98835 0.99882 0.99252 1.00000

0.98594 0.93806 0.98477 0.99176 0.97580 1.00000

X(D) index in heptanes preserves the ordering C7, 2MC6, 3MC6, 3EC5, 24M2C5, 23M2C5, 22M2C5, 33M2C5, 223M3C4 for subgraphs of 0, 1, and 2 edges, but one inversion appeared (24M2C5, 3EC5) for subgraphs of three edges. This is exactly the sequence of Bertz’s index.16 The other indices could give more varied ordering, according to their vertex separation ability. This is not a problem, since, according to Bertz,I6 the number of edges in L, (the Bertz index, B ) within a set of molecules can vary along a string of Ln. From the above examples some conclusions emerge: (1) The c ordering given by C(LM3D) index is very close to that given by applying the Bonchev’s 1P-3P criteria to the graph path sequences. The question of the “true” ordering remains open. (2) The x ordering implies that the vertex degree is the most important local invariant. (3) An ordering which follows the branching in graphs will always remain a matter of intuition as it was already pointed out by Bertz. QSPWQSAR STUDIES

Two van der Waals parameters were tested in the set of 17 geometric isomers of heptane: area and surface potential index, SPI, (see section entitled Molecular Geometries). The areas correlate well with the c indices (C(LK), 0.97689; C(D), 0.96638; C(3D,K), 0.99178; C(3D,S), 0.99005; C(LM3D), 0.99135-see Tables 3 and 6), whereas the x indices correlate better with the SPI parameter (X(LK), 0.99407; X(D), 0.99498; X(3D), 0.98186; X(3D,S), 0.98835). This holds for the indices with well differentiated c and x characters (e.g., C(LK) YS X(LK), 0.91863) but becomes less pronounced as the correlation c vs x increases (C(D) vs X(D), 0.96041; C(3D,S) vs X(3D,S), 0.977218; C(3D) vs X(3D), 0.97621; C(LM3D) vs X(LM3D), 0.97669; C(3D,K) vs X(3D,K), 0.97698). This behavior can be partly explained by the fact that the c indices “feel” better the remote points (Le., those located on the van der Waals envelope) in molecular graphs than the x indices do. Conversely, the x indices express better the identity of atoms, sometimes associated with their electronegativity. Notice that X(LK) and X(D) do not use electronegativities (ti = 1). Some QSPR equations are given in Table 7. Further we tested the correlation ability of the indices introduced here with selected physicochemical and biological properties within a set of 25 ethers17 (Tables 8 and 9).

Table 7. QSPR Equations in the Set of 17 Geometric Isomers of Heptane van der Waals Area y = 235.43228-501.17591C(3D) y = 235.31912-327.03903C(3D, K )

9 = 0.99178 s = 1.00021 9 = 0.99005 s = 1.10039

y = 235.11990-333.75613C(3D, S)

9 = 0.99135 s = 1.02614

SPI (Surface Potential Index)

+ 0.02023(X(3D)

y = 0.00030

+ 0.00709X(3D, K ) y = 0.00599 + 0.01325X(3D,S) y = 0.03147

9 = 0.98186 s = 0.00063 9 = 0.97010 s = 0.00081 9 = 0.98835 s = 0.00051

The physicochemical parameters considered are van der Waals area, SPI, and relative conformational energy, and the biological property studied is the toxicity of ethers on mice. In Table 10 we present some QSPR and QSAR equations for the set of ethers in Table 8. From Table 10 one can see that the van der Waals areas show direct correlation with the number of carbon atoms, NC, in ethers (which is modulated by the topological indices), while the SPI and the relative conformational energy are inverse correlated (by means of normalized topological indices, TUNC) with this parameter. The toxicity of ethers on mice shows not only parabolic dependence on NC or TIS but also an excellent linear correlation with SPI and a combination of the normalized TIS,SUMX/NC. Notice that the above QSAR results are very close to that obtained by Mekenyan et al.17 by using three variables: NC, NC,* and the electropy index (I = 0.9784). MOLECULAR GEOMETIUES The molecular geometries used in the 3D indices calculation were obtained from molecular mechanics calculations, using an updated version written in Turbo Pascal 6.0 for the PC environment of our MM-workstation described in ref 18. The empirical force field used was Allinger’s MM2.19 In order to obtain the absolute minimum of potential energy, a simple procedure of mapping the conformational subspace was used. According to the multiplicity of each torosional barrier, the workstation assumes a set of optimal values for the

134 J. Chem. In$ Comput. Sci., Vol. 35, No. 1, 1995

DIUDEA ET AL.

Table 8. Toxicity on Mice (pC), Number of Carbon Atoms (NC), Conformational Energy, and van der Waals Parameters in Ethers (List Taken from Ref 17) no. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.

graph dimethyl ether ethyl methyl ether propyl methyl ether isopropyl methyl ether cyclopropyl methyl ether butyl methyl ether isobutyl methyl ether sec-butyl methyl ether tea-butyl methyl ether pentyl methyl ether ethyl ethyl ether propyl ethyl ether isopropyl ethyl ether cyclopropyl ethyl ether butyl ethyl ether isobutyl ethyl ether sec-butyl ethyl ether rea-butyl ethyl ether pentyl ethyl ether neopentyl ethyl ether vinyl ethyl ether propyl propyl ether isopropyl propyl ether isopropyl isopropyl ether divinyl ether

PC 1.43 1.74 2.45 2.26 2.75 2.70 2.79 2.79 2.79 2.88 2.22 2.60 2.60 3.00 2.82 2.82 2.85 2.92 3.00 3.15 2.34 2.79 2.82 2.82 2.33

NC 2 3 4 4 4 5 5 5 5 6 4 5 5 5 6 6 6 6 7 7 5 6 6 6 6

Table 9. 3 0 Topological Indices for 25 Ethers in Table 8 G X(3D,K) X(3D,S) X(3D) 1 2.1185 2.4446 1.5426 2 3.6004 3.4498 2.2715 3 5.0505 4.4635 3.0020 4 5.4570 4.6870 3.1468 7.6192 5.1757 3.6472 5 6.5355 5.5198 3.7604 6 6.9053 5.7138 3.8880 7 7.1365 5.7918 3.9448 8 9 7.6069 6.1102 4.1323 7.9554 6.4850 4.4612 10 5.2003 4.5129 3.0431 11 6.6436 5.5048 3.7608 12 7.1011 5.7435 3.9188 13 9.3066 6.2316 4.4216 14 8.1062 6.5201 4.4938 15 8.5393 6.7789 4.6642 16 8.8285 6.8746 4.7365 17 9.3296 7.1968 4.9289 18 9.5731 7.5371 5.2286 19 8.2583 5.6928 20 10.7574 5.8788 5.1956 3.4942 21 8.1284 6.5257 4.4989 22 8.6044 6.7760 4.6653 23 24 99.1264 7.0566 4.8516 25 6.6080 5.8495 3.9349

X(3D, K)I X(3D, s)/ NC NC 1.0592 1.2223 1.2001 1.1499 1.2626 1.1159 1.3643 1.1717 1.9048 1.2939 1.3071 1.1040 1.3811 1.1428 1.4273 1.1584 1.5214 1.2220 1.3259 1.0808 1.3001 1.1282 1.3287 1.1010 1.4202 1.1487 1.8613 1.2463 1.3510 1.0867 1.4232 1.1298 1.4714 1.1458 1.5549 1.1995 1.3616 1.0767 1.5368 1.1798 1.4697 1.2989 1.3547 1.0876 1.4341 1.1293 1.5211 1.1761 1.6520 1.4624

conformat. energy (kcdmol) 4.7018 5.8029 6.4223 8.5567 18.8305 6.8500 7.9113 9.2527 11.2903 7.7073 6.8643 7.4748 9.6103 19.8335 8.1156 7.8514 10.2410 12.3225 8.7536 8.1182 12.2599 8.0819 10.1934 12.2733 20.9689

VDW area (Az) 105.6058 134.1241 160.2592 156.2889 143.5338 183.4762 180.5055 181.3286 175.0331 2 11.2203 162.9201 188.3811 184.9085 172.5336 214.7855 209.7583 209.3766 203.2090 240.2706 227.6520 147.3691 2 14.2605 21 1.0182 205.7203 135.1905

SPI (euc/A) 0.2586 0.2407 0.2264 0.2299 0.2343 0.2111 0.2174 0.2190 0.2192 0.2018 0.2291 0.2159 0.2193 0.2233 0.2055 0.2094 0.2076 0.2112 0.1962 0.2032 0.2379 0.2071 0.2093 0.2109 0.2566

Table 10. QSPR and Q S A R Equations in 25 Ethers X(3D)/ NC 0.7713 0.7572 0.7505 0.7867 0.9118 0.7521 0.7776 0.7890 0.8265 0.7435 0.7608 0.7522 0.7838 0.8843 0.7490 0.7774 0.7894 0.8215 0.7469 0.8133 0.8736 0.7498 0.7775 0.8086 0.9837

torsional angles of the molecule. The user is free to modify both the multiplicity and the minimum positions of each torsional axis or to ignore several degress of freedom if the user supposes that these do not have any great influence on the molecular potential or if the rotation around an axis leads to degenerate conformations. The next step consists in generating all the possible conformations (of the initial geometries). All possible combinations of the minimum values for the torsional angles are obtained by rotations around the corresponding axes, and a simple criterion is used to test the validity of each conformation. Since this conformational setup does not use any molecular energy evaluations and provides only rough initial geometries, only

van der Waals Area

+ 26.9513NC y = 44.1547 + 38.0474NC - 7.3692X(3D, K ) y = 54.1935 + 45.7464NC - 25.5681X(3D) y = 56.2506 + 46.5791NC + 1.8354X(3D, K ) y = 46.5146

9 = 0.97781 s = 6.96178 9 = 0.99267 s = 4.01784 9 = 0.99839 s = 1.88436 9 = 0.99851

30G4604X(3D)

s = 1.81203

y = 0.1867

+

SPI (Surface Potential Index) 0.0721X(3D)/NC - 0.0101NC

y = 0.2482 - 0.0004VDW area 0.0393X(3D, S)/NC

+

9 = 0.98524 s = 0.00276 9 = 0.97706 s = 0.00343

Conformational Relative Energy" y = -34.5923 15.9551X(3D,K)/NC 18.5892X(3D, S)/NC

+

+

9 = 0.95724

+

+

s = 1.21944 9 = 0.96181

y = -36.1417 11.4469X(3D, K)/NC 37.3173X(3D)/NC y = -36.1693

s = 1.15370

+ 13.5831SWh'C

9 = 0.96027 s = 1.17634

Toxicity (pC) y = 0.1796 0.1289X(3D,K ) 0.5552NC 0.048 1(NCI2

+

y = 1.9848 - 4.9526SPI 0.0149[X(3D, K)I2

+

+ 0.3539X(3D,K ) -

y = 5.4395 - 24.1071SPI

+ 0.7328SW/NC +

With SuMx/NC = X(3D. K)/NC X(3D, Q/NC With SUMX/NC as specified in the last section.

9 = 0.96724 s = 0.10035

9 = 0.97165 s = 0.09345 12 = 0.97699 s = 0.08432

+ X(3D)/NC.

the structures showing a very large degree of overlapping of the van der Waals spheres can be ignored. The degree of overlapping between the nonbonded atoms i and j is simply defined as

J. Chem. In$ Comput. Sci., Vol. 35,No. I, 1995 135

3D DISTANCE MATRICES AND TOPOLOGICAL INDICES

ov&,j= (&i+ Rwj)/dij

(1 1) The user is asked about the maximal overlapping degree he/she wishes to admit in the conformations to be obtained, and all structures showing stronger interactions will be ignored. A good threshold value was found to be about 0.5, yielding a reasonable limitation on the valid conformations and also ensuring that at least one of the initial geometries leading to the absolute minimum would be obtained. Since optimization can dramatically rearrange the molecular structure, there is no correlation between the degree of overlapping encountered in these initial conformations and the energy of the stable conformations obtained by their optimization. An overlapping index was used to characterize each valid conformation

This basically not only accounts for the maximal overlapping degree encountered in the molecule but also includes information about the mean overlapping degree (n is the number of nonbonded pairs), so that it is able to distinguish degenerate a chiral conformations (having the same IOVR value). Different conformations might have the same maximal overlapping degree, so that only the first term cannot be used to decide whether these are actually identical or not. The calculation of the van der Waals surface was made using polar coordinates. Each elemental area on an atomic van der Waals sphere was tested whether it belongs to the external surface or is placed in an overlapping zone. If the first situation occurs, the area is simply added to the total van der Waals area. By trial and error, we have found that a division of n by 30 for both angular coordinates suffices to provide the stability of the method. Van der Waals radii were taken from ref 20. The fractional atomic charges for the studied molecules were obtained by DelReZ1calculations. In addition to the van der Waals area, an electrostatic index on the van der Waals area was calculated as follows: the sum of the squared atomic contributions to the electrostatic potential was calculated on each elemental area dA found

of the van der Waals surface. Notice that this treatment adds up all the atomic contributions, irrespective of the sign of the atomic fractional charges. The square root of the sum of these squared values on the entire surface was divided by the van der Waals surface and used as an index of the overall polarity of the van der Waals area, named surface potential index, SPI SPI = [ l/AC@(dA)dA]1’2

(14)

CONCLUSIONS By replacing the topological distances in the matrices D and LMD with the 3D-metric distances, matrices 3D and

LM3D are introduced. 3D-Metric distances are obtained through the MMX molecular geometry calculations. On the basis of these matrices, a series of new local and global topological indices, for characterizing the centricity, c, and centrocomplexity, x, in graphs, are generated. It is shown that the indices introduced are useful in the QSAWQSPR studies. The testing was performed on heptane and octane isomers and in a set of ethers. REFERENCES AND NOTES Balaban, A. T.; Diudea, M. V. Real number vertex invariants: regressive distance sums and related topological indices. J. Chem. Inf. Comput. Sci. 1993,33,421-428. Diudea, M. V. Layer matrices in molecular graphs, J . Chem. In& Comput. Sci. 1994,34, 1064-1071. Hall, L. H.; Kier, L. B. Determination of topological equivalence in molecular graphs from the topological state, Quanr. Strucr.-Acr. Relat. 1990,9, 115-131. Skorobogatov, V. A.; Dobrynin, A. A. Metric analysis of graphs, MATCH 1988,23, 105-151. Diudea, M. V.; Paw, B.; A new centric connectivity index (CCI). MATCH 1988,23,65-87. Diudea, M. V.; Minailiuc, 0. M.; Balaban, A. T. Regressive vertex degrees (new graph invariants) and derived topological indices. J. Comput. Chem. 1991, 12,527-535. Diudea, M. V.; Kacso, I. E.; Minailiuc, 0. M. Y indices in homogeneous dendrimers. MATCH 1992,28,61-99. Diudea, M. V.; Horvath, D.; Kacso, I. E.; Minailiuc, 0. M.; Pam, B. Centricitiesin moleculargraphs. The MOLCEN algorithm. J . Marh. Chem. 1992,11,259-270. Diudea, M. V.; Silaghi-Dumitrescu, I. Valence group electronegativity as a vertex discriminator. Rev. Roumaine Chim. 1989,34, 11751182. RandiC, M. Generalizedmolecular descriptors. J. Math. Chem. 1991, 7, 155-168. (a) Bonchev, D.; Balaban, A. T.; RandiC, M. The graph center concept for polycyclic graphs. Int. J. Quantum. Chem. 1981,19,61-82.(b) Balaban, A. T. Iterative procedure for Bonchev, D.; Mekenyan, 0.; the generalized graph center in polycyclic graphs. J . Chem. In& Comput. Sci. 1989,29,91-91. Diudea, M. V.; Horvath,D.; Topan, M. MOLORD algorithmand real number subgraph invariants. Croat. Chem. Acra, in press. RandiC, M.; Wilkins, C. L. Graph-theoreticalordering of structures as a basis for systematic searches for regularities in molecular data. J. Phys. Chem. 1979,83, 1525-1540. RandiC, M. On molecular identification numbers. J . Chem. Inf. Comput. Sci. 1984,24, 164-175. Balaban, A. T.;Ciubotariu, D.; Ivanciuc, 0. Design of topological indices. Part 2. Distance measure connectivity indices. MATCH 1990,25,41-70. Bertz, S. H.Branching in graphs and molecules. Discr. Appl. Marh. 1988,19,65-83. Mekenyan, 0.; Bonchev, D.; Sabljif, A.; TrinajstiC; N. Applications of topological indices to QSAR. The use of Balaban index and the electropy index for correlationswith toxicity of ethers on mice. Acta. Ph”. Jugosl. 1987,37, 75-86. Horvath, D.; Silaghi-Dumitrescu,I. An interactive workstation for molecular mechanics modelling de chemical structures. Rev. Roum. Chim. 1992,37, 1165-1174. Burkert, U.;AUinger, N. L. Molecular Mechanics; American Chemical Society: Washington, DC, 1982. Labanowski, J.; Motoc, I.; Dammkoehler, R. A. The physics meaning of topological indices. Computers Chem. 1991,15,47-53. DelRe, G.A simple MO-LCAO method for the calculation of charge distributions in saturated organic molecules. J. Chem. SOC. 1958, 4031-4040.

CI930134E