The walk ID number revisited - Journal of Chemical Information and

The walk ID number revisited. Wolfgang R. Mueller, Klaus Szymanski, Jan V. Knop, Zlatko Mihalic, and Nenad Trinajstic. J. Chem. Inf. Comput. Sci. , 19...
12 downloads 0 Views 264KB Size
231

J. Chem. Znf: Comput. Sci. 1993, 33, 231-233

The Walk ID Number Revisited Wolfgang R. Miiller, Klaus Szymanski, and Jan V. Knop The Computer Center, The Heinrich Heine University, D-4000 Diisseldorf, The Federal Republic of Germany Zlatko MihaliC Faculty of Science and Mathematics, The University of Zagreb, HR-41001 Zagreb, The Republic of Croatia Nenad TrinajstiC' The Rugjer BogkoviC Institute, HR-41001 Zagreb, The Republic of Croatia Received August 27, 1992 The self-returning walk ID number, SID, a modification of the walk ID number, WID, is introduced as a graph-theoretical descriptor with the highest known discriminative ability. The concept of an identification (ID) number was introduced in this journal a few years ago by RandiC' to characterize chemical structures by a single number. To increase the discriminative power of previous graph-theoretical indices, an ID number considers not only elements of a graph (Le., vertices and edges) but also defines weights for classes of substructures and adds them up. In the case of RandiC's connectivity ID number (CID),' prime ID number (PID),* and Balaban's ID number (BID),3 all paths in a graph are inspected. This implies the necessity of the enumeration of all its paths for a general graph, which means a prohibitive amount of computing time for polycyclic structures. Only in the special case of trees can many paths be weighted in parallel without considering them separately, so there exists a very efficient method for computing the CID, PID, and BID numbers for trees. This made possible a search for counterexamples, giving as a result, the discovery that all three are highly discriminative molecular descriptors; but for the first two it is established that they are not unique. Thus, the CID number is unique only for alkane trees with up to 14 vertices," the PID number is unique only for alkane trees with up to 19 verti~es,~ and the BID number is unique only for alkane trees with up to at least 20 vertices. To get an ID number which could be economicallycomputed for all kinds of graphs, the walk ID number (WID) was proposed.6 The WID number is obtained by adding up weights for all walks of a graph up to a certain length. Since the weight of a walk is defined as the product of the weights of its participating edges, a few matrix multiplications and additions are sufficient to obtain the sum of all weights of all these walks. Among the trees with up to 20 vertices, the WID number proved to be unique like the BID number. After a long search, two pairs of polycyclic graphs with identical WID numbers were found.6 Close inspection of these counterexamples showed that the summing over all walks may drop some vital information about the cyclic structure of a graph, which can be retained if only self-returning walks are considered. This leads to the definition of the self-returning walk ID number (SID) of a graph with N vertices as the sum over all self-returning walks of length less than Nof the product over all edges making up the walk of the weights assigned to the edges where the weight of an edge is 1 divided by the square

SID- iO.OiO815i8

SID- iO.01043058

SID- 10.01012i85

SID- iO.Oii68803

W&d w

SID- 10.01117323

SID- iO.OiO8iU7

SID- iO.Oli50291

SID- 10.0i007110

SID- i0.01007478

SID. i0.00872173

SID. 10.00845735

SID- 10.00887084

Figure 1. SID numbers for a class of cyclic graphs with 10 vertices and 10 edges.

root of the product of the distance sums of the vertices incident to the edge. A walk of length 0 simply is a vertex and has the empty product 1 as its weight, which means that the identity matrix I contains at position ijjust the sum of the weights of all walks of length 0. A walk of length 1 simply is an edge, which implies that the edge-weight matrix (containing at position ij the weight of the edge from i to j-if present-and 0 otherwise) has at position ij just the sum of the weights of all walks of length 1. If two matrices contain at position ij the sum of the weights of all walks of length L, respectively, m from vertex i toj, then their matrix product contains at position ijjust the sum of the weights of all walks of length L + m from i to j (the walks of length L + m from i t o j are in a 1-1 correspondence with the pairs of walks of lengths L,respectively, m from i to k, respectively, from k to j for some vertex k;and the weight of the composite walk is just the product of the weights of the components).

0095-233819311633-023 1$04.00/0 0 1993 American Chemical Society

MULLERET AL.

232 J. Chem. Znf. Comput. Sci., Vol. 33, No. 2, 1993 Table I. Comoutation of the SID Number

(a) Graph

A

10

G (c) Distance Matrix and Distance Sums of G

(b) Adjacency Matrix of G

0 1 1 1 1 0 0 0 0 0

1 0 1 0 0 1 0 0 0 0

1 1 0 0 0 0 1 0 0 0

1 0 0 0 0 0 0 1 1 0

1 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 1 0 1

0 0 1 0 0 0 0 1 0 0

0 0 0 1 0 1 1 0 0 0

0 1 I 1 1 2 2 2 2 3

1 0 1 2 2 1 2 2 3 2

"I[ O O D =

0 0 0 0

0 0 0 0

1 1 0 2 2 2 1 2 3 3

1 2 2 0 2 2 2 1 1 3

1 2 2 2 0 3 3 3 3 4

2 1 2 2 3 0 2 1 3 1

2 2 1 2 3 2 0 1 3 3

2 2 2 1 3 1 1 0 2 2

D(i)

2 3 3 1 3 3 3 2 0 4

(d) Matrix of Edge Weights of G

0.0 0.0645 0.0626 0.0645 0.0538 0.0 0.0 0.0 0.0

0.0645 0.0 0.0606 0.0 0.0 0.0606 0.0 0.0 0.0

0.0

0.0

0.0626 0.0606 0.0 0.0 0.0 0.0 0.0556 0.0 0.0 0.0

1.01597

0.06993 1.01225 0.06594 0.00479 0.00376 0.06175 0.00391 0.00427 0.00024 0.00299

0.06806 0.06594 1.01140 0.00464 0.00366 0.00424 0.05645 0.00379 0.00024 0.00021

0.0538 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0645 0.0 0.0 0.0 0.0 0.0

0.0 0.0625 0.0510 0.0

0.0 0.0606 0.0 0.0 0.0 0.0 0.0 0.0606 0.0 0.0485

0.0 0.0 0.0556 0.0 0.0 0.0 0.0 0.0574 0.0 0.0

0.0 0.0 0.0 0.0625 0.0 0.0606 0.0574 0.0 0.0 0.0

0.0 0.0 0.0 0.0510 0.0 0.0 0.0 0.0 0.0 0.0

0.0

0.00405 0.00391 0.05645 0.00391 0.00022 0.00378 1.00648 0.05825 0.00020 0.00018

0.00463 0.00427 0.00379 0.06366 0.00025 0.06167 0.05825 1.01106 0.00325 0.00299

0.00337 0.00024 0.00024 0.05155 0.00018 0.00021 0.00020 0.00325 1.00263

0.00022 0.00299 0.00021

0.0 0.0

0.0 0.0

(e) Matrix of Weighted Walks of G

0.06599 0.00479 0.00464 1.01086 0.00355 0.00416 0.00391 0.06366 0.05155 0.00020

0.05466 0.00376 0.00366 0.00355 1.00294 0.00024 0.00022 0.00025 0.00018 O.oooO1

0.00453 0.06175 0.00424 0.00416 0.00024 1.00985 0.00378 0.06167 0.00021 0.04898

O.OOOO1

0.04898 0.00018 0.00299 0.00001 1.00238

( f ) The SID Number SID = 10.0858

The SID number is computed by the following algorithm of order N4: Given a graph G with N vertices, we set up (1) adjacency matrix' A = (ajj) (2) distance D = (dij) (3) distance sums1&'* D(i) = Cjdv (4) edge-weight matrix W = (wij) where Wij '=

{

for dij = 1 = uij

i

otherwise

WO=I,WL=

The order N4stated above is achieved by combining steps

w,...,WN-I

which, due to the explanation given above, contain at position ij just the sum of the weights of all walks from i t o j of length

..., N - 1 .

which contains at position ij just the sum of the weights of all walks from i t o j of length less than Nand, thus, at position ii on the diagonal just the sum of the weights of all selfreturning walks for vertex i of length less than N . Thus, the self-returning walk ID (SID) number is obtained as the sum of diagonal elements of the matrix of weighted walks: SID = x w * i i

F(I)Do]-'"

( 5 ) matrix powers

0,1,

w*= (W*ij) = CkWk

(6) matrix Of weighted Walks (sum of matrix powers)

5 and 6 with an optimization technique known in numerical analysis as Horner's rule,14 Le., we need only N - 2 matrix

multiplications if we compute

Cw' = I + w k