J . Chem. If. Comput. Sci. 1994, 34, 534-538
534
Mathematical Relation between Extended Connectivity and Eigenvector Coefficients Christoph Rucker and Gerta Rucker’ Institut fur Organische Chemie und Biochemie, Universitlt Freiburg, Albertstrasse 21, D-79104 Freiburg, FRG Received March 25, 1993” Formulas are derived for the limit distribution of weights of vertices in a graph as obtained from extended connectivities. A fundamental difference between bipartite and nonbipartite graphs is seen: For the latter the eventual distribution coincides with the one resulting from the coefficients in the principal eigenvector. For the former, in contrast, the last eigenvector also has to be taken into account, and there is no single limit distribution. This is the reason why in some bipartite graphs the ranks derived from extended connectivities switch indefinitely for certain atoms. INTRODUCTION In 1965, H. L. Morgan at Chemical Abstracts Service (CAS) proposed a method for numerically ordering the vertices in a graph, now known as the Morgan algorithm, which consists in iteratively assigning to each vertex the sum of the previous values of its neighbors, starting from the degrees (connectivities).’ These values, later called extended connectivities (ECs), measure how centrally involved a vertex is in a graph.* As soon as a further iteration fails to further split the atoms into classes, the algorithm stops. This simple method is still the basis of canonicalization in the CAS registration process and thus of most computerized literature searches for compounds performed today. Some problems associated with the Morgan algorithm (“nonuniform convergence, oscillatory behavior, instability, indeterminancy”) have been discussed3and improvements were p r ~ p o s e d . ~From a more fundamental point of view, the Morgan algorithm was criticized as being “somewhat ad hoc”,S and in 1975 RandiE, aiming at a more objective process, proposed a seemingly independent numbering scheme for vertices, ranking them by their respective coefficient in the eigenvector associated with the largest eigenvalue of the adjacency m a t r i ~ . ~This . ~ method, too, has its deficiencies, as was pointed out by B a l a b a ~ ~ . ~ ~ , ’ Interestingly, for many molecules these two different ranking procedures gave complete agreement for the numbering of atoms, and in some others only a few vertices were exchanged. These observations led RandiC to suspect “some intimate relationship between the coefficients of the first eigenvector and extended connectivity values”. The relationship was “not transparent” at that time, and as far as we know the question is still not satisfactorily answered. Balasubramanian and Munk*cite as generally valid a known procedure by which the principal eigenvector is obtained as the limit of a sequence of vectors which result from repeated action of the adjacency matrix on the vector of degreesg This procedure actually is valid for nonbipartite graphs only. In view of the enormous importance of the Morgan algorithm we thought it worthwhile to reconsider this old problem. In the meantime it has become clear that the Morgan algorithm is firmly mathematics-based, in that the ECs are identical to the counts of walks of a certain length in a graph @
Abstract published in Advance ACS Abstructs, April 1, 1994. 0095-2338/94/ 1634-0534$04.50/0
starting at a particular vertex, which are (less easily) obtainable as the sums over rows in the higher powers of the adjacency matrix.l0?” In the present paper we clarify the connection between ECs and eigenvector coefficients. RESULTS The ECs of atom i form a sequence, one particular member of which is chosen to be used for assigning a rank to atom i by the arbitrary end criterion built in the Morgan algorithm. This fact suggests that one should look for something like a limit of the ECs which might be used for a more objective ranking. This will be done in the following. Let A be the adjacency matrix of a nondirected, connected simple graph of n vertices, Ak its kth power (k = 1, 2, ...) containing elements aij(&)( i , j = 1, ...,n). a ~ ( &is )the number of walks of length k steps from vertex i to vertexj. The atomic walk count of i, awck(i), is the number of walks of length k beginning at vertex i. n
awc,(i) = j= 1
As is easily seen,]] awck(i) can be calculated without matrix multiplication by the recursion formula
where the first summation extends over all atomsj, the second over those atoms j which are neighbors of i. This is the Morgan recursion, therefore awck(i) is equal to the kth EC of i.lJo,ll The molecular walk count mwck is the sum of all elements of matrix Ak. “
I
While the awck are atomic descriptors, measuring something like “involvedness” or centrality of atom i, or complexity of its environment, the mwck aremolecular descriptors measuring branching and size, or complexity of the graph.” For intramolecular ordering of vertices according to their awcks, it is convenient to normalize awck(i) by dividing it by the sum mwck: 0 1994 American Chemical Society
J . Chem. In$ Comput. Sci., Vol. 34, No. 3, 1994 535
EXTENDED CONNECTIVITY AND EIGENVECTOR COEFFICIENTS pk(i) = awck(i)/mwck
First case,
A1
> 1AnI
Therefore
n
0