An Effective Topological Symmetry Perception and Unique Numbering

Metrics for measuring distances in configuration spaces. Ali Sadeghi , S. ... Isomer Generation: Semantic Rules for Detection of Isomorphism. István ...
2 downloads 0 Views 104KB Size
J. Chem. Inf. Comput. Sci. 1999, 39, 299-303

299

An Effective Topological Symmetry Perception and Unique Numbering Algorithm Zheng Ouyang,† Shengang Yuan,*,† Josef Brandt,‡ and Chongzhi Zheng† Laboratory of Computer Chemistry, Chinese Academy of Sciences and Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences 354 Fenglin Lu, Shanghai 200032, P.R. China, and Institut fur Organische Chemie und Biochemie, Technische Universitat Munchen, Lichtenbergstr. 4, D-85747 Garching, Germany Received May 26, 1998

Determination of equivalence classes of atoms in molecules and the unique numbering for the molecular graphs are of major interest for many structure processing tasks and many programs have been reported for this purpose. Most of them were based on the use of graph invariants, but such methods reportedly failed to give correct partitioning for certain structures and the only theoretically rigorous method is based on atom-by-atom matchings1 which was considered to be computationally impractical. In order to avoid the failures of partitioning and the time-consuming atom-by-atom matching, on the basis of a profound analysis on the mechanism of Morgan algorithm, this work proposed two improvements for the original Morgan algorithm. The first improvement is to avoid the oscillatory behavior of Morgan algorithm. The second improvement referred to as single-vertex Morgan algorithm, is to decompose the Morgan algorithm into single-vertex processing. By incorporating these improvements, an effective topological symmetry perception and unique numbering algorithms were devised. The high performance of these algorithms is demonstrated with some graphs that are difficult to partition. INTRODUCTION

Recognition of topological symmetry in molecular graphs is of major interest for many graph processing tasks. The only theoretically rigorous method to derive the equivalence partitioning is atom-by-atom matching,1 but it is so timeconsuming that much effort has been aimed at speeding up the processing. To circumvent the computational burden, most graph isomorphism (automorphism) algorithms use heuristics that are based on graph invariants to potentially reduce the possible permutation of atom-by-atom matching of the vertexes. However, no sufficient graph invariant (or graph invariant ensemble) is known to establish isomorphism, or the equivalent problem, automorphism partitioning. Since Morgan’s work,2 extended connectivity (or sum of the connectivity of adjacent vertexes) has been used extensively as the graph invariant by such iterative vertexclassification methods for the automorphism partitioning. On this basis, Morgan-type algorithms were raised. Although these algorithms are efficient for numbering many molecular structures, some authors3-5 indicated that (1) these algorithms frequently express an oscillatory behavior, which makes the ending condition unreliable, and in the worst case they fail to give the automorphism partitioning; and (2) the algorithms could possibly assign the same extended connectivity (EC) values to nonequivalent vertexes, even in very simple graphs such as graph 1 in Table 1, because the information used to distinguish nonequivalence might be lost during each summation step. In the present work, an improved morgan algorithm (IMA) is devised to avoid the oscillatory behavior completely. On * To whom correspondence should be addressed. † Chinese Academy of Sciences. ‡ Technische Universit Munchen.

the basis of analyzing the nature of Morgan algorithm, a new concept of single vertex Morgan algorithm (SVMA) is described to enhance the ability of discriminating nonequivalence. We then use IMA and SVMA on topological symmetry perception. Finally, we find another new application of IMA used to create different numbers for those topological symmetrical vertexes and present a complete unique numbering algorithm. DECOMPOSITION OF MORGAN ALGORITHM

The nature of Morgan algorithm is characterized by the simultaneous summation of EC values for all vertices in a graph. Consequently, Morgan algorithm can not avoid the loss of information used to distinguish the topologically nonequivalent vertexes during the summation. It is the most basic reason that causes the counterexamples of Morgan algorithm. To prove this point, we decompose Morgan algorithm into discrete summations for each vertex in a graph. We set the initial EC value of a vertex as 1, whereas the ECs of all the others were set to 0, then we apply Morgan algorithm. We call the algorithm carried out in such a way an SVMA. We demonstrate the SVMA by the example presented in Table 1 together with Morgan algorithm for comparison. After all five atoms are processed by SVMA, respectively, the summation of EC values of each atom is the same as the result that we obtained by an SMA. Having tested many molecular graphs, we found the result of Morgan algorithm can always be fulfilled by SVMA on each atom. We find that the same EC values for different vertexes may have different evolving procedures. For instance, vertex 3 and vertex 4 finally had the same EC values in Morgan algorithm, but the EC value 4 of vertex 3 was made from 0,

10.1021/ci9800918 CCC: $18.00 © 1999 American Chemical Society Published on Web 01/29/1999

300 J. Chem. Inf. Comput. Sci., Vol. 39, No. 2, 1999

OUYANG

ET AL.

Table 1. The Results of Graph 1 Treated with SVMA in Comparison with the Morgan Algorithm

Figure 1. Some counterexamples belong to Class 1.

Figure 2. An example of IMA. Table 2. Comparison of Performance of Morgan Algorithm and IMA for Automorphism Vertex Partitioning of Graphs in Figure 1 Morgan algorithm

0, 3, 0, and 1, whereas the same EC value of vertex 4 was the summation of 1, 1, 0, 2, and 0. This observation indicates that the failure of Morgan algorithm is due to the simultaneous summation of the EC values. On the other hand, from the evolution of EC value of vertexes 3 and 4, we can see oscillatory behaviorsa vertex oscillatorily belongs to different equivalence classes. For example, in the first iteration, vertexes 3 and 4 belong to the same class. In the second iteration, they belong to different classes. In the third iteration, they belong to the same class again. Although we know that introducing other properties of vertex and bond into the Morgan algorithm because the initial weight will enhance its discriminating ability,6,7 we always neglect it’s side effect; oscillatory behavior will be more serious. COUNTEREXAMPLES OF MORGAN ALGORITHM AND SOLUTIONS

After investigating the evolution of EC value on a large number of graphs, we divided the counterexamples of Morgan algorithm into class 1 and class 2 according to whether there was an oscillatory behavior. Figures 1 and 3 provide some examples. To avoid oscillatory behavior, we improved Morgan algorithm in two aspects. First, we added each vertex connectivity into its EC value. This modification was simple, but it was really effective; this modified Morgan algorithm can perceive all automorphism equivalence classes for all

graphs

NDCN

times of iteration

1 2 3 4 5 6 7 8 9 10

3 5 4 4 5 7 9 6 8 3

2 2 2 2 2 3 3 2 4 1

IMA NDCN

times of iteration

4 6 5 5 6 8 12 7 9 4

2 3 3 3 3 3 4 3 4 2

of the other graphs in Figure 1 except graph 10. Second, we proposed an IMA to completely avoid the oscillatory behavior. IMA. (1) Set the initial EC of each vertex to the number of its non-hydrogen neighbors. (2) Classify the vertexes according to their EC values in an ascending order, and the order number is set as class number (CN) of each vertex. (3) Count the number of different CNs (NDCN). (4) Set the new EC of each vertex to the sum of the ECs of its non-hydrogen neighbors and its own EC. (5) Considering CN as the basic class number, continue to classify the vertexes possessing the same CN values by the new EC in ascending order, and get a class number for each vertex, which is set as new CN. (6) Count the number of different new CNs (NDNCN). (7) If NDNCN is not greater than NDCN, go to step 9. (8) Set the EC of each vertex to its new EC. Set the NCN of each vertex to its new CN. Set NDCN to NDNCN. Go to step 4. (9) The process is complete. The evolution of the IMA process is shown in Figure 2 by using molecular graph 2 as an example. In IMA, once two vertexes get different EC values, they will be assigned into different CNs and will never get the same CN in the subsequent steps. That means NDCN always monotonically evolves. Therefore, taking the stability of NDCN as the criterion to end the algorithm is more relevant than that used in Morgan algorithm. From Table 2, we can see that IMA possesses more powerful partitioning ability than does Morgan algorithm.

TOPOLOGICAL SYMMETRY PERCEPTION

Figure 3. Some counterexamples belong to class 2.

J. Chem. Inf. Comput. Sci., Vol. 39, No. 2, 1999 301

Figure 6. Unique numbering of graph 4.

devised. This three-step procedure is illustrated as follows by using the molecular graph 4 in Figure 5 as an example. Symmetry Perception Algorithm. Step 1: Apply IMA.

Figure 4. An example of application of SVMA.

Figure 5. Graph 4.

In Figure 3, we present some counterexamples belong to class 2 and we will demonstrate in detail that SVMA is a good tool to solve this problem. (Because adding the vertex’s own connectivity into its EC can improve Morgan algorithm, we will do so in SVMAs presented hereafter in this paper.) To enhance the understanding of SVMA, we give an example in Figure 4. Vertexes 1 and 3 in graph 3 are topologically nonequivalent, but Morgan algorithm fails to distinguish them, because every vertex has the same degree. To facilitate the use of the information revealed by SVMA, we arrange the EC values of a vertex during the processing of the SVMA in a sequence, named sum of coefficient ensemble (SCE). For example, the sequence [1, 1, 4, 12] in Figure 4a is defined as the SCE of vertex 3 in graph 3. The SCE of vertex I can also be obtained in the same way, as shown in Figure 4b. Because the first two numbers in any SCE always are (1, 1), we will omit them from SCEs hereafter. Comparing the SCE of vertex 1 [4, 12, 42] with that of vertex 3 [4, 12], we can partition them into two different classes. TOPOLOGICAL SYMMETRY PERCEPTION ALGORITHM

Although Morgan algorithm can be used as a topological symmetry perception algorithm in some cases, it is not rigorous enough to avoid many counterexamples. This is not suprising because the original motive of the Morgan algorithm was not to solve this kind of problem. Based on our work presented in the previous section, a more rigorous and effective topological symmetry perception algorithm was

Step 2: 2.1. Apply SVMA to the vertexes that belong to the classes containing more than one vertex, and record their SCE, respectively. 2.2. Considering CN as the basic class number, continue to classify the vertexes possessing the same CN values by their SCE in an ascending order. Each vertex then gets a new CN value. vertex’s number

CN after step 1

(19) (1, 10) (2, 3) (11, 12) (4, 5, 6, 7) (13, 14, 15, 16) (8, 9) (17, 18)

1 2 3 4 5

SCE

CN after step 2

[4, 10] [4, 12, 44, 158] [4, 12, 42, 146] [4, 14, 52, 192, 716] [4, 12, 44] [4, 12] [4, 10, 38, 134, 514, 1966]

2 4 3 6 5 8 7

Step 3: Set the EC of each vertex to its CN value after step 2. Apply IMA again.

The vertexes 1 and 10 in graph 4 are partitioned last. All vertexes that have the same final CN values are then topologically symmetrical. UNIQUE NUMBERING ALGORITHM

If the graph does not contain topologically symmetrical vertexes, each vertex gets uniquely numbered at the topological symmetry perception stage. However, in most cases, we must distribute different numbers to the topologically symmetrical vertices in order to uniquely number the graph. In practical use, the unique numbering is more frequently encountered than the symmetry perception, although it constitutes the most key step in the unique numbering process. To meet the requirements for molecular graph unique numbering, we designed an algorithm which further classifies the vertexes in the same equivalence classes obtained at the symmetry perception stage.

302 J. Chem. Inf. Comput. Sci., Vol. 39, No. 2, 1999

OUYANG

ET AL.

Figure 7. Examples of various graph whose topological symmetry has been correctly perceived.

Unique Numbering Algorithm. Step 1: Arbitrarily choose a vertex from the class that possesses the largest CN among all classes containing more than one vertex. Raise the CN of this vertex and other vertexes that hold higher CN values than it does by 1. If each vertex gets different CN value, go to end. Step 2: Set the initial EC of each vertex tod its CN value. Apply IMA. Step 3: If each vertex has a unique CN value, go to end; otherwise, go to step 1.

Finally, we obtained the unique numbering of graph 4. It is shown in Figure 6.

RESULTS AND DISCUSSION

The topological symmetry perception algorithm and unique numbering algorithm were implemented in an IBM PCcompatible platform (DX-80486, 66 Hz) running under Windows 3.11. The program was written in C++ and was very short, not more than 400 lines of C++ statement. Because the IMA and SVMA constitute the core of these algorithms, the strongest point of Morgan algorithms simplicitysis reserved. There is no need to store a large amount of information to partition vertexes, because SCE is used. A testing set of molecular graphs containing more than 60 000 topological ring systems (i.e., coded only for topology but not for vertex and bond properties), whose vertexes are considered more difficult to partition, were used to evaluate the performance of the topological symmetry perception algorithm. In addition, many other kinds of graphs, including some imaginative graphs possessing high topological symmetry, also have been tested. In all cases, vertexes were correctly partitioned. Some examples are presented in Figure 7 and Table 3. These results demonstrate that the algorithms are reliable in practical use, although their theoretical rigorousness still needs to be approved. The most remarkable aspect of this work is the decomposition of the Morgan algorithm into several parallel subalgorithms that lead to the development of the SVMA. Such decomposition features the addition of discrete EC

TOPOLOGICAL SYMMETRY PERCEPTION

J. Chem. Inf. Comput. Sci., Vol. 39, No. 2, 1999 303

Table 3. Equivalence Classes of the Graphs in Figure 7 and CPU Time Needed for Perceiving the Topological Symmetry graph

n

classes

CPU timea (s)

graph

n

classes

CPU timea (s)

1 2 3 4 5 6 7 8 9 10

20 20 20 17 10 8 12 8 10 21

6 1 10 17 3 3 7 2 3 3

0.11 0.05 0.16 0.11