960
J. Chem. In& Comput. Sci. 1994,34, 960-961
Bit-tuple Notation for Trees Wolfgang R. Miiller, Klaus Szymanski, and Jan V. Knop Computer Centre, The Heinrich Heine University, D-40225Diisseldorf, The Federal Republic of Germany Nenad TrinajstiC’ The Rugjer BdkoviC Institute, P.0. Box 1016, HR-41001 Zagreb, The Republic of Croatia Received February 7, 1994’ A bit-tuple notation, based on the N-tuple concept, is introduced for trees. Recently in this Journal Riicker and Riickerl stated in their article on counts of all walks as atomic and molecular descriptors, “As far as we know, no system for the unique but simple graph-theoretical description of atomic environments has been proposed, not even in treegraphs .... If such a system does exist, the code for a single atom would suffice for reconstruction of the whole tree, and it would be the natural basis for a substructure search system for atoms in tree molecules.” However, we introduced such a system for trees in 1981 in this Journal,* namely, the N-tuple code for (rooted) trees. A rooted tree is a tree with one selected vertex, its root.’ Therefore, a unique description of a rooted tree is at the same time the description of the atomic environment of its root. In our 1981 paper, we define for a rooted tree with N vertices (and hence with N - 1 edges) its code as a sequence of N nonnegative integer numbers by recursion. To construct the code for a given rooted tree, we first eliminate the root vertex and all of its M incident edges, getting M subtrees with the neighbors of the former root as new roots. Then, we construct the codes for these rooted subtrees and append them in reverse lexicographic order to the 1-tuple (M) to get the code for the given rooted tree. As the subtrees become smaller for every level, the recursion will definitely terminate with cases where M = 0, Le., at the minimum (rooted) tree with only one vertex, which by the same rule gets the code 0. The lexicographic ordering makes this code unique, and obviously the rooted trees are easily reconstructed. When representing very large graph-theoretical (rooted) trees by N-tuples, there is a practical problem of storage size for the integeres composing the N-tuple. If the storage size is chosen too large, there may be plenty of wasted room, but if chosen too small, this may limit the size of the trees which are otherwise representable. Clearly, this is no problem for chemical trees since the valencies of their vertices are strictly limited. For a general graph-theoretical tree the following notation may help. The integers which compose the N-tuple and whose sum is N - 1 are replaced by a sequence of 2N - 1 bits of which N - 1 are equal to unity and N are equal to zero. This is done in the following way: Each integer k in theN-tuple is replaced by k + 1 bits of which the first k are 1 and the last is 0, and these bit sequences are concatenated. Obviously, the N-tuple is straightforwardly reconstructed by counting the 1-bits up to the next 0-bit, the sum N - 1 of the integers implies the
* Abstract published in Aduance ACS Absrracrs, June 1, 1994. oO95-2338/94/ 1634-0960504.50/0
A givm naled free with a rw( X is analyzed as stmbom below:
-
o-l&:-M 0 I
0 I
(F-x..
.: . x.-I
x.
,.
.. . x 4
,
..x
->
->
x o
X
The N-tuple comspondinp the Wed tree is synthesized (tuple8 of subaees are ordered lexicographily) as followa:
The N-tuple mde:
3
2
andthemrqmmdingbitrequcnce: Ill0 110
1
0
0
1
0
0
10
0
0
IO
0
0
The gamntcdbit-tuple may be imcrprrtedas the inttga number: 111011010001000
O f t t t
o
+
o
O
f
*
1
*
*
0
t
+
*
0
*
1
t t
0
t +
f
1
1
+
* *
o
+ 1 + 1
f
t1
t
1 2 4 8 16 32 64 12 8 256 512 1024 2048 4096 8192 16384
= 30344
or as the frsaion:
1
f
*
t l
t t
t t
+
t t
t t t
+
+ +
1 o
f
1
+
1
f
0
f
1
f
0
0
f
*
0
1
f
0
+
0
f
0
f
0.5 0.25 0.125 0.0625 0.03125 0.015625 0.0078125 0.00390625 0.001953125 0.0009765625 0.00048828125 0.000244140625 0.0001220703125 0.00006103515625 0.000030517578125
D
0.926025390625
Figure 1. Computation of the N-tuple for a given rooted tree, the constructionof the related bit-tuple,and its conversionintothe integer number and the fraction.
total of N - 1 1-bits, and the count N of members in the N-tuple leads to NO-bits. We propose to name this code the “bit-tuple” notation for trees. The number of bits is small enough to allow moderately sized rooted trees to be represented by a single number, if we Q 1994 American Chemical
Society
BIT-TUPLE NOTATION FOR TREES
N-tuple coder (soned Icxicognphically): 1 : 1' ! 2' : 2 : 5' : 5 : 3 : 3':
6 : 8': 6' : 8 : 4' : 4 : 9 : 7' : 7 :
111414iOC2003311C110110 11142301420011311001130 12414200110110311011000 1i4200141~0110113110033 142001420C1101100110110 14200200141101101101103 214142002C0001101101100 2142001420CllC110011000 i4~420020000llC11011010 2420~1101100420C1101100 i420C1420011C1100110310 24202200004110110110110 3414203110110011C110300 3420~1411011011011003CO 51423021000110110110110 52001420011011001101100 52c0ic0141101101101103c
Bit-tuple notations: 1 : 1' : 2': 2 : 5' : 5 : 3 : 3': 6 : 8' : 6' : 8 : 4' : 4 : 9 : 7' : 7 :
:c10::11110101:1101100311033001c1101010010103 :01010111101103010111101100010:30101000101000 1311c111101011110110001010010100010l001010000 :0:10111:011000101:1101c10010:0010100101~0000 131111c11c0010111101100010100101000131001c100 L31111011000110301c11l101010c1010010100101000 11?10111101c111:31100011000001010c10100101000 1:01c1111311000101111011000101001010001010090 Llcl1llolol:llollooollooooololoololcololoolo3 L~c11:1011000:0100101000111l011300:01~0101000 110111101100010111101l00010100101000101003100 11311110113001100000111101013010100101001010c 111311110101111011000:0100101000101001c100000 1110111101103013111101010c:010010:00101000000 :11110101111011000110000010100101001010010100 111110110001c11110113001010010100010100101000 11:11c113~3110001c1:1101010010100l01001310000
Bit-tuples intcrpnted as integers or fractions:
_23618422655528 ___
1 .. 7-4617860555412 0.671259970073720069 .......~~ _
1' : 2' : 2 : 5' : 5 : 3 : 3': 6 : 8': 6': 8 : 4' : 4 : 9 : 7': 7 :
25245487630928 25246611707040 26051041600148 26051322619176 29643535066408 29644659266640 30613210043556 30617297143080 30617706844484 30617780245140 32898789217440 32901037369664 34491909952148 34509897155880 34510459193936
0.671275945919887818 0.717519913875094062 0.717551862039726984 0.740415134718773516 0.740423121759931746 0.842519940147440138 0.842551891839775635 0.870079760589874240 0.870195922944958511 0.870207567359216228 0.870209653531333060 0.935039827750188124 0,935103724079453968 0.980319042359610648 0.980830269437547031 0,980846243519863492
Figure 2. Illustrativeexample from a paper by Ivanciucand Balaban (equivalent vertices are labeled with the same number). Reprinted with permission from ref 4. Copyright 1992 Baltzer.
interpret the bit sequence as an integer binary number or as the binary digits behind the binary point. The latter way of
J . Chem. Inf. Comput. Sci., Vol. 34, No. 4, 1994 961
storing contains all necessary information for a reconstruction because of the self-terminating character of the N-tuple code (the last position in the N-tuple is the first position where the sum of the numbers up to this position is less than the position itself, so that appended zeros do not cause any problem). This quality would also allow one to omit all trailing zero bits, if equal code length for equal tree size is not mandatory. As the last bit is always zero and for all but the trivial tree without edges the first bit is always 1 and the last two bits are always zero, there is a chance of squeezing the bit-tuple notation a little when short of available bits. In Figure 1 we show how to compute the N-tuple of arooted tree and the corresponding bit-tuple and how to convert the bit-tuple into the integer number and the fraction. The efficiency of the bit-tuple notation is perhaps best documented by the fact that including the next higher number of vertices requires only two additional bits; Le., the range of notation values must only be enlarged by a factor of 4, when the total number of rooted trees to be counted grows by a factor of more than 2.5 for vertex counts above 10 (with increasing tendency, e.g., more than 2.9 above 80). Figure 2 displays an application of the bit-tuple notation to two difficult branched trees with 23 vertices taken from Ivanciuc and Balaban4 (and discussed also by Rucker and Riicker * ). ACKNOWLEDGMENT
N.T.was supported in part by the Ministry of Science and Technology of the Republic of Croatia through Grant 1-07159. We are thankful to Dr. Christoph Riicker (Freiburg, FRG) for constructive comments on the early version of this paper. We also thank the reviewers for helpful comments. REFERENCES AND NOTES (1) Riicker, G.; Riicker, Ch. Counts of All Walks as Atomic and Molecular Descriptors. J. Chem. Inf, Comput. Sci. 1993, 33, 683695. (2) Knop, J. V.; Miiller, W. R.; JeriEeviC, TrinajstiC, N. Computer Enumeration and Generation of Trees and Rooted Trees. J . Chem. In$ Comput. Sci. 1981, 21, 91-99. See also: TrinajstiE, N.; NikoliC, S.; h o p , J. V.; Miiller, W. R.; Szymanski, K. Computational Chemical Graph Theory: Characterization, Enumeration and Generation of Chemical Structures by Computer Methods; Simon & Schuster: New York, 1991; Chapter 3. (3) TrinajstiC, N. Chemical Graph Theory, 2nd revised ed.; CRC Press: Boca Raton, FL, 1992; p 13. (4) Ivanciuc, 0.;Balaban, A. T. Nonisomorphic Graphs with Identical Atomic Counts of Self-Returning Walks: Isocodal Graphs. J. Math. Chem. 1992, 1 1 , 155-167.
e.;