Compact numeric alkane codes derived from IUPAC nomenclature

Aug 1, 1991 - Coding Chemical Trees with the Centered N-tuple Code. Pierre Hansen , Brigitte Jaumard , Catherine Lebatteux , and Maolin Zheng. Journal...
0 downloads 11 Views 801KB Size
J. Chem. Inf Comput. Sci. 1991, 31, 417-422

time for a graph of size n. If a chemical structure of size n is transformed into a usual graph and the algorithm is applied, the total processing time becomes

ckaknk + ck-'ak-lnk-' + *.

+ 0 0 + dln + do + b

where b denotes time required for subroutine call. However, as the unique naming algorithm for the graphs of bounded valence is too complicated to implement (it is based on the group theory) and the degree of the polynomial is large, it is not considered to be practical. For almost all inputs occurred in practice, such algorithms as the Morgan algorithm seem to work much more efficiently. What we want to point out here is that there is a possibility that more practical and efficient algorithms for unique naming of graphs will be found. Once such an algorithm is found, it can be applied to stereochemically unique naming by using the results of this paper. The transformation method can be applied not only to unique naming but also to substructure matching, because transformed graphs are locally ismorphic if and only if original chemical structures are locally and stereochemically isomorphic. Unfortunately, no polynomial time algorithm is known for subgraph matching even if graphs are restricted to the ones of bounded valencess However, once an efficient algorithm for subgraph matching is found, it can be directly applied to stereochemically substructure matching. CONCLUSION A method which transforms stereochemical structures into graphs (structures which do not have stereochemical information) is presented. The transformation is very simple, and two structures are transformed into isomorphic graphs, if and only if they are stereochemically isomorphic. By using the method, graph algorithms for usual graphs such as unique naming algorithms and subgraph matching algorithms can be applied to stereochemical structures. That is, when an efficient algorithm for usual graphs is found, it can be applied to stereochemical structures. A polynomial time algorithm for

417

stereochemically unique naming is implied as an example. ACKNOWLEDGMENT

I thank Prof. Ohsuga of the University of Tokyo for giving me a chance to study computer application to chemistry. Also, I would like to express my gratitude to Prof. Sasaki and Dr. Funatsu of Toyohashi University of Technology for their valuable discussions and suggestions on chemical information processing. REFERENCES AND NOTES (1) Aho, A. V.; Hopcroft, J. E.; Ullman, J. D. The Design und Anulysis of Computer Algorithms; Addison-Wesley: Reading, MA, 1974. (2) Ahtsu, T.; Sum!, E.; Ohsuga, S.A. Logic Based Approach to Expert Systems in Chemistry. Know/. Based Sys. (in preparation). ( 3 ) Dubois, J. E. French National Policy for Chemical Information and the DRAC System as a Potential Tool of This Policy. J . Chem. Doc. 1973, 13. 8-13. (4) Furer, M.; Schnyder, W.; Specker, E. Normal Forms for Trivalent Graphs and Graphs of Bounded Valence. Proc. ACM Symp. Theor. Compur. 1983, No. IS, 161-170. (5) Garey, M. R.; Johnson, D. S.Computers and Inrroctubility; Freeman: San Francisco, 1979. (6) Hendrickson, J. B.; Toczko, A. G. Unique Numbering and Cataloguing of Melecular Structures. J . Chem. In/. Compur. Sci. 1983, 23, 171-177. . . - .

(7) Kudo, Y.; Sasaki, S.Principle of Exhaustive Enumeration of Unique Structures Consistent with Structural Information. J . Chem. In/. Compur. Sci. 1976, 13, 43-49. (8) Luks, E.M. Isomorphism of Graphs of Bounded Valence Can Be Tested in Polynomial Time. Proc. IEEE Symp. Foundur. Compur.Sci. 1980, NO.21, 42-49. (9) Morgan, H. L. The Generation of a Unique Machine Description for Chemical Structures-A Techenique Developed at Chemical Abstracts Service. J . Chem. Doc. 1965, 5, 107-113. (10) Petrarca, A. E.; Lynch, M. F.; Rush, J. E. A Method for Generating Unique Computer Structural Representation of Stereoisomers. J . Chem. Doc. 1967, 7, 154-165. (1 1) Randic, M. On Canonical Numbering of Atoms in a Molecule and Graph Isomorphism. J . Chem. InJ Compur. Sci. 1977,17. 171-180. (12) Sussenguth, E. H. A Graph-TheoreticAlgorithm for Matching Chemical Structures. J . Chem. Doc.1965,6, 36-43. (13) Wipke, W. T.; Dyott, T. M. Simulation and Evaluation of Chemical Synthesis-Computer Representation and Manipulation of Stereochemistry. J . Am. Chem. SOC.1974, 96, 4825-4834. (14) Wipke, W. T.; Dyott, T. M. Stereochemically Unique Naming Algorithm. J . Am. Chem. Soc. 1974, 96,4834-4842.

Compact Numeric Alkane Codes Derived from IUPAC Nomenclature SCOTT DAVIDSON* Computer Data Systems, Inc., One Curie Court, Rockville, Maryland 20850 Received January 18, 1991 A reversible binary coding scheme for storing and ordering all alkane isomers through Czl as 32-bit integers is described. The method is derived from a modified set of IUPAC rules formerly utilized to cite side-chain names by increasing complexity (Davidson, S.J. Chem. Inl. Comput. Sci. 1989,29, 151-5). The ordering enables construction of a bitwise tiebreaker series in which the number of bits assigned at each step is determined by the remaining choices. The compactness of the resulting codes compares favorably with previously reported graph-based codes. Manual encoding/decoding is not difficult because bit fields are small, and the logic is based upon already familiar considerations of chain sizes, lengths, and locants. INTRODUCTION Many numeric codes for alkanes have been reported in the literature. However, only a few have proved to be both unique and reversible. Gordon and Kennedy' developed a compact, ordered integer code from combinatorial equations related to enumeration of rooted trees, ordering alkanes by increasing chain length. Decoding is obtained by iterative solution of the *Addraraxnspondencc to 240 Manor Circle No. 2, T a k a Park, MD 20912. 0095-2338/91/1631-0417$02.50/0

same equations. Knop et aI.* designed an "N-tuple" code for alkanes consisting of a string of N digits for an N-carbon alkane. The string represents a maximal sequence of the number of uncounted bonds at each carbon in a traverse of the longest unexplored path from a most substituted carbon, then backtracking to visit all other carbons in that path. Randi$ has extended this code to polycyclic structures and has reviewed other codes for comparison. A numeric code derived from standard nomenclature and ordered by size would have the advantage of being a more 0 1991 American Chemical Society

418 J . Chem. In& Comput. Sci., Vol. 31, No. 3, 1991

DAVIDSON Scheme I

Chart I

c-c

I c-c-c-c-c-c-c 1

2

3

4

I

5

6

1

c-c-c-c-c-c-c

IUPAC name

I

I


OI

-=r,

12-*e q r a u p s

!-*e

If

C P l X i

RETURN

EKD 6L14CTIOV ICETI h), la 1 I f hi,lo then n x t f u = l U T I L O C Z I h i - l o I ! + l else n x t f w = O i n e x ' I f nxtfw>@ then READ nxtfu bits as IGET else IGET=O IGET=IGET+lo RETLRI ELiD

field width1

-

NOTE: To handle nested complex side chains, each subroutine argument would have to be stacked by level: arg ARG(Iv1). The first case by size is the C23 alkane 7-(1tert-penty1pentyl)tridecane. REFERENCES AND NOTES (1) Gordon, M.; Kennedy, J. W. SIAM J. pl. Murh. 1975,28,376-398. ( 2 ) Knop, J. V.; Miiller, W. R.; JerikviE,?; TrinajstiE, N. J . Chem. In/. Comput. Sci. 1981, 21, 91-99. (3) Randit, M. J. Chem. In/. Comput. Sci. 1986, 26, 136-148. (4) Read, R. C. J. Chem. In/. Comput. Sci. 1983, 23, 135-149. (5) Davidson, S.J . Chem. In/. Comput. Sci. 1989, 29, 151-155. (6)

SUBROUTINE S I U P L E i ~ , C . , L , ~ n a x s m I I 11. Get Butyl, propyl and ethyl side chains: "make change' i LEUBU141/4,3,3,2/; LENPR12113,Zi (length o f n , s , i , t B u : n,iPr! s ~ m = l n o t~ ~ 1 x lpossible 1 o n main only! lohaIf=INTI1Lt11/21 l l a s x l o c for 1 n o s t cpl\ s i d e c h a i n 1 Do -=mexsml,2,-1 while C,)1 maxgps=INTIC~lslzeI l m a x alkyl groups o f this s i z e 1 ngps=lGET!maxgps.01 ( a c t u a l $ 1

= type

End1 Do ngps 1 C,:C,-"gp"*S17.e lleft f o r n e x t S I Z P I EndlDo S L Z P I 1 2 . Locate methyl groups' nme=C.: n x t f ~ = P Ifleld uidthl, 11 c p l r c h e n l a l o r = l else i r l o c = 2 i f s y m then COIPL'TE s y m Isfter Et added s b o b e l I f n m e = I k s ~ mthen !uLgs=lohalf else W . L - 1 Do while C,,O 6 hiloc,loloc I f n m e l 4 o r C s = l then method=] e l s e method;? I f method=l then lindii He I o c a n t ~ I nhere=l. ~ c = I G E T l h i l o c , l o l a c i h i l o c = m e l o c G e d u c e p o s l r ~ o nu n c ~ r t a l n f ) , Else ,method=> position C o u n t 0-21 nhere=ICE'Tfnrtfw,01 I n x t f w bits read1 I f n x t f * = I then W i n h e r e t l ' n n t f v = 2 I 1 or 2 here, 0 p r e ' I r l s e 1 1 "here-0 then hiloc=hiloc-l in on^ here, n e x t 1 e l s e i f nhere=l then "here=@: qrtfk-1 i n o n e here, : o r 2 n ~ h else -="here-I I 1 o r 2 here1 W = h i Ioc-l

IUPAC. Nomenclurure of Organic Compounds;Pergamon Press: New

York, 1979; pp 10-11. (7) IUPAC. Nomenclature of Organic Compounds; Butterworths: London, 1958; pp 8-9. (8) Morgan, H. L. J . Chem. Doc. 1965, 5, 107-113. (9) Lozac'h, N. Angew. Chem., Int. Ed. Engl. 1979, 18, 887-899.

Isocodal and Isospectral Points, Edges, and Pairs in Graphs and How To Cope with Them in Computerized Symmetry Recognition GERTA RUCKER and CHRISTOPH RUCKER* Institut fur Organische Chemie und Biochemie, Universitat Freiburg, Albertstrasse 21, D-7800 Freiburg, FRG Received April 3, 1991 It is demonstrated that in certain graphs isospectral edges and pairs exist, in analogy to the well-known isospectral points. A pair is any relationship between two vertices (an edge is thus a special kind of a pair), and isospectral pairs are pairs which, when arbitrarily but identically perturbed, always yield isospectral graphs. The significance of isospectral points, edges, and pairs is that computer programs for symmetry perception and for graph isomorphism testing tend to encounter difficulties when processing graphs containing such features; they tend to take isospectrality for equivalence by symmetry. It is shown how in the authors' programs TOPSYM and MATSYM these difficulties are overcome by using the newly developed "class matrix procedure". INTRODUCTION, DEFINITIONS, AND EXAMPLES Recognition of constitutional (topological) symmetry in (molecular) graphs is of major interest for any molecular structure processing task, and ever more powerful computer programs for this purpose are still being developped, see refs 1-3 and references cited therein. For example, Balasubramanian et al. recently used their vertex partitioning program to predict the number and intensity ratios of NMR signals4 Since their program does not partition the pairs of vertices, 0095-2338/91/1631-0422$02.50/0

they are not able to predict the number of coupling constants. A pair is a defined relation between two vertices, either a long-range relation or a one-bond relation (edge). For an illustration of the term pair see Figure 1.5 We recently developped the program TOPSYM (as an aid in the machine generation of IUPAC names for polycyclic compounds6) that partitions both the vertices and the pairs of vertices in a graph into equivalence classes.' The program relies primarily on different entries in the higher powers of 0 1991 American Chemical Society