Exhaustive Generation of Organic Isomers. 5. Unsaturated Optical and

Apr 28, 1999 - Faculty of Chemistry and Biology, Department of Chemical Sciences, and Faculty of Engineering, Department of Information Technology, Un...
1 downloads 10 Views 107KB Size
J. Chem. Inf. Comput. Sci. 1999, 39, 475-482

475

Exhaustive Generation of Organic Isomers. 5. Unsaturated Optical and Geometrical Stereoisomers and a New CIP Subrule M. L. Contreras,*,† G. M. Trevisiol,‡ J. Alvarez,‡ G. Arias,† and R. Rozas† Faculty of Chemistry and Biology, Department of Chemical Sciences, and Faculty of Engineering, Department of Information Technology, University of Santiago de Chile, Casilla 40, Santiago-33, Chile Received July 24, 1998

This work, based on graph theory, presents a US-CAMGEC software developed for analysis, generation, and counting organic unsaturated stereoisomers with isolated or cumulated double bonds in structures that may contain chiral carbon and heteroatoms with variable valences within the molecule. A new extension to N•tuple notation was done for describing accordingly either Z, E, r, or s double bond and R, S stereoisomerism. Algorithms for transforming both constitutional isomer N•tuples to their corresponding graphs and graphs to the corresponding configurational N•tuples were developed. Computational implementation of Cahn-Ingold-Prelog (CIP) rules was developed including a new subrule for discriminating priorities to rank complex ligands already containing chiral stereocenters and having no different atomic number partial ranks. Test results for families of symmetric and nonsymmetric hydrocarbons and their monohalogenated, mono- and dioxygenated, and mono- and dinitrogenated derivatives are presented. 1. INTRODUCTION

Stereochemistry has a fundamental role in modern organic chemistry, particularly in the field of drugs and natural products. Computational organic chemistry has greatly stimulated the interest in developing software able to recognize and manage 2D and 3D molecular structures to assist in solving problems of synthesis,1 structure elucidation,2 database managing3,4 similarity,5-7 drug design,8,9 and others. Active research has been done in the organic isomer enumeration10-15 and generation16,17 field particularly in computer aided stereoisomer generation.18,19 Numerous computer systems of very different global strategies have been reported for generating and counting constitutional isomer molecular structures.20-31 Among them, CAMGEC,22 generates and counts, in an exhaustive and irredundant way, all of the constitutional or topological isomers with different combinations of double bonds and cycles starting just from a given molecular formula. Many of these systems are related to structure elucidation and can integrate some spectrum interpretation capabilities with constrained structure generation;28,31 other systems are oriented to molecular design32-35 and combinatorial chemistry.10 Development of stereoisomer modules was considered as a natural further step following CAMGEC.22 For instance, S-CAMGEC, a very specific system for stereoisomers of saturated acyclic chiral compounds was developed.23 Also specific was a system that enumerates alkane isomers making a special reference to decane stereoisomers36 and another for fluoroalkane stereoisomers.37 More general programs for generation of stereoisomers were informed38-40 and have also been a further contribution to pioneer works on stereoisomer generation systems.41-43 * E-mail address: [email protected]. † Faculty of Chemistry and Biology. ‡ Faculty of Engineering.

On the other hand, application of Cahn-Ingold-Prelog (CIP) rules to computer programs has shown to be a nontrivial task44-46 sometimes appearing unresolved priority assignments which must be solved. Proposals for a revision of CIP rules have been reported.44 In particular, the known sequence rules for properly ranking ligands and in this way for assigning corresponding stereocenter configurations briefly consist of a comparison of ligands based on atomic numbers, followed by a comparison of atomic masses (not relevant in our case). Then, the procedure is applied recursively comparing total summation of atomic numbers by levels until a difference arises. If it does not, there is no topological difference among ligands. If it does, priority is assigned to the ligand with larger atomic number. In the case of cycles and multiple bonds, they must be decomposed in terms of single bonded groups, and then previous rules are applied. Strategies such as hierarchical digraphs45 and canonical numbering methods44 have been used. Proposals for ranking ligands around a center were done for ligands already containing stereogenic units,45,47 for instance •chiral > pseudoasymmetric > nonstereogenic •cis > trans > nonstereogenic •alike descriptors pairs (such as RR or SS) > unlike descriptor pairs (such as RS or SR) •dv with higher description number > dv with lower description number for a set of interdependent stereogenic centers characterized by description vectors (dv).44 The following sections show our efforts on this subject focused to generation and counting of organic compounds which may contain heteroatoms, chiral carbon atoms, and several isolated or cumulated double bonds. Software resulting in this way was called US-CAMGEC (for Unsaturated Stereoisomers in Computer Assisted Molecular GEneration and Counting).

10.1021/ci9700586 CCC: $18.00 © 1999 American Chemical Society Published on Web 04/28/1999

476 J. Chem. Inf. Comput. Sci., Vol. 39, No. 3, 1999

CONTRERAS

ET AL.

Figure 1. US-CAMGEC global data flow.

Figure 3. Molecule I. Its graph and N•tuples: (a) canonical N•tuple; (b) extended N•tuple for the Z,r-stereoisomer; (c) extended N•tuple for the Z,s-stereoisomer; (d) extended N•tuple for the E,r-stereoisomer; and (e) extended N•tuple for the E,sstereoisomer.

Figure 2. US-CAMGEC main processes.

2. SYSTEM DESCRIPTION

The main purpose of the US-CAMGEC system is to analyze organic molecules containing double bonds (db) to verify if one or more true db-stereocenters or constitutional stereogenic centers are occurring and, in that case, exhaustively and irredundantly generate correct stereoisomers. Two types of db-stereoisomers will be considered: (a) Geometrical isomers, described normally with descriptors Z and E. The stereogenic unit in this case contains one or any odd number of adjacent (cumulated) db. The simplest case is the cis-trans isomerism of ethylene hydrocarbons. (b) Optical isomers, described here with the descriptors r and s to differentiate them from chiral isomers known as R and S. The stereogenic unit in this case contains an even number of cumulated db. The most thoroughly studied cumulenes of this type are allenes. Input and output of the US-CAMGEC system (see Figures 1 and 2) are described as follows: Input. It could be a file with one or more constitutional isomers, each one represented in linear N•tuple notation. It also could include stereoisomer molecules with chiral centers. Specifically, it could be the output of a system generator of constitutional isomers like CAMGEC,22 or the output of a stereoisomer generator system as S-CAMGEC,23 or even it could be manually generated. Output. It will be a file containing the corresponding extended N•tuples representing Z, E, r, or s stereoisomers generated over db stereocenters. The extension procedure of the original N•tuple notation for incorporating chiral atoms and db stereoisomer characteristics will be explained below through respective data structures. Global data flow

of US-CAMGEC in relation to input and output is given in Figure 1. Key data structures of the systems are represented by •N•tuple external representation of a molecule in the input text file. •Internal graph representation of a molecule for processing at the computer memory (MGraph). •Stereocenter list of molecular graph at the computer memory (SC-list). •Extended N•tuple for external representation of the generated molecule into an output text file. N•tuple external notation for constitutional isomers at the input is a canonical tree representation of a graph which depth-first search produces a linear sequence as explained before.22 Internal memory representation of molecular graphs was chosen as a graph representation; i.e., nodes and pointers linked together in a convenient way. Main processes are shown on the data flow diagram in Figure 2 and are listed below: •N•tuple input and validation. •Graph construction. •Symmetry determination. •List of potential stereocenters or stereogenic centers (SC) construction. •Potential SC•list analysis and SC determination. •Stereoisomer generation, elimination of symmetric redundant stereoisomers, final extended N•tuple creation, and its output. In Figure 3 graph and canonical N•tuple (a) and some extended N•tuples (b-e) for stereoisomers generated for a molecule with two SC are presented. Stereocenter 1, constituted by atoms 1 and 6, is designated as E1, while stereocenter 2, constituted by atoms 2, 3, and 4, is designated as E2. N•tuple notation used in this work is similar to one previously proposed.23 For atoms that are terminal atoms, a character following the stereocenter specification (E1, E2, or other) indicates stereochemical attribute. For instance character z indicates Z configuration. Atoms constituting a nonterminal part of a stereocenter, as atom 3 in Figure 3, are not provided with that character. This notation facilitates interaction with graphical interfaces and with topological indices and other calculation programs. Symmetry determination process involves construction of graph automorphism groups48 joined to a structural analysis

EXHAUSTIVE GENERATION

OF

ORGANIC ISOMERS. 5

J. Chem. Inf. Comput. Sci., Vol. 39, No. 3, 1999 477

Figure 4. SC ligands with same number and type of configurational centers and different connectivities which cannot be ranked using pre-established rules.44,45 Configuration R for carbon atoms (9) to (18) is considered.

of constitutional N•tuples and an algorithm to eliminate redundancy.49 One of the most complex and important subprocesses is potential SC•list analysis to get the real stereogenic centers list. Potential SC•list contains no terminal db, i.e., no H-ends, which are discarded right from the beginning. Analysis implies ranking of ligands of each potential stereocenter and then establishing its configuration assignment which will allow its transformation into a real generating SC. According to general organic chemistry knowledge and some recent specialized studies,44,45 there is a number of CIP rules proposed for properly ranking ligands and in this way to assign corresponding stereocenter configuration descriptors. However, this is a problem not completely solved, and these rules do not allow for properly ranking SC containing ligands such as those described in Figure 4. Let us define one ligand in Figure 4 as ligand (i) and the other as ligand (j). Analysis of them by levels will be as follows: Ligand (i)

Ligand (j)

level 1: C1 level 2: C3; C4; C5 level 3: C9, H, H; C10, H, H; C11, C12, C13 level 4: 5 × HClF

C2 C6; C7; C8 C14, H, H; C15, C16, H; C17, C18, H 5 × HClF

As it is shown both paths are equivalent from the point of view of atomic numbers, but they have different connectivity: ligand (i) presents at level 2 three carbon atoms with connectivities 1, 1, and 3; in this case each of those carbon atoms are bonded to 1, 1, and, respectively, 3 chiral carbon atoms. On the other hand, ligand (j) presents also three carbon atoms at level 2 but with connectivities 1, 2, and 2. If both ligands have the same number of identical configuration descriptors at level 3 (5 for each one, all of them R), there is no way of applying known rules44-46 for properly ranking these ligands, and, in that case, a new criterion for elucidating ligand priorities is needed. At this level then, and as a way of establishing priorities for ranking both ligands, a new concept for determining configuration values is proposed in this work. For that we have given a numerical value 1 to the S configuration and a

Figure 5. US•CAMGEC flow diagram procedure for ranking ligands. It can be applied to ligands with identical atomic numbers at each level and identical number and type of descriptors as those of Figure 4.

value 2 to the R configuration. Then, a subtotal value is calculated for each carbon atom of level 2. This subtotal value is defined as the sum of configuration values for all the chiral carbon bonded to that particular atom. In other words and as shown in Figure 5 the sum of numerical descriptors values for sons coming from the same father are considered. path i

path j

C3: 2 C4: 2 C5: 2 + 2 + 2 ) 6

C6: 2 C7: 2 + 2 ) 4 C8: 2 + 2 ) 4

As shown, for ligand (i) we have three subtotals: the first two with a value of 2 and the third one with a value 6. These subtotals are increasingly ordered and numerically considered. Finally they can form the sublevel chiral value of 226. For ligand (j) subtotals are 2, 4, and 4, and a final sublevel chiral value of 244 is obtained. So, ligand (j) ranks higher than ligand (i) because 244 > 226. The same is valid if the configuration of all chiral carbons occurring at both ligands are S type because in that case final counting is 122 > 113. On the other hand, if the five chiral carbon atoms of ligand i are all R and for the ligand j are all S, then our rule will rank ligand i higher than ligand j, in complete accord with existing rules.

478 J. Chem. Inf. Comput. Sci., Vol. 39, No. 3, 1999 Table 1. Binary Matrix Used for Controlling Stereoisomers Generation Process for a Molecule Having Three SC SC1

SC2

SC3

swapping indication

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

original molecule ligand swapping of SC3 ligand swapping of SC2 ligand swapping of SC2 and SC3 ligand swapping of SC1 ligand swapping of SC1 and SC3 ligand swapping of SC1 and SC2 ligand swapping of SC1, SC2, and SC3

In that way, a new subrule was implemented for solving this situation, and in Figure 5 a general ranking procedure implemented here for ligands having the same number and type of SC but different connectivity is given. The new subrule proposed in this work to be added to the previously reported rules should be described as follows: If two ligands contain identical number of stereocenters of the same configuration (same descriptors) and haVe different connectiVities, a subleVel chiral Value is calculated at the leVel of different connectiVities. For that, to configuration descriptor R (also Z and r) a numerical Value of 2 is assigned and to configuration descriptor S (also E and s) a numerical Value 1 is assigned, and a subleVel Value is calculated for each of the nodes of this leVel. These obtained Values are arranged in increasing numerical order to get the chiral subleVel Value of the ligands. Then the subrule: “Higher subleWel chiral Walue precedes lower one” is applied. Once stereogenic centers are validated, the stereoisomers generation process has to be accomplished. This consists basically of swapping a pair of ligands at one of the SC terminal places. Different molecules (stereoisomers) are generated when different combinations of SC swapping are carried out. This point involves two problems: (i) decision of which SC to swap and (ii) swapping of SC ligands. Fortunately ligand swapping is a single pointer swapping computer operation. For the first problem, it is considered that each of the n occurring SC can be found in one of two configurations. For the entire molecule, that could be visualized as a binary vector of length n with values 0 or 1 corresponding to the configurations R or S, Z or E, and r or s, as required for each of the actual n SC, where a value of 1 means “swap” and 0 means “do not swap”. Initial values for the binary vector are chosen for all the SC having configuration 0. The next value and the following ones are obtained by adding 1 to the immediate precedent vector. In that way, only two molecular graphs are kept in memory each time: the actual molecular graph and the next one to be generated. Table 1 shows a generation matrix for a molecule having three SC. Each row corresponds to a binary control vector state for the generation of one of the 2n possible stereoisomers by swapping a pair of ligands at the SC specified with number 1 in the matrix. As explained before and as is shown in Figure 2 redundancy effect occurring for symmetric molecules is properly managed. USCAMGEC is not addressed to aromatic compounds. It can work with molecules that contain cycles; however, it does not consider at the moment SC constituted only by atoms that belong to a cyclic structure.

CONTRERAS

ET AL.

Table 2. Acyclic Hydrocarbon Stereoisomers Generated by US-CAMGECa no. no. with uncorr real formula C db tb CI SC SC IEC 2EC 3EC SI isomers CnH2n CnH2n-2 CnH2n-4 CnH2n-4 CnH2n-6 CnH2n-6

5 6 7 5 6 7 5 6 7 5 6 7 5 6 7 5 6 7

1 1 1 2 2 2 3 3 3 1 1 1 4 4 4 2 2 2

1 1 1

1 1 1

5 13 27 6 16 44 2 10 32 4 12 34 1 3 15 1 7 29

4 9 18 4 10 23 2 6 17 3 8 21 1 3 9 1 5 15

1 4 9 2 6 21

1 4 9 2 5 18

4 15 1 4 13

4 13 1 4 13

6

6

2 14

2 13

6 17 36 8 24 71 2 14 51 5 16 47 1 3 21 1 9 45

1 3 2

1

6 17 36 8 23 70 2 14 51 5 16 47 1 3 21 1 9 45

a Abbreviations: no. C: number of carbon atoms; db: number of double bonds; tb: number of triple bonds; CI: number of constitutional isomers generated by CAMGEC; no. SC: number of isomers without any stereocenter; with SC: number of isomers that contain one or more stereocenters; 1EC, 2EC, and 3EC: number of isomers that contain, one, two, or three stereocenters, respectively; uncorr SI: uncorrected stereoisomers including isomers without any stereocenter; and real isomers: number of irredundant total isomers.

Table 3. Acyclic Monohalogenated Stereoisomers Generated by US-CAMGECa formula

no. no. with uncorr real C db tb CI SC SC IEC 2EC 3EC SI isomers

CnH2n-1X 5 6 7 CnH2n-3X 5 6 7 CnH2n-5X 5 6 7 CnH2n-5X 5 6 7 CnH2n-7X 5 6 7 CnH2n-7X 5 6 7 a

1 1 1 2 2 2 3 3 3 1 1 1 4 4 4 2 2 2

1 1 1

1 1 1

21 56 149 20 69 228 7 37 165 14 50 166 1 9 60 3 25 134

12 9 9 30 26 26 76 73 73 9 11 10 1 26 43 36 7 75 153 122 31 4 3 3 14 23 20 3 47 118 88 29 7 7 7 24 26 26 77 89 89 1 5 4 4 20 40 34 6 2 1 1 10 15 14 1 42 92 77 15

1

30 82 222 33 126 443 10 66 347 21 76 255 1 13 112 4 42 256

30 82 222 33 126 442 10 66 347 21 76 255 1 13 112 4 42 256

Same as in Table 2.

3. RESULTS AND DISCUSSION

Results for exhaustive and irredundant double bond stereoisomers generation process developed in this work are shown in Tables 2-8. Hydrocarbons and their monohalogenated, mono- and dioxygenated, and mono- and dinitrogenated derivatives were used for isomer generation. Input files coming from CAMGEC22 were created for sets of acyclic constitutional isomers under different constraints over bond type. The unsaturated stereoisomer generation process was successfully checked by decodifying final extended N•tuples for studied compounds with less than 500 total stereoisomers. From analysis of results shown in Tables 2-8 the following points can be made.

EXHAUSTIVE GENERATION

OF

ORGANIC ISOMERS. 5

J. Chem. Inf. Comput. Sci., Vol. 39, No. 3, 1999 479

Table 4. Acyclic Monooxygenated Stereoisomers Generated by US-CAMGECa formula

no. no. with uncorr real C db tb CI SC SC IEC 2EC 3EC SI isomers

CnH2n-6O 5 6 7 CnH2n-6O 5 6 7 CnH2n-4O 5 6 7 CnH2n-4O 5 6 7 CnH2n-2O 5 6 7 CnH2nO 5 6 7 a

4 4 4 2 2 2 3 3 3 1 1 1 2 2 2 1 1 1

1 1 1

1 1 1

5 34 182 11 67 322 22 97 405 29 103 342 44 151 485 41 109 294

5 24 85 9 39 148 16 50 163 20 64 99 26 76 211 28 69 176

10 97 2 28 174 6 47 242 9 39 143 18 75 274 13 40 118

10 87 2 27 156 6 43 197 9 39 143 17 66 232 13 40 118

5 44 299 13 97 532 28 152 741 38 142 485 64 244 843 54 149 412

10 1 18 4 44

1

1 9 42

5 44 299 13 97 532 28 152 740 38 142 485 64 243 842 54 149 412

Table 5. Acyclic Dioxygenated Stereoisomers Generated by US-CAMGECa no. C db tb CI

CnH2n-6O2 5 6 7 CnH2n-6O2 5 6 7 CnH2n-4O2 5 6 7 CnH2n-4O2 5 6 7 CnH2n-2O2 5 6 7 CnH2nO2 5 6 7 a

4 4 4 2 2 2 3 3 3 1 1 1 2 2 2 1 1 1

1 1 1

1 1 1

30 216 1341 52 362 1966 115 614 2840 122 519 2009 234 907 3328 204 641 1946

formula

6 93 818 13 162 1090 45 341 1827 38 197 844 108 486 1970 71 252 814

6 87 682 13 153 963 43 296 1429 38 197 844 99 419 1639 71 252 814

6 133

3

9 127 2 44 1 384 14

9 67 331

36 321 2443 65 542 3310 164 1049 5519 160 716 2853 360 1527 5960 275 893 2760

4 4 4 2 2 2 3 3 3 1 1 1 2 2 2 1 1 1

1 1 1

1 1 1

9 60 326 25 134 587 37 165 664 50 166 531 69 228 725 56 149 398

9 44 172 20 84 308 28 96 309 38 116 348 46 131 368 42 105 265

16 154 5 50 279 9 69 355 12 50 183 23 97 357 14 44 133

16 140 5 48 254 9 64 300 12 50 183 22 88 313 14 44 133

14 2 25 5 54

1 9 44

1

9 76 508 30 188 916 46 244 1133 62 216 714 94 343 1170 70 193 531

9 76 508 30 188 916 46 244 1132 62 216 714 94 342 1168 70 193 531

Same as in Table 2.

Table 7. Acyclic Dinitrogenated Stereoisomers Generated by US-CAMGECa

no. with uncorr real SC SC IEC 2EC 3EC SI isomers 24 123 523 39 200 876 70 273 1013 84 322 1165 126 421 1358 133 389 1132

no. no. with uncorr real C db tb CI SC SC IEC 2EC 3EC SI isomers

CnH2n-5N 5 6 7 CnH2n-5N 5 6 7 CnH2n-3N 5 6 7 CnH2n-3N 5 6 7 CnH2n-1N 5 6 7 CnH2n+1N 5 6 7 a

Same as in Table 2.

formula

Table 6. Acyclic Mononitrogenated Stereoisomers Generated by US-CAMGECa

36 319 2438 65 541 3309 164 1046 5517 160 716 2853 359 1519 5949 275 893 2760

Same as in Table 2.

(1) The total number of stereoisomers (SI) increases with the number of carbon atoms due to the combinatorial nature of generation process. For instance in Table 5, the total number of isomers for molecules CnH2n-4O2 with three db and having 5, 6, and 7 carbon atoms was 164, 1046, and 5517 isomers, respectively. Also, the total number of isomers increases with heteroatom features, in particular, with their valences and number in the molecule, as was expected due to a higher possibility of combinations. In that way for families of hydrocarbons, monohalogenated, monooxygenated, mononitrogenated, dioxygenated, and dinitrogenated derivatives having seven carbon atoms and three db (see Tables 2-7), total isomer numbers were equal to 51, 347, 740, 1132, 5517, and 13145 for each family, respectively. (2) For a family of compounds with a constant number of carbon atoms, the total number of isomers shows a different tendency according to the number of db in the molecule: it increases up to a maximum and then decreases. For instance, for hydrocarbon molecules (Table 2) having seven carbon

formula

no. C db tb CI

CnH2n-4N2 5 6 7 5 6 7 CnH2n-2N2 5 6 7 5 6 7 CnH2nN2 5 6 7 CnH2n+2N2 5 6 7 a

4 4 4 2 2 2 3 3 3 1 1 1 2 2 2 1 1 1

1 1 1

1 1 1

116 782 4467 255 1362 6324 369 1783 7730 341 1298 4697 561 2071 7323 375 1162 3513

no. with uncorr real SC SC IEC 2EC 3EC SI isomers 96 502 2190 191 871 3538 255 996 3635 272 965 3298 374 1228 3909 290 846 2449

20 280 2277 64 491 2786 114 787 4095 69 333 1399 187 843 3414 85 316 1064

20 265 1978 63 467 2555 110 712 3460 69 333 1399 178 772 3041 85 316 1064

136 136 15 1092 1089 294 5 7362 7355 1 321 321 24 1901 1900 231 9572 9570 4 491 491 74 1 2724 2720 621 14 13151 13145 410 410 1631 1631 6096 6096 9 766 765 71 3056 3047 373 11483 11470 460 460 1478 1478 4577 4577

Same as in Table 2.

atoms with 1, 2, 3, and 4 db, the total number of isomers was 36, 70, 51, and 21, respectively. When comparing the number of isomers called “with SC” in Table 2, for the same kind of compounds the number of structures with SC are 9, 21, 15, and 6, respectively. The same tendency is observed for the other families studied. (3) Generation of symmetric stereoisomers eliminates redundancy effects. For instance, stereoisomer generation process applied to 2,5-heptadiene will generate isomers Z,Z, Z,E, and E,E eliminating the E,Z isomer because it recognizes that Z,E and E,Z are the same molecule. Table 8 shows several cases where redundant isomers were eliminated. General cases affected by this redundancy effect are those symmetrical ones with more than one SC. It can be observed a greater number of redundant isomers for molecules containing two N atoms and two O atoms. US-CAMGEC is able to exhaustively generate db stereoisomers for molecules that may contain heteroatoms, even with different valences (i.e. an S atom with valence 2 and

480 J. Chem. Inf. Comput. Sci., Vol. 39, No. 3, 1999

CONTRERAS

Table 8. Acyclic Stereoisomers Generated by US-CAMGEC: Symmetry Redundant Casesa formula CnH2n-2 CnH2n-3X CnH2n-2O CnH2n-4O CnH2n-2O2 CnH2n-4O2 CnH2n-6O2

CnH2n-1N CnH2n-3N CnH2nN2 CnH2n-2N2 CnH2n-4N2

no. C db tb 6 7 7 6 7 7 5 6 7 6 7 6 7 6 7 6 7 7 5 6 7 6 7 6 7 6 7

2 2 2 2 2 3 2 2 2 3 3 4 4 2 2 2 2 3 2 2 2 3 3 4 4 2 2

1 1

1 1

CI 16 44 228 151 485 405 234 907 3328 614 2840 216 1341 362 1966 228 725 664 561 2071 7323 1783 7730 782 4467 1362 6324

sym unsym uncorr red real CI CI SI SI isomers 4 8 5 7 6 5 14 29 54 19 14 9 28 6 7 8 11 6 22 47 86 34 36 18 53 11 16

12 36 223 144 479 400 220 878 3274 595 2826 207 1313 356 1959 220 714 658 539 2024 7237 1749 7694 764 4414 1351 6308

24 71 443 244 843 741 360 1527 5960 1049 5519 321 2443 542 3310 343 1170 1133 766 3056 11483 2724 13151 1092 7362 1901 9572

1 1 1 1 1 1 1 8 11 3 2 2 5 1 1 1 2 1 1 9 13 4 6 3 7 1 2

23 70 442 243 842 740 359 1519 5949 1046 5517 319 2438 541 3309 342 1168 1132 765 3047 11470 2720 13145 1089 7355 1900 9570

a Abbreviations: no. C: number of carbon atoms; db: number of double bonds; tb: number of triple bonds; CI: number of constitutional isomers generated by CAMGEC; sym CI: symmetric CI; unsym CI: unsymmetric CI; uncorr SI: uncorrected stereoisomers including isomers without any stereocenter; red SI: redundant symmetric stereoisomers; and real isomers: number of irredundant total isomers.

another one with valence 6 within the same molecule; in fact they are treated by the system as different atoms20), multiple bonds of different types, and cycles. Its results are completely consistent with those in the literature.30,38 Perception of db stereogenic units in a wide range of cyclic molecular structures works very well. Priority determinations according to CIP rules and assignment of SC configurations are also set correctly. In relation to cases where it is necessary to rank two ligands each one having a similar number of chiral carbon atoms with the same configuration but with a different level distribution, as in the example of Figure 4, US-CAMGEC offers a new way of solving this problem by assigning different numerical values to configuration descriptors R and S (R ) 2; S ) 1) and at the same time by defining the concept of a chiral vector. This is done by considering all the sons coming from the same parent node in the tree of the molecule (digraph; see system description and Figure 5). As explained before that problem could not be solved by applying known CIP rules which failed for this particular case where no ligand difference is found neither at the atomic number level nor for configuration descriptor pairs. Even when application of numerical values assigned to configuration descriptors will directly help to rank ligands, care should be taken when applying values so known priority rules44-47 are not violated. One of the visualized problems refers for instance to the case where one ligand has two atoms with an S configuration and another ligand has one atom with an R and one atom with an S configuration. Known

ET AL.

rules will find that the first ligand with the two S configurations ranks higher than the other ligand. Our system will find the inverse situation. Its use is recommended only for ranking ligands that have identical configuration descriptors as the case shown in Figure 4. Treatment proposed in this work for R and S configuration descriptors may be extended to other kinds of configuration descriptors like Z-E or r-s. Finally, the developed representation of extended N•tuples for db SI also containing chiral atoms could constitute an invaluable tool for 3-D molecule structure representation, its storage and retrieval, and also the calculation of topological indexes and other calculated variables, like similarity indexes, that depend on molecular structure. In addition, our system has proved to be a good tool specially for studying families of allene derivatives and their optical stereoisomers. All these points undoubtedly make US-CAMGEC a good contribution to molecular design. 4. IMPLEMENTATION ASPECTS

The program was developed on a SUN-IPX computer with SunOS 4.1.3 (28.5 Mips). Inputs to the program were files created under particular structural constraints with the help of CAMGEC22 and S•CAMGEC.23 Also, some manually created N•tuples, automatically validated by the program, were used as input in order to compare US-CAMGEC generation results with published data. The program is able to work with both canonical N•tuples like the ones coming from CAMGEC and also with extended N•tuples (noncanonical). Main outputs correspond to the extended N•tuple archives where one isomer is stored in one line of the archive. N•tuple storage is optional, and the output can be just the number of total isomers generated. Another important output corresponds to the statistical part of the generation results, where the name of the input file, date of processing, empirical formula, number of carbon atoms, number of double bonds, number of triple bonds, number of constitutional isomers generated by CAMGEC, number of isomers without any stereocenter, number of isomers that contain one or more stereocenters, total number of stereoisomers generated by the program, and number of isomers that contain respectively 1, 2, 3, or 4 stereocenters, are considered. 5. CONCLUSIONS

Useful computer software, US-CAMGEC, has been developed for generation, perception, and counting of unsaturated stereoisomers, focused on alkenes and cumulenes, and also including acyclic chiral carbons. It is based on graph theory and provided with the following capabilities: •It makes perception of double bond stereocenters correctly identifying geometrical and optical stereogenic centers. It makes exhaustive generation of both kinds of stereoisomers. •It ranks complex ligands that already contain some stereocenters on their structure. •It contains computer implementation of CIP rules. •It defines and implements a new subrule for ranking ligands not differentiated by their atomic numbers, containing a given number of stereocenters with identical configuration descriptors and different connectivity, which cannot be ranked by the existing CIP rules and their extensions.

EXHAUSTIVE GENERATION

OF

ORGANIC ISOMERS. 5

•It develops an extended N•tuple notation which could be of great help for studying and calculating molecular properties and graph invariants for many structures. •Its results are reliable. The program developed in C is easily portable to different hardware architectures. This work will constitute an important base for educational purposes and for research, especially in organic chemistry, molecular design, structure elucidation, and also combinatorial chemistry. At the moment we are beginning to extend the program to include cyclic chiral carbons for properly treating dependency and interdependency among tetrahedral and planar/helicoidal SC. ACKNOWLEDGMENTS

Financial support from University of Santiago de Chile is appreciated. Discussions with Mr. J. M. Dorado from University of Salamanca, Spain, and help in symmetry computational aspects by Eng. A. Toro are also appreciated. REFERENCES AND NOTES (1) Barberis, F. Barone, R.; Arbelot, M.; Baldy, A.; Chanon, M. CONAN (CONnectivity ANalysis): A Simple Approach in the Field of Computer-Aided Organic Synthesis. Example of the Taxane Framework. J. Chem. Inf. Comput. Sci. 1995, 35, 467-471. (2) Balasubramanian, K.; Basak, S. C. Characterization of Isospectral Graphs Using Graph Invariants and Derived Orthogonal Parameters. J. Chem. Inf. Comput. Sci. 1998, 38, 367-373. (3) Contreras, M. L.; Deliz, M.; Galaz, A.; Rozas, R.; Sepulveda, N. A Microcomputer-based system for chemical information and molecular structure search. J. Chem. Inf. Comput. Sci. 1986, 26, 105-108. (4) Contreras, M. L.; Deliz, M.; Rozas, R. Personal microcomputer based system of chemical information with topological structure data elaboration. J. Chem. Inf. Comput. Sci. 1987, 27, 163-167. (5) Sadowski, J.; Wagener, M.; Gasteiger, J. Assessing Similarity and Diversity of Combinatorial Libraries by Spatial Autocorrelation Functions and Neural Networks. Angew. Chem., Int. Ed. Engl. 1995, 34, 2674-2677. (6) Basak, S. C.; Grundwald, G. D. Molecular Similarity and Estimation of Molecular Properties. J. Chem. Inf. Comput. Sci. 1995, 35, 366372. (7) Skvortsova, M.; Baskin, I. I.; Stankevich, I. V.; Palyulin, V. A.; Zefirov, N. S. Molecular Similarity. 1. Analytical Description of the Set of Graph Similarity Measures. J. Chem. Inf. Comput. Sci. 1998, 38, 785-790. (8) Contreras, M. L.; Gonzalez, F. D.; Rozas, R. MM2 Parameterization of the Azido Group with Electron Lone Pairs. Conformational Analysis of 3′-Azidothymidine (AZT) and Structure Activity Relationship for its Analogues as Potential Anti-HIV Drugs. Bol. Soc. Chil. Quim. 1995, 40(1), 33-40. (9) Contreras, M. L.; Gonzalez, F. D.; Mun˜oz, V. C.; Rozas, R. Computer Assisted Molecular Design and Biophore Determination. Application to AIDS Drugs. Bol. Soc. Chil. Quim. 1995, 40(3), 279-291. (10) Bone, R. G. A.; Villar, H. O. Exhaustive Enumeration of Molecular Substructures. J. Comput. Chem. 1997, 18(1), 86-107. (11) Cyvin, S. J.; Wang, J.; Brunvoll, J.; Cao, S.; Li, Y.; Cyvin, B. N.; Wang, Y. Staggered Conformers of Alkanes: Complete Solution of the Enumeration Problem. J. Mol. Struct. 1997, 413-414, 227-239. (12) Brunvoll, J.; Cyvin, S. J.; Cyvin, B. N. Enumeration of Tree-like Octogonal Systems. J. Math. Chem. 1997, 21(2), 193-196. (13) Caporossi, G.; Hansen, P. Enumeration of Polyhex Hydrocarbons to h ) 21. J. Chem. Inf. Comput. Sci. 1998, 38, 610-619. (14) John, P. E.; Mallion, R. B.; Gutman, I. An Algorithm for Counting Spanning Trees in Labeled Molecular Graphs Homeomorphic to CataCondensed Systems. J. Chem. Inf. Comput. Sci. 1998, 38, 108-112. (15) Fujita, S. Pseudo-Point Groups and Subsymmetry-Itemized Enumeration for Characterizing the Symmetries of 1,4-Dioxane and 1,4Oxathiane Derivatives. J. Chem. Inf. Comput. Sci. 1998, 38, 876884. (16) Wieland, T.; Kerber, A.; Laue, R. Principles of the Generation of Constitutional and Configurational Isomers. J. Chem. Inf. Comput. Sci. 1996, 36, 413-419. (17) Molchanova, M.; Zefirov, N. S. Irredundant Generation of Isomeric Molecular Structures with Some Known Fragments. J. Chem. Inf. Comput. Sci. 1998, 38, 8-22.

J. Chem. Inf. Comput. Sci., Vol. 39, No. 3, 1999 481 (18) Agarwal, K. K. An Algorithm for Computing the Automorphism Group of Organic Structures with Stereochemistry and a Measure of its Efficiency. J. Chem. Inf. Comput. Sci. 1998, 38, 402-404. (19) Jaritz, R. The Mathematical Modeling of Space Models of Chemical Molecules. Match 1998, 37, 179-193. (20) Contreras, M. L.; Valdivia, R.; Rozas, R. Exhaustive Generation of Organic Isomers. 1. Acyclic Structures. J. Chem. Inf. Comput. Sci. 1992, 32, 323-330. (21) Contreras, M. L.; Valdivia, R.; Rozas, R. Exhaustive Generation of Organic Isomers. 2. Cyclic Structures: New Compact Molecular Code. J. Chem. Inf. Comput. Sci. 1992, 32, 483-491. (22) Contreras, M. L.; Rozas, R.; Valdivia, R. Exhaustive Generation of Organic Isomers. 3. Acyclic, Cyclic, and Mixed Compounds. J. Chem. Inf. Comput. Sci. 1994, 34, 610-616. (23) Contreras, M. L.; Rozas, R.; Valdivia, R.; Agu¨ero, R. Exhaustive Generation of Organic Isomers. 4. Acyclic Stereoisomers with One or More Chiral Carbon Atoms. J. Chem. Inf. Comput. Sci. 1995, 35, 752-758. (24) Read, R. C. In Chemical Applications of Graph Theory; Balaban, A. T., Ed.; Academic Press: New York, 1976; pp 11-60. (25) Trinajstic, N.; Jericeviv, Z.; Knop, J. V.; Muller, W. R.; Szymanski, K. Computer Generation of Isomeric Structures. Pure Appl. Chem. 1983, 55, 379-390. (26) Abe, H.; Okuyama, T.; Fujiwara, I.; Sasaki, S. I. A Computer Program for Generation of Constitutionally Isomeric Structural Formulas. J. Chem. Inf. Comput. Sci. 1984, 24, 220-229. (27) Knop, J. V.; Szymanski, K; Klasing, L.; Trinajstic, N. Computer Enumeration of Substituted Polyhexes. Comput. Chem. 1984, 8, 107115. (28) Funatsu, K.; Miyabaiyashi, N.; Sasaki, S. Further Development of Structure Generation in the Automated Structure Elucidation System CHEMICS. J. Chem. Inf. Comput. Sci. 1988, 28, 18-28. (29) Hendrickson, J. B.; Parks, C. A. Generation and Enumeration of Carbon Skeletons. J. Chem. Inf. Comput. Sci. 1991, 31, 101-107. (30) Luinge, H. J.; Van der Maas, J. H. AEGIS, an Algorithm for the Exhaustive Generation of Irredundant Structures. Chemom. Intell. Lab. Sys. 1990, 8, 157-165. (31) Bangov, I. P. Computer-Assisted Structure Generation from a Gross Formula. 3. Alleviation of the Combinatorial Problem. J. Chem. Inf. Comput. Sci. 1990, 30, 277-289. (32) Nilakantan, R.; Bauman, N.; Venkataraghavan, R. A Method for Automatic Generation of Novel Chemical Structures and Its Potential Applications to Drug Discovery. J. Chem. Inf: Comput. Sci. 1991, 31, 527-530. (33) Pivina, T. S.; Molchanova, M. S.; Shcherbukhin, V. V.; Zefirov, N. S. Computer Generation of Caged Frameworks Which Can be Used as Synthons for Creating High-Energetic Materials. Propell. Expl. Pyrotech. 1994, 19, 286-289. (34) Pivina, T. S.; Shcherbukhin, V. V.; Molchanova, M. S.; Zefirov, N. S. Computer-Assisted Prediction of Novel Target High-Energy Compounds. Propell. Expl. Pyrotech. 1995, 20, 144-146. (35) Clark, D. E.; Frenkel, D.; Levy, S. A.; Li, J.; Murray, C. W.; Robson, B.; Waszkowycs, B.; Westhead, D. R., PRO•LIGAND: An Approach to de noVo Molecular Design. 1. Application to the Design of Organic Molecules. J. Comp.-Aided Mol. Des. 1995, 9, 13-32. (36) Whyte, J. R. C.; Clugston, M. J. The Enumeration of Isomers with Special Reference to the Stereoisomers of Decane. J. Chem. Ed. 1993, 70, 874-876. (37) Gu, F.; Wang, J. The Numbers of Structural Isomers, Stereoisomers, and Chiral and Achiral Stereoisomers of Fluorochloroalkanes. J. Chem. Inf. Comput. Sci. 1992, 32, 407-410. (38) Razinger, M.; Balasubramanian, K.; Perdih, M.; Munk, M. E. Stereoisomer Generation in Computer-Enhanced Structure Elucidation. J. Chem. Inf. Comput. Sci. 1993, 33, 812-825. (39) Benecke, C.; Grund, R.; Hohberger, R.; Kerber, A.; Laue, R.; Wieland, Th. MOLGEN, A Generator of Connectivity Isomers and Stereoisomers for Molecular Structure Elucidation. Anal. Chim. Acta 1995, 314, 141-147. (40) Wieland, T. Enumeration, Generation, and Construction of Stereoisomers of High-Valence Stereocenters. J. Chem. Inf. Comput. Sci. 1995, 35, 220-225. (41) Nourse, J. G.; Smith, D. H.; Carhart, R. E.; Djerassi, C. Exhaustive Generation of Stereoisomers for Structure Elucidation. J. Am. Chem. Soc. 1970, 101, 1216-1223. (42) Robinson, R. W.; Harary, F:; Balaban, A. T. The Number of Chiral and Achiral Alkanes and Monosubstituted Alkanes. Tetrahedron 1976, 32, 355-361. (43) Abe, H.; Hayasaka, H.; Miyashita, Y.; Sasaki, S. Generation of stereoisomeric structures using topological information alone. J. Chem. Inf. Comput. Sci. 1984, 24, 216-219.

482 J. Chem. Inf. Comput. Sci., Vol. 39, No. 3, 1999 (44) Perdih, M.; Razinger, M. Stereochemistry and Sequence Rules. A proposal for Modification of Cahn-Ingold-Prelog System. Tetrahedron: Asymmetry 1994, 5(5), 835-861. (45) Mata, P.; Lobo, A. M. Implementation of the Cahn-Ingold-Prelog System for Stereochemical Perception in the LHASA Program. J. Chem. Inf. Comput. Sci. 1994, 34, 491-504. (46) Figueras, J. Computer Coding of Configuration. J. Chem. lnf. Comput. Sci. 1996, 36, 491-496.

CONTRERAS

ET AL.

(47) Prelog, V.; Helmchen, G. Basic Principles of the CIP-System and Proposals for a Revision. Angew. Chem., Int. Ed. Engl. 1982, 21, 567583. (48) Balasubramanian, K. Computer Perception of Molecular Symmetry. J. Chem. Inf. Comput. Sci. 1995, 35, 761-770. (49) Contreras, M. L. et al. (to be published).

CI9700586