Chemical applications of graph theory. Part I. Fundamentals and

2. Molecules Containing Heteroatoms and QSAR Applications. Ernesto Estrada. Journal of Chemical Information and Computer Sciences 1997 37 (2), 320-328...
5 downloads 0 Views 7MB Size
Chemical Applications of Graph Theory Part I. Fundamentals and Topological Indices Peter J. Hansen and Peter C. Jurs Pennsylvania State University, University Park, PA 16802 Graph theory is a subdiscipline of mathematics that is closely related to both topology and combinatorics. Uudergraduate math students probably first encounter graph theory in their discrete mathematics course. The origins of graph theory date back over 200 years to the workof individuals from several different fields. Euler (1707-1783) is generally considered to be the father of graph theory with his publication of the solution to the Konigsberg Bridge problem ( I , 2). Kirchhoffs work with electrical circuits (3) and Cahley's work leading up to his enumeration of organic isomers (4) are also recognized as independent discoveries of graph theory (5). As evidenced bv the above.. eraoh " . theon, is hv no means a new field of mathematics nor an area thatiacksapplications in other disciplines. In addition to phvsics and chemistrv, graph theory has been applied to anthropology, architecture; civil engineering, communications, computer science, economics, electrical engineering, genetics, geography, industrial management, linguistics, operations research, political science, psychology, and sociology. Within chemistry, graph theory has been applied to problems from a wide range of research areas: svnthetic chemistrv. .,.nolvmer chemistrv. ". quantum chemistry, organometallic chemistry, petroleum chemistry, thermochemistry, chemical kinetics, statistical mechanics, phase equilibria, spectroscopic analysis, Hiickel theory, and chemical information storage and retrieval. In spite of its widespread usage, however, many chemists still lack anv familiaritv with the basic vocabulary and concepts of graph theory. Several review articles and monographs devoted entirely to the applications of graph theory to chemistry have been published (6-9). This paper will introduce the reader to some basic terms and concepts from graph theory and will present a brief discussion of one its many applications to chemistrv-namelv. the use of to~oloeical indices in auanti. tatl\.e st;ucture-activity relationship IQSARJ studies and quantitative structure-property relationship (QSPH)studies.

Two points joined by a line are adjacent points, while two lines having a point in common are adjacent lines. The degree of a point is the number of lines joined to it. A walk is an alternating sequence of points and lines that begins with a point and ends with a point. A path is a walk in which no point occurs more than once. A connected graph has every pair of points joined by a path. A cyclic graph must include a t least one walkof threeor more points that begins and ends with the same point and contains no point mire than once (except for the first and last). A tree is a connected acyclic graph. The distance between two points is the number of lines in the shortest path joining the two points. Two graphs are isomorphic if there exists a one-to-one correspondence between their points and their lines. Isomorphic graphs can give the appearance of being very different pictorially. Figure 1 includes a number of different graphs, and Figure 2 exemplifies a number of different graph theory concepts. Chemlcal Structures a s Graphs Graphs used in chemistry generally belong to one of two broad categories: (1) structural graphs, also referred to as chemical graphs, molecular graphs, or constitutional graphs,

-

Fundamentals of Graph Theory Mathematically a graph is a pair of sets, (1) a set of elements, and (2) a set of pairs of these elements. It is much easier, however, to conceive of agraph in terms of its pictorial representation, that is, (1) a set of points, and (2) a set of lines that join some or all pairs of points. (It should be noted that the nomenclature of graph theory has not been standardized; points are also referred to as vertices, nodes, and junctions, and lines as edges, arcs, and branches.) Many of the terms and concepts used in graph theory have their counterparts in the vocabulary of the chemist or layperson. The most basic of these will be defined here, while others will be introduced later as needed. A graph is a topological concept rather than ageometrical concept, and hence metric lengths, angles, and three-dimensional spatial configurations have no meaning. The word "graph", as used here, should not be confused with the Cartesian graph (e.g., y versus x ) commonly used in science and mathematics to present the relationship between two or more variables visually. 574

Journal of Chemical Education

Figure 1. Graphs a and b are disconnected, but all of the other graphs ere connected. Graphs c and d are trees. Graphs e i are cyclic. The graphs f and g are isomorphic. as are the graphs h and I. Examples of a regular graph (all of whose points have the same degree)are b and Ci. Graphs h and i are complete graphs (that is, every pair of points is adjacent).

Figure 2. Pain of adjacent points are AB. BC. BE, and CD. Pairs of adjacent lines are ah, ad, bd, and bc. Points A. D, and E are of degree 1, point C of degree 2, and point B of degree 3. One example of a walk is CbBdEdBaA. A path of length 2 is EdBbC. Points D and E are separated by a distance of 3.

Table 1.

Oraph meOry Graph point line path degree cyclic graph tree

Correspondlng Concepts In Graph Theory and Chemistry Chemistry Struchlral Oraph Reaction Graph atom chemical bond chemical substrunure atom valency ring compound acvclic structure

chemical species chemical reaction reaction sequence

-

and (2) reaction graphs. As the names suggest, a structural graph corresponds to a specific chemical structure, while a reaction graph corresponds to a set of chemical reactions. Table 1 provides a list of graph theory concepts and the corresponding terms used in chemistry for each of these two types of graphs. The bulk of this paper will consider only concepts related to structural graphs. Reaction graphs have found their principal application in chemical kinetics (10) and computer-assisted organic synthesis (11). In kinetics applicatibns a graph char&teriz& a reaction mechanism; each point corresponds to a reactant, product, or intermediate, and each line corresponds to an elementary reaction. In organic synthesis applications points represent reactants, molecular fraements. intermediates. or target molecules. and the lines ;eprese;lt steps in the &thesis. Balahan (12) coined the term synthon graph for the latter type of graph. Chemists employ various types of names and formulas when they wish to communicate information about chemicals and their structures. For the most part, however, names and formulas have no direct, immediate, or explicit mathematical meaning. Graph theory, on the other hand, provides many different methods of characterizing chemical structures numericallv. Fieure 3 eives the molecular formula and IUPAC name o f a chimica[its structural formula, and the chemical eranh - . that corres~ondsto its hvdroeen-depleted . structure; two matrices are shown that represent alternative means of depictine this a r a ~ h .The adjacency matrix is defined such-that each element ai, equ& 1, i f and only if atoms i and j are adjacent (i.e., bondedto each other) while allother aij'sequal zero. The distance matrix is defined such that each element dv equals the length of the shortest path (i.e., the fewest number of bonds) joining atoms i and j.

Figure 3. 2,Mimethylpentane represented as a molecular formula, IUPAC names, structural formula, structural graph, adjacency matrix, and distance matrix.

Topological Indices and PSARIQSPR Studles

Figure 4. SnucNral graphs for selected alkane isomers and their normal boiling points (OC).

I t is well known that the chemical behavior of a compound is dependent upon the structure of its molecules. Quantitative s t r u c t ~ r e ~ a c t i v relationship it~ (QSAR) studies and auantitative structure-property relationship (QSPR) studies are active areas of i h e i i c a i research that focus on the nature of this dependency. The quantitative relationships are mathematicaimodels that either enable the predictionof a continuous variable (e.g., boiling point, LCw toxicity, fla-

vor threshold concentration, antiviral activity) or the classification of a discrete variable (e.g., sweetflitter, toxiclnontoxic, carcinogeniclnoncarcinogenic) from structural parameters. These two types of models are constructed using regression analysis and pattern recognition techniques, respectively. Hundreds of QSAR and QSPR research papers have been published-everything from "Computer-Assisted Volume 65

Number 7

July 1988

575

Prediction of Liquid Chromatographic Retention Indices of Polycyclic Aromatic Hydrncarhons" (131to"The StructureActivitv Relationshio in Barbiturates and Its Similaritv to That in Other ~arcoiics"(14). Testimony to the wide aiplicahilitv and utilitv of these methods is the broad ranee of scientsic journalsin which this work has appeared, iniludina: Analytical Chemistrv. Journal of Medicinal Chemistry, ~ u n d o m e n t a l sof ~ p & e d ~oxicology,Drug Information Journal, Chemical Senses,Environmental Health Perspectives, Journal of Pharmaceutical Science, and I&EC Process Design and Development. In addition, in 1982 the journal Quantitative Structure-Activity Relationships was

Table 2. The Normal Bolllng Polnts ( ' C ) , Wlener Numbers, Rand16 Indlces, and Molecular ID Numbers for the C,-C, Alkanes

Com~aund

bp fobs1

Wiener

Rand16

number f m

index f'ul

molecular ID numbera

fn,,nrl.rl

Graph theory has been found to be a useful tool in QSAR and QSPR research. Figure 4 shows the structural graphs and normal boiline". ooints of the CA-C. , normal alkanes and the hexane isomers. (As is apparent from Figure 4, hydrogen atoms are generally not represented as points in structural graphs. To he more precise one should refer to these as "hydrogen-suppressed" or "hydrogen-depleted" structural graphs, but usually these modifiers are omitted.) By examining the normal alkanes, i t is apparent that boiling points are related to molecular size; however, consideration of the hexane isomers demonstrates that boiling points are also s of related to molecular shape or topology. ~ a i i n u measures size-molecular weight, molecular volume, chain length, and carbon number-have been found to correlate hiehlv with the boiling points of the normal alkanes. But, how"doddes one measure or quantify shape in general? For organic structures we often think of shape in terms of the degree or extent of hranchinp. Bv definition normal hexane is unbranched. and most chemisis would agree that 2,3-dimethylhutane is more highly branched than 2-methslpentane, but is 3-methvlpentane more or less hranched than 2-ðylpentane? -can branching he quantified, and, if so, how? Numerous attempts have been made to answer questions such as these by using what are called topological indices. A to~oloeicalindex is a numeric auantitv that is mathematicaily &rived in a direct and unakhigukus manner from the structural eraoh of a molecule. (The t e r m ~ r a o htheoretical index wouih be more accurate than topolo&ai index for this concept, hut the latter is more common in the chemical literature.) Since isomorphic graphs possess identical values for any given topological index, these indices are referred to asgraph invariants. Topological indices usually reflect both molecular size and shape. TrinajstiC and coworkers recently reported that 39 topological indices are "presently available in the literature" (15).In the sections that follow we discuss several of the most noteworthy topological indices and give published examples of their use. ~~~

~~

Wlener Number

The first reported use of a topological index in chemistry was by Wiener in his study of paraffin boiling points (16). Wiener employed both a polarity number, p, which is a count of all paths of length three, and a p a t h number, w , which is "the sum of the distances between any two carbon atoms in the molecule, in terms of carbon-carhon bonds". Here we will focus only on the path number that is now commonly referred to as the Wiener number, W. Its definition has been broadened to include all nonhydrogen atoms, so that i t can now be applied to nonhydrocarhons as well as hydrocarbons. Although relatively easy to compute for small molecules, the numher of terms in this sum increases approximately as the square of the numher of nonhydrogen atoms, and hence computer methods are normallv emploved foritscalc~lation.~h~~ienernumheris identictdy equal to one-halfthesum of theelementsof thedistance matrix; onehalf since this matrix includes the distance from atom i to atom j as well as from atom j to atom i. Wiener number = 'I2 d i j ..

I.?

578

Journal of Chemical Education

F r o m RsndlC, aner mundlng (28)

The adjacency matrix is quite easy to construct, and the distance matrix can be readily computed from the adjacency matrix using a very efficient algorithm ( 1 ; ) . The valuesshown in Table 2 demonstrate that the Wiener number increases with molecular size but tends to decrease with what intuition would suggest is greater branching-for example, of the five hexane isomers, the unbranched isomer has a W of 35, the two isomers with onlv one methyl branch have W values of 32 and 31, and the t&o isomerswith two methyl branches have W values of 29 and 28. What is the logical basis for the Wiener numbers? Trees with an identical numher of points (e.g., acyclic isomers) have equal numher of paths (this may not he obvious from observing two such trees, but it should be apparent if one recoenizes that their distance matrices are of the same order)r~ranchingreduces the number of long paths and increases the numher of short paths; hence the sum of all path lengthsmust decrease. Platt (18)has suggested that the cube root of the Wiener number is a "sort of mean distance between the carbon atoms in a molecule [and] is an approximate inverse measure of the probability of one part of the by van-der Waals molecule being attracted to another forces". Table 2 reveals two apparent shortcomings of the Wiener numher. The alkanes in this table are sorted in order of increasing boiling point. If the Wiener numher is to be employed as a predictor of alkaline hoiling points, then ideally the Wiener numhers in this table should either increase or decrease monotonically. Clearly this is not the case since 5of the 19 alkanes are out of order. Table 2 also shows that two pairs of isomers have identical Wiener numbers (both 2,2and 2,3-dimethylpentane have a W of 46, while both 3-ethyl and 2.4-dimethylpentane have a W of 48). For a t least two reasons this degenefacy is a bit troubling. First, it is not readily apparent that isomers such as 3-ethylpentane and 2,4-dimethylpentane are equally hranched. Second, assuming these two pairs of compounds are equally branched, would it not seem reasonable to expect each pair of comnounds to Dossess more nearlv- eoual Clearlv. . hoiline..ooints? . .. it would headvantageous tor only isomorphic graphs to have eoual toooloeical indices-in which case the toooloeical index wouid possess the properties of both unambig~ousness and uniqueness.

i"'

Figwe 5. The observed m m a l boiling points (OC) versus the Wiener number tM me C, to C, alkanes.

Figure 5 is a plot of the normal boiling points versus the Wiener numbers for the CrC7 alkanes. Although highly correlated (the correlation coefficient, r = 0.971), clearly the points possess significant scatter. I t should be remembered, however, that the original model developed by Wiener innumber. cluded a second indeoendent variable.. the nolaritv . QSAH and QSPR studies often employ multivariate models. Althourh oerhaos not obvious from this firure, the relationship bGween these two variables is decidedly nonlinearthis does not, however, diminish their utility. The Wiener number has not seen widespread use in the QSAR/QSI'H community. It has been said, however. that its high correlation with other very successful topological indices suggests that it deserves far greater attention (15). In addition to develonine oredictire models for alkane hoiline points (I@, Wiener also developed models for predicting other alkane properties, including: molar refraction, molar volume, and heat of formation (19); vapor pressure as a function of temperature (20); and surface tension, "specific dispersion", and "critical solution temperature in aniline" (21). More recentlv Trinaistik, Bonchev, and co-workers reported a two-parameter model for the prediction of the gas chromatomaphic retention indices of monoalkyl- and o-dialkylbenzeies from their Wiener numbers (221,

.

Flgurs 6. The RsndlC branching index (path-ms molecu ar connecrivltyl ol 2msthyloutans. (a) the strxtural form.12.. (0) the swucldra graph wan the valency ot each point shown. (cl me sw.nural grapn win (l/mnl' 'tor each edge shown, and (d) the summation of ( t l ~ n n ) " ~which , yields the RandiC branching index,

u.

Rand16 Branching Index and Molecular Connectlvlty In 1975, RandiE proposed a topological index (23) that has become one of the most widely used in both QSAR and QSPR studies. The RandiE branching index, R, is defined as,

This summation includes one term for each edge in the hydrogen-suppressed structural graph. The variahies m and n are the valencies of the adjacent points joined by each edge. Figure 6 demonstrates the sequence of steps leading from the structural formula of 2-methylbutane to its RandiE branchine index. owingoto the normal valency of 4 observed for carbon in organic chemistry, the valencies of the points in the hydrogen-suppressed structural graphs of the alkanes are limited to the values 1.2.3. and 4. I t follows that the number of edge types is limited to the 10 possible pairs of these four valencies (i.e., 1-1,l-2,l-3,1-4,2-2,2-3,2-4,3-3,3-4,P4). The 1-1 edge type, of course, can occur only in ethane. Furthermore, since the edge types 1-4 and 2-2 both yield the same mn product, they must also produce the same value for (11

Figure 7. The observed normal boiling points ('C)versus the Rand16 branching index for the C to C, alkanes.

mn)'fi. Hence, practically speaking, the RandiE branching index is based on the decomposition of a compound into eight different carbon-carbon bond types. Examination of Table 2 reveals that for the 19 CrC7 alkanes listed a change in position of only two of them would result in a monotonically increasing sequence of RandiE index values. Although no degeneracies are observed for these 19 comoounds. this is not the case if one exoands the list to include'alkanei with greater carbon numbers (for example, 3-methvlhe~taneand 4-methvlhentane vield exactlv the same decomposition of bond types and hence m u s t have identical values for the RandiE index). The high correlation ( r = 0.994) of boiling point with the RandiE index is evident from Figure 7. As with the Wiener number, the functional relationship of the RandiE index to alkane boiling point is somewhat nonlinear. I t has been shown that for the 21 Cz-C7 alkanes, use of the cube root of the RandiE index results in an improved correlation with boiling point (r = 0.999) and a reduction in the standard deviation bv more than half (45). An alternative way of looking at ihe Rand% branching index is to consider it asa summationover all pathsof length Volume 65 Number 7 July 1988

577

Figwe 8. Palbhvo molecular wnnectivity f a Pmelhylbutane. Each path of lengthtwo is shown in bold together with the path's contribution to

Table 3. Atomlc and Molecular Path Counts lor 2-Methylbutane Atom Number

Count of Path Lenglh 1 2 3

4

Sum

1

1

1 1

3

0

5 5 5 5

0

5

0

1 2 4 5

I 1

2 1 1

Molecular Path Counts

5

4

3

2 1 2 1 2

1 0 0 2 1

0

4

2

0

0 0

Molecular Path Count Total = 15

one in the structural graph. The natural extension of this view is to define additional indices corresponding to paths with lengths greater than one (25), and to other subgraphs (clusters, path-clusters, and cycles) some of which may even include points of degree greater than two (26).In fact Randif's original paper alludes to the "cuncept of extended connecti\,ity which acknowledges the presence of more distant neighbors" (23). Hence the chemical literature now uses the term molecular connectivity to refer to an entire family of topological indices. The RandiE branching index is generally referred to as the path-one molecular cunnectivit), 'x.This nomenclature is readily adaptable to the naming of the extended molecular connectivities; for example,path-tuo molecular connectivitj, 2x,which is defined as.

Where k corresponds to all paths of length two, and m, n,and u.are the valencies of the three ~ o i n t scontaimed in these paths (see Figure 8). The original work of Randit (23) involved only the acyclic alkanes; tLese, of course, represent only avery small fraction of the organic compounds of interest to chemists. Hence almost immediately the concept of molecular connectivity was altered and expanded to broaden its application t o com578

Journal of Chemical Education

Floure 9. For 2methvlbuane: la) ern~lovedin . . The atom numbering- svstem . T& 3. (b) The ed&w bond weights related to the atom vale&. (c) The five paths associated with carbon atom C-1 together wim their lengths and weights.

pounds containing rings (27),multiple bonds (a), and heteroatoms (29, 30). These modifications will not he considered in this paper; interested readers are directed to the orieinal Daners. or to either of two hooks bv Kier and Hall. twg of t i e $trohgest advocates of the use ok molecular connectivities in QSAR or QSPR work (26.31). Since 1975. Kier and Hall have authorei or coauthored over 35 papers in which molecular connectivities were emvloved . . in QSAR and QSPR studies. Molecular connectivity represents perhaps the most widely used topological index (or family of indices) in QSAR and QSPR work. Kier and Hall (26) include a list of 158 journal references (through 1984) in which molecular connectivity played a prominent role. Since that date the use of molecular connectivity has probably even increased. For example, molecular connectivity has been used in the QSAR or QSPR studies of eas chromatoera~hic - . retention indices of nitrated polycyclic aromatic hydrocarbons (32), alkane soluhility in water (33), rational drug design (34), halocarhon anesthetics (35), enzymatic reactions (36), hallucinogenic mescaline analogs (37). bioconcentration factor of hazardous chemicals (387, and chemical carcinogenicity (39). Molecular Identlflcatlon Numbers Attempting to find a topological index that would be easy to derive. unioue. and structurallv sienificant. RandiE. in 1984, proposei the molecular idekifkation (ID) number (40). This topological index is a hybrid of molecular connectivity and the molecular path count. More precisely, i t is a weiehted molecular ~ a t hcount in which the weiahts are related to molecular &mnectivity. Before definingihe mo-

lecular ID numher, if will he helpful to consider in more detail both atom and molecular path counts. Table 3 eives individual and total nath counts for all carhon ato& and the 2-methylbutane molecule as a whole (corresnondine to its hvdroeen-suonressed e..r a.~ h )Fiaure . 9 . .. shows the numbering system that was assumed fur this molecule. In addirion the associated naths and their lenrrhs are shown for carhon atom C-l (thd other information shown can he ignored for now). Some of the data and relationships shown in Tahle 3 are perhaps confusing. First, for mathematical completeness paths of length zero are counted and included; these paths consist of merely a single point or atom. Second, except for paths of length zero, the molecular path count for any given length is notequal to the sum of the atom path counts, but rather to one-half of this sum-this stems from the fact that each of these paths has been counted twice, once from each end. For example, the sum of atom path counts for length two is, 2 1 2 1 2 = 8, while the molecular path count for length two is 4. It follows that to compute the molecular path count total from the tabulated atom path counts, onemust sum the numher of paths of length zero together with one-half the sum of the path counts for lengihs greater than zero. Third, it is merr cg~incidence that the sum of the atomic path counts fur each atom is 5: this willnot eenerallv he the case. Fourth. atoms C-1 and C-5 have identical values for all path counts; this is to he expected since these two atoms are chemically equivalent and related by symmetry. Molecular nath counts (also called molecular nath codes) can themselves he used ~ ~ Q S Aand R QSPR studies. ~ a n d i h and Wilkins (41) . . studied the ouantitative structural similarities of a set of 29 monocyclic monoterpenes by comparing the following measure of dissimilarity:

-

+ + + +

D:, =

z(a;

bJ

Here a and b refer to the molecular path counts for path length i of two structures. The dissimilarity measure, D, is somewhat analogous to a Euclidean distance. As stated ahove. the molecular ID number is a weiehted molecular path count. What weights are used? Each path of leneth zero is eiven a weight of unitv. For naths of lenrrth greker than zero the weiiht is equai to th; product of711 mn)1'2terms. one term for each edae included in the path: for each of these terms m and n equaithe valencies of the atoms joined by the edge. (Hence the close relationship with molecular connectivity.) For the paths associated with carbon atom C-1, Figure 9 shows the method used to calculate these weights. The molecular ID numher is exactly analogous to the molecular path count total except that a path-weighted count is taken. molecular ID =

2 w,, + 'I,2 I

ik

Where w is the path-weight, O j corresponds to all paths of length zero and ik corresponds to all paths of length greater than zero. Tahle 4 gives individual and total path-weighted counts for all carhon atoms and the 2-methylbutane molecule as a whole (note the similarity to Table 3). Tahle 2 includes the molecular ID values for all of the C4-C, alkanes. These values exhibit no degeneracy, hut the positions of four alkanes would have to he altered to achieve a monotonically increasing sequence of values. What is the inherent logic behind the molecular ID number as a topological index? As stated by RandiE the use of bond or edge weights smaller than 1 results in "a gradual attenuation of the role of paths of longer lengths", hence "the weighted molecular path sum represents an average atomic path count in which local features are more pronounced" (40).

Table 4.

Atomlc and Molecular Path-Welghted Counts for 2Methylbutane*

Atom

Path-Weighted Count for Path Length

Atom

Number

0

1

5

2.2701

2

3

ID

4

Number

Molecular

Path-Weighted CounU =

1.0934 0.3333

0

Molecular ID Number = 9.6968019

"Most r e ~ u lhave t ~ been rounded to four decimal places. To some the claim that molecular ID numbers are "easy to derive" mav seem areuable. Most would aeree. . however. that the adventind read;accessihility ofcompurers haschanged the meaning of [his nhrase. Clearls the arithmetic ouerations in thk calculation 01molecular ID numbers do not require any advanced 111 sophisticated mathematics. Ilandii., Wilkins, and co-workers have published the BASIC program ALL-PATH (containing fewer than 100 lines of code), which can be used to compute molecular ID numbers in a relatively straightforward manner (42). Randii: (40) computed molecular ID numbers for over 400 structures'(i~clud&gall acyclic alkanes up t o n = 10 as well as a varietv of monocvclics and . nolvcvclics) and observed no " " degeneracy. He recognized that if molecular ID numbers were unique, they might also serve as a coding scheme, perhaps supplementing the CAS Registry Numbers, which themselves contain no structural information. This would require, of course, that the ID numbers he computed with high precision, for if ID numhers are rounded to only a few decimal digits, the maximum numher of compounds that could he encoded without degeneracy would obviously he verv limited. i n 1985,TrinajstiE and co-workers reported (43) that their svstematic examination of the 618.000 alkane isomers through n = 20 revealed the existence bf 124 pairs and 1trio of nonisomorphic structures with identical molecular ID numhers. For example, 2,3-dimethyl-6-ethyl-5-isopropyloctane and 2,6-dimethyl-5-ethyl-3-isopropyloctane both possessed the same ID number (this was true even when the ID numhers were expressed as integer expressions, precluding round-off error). Their work clearly demonstrated that the molecular ID numher did not possess uniqueness. RandiE subsequently proposed (44) a second molecular ID numher that uses as weights the square roots of the reciprocals of the first nine mime numbers for the nine edee tvnes .. commonlv encountered in chemical graphs (rather than the weights based on atom valency). This modification will he considered no further in this paper. TrinajstiC also indicated that "for complicated structures (e.a., the ID numbers - ..~olvhexes) . are not easily computed". The calculation of molecular ID numhers gives rise to what might be called atom ID numhers (see the last column of Table 4). Whereas most topological indices are whole molecule descrintors. the atom ID numhers are atom based and hence hold promise for applications in QSAR and QSPR work in which the activity or property of interest is localized (e.g., 13CNMR chemical shifts or carcinogenicity associated with a specific molecular site). In addition, by summing the appropriate atom ID numhers subgraph ID numhers could be computed for functional grouDs consisting of several atoms suih as anitro group o r a ca;boxylic acidgroup. The development of molecular ID numhers is recent enough that one could hardly expect them to have been

-

Volume 65 Number 7 July 1988

579

employed in many QSAR and QSPR studies. RandiE did, however, apply cluster analysis to the molecular ID numbers of a set of 4 1 therapeutically active substances including antihistamines, anticholinergics, antipsychotics, antidepressants, analgesics, and antiparksonians, with some success at classification (45).

Literature Clted

Conclusion

011 7 r r r ~ . ! # U > Chrmcrol Croph l"rrr).CRC H.re R ~ l u nFI. . 1961 \ nlr I and 11 9 H .-:S:n C f'.r J M s o Ch.nz I982,17.W7-111 lr lV3,lo.l' D ' K s r . L H .Hell.L H J P h a r m C r , . 197Y.S.rs