Molecular Structure-Property Relationships Paul G. Seybold', Michael May, and Ujvala A. Bagal Wright State University, Dayton. OH 45435
The axiom that "form follows function" is one of the most fundamental in science and technology. Biologists learn quite early that a bird's feathered wings, hollow bones, and powerful flight muscles uniquely prepare it for flight, and that a polar bear's white fur, insulating fat, and low surface1 volume ratio suit it well for arctic life. Engineers streamline vehicles to reduce air resistance and design computer chips to optimize speed and minimize heat production. Conversely, in chemistry it is recognized that the properties and behaviors of molecules follow from their structures, that is, that "function follows form". Yet this connection, with some notable exceptions (often in biochemistry), has been less emphasized in the teaching of chemistry, possibly because of perceived difficulties in its quantitative application. Whereas it is obvious that the structure of a moleculegeometric and electronic-must contain the features responsible for its physical and chemical properties, it is less obvious that these features can be discerned in any simple way. T o he sure, a full-scale quantum mechanical treatment should in principle yield accurate values for these properties. However, one is reminded of Dirac's famous statement in 1929 that "the underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble." Moreover, even if such quantum chemical calculations were now possible for hulk systems, the abstract results generated would likely be less than satisfying to most practicing chemists. Clearly at present one must seek to describe molecular structure in some less exalted way. Our purpose here is to note that (1) useful methods are currently available for the description of some aspects of molecular structure and that (2) the study of molecular structure-property relations can play a valuable role in the training of undergraduate chemistry students. There are normally two goals in structure-property investigations: accuracy and understanding. Our emphasis here, rather than being directed toward achieving ultimate "best fit" correlations, will be upon describing how molecular structure can
be represented mathematically and how this can then lead to a better understanding of the connection between molecular structures and properties. The simplest way to represent a molecule's structure is to assign to the structure a number or a set of numbers, termed indices. There have been many attempts to design effective molecular structural indices (for collections and reviews see refs 1 4 ) . These attempts have variously fallen into such categories as molecular design, chemical graph theory (see Appendix), quantitative structure-property relations (QSPR), quantitative structure-activity relations (QSAR), and quantitative drug design. Most applications have been in pharmacology and toxicology (7-91, but many physical and chemical properties have also been modeled. We shall confine our attention to two approaches, those of Wiener and Randit, which have been the most widely applied and most successful, and a third approach based on ad hoc molecular descriptors, which has certain advantages of directness and simplicity. Wiener's Approach Nearly four decades ago H. Wiener carried out a pioneering investigation of the relationship between the structures and properties of saturated hydrocarbons (10-14). I t was recognized that the properties of these compounds varied with molecular bulk and branching (Table 1). In order to explain the observed variations, Wiener proposed an index of molecular structure based on path distances between carbon atoms (10-12). The Wiener index w is the sum of all the shortest carbon-carbon bond paths in a hydrocarbon. For example, the carbon skeleton of 2-methylbutane
can also be pictured as the hydrogen-suppressedgraph (see Appendix)
id a
1
I
Author to
whom correspondence should be addressed
e
b
2
a
4
The index w is then readily calculated by forming a table of
Table 1. Selected Properties of Butanes and Pentanes
Compound
Structure
Boiling paint ("c)
Molar Volume (~LI
Critical temperature (%I
Heal of vaporization (~al/rnol)
Heat of combu~tion (kcailmol)
Volume 64 Number 7 July 1987
575
shortest path distances between all carbon atoms and summing its elements:
An alternative method for calculating w is to take for each C-C bond the product of the numbers of carbon atoms on each of its sides, and to form a sum of these values over all C-C bonds:
l
0
50
I00
EXPERIMENTAL BP Figure 1. Plot of observedboilingpoints of alkanes vs. boilingpoints calculated by eq 4.
I t will be apparent that the lower the value of w , the more "compact" is the molecular framework. Wiener also proposed a second index p equal to the number of pairs of carbon atoms separated by three bonds. For 2methylbutane there are just two such pairs, atoms (1,4) and (4,5), so that p = 2. Originally described as a "polarity" index, p is more properly related to steric aspects of structure (15). Wiener observed that values for many molecular propertiesP; could be represented by equations of the general form
where N is the number of carbon atoms, and a, b, and c are constants appropriate to the property. I t is therefore convenient to define the reduced Wiener index as w, = w/W. (In fact, Wiener worked directly with differences Aw, between a normal hydrocarbon and its isomers.) As an example, let us examine the boiling points (bp) of straight-chain and branched-chain alkanes having two to eight carbons. Multiple linear regression using w , and p yields
With this index the fit becomes (see Fig. 1)
I t is desirable t o extend the Wiener scheme to include heteroatoms such as N, 0 , S, etc. One way to do this is to define an atomic site index s;, in which s; is the sum of all shortest distances (in terms of C-C and C-X bonds) from atom i to other atoms (16,17). For example, for oxygen in isobutanol,
+ + +
so. = 3 3 2 1 = 9. For a set of 37 alcohols including the normal alcohols through decanol and all branched alcohols through the hexanols, one obtains bp('C)
where n is the number of compounds, r is the correlation coefficient, and s is the root-mean-square error of the fit. The lighter hydrocarbons ethane (calcd, -63.5; obsd, -88.6) and propane (calcd, -34.0; ohsd. -42.1) are very poorly represented. If these are eliminated, the fit improves to
Note that r2 measures the fraction of the variation in the ex~erimentaldata that can be accounted 'or bv the model. wiener showed that a number of other alkane properties, includine heats of isomerization and va~orization.surface tensions; and vapor pressures, could be fit in the above manner (10-14). Platt (15) later independently analyzed the performance of Wiener's indices and concluded that for several properties they yielded values falling largely within the experimental uncertainties. Platt introduced an additional structural index f , equal to the number of adjacent C-C bonds summed over all C-C bonds of the molecule. For 2-methylbutane above,
576
Journal of Chemical Education
= 60.87
- 0.9526w, + 3.182~+ 6 . 4 3 7 ~ ~ ~
There have been a number of applications of Wiener's method to additional ~ r o ~ e r t i eFor s . exam~le. . . i t has been emplwed to fit the gaschromatographic retention indices of alkvlhenreneli (18). Extended toaromatic com~ounds.it has bein used to analyze the roles of structure (the "bay region" geometry) and position of methyl substitution in determinof polycyclic aromatic hying the ielative-~arcino~enicitieb drocarbons (17). Connectlvlty lndlces In an analysis of molecular branching, Randil: introduced a different topological index (19). One begins by assigning to each carbon atom of a hvdrocarbon a valence 6 eaual to the number of C-C bonds inwhich the atom participa'tes. Thus, for 2,2-dimethylbutane the atomic valences are
Each bond (ij)is then characterized by the number cq = (11 6j6j)''2. The molecular connectivity index x (originally the branching index) is the sum of such terms for all carboncarbon bonds,
x=
1e, 2 l l ( 6 , ~ ~ ) ' " =
bonds
bond%
For 2,2-dimethylbutane,
This index by itself gives a fair account of the alkane boiling points:
A generalization and extension of the RandiE approach has been given by Kier, Hall, Randie, and their co-workers (3,20). In this, different orders '"x of the connectivity index are defined, according to the numbers of contiguous bonds included, as
the indices 'x" are 1.023 and 1.204, respectively. Values for other atoms were calculated in a similar manner, except that in some cases, such as for halogens and sulfur, it was found to be desirable to employ empirical values for 6" (3,22). More recently, Kier and Hall (51) have suggested the formula
where Z is the total number of electrons, to account for elements not in the second row of the periodic table. Note that this formula reduces to eq 10 for second-period elements. For our test sample of aliphatic alcohols the index 'xv alone gives only a very modest fit,
Adding ' X considerably improves the fit, bp("C) = -25.76 - 99.65'~"+ 141.60'~
It is apparent that y", is an atomic sum index and that 'x is the original Randii: index. For order3 higher than 2 it hecomes necessary tospecify the type ofsubunit of the rnolecuInr framework over which the summation is to be taken. For example, form = 3 one may take a p a t h (p) of three bonds or a cluster (c) of three bonds: _f.
path
L cluster
These forms are designated 3x, and 3x, For higher orders more complicated combinations are possible. Values of these indices for alkanes and alkyl benzenes have been summarized in an appendix of ref 3. Kier and Hall (3) showed that for a collection of 51 Cs-C9 noncyclic alkanes
The RandiC/Kier/Hall approach has been applied successfully to an impressive variety of physical and biological properties (3). Physical and chemical properties have included boiling points (22, 23), solubilities (23), densities (201, partition coefficients (211, molar refractions (22), heats of atomization (3), heats of formation (3), solvent polarities (24), etc. (3). Applications to gas chromatographic retention indices have been especially common (e.g., 25-30). In his study of GC retention indices (RI) RandiE (25) introduced an index Ta, the number of terminal methyls separated hy three bonds, to model steric crowding. For alkanes he ohtained an excellent fit with the equation
Biological applications have included, for example, anesthesic (31-33), narcotic (3), and hallucinogenic (34) activities, toxicities (3, 35-37), enzyme inhibitions (3, 38, 391, factors influencing sweet and hitter taste (40), and other properties (3,4). Ad Hoc Molecular Descriptors
However, inclusion of the path-cluster term 4 ~ , , considerably improves the fit,
, sums over subunits of the form The term 4 ~ p represents
I t remained desirable to include multiple bonds and heteroatoms in the RaudiE method in some natural way. For this purpose Kier and Hall (3, 21, 22) originally proposed that the valence values of atoms be assigned according to
I t is also possible to characterize molecular structure using less sophisticated indices. We shall term this the ad hoe descriptors, or AHD, approach. In practice, many attempts to obtain molecular structure-property relations have been of this general type, relying a t least partly on a combination of ad hoc indices of structure (e.g., 41-44), Considering the alkanes, the most direct index is the number of carbon atoms, N,.This is a measure of gross molecular bulk and is linearly related to molecular weight. Because each branch of an alkane must terminate in a methyl group, the number of terminal methyl groups, T,, can be taken as a measure of branching or compactness. These two simple indices give a reasonably good account of the alkane boiling points: bp(OC) = -126.19 33.42NC- 6.286Tm
+
in which Zvis the number of valence electrons of atom i and hi is the number of hydrogens bonded to it. Valence conuectivity indices xVare then calculated from these in the same manner as above. For alkanes the 6 and 6" carbon terms are identical. In ethylene, however, each carbon of the double bond has 6" = 2, so that 'xv = 0.500. For an hydroxyl oxygen 6" = 6 - 1 = 5, whereas for an ether oxygen 6" = 6 - 0 = 6. Thus, for ethanol and acetone,
A formulation in this manner distinguishes the relative influences of molecular mass and branching on the boiling points of alkanes. Other properties are also well modeled by the AHD iudices. Alkane (gas phase, 25 OC) heats of combustion, for example, are very well described by N,alone, AH",(keal/mol) = 47.27
Volume 64
+ 146.66Nc
Number 7 July 1987
577
Heats of vaporization at 25 OC are given by AHJcallmol)
= 759.60
+ 1246.09Nc- 390.0416,
For aliphatic alcohols a convenient additional index is the number of carbons bonded to the alpha carbon, C,. This index accounts for steric hindrance alcohol boiling points, gas phase heats of combustion, and critical temperatures are
I
50
100
I
200
150
250
EXPERIMENTAL BP
Figure 2. Plat of observed boiling points of allphatlc alcohols vs. boiling paints calculated by eq 18.
The AHD indices have previously been used to describe the position of equilibrium between the zwitterion (Z) and lactone (L) forms of the dye rhodamine B in various alcohol at 25 solvents (45).For the equilibrium constant K = [Z]/[L] "C:
(In Table 1it is apparent that heats of combustion vary only slightly among isomers; the reasons for this are discussed in the following section.) Addition of T,,, further improves the fit,
Table 2. Alkane 1. Ethane 2. Propane 3. +Butane 4. 2-Methylpropane 5. RPentane 6. 2-Methylbutane 7. 2.2-Dimethylpropane 8. RHexane 9. BMethylpenfane 10. 3-Methylpenlane 11. 2.2-Dimethylbutane 12.2.3-Dimethylbutane 13. ~ H e p t a n e 14. 2-Methylhexme 15. 3-Methyihexane 16. 3-Ethylpentme 17. 2.2-Dimethylpenfane 18. 2.3-Dimethylpenlane 19. 2.4-Dimethylpenlane 20. 3.3-Dimethylpentane 21. 2.2.3-Trimethylbutane 22. +Octane 23. 2-Methylheptane 24. 3-Methylheptane 25. 4-Methylheplane 26. 3-Ethylhexam 27. 2,PDimethylhexane 28. 2.3-Dimethylhexme 29. 2.4-Dimethylhexane 30. 2.5-Dimethylhexane 31.3.3-Dimethylhexam 32.3.4-Dimethylhexme 33. 2-Methyl-3-ethylpentane 34. 3-Methyl-3-ethylpenlane 35. 2.2.3-Trimethylpentane 36. 2.2.4-Trimethylpentane 37. 2.3.3-Trimethylpenlane 38. 2.3.4-Trimethylpenlane 39. 2.2.3.3-Tetramethylbutane
578
Alkane Parameters and Observed Boillng Points
k
w
P
f
'X
TM
2 3 4 4 5 5 5 6 6 6 6 6 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 6 8 8
1 4 10 9 20 18 16 35 32 31 28 29 56 52 50 48 46 46 48 44 42 84 79 76 75 72 71 70 71 74 67 68 67 64 63 66 62 65 58
0 0 1 0 2 2 0 3 3 4 3 4 4 4 5 6 4 6 4 6 6 5 5 6 6 7 5 7 6 5 7 8 6 9 8 5 9 8 9
0 2 4 6 6 8 12 8 10 10 14 12 10 12 12 12 16 14 14 16 18 12 14 14 14 14 18 16 16 16 18 16 16 16 20 20 20 18 24
100000 1.41421 1.91421 1.73205 2.41421 2.27005 2.00000 2.91421 2.77005 2.80806 2.56066 2.64273 3.41421 3.27005 3.30806 3.34606 3.06066 3.18073 3.12589 3.12132 2.94337 3.91421 3.77005 3.80806 3.80806 3.84606 3.56066 3.68073 3.66390 3.62589 3.62132 3.71784 3.71784 3.68198 3.48138 3.41650 3.50403 3.55341 3.25000
2 2 2 3 2 3 4 2 3 3 4 4 2 3 3 3 4 4 4 4 5 2 3 3 3 3 4 4 4 4
Journal of Chemical Education
4 4 4 4 5 5 5 5 6
bp('C) -88.630 -42.070 -0.500 -11.730 36.074 27.852 9.503 68.740 60.271 63.282 49.741 57.988 98.427 90.052 91.850 93.475 79.197 89.784 80.500 86.064 80.882 125.665 117.647 118.925 117.709 118.534 106.840 115.607 109.429 109.103 111.969 117.725 115.650 118.259 109.841 99.238 114.760 113.467 106.470
Why Does It Work?
Having seen the perhaps surprising effectiveness of relatively crude structural parameters in accounting for variations in a variety of bulk properties, i t is natural to inquire into the physical origins of these correlations. That is, why do they work? At least part of the answer is that many molecular properties can be classified as either shape-dependent (constitutive, depending on the specific arrangement of atoms) or part-dependent (additiue, resulting from a summation of individual contributions) (3). For shape-dependent properties a crude picture is highly instructive. Consider two extreme cases, in which the molecules - ~~-~ are either soheres or cvlinders. as illustrated in Fieure 3. It is apparent ihat, a t least in the limit of perfect packing, there is more wasted soace in the packing of spheres than in the packing of cylinders. A brief c a l c ~ l a t i ~shows n that, ueelectine " " surface corrections, the cylinders fill 78.5%of the available volume, whereas spheres in a cubic arrangement fill only 52.4%.Even the densest packing of spheres, hexagonal or cuhic close packing, fills only 74.1% of the available volume (46),still less than cylinders. Thus we expect cylindrical molecules to exhibit higher densities than their spherical isomer counterparts. Conversely, the molar volumes should increase not only as the molecules contain more atoms, hut within an isomer set, as they become more spherical. These are just the trends that are observed, as seen, for example, in the molar volumes of Table 1. The parameters above appear to reflect these variations in shape relatively well. Packing effectiveness also helps to explain variations in a number of other properties, although less directly. Proper~
~~
~
SPHERES
CYLINDERS
packing-52.4% hcp/ccp=74.1%
packing=78.5%
Figure 3. Packing of spheres compared to packing of cylinders.
~
Table 3. Alcohol 1. Methanol 2. Ethanol 3. l-Propanol 4. 2-Propanol 5. I-Butanol 6. 2-Butanol 7. 2-Methyl-l-propanol 8. 2-Methyl-2-propanol 9. l-Penlanol 10. 2-Penlanol 11. 3-Pentanol 12. 2-Methyl-l-butanol 13. 3-Methyl-l-butanal 14. 2-Methyl-2-butanol 15. 3-Methyl-2-butanol 16. 2.2-Dimethyl-l-propanal 17. 1-Hexanol 18. 2-Hexanoi 19.3-Hexanol 20. 2-Methyl-l-pentanol 21. 3-M~thyl-l-pentanol 22. 4.Melhyl-l-pentanol 23. 2-Melhyl-2-pentanol 24. 3-Methyl-2-pentanal 25. 4-Melhyi-2-pentand 26. 2-Methyl-3-pentanol 27. 3-Methyl-3-pentanol 28. 2-Ethyl-l-butanol 29.2.2-Dimethyl-l-butanal 30. 2.3-Dimethyl-l-butan01 31.3.3-Dimethyl-l-butan01 32. 2.3-Dimethyl-2-butanol 33. 3.3-Dimethyl-2-butanol 34, l-Heptanal 35. l-Octanol 36. I-Naoanol 37. 1Decanol
ties such as boiling points, heats of vaporization, and critical temperatures are determined by intermolecular forces. For neutral molecules these forces consist of the dipolar van der Waals forces of attraction and. in certain cases, hvdrogen bonding. The latter force inserts a complicating feaiure,the importance of which, however, normally diminishes as higher members of a series, such as the aliphatic alcohols, are considered. Van der Wads energies of interaction decrease with the inverse sixth power of separation, so that molecules in more densely packed arrangements are expected to be more strongly held together than those less densely packed. In many cases dispersion forces are dominant among the van der Waals interactions:, Mever et al. (47.48) . . . have shown this to he true even for acetone, in which orientation (permanent dipole) forces might he expected to he relatively strong.
Allphatlc Alcohol Parameters and Observed Bolllng Polnts
Nc
w
so.
'X
'x"
TM
bp PC)
1 2 3 3 4 4
1 4 10 9 20 18 18 16 35 32 31 31 32 26 29 26 56 52 50 50 50 52 46 46 48 46 44 48 44 46 46 42 42 84 120 165 220
1 3 6 5 10
1.00000 1.41421 1.91421 1.73205 2.41421 2.27005 2.27005 2.00000 2.91421 2.77005 2.80606 2.80806 2.77005 2.56066 2.64273 2.56066 3.41421 3.27005 3.30806 3.30806 3.30806 3.27005 3.06066 3.18073 3.12589 3.16073 3.12132 3.34606 3.12132 3.18073 3.06066 2.94337 2.94337 3.91421 4.41421 4.91421 5.41421
0.4472 1.0233 1.5233 1.4130 2.0233 1.9509 1.8792 1.7236 2.5233 2.4509 2.4888 2.4171 2.3792 2.2843 2.3237 2.1698 3.0233 2.9509 2.9888 2.9171 2.9171 2.8792 2.7843 2.8616 2.8068 2.8616 2.8450 2.9550 2.7305 2.7899 2.6698 2.6671 2.6243 3.5233 4.0233 4.5233 5.0233
1 1 1 2 1 2 2 3 1 2 2 2 2 3 3 3 1 2 2 2 2. 2 3 3 3 3 3 2 3 3 3 4 4 1 1 1 1
64.7 78.3 97.2 82.3 117.7 99.6 107.9 82.4 137.8 119.0 115.3 128.7 131.2 102.0 111.5 113.1 157.0 139.9 135.4 148.0 152.4 151.8 121.4 134.2 131.7 126.5 122.4 146.5 136.8 149.0 143.0 118.6 120.0 178.3 195.2 213.1 230.2
4 4 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7
8 9 10
8 9 7 15 12 11 13 14 10 11 12 21, 17 15 18 19 20 14 15 16 14 13 17 16 17 18 13 14
28 36 45 55
Volume 84
Number 7
.
July 1987
579
Therefore, to the extent that the parameters employed successfully represent molecular shape and this in turn influences packing ability, these properties will he reasonably well accounted for. Unlike boiling temperatures, melting points (mp) are not well accounted for by the parameters described above. For alkanes, for example,
+
mp(W = 190.0 9.1N, n = 32
rZ= 0.247
+ 4.3T, s = 32.7
(22)
This serves as a reminder that melting is a more subtle transition than boiling. Whereas boiling represents an almost complete disruption of intermolecular forces, melting entails a less severe transformation from the highly ordered crystal to a somewhat less ordered, but still condensed, liquid phase. Regularities in certain other properties, the part-dependent, or additive properties, result from their direct dependence on the number of component atoms or bonds. Combustion of an alkane with m carbons C,H,,+,
-
+ 'I2(3rn + 1)02
(rn
+ l)HZO+ mCO,
+
involves breaking (m - 1)spa-sp3 C-C honds, (2m 2) C-H bonds, and (3m 1112 0=0 bonds, and formation of 2(m 1) 0-H bonds and 2m C=O bonds. I t is not surprising, therefore, that except for minor variations due to steric crowding or strain, the energies of such transformations follow m (or N,) linearly. I t is worth noting that heats of formation and atomization, which are normally derived a t least in part from heats of combustion via Hess's law, are smaller numbers and are less accurately known as a consequence. This leads to a reduction in the accuracy of the fit for these properties. Kier and Hall (3,49) have discussed the physical bases of these correlations, with special reference to the connectivity functions. They have concluded that the sum 6" 6 for bonding atom is related to its volume, whereas the difference 6" - 6 is related to its electrone~ativitv u " (49). , , Thev ~* cite San-- ~ derson's principle of electronegativity equalization (50) as a justification for use of the form 1/(6;v6,v)'/2for bond connectivity. Further developments of the connectivity index have recently been described and discussed (51, 52). The index %p may be associated with molecular flexibility (53), and 'xpCmodels ring substitutions in benzene derivatives (54). Edward (55) has examined the molar volumes, beats of vaporization, and solubilities of alkanes in water and concluded that connectivity indices successfully model these properties because the indices correlate with the numbers and types (primary, secondary,. . .)of carbon atoms. He has emphasized the importance of including the average number of gauche arrangements (Z,) present in the alkanes in accounting for steric influences on these properties (55,56). Global indices alone, such as w, and X, will be most useful when the property or activity in question depends on nonspecific features of the molecules, as distinguished from specific stereochemical interactions (3,5). Anesthetic gases, for excample, appear to represent such a case (5). The global indices are expected to be inappropriate when highly stereoselective interactions, such as lock-key fits of agents to receptors, determine the activity of interest.
+
+
~
~
~
Student Use
Students in the third quarter of a juniorlsenior level physical chemistry course were assigned individual QSPR or QSAR projects employing a computer-based statistical package. Students were asked to choose a property and class of compounds of interest, a condition being that data for a sufficient number of compounds (usually 20 or more) he available to allow meaningful statistical analysis. Properties chosen included physical and chemical properties and bio580
Journal of Chemical Education
~~~
a
+
logical activities. (Useful secondary data sources include refs 3, 57-59.) Individual student accounts were opened on the university's INTERACT system allowing access via terminals to the Statistical Analysis System (SAS) (60) on the university's mainframe computer. (Other systems or corresponding microcomputer-based packages could, of course, be substituted for this.) A class lecture was devoted to structural indices, and short descriptions of the use of INTERACT and SAS were provided, with handouts. Students were asked to present brief written and oral reports on their results. The purposes of the project were several. A major aim was to review the various properties studied during the yeartheir natures, methods of measurement. units. accuracies. etc. A second aim was to emphasize the intrinsic relationship between molecular structure and the physical, chemical, and biological properties of substances. A third goal was to introduce students to statistical methods of analysis and to provide "hands-on" experience with an available computer package for this purpose. Other benefits included (1)a break from the routine lectureltext study of material, (2) generation of a useful tool for the analysis of results obtained in the physical chemistry laboratory, and (3) experience in the presentation of oral and written reports on "original" results. After some initial misgivings on the parts of a few students unfamiliar with computers, student reactions were quite positive. Clearly, most enjoyed the opportunity to exercise problem-solving skills, work on an individual oroiect, design new indices, acquire some computer experience, and learnto use the statistical package. Mark Twain attributed to the British statesman BenjaminDisraeli thestatement that there are "three kinds of lies: lies, damned lies. and statistics." There are. of course. ~oerils . . in the indiscrim~anteuse of statistics ( f i l ,62), and students should be suitabl!. cautilmed. The most common fault is the use of too few observations or too many parameters. If enough parameters are screened even random numbers show correlations (62). Use of redundant parameters is another common pitfall for beginners. Students should also be warned that correlation does not necessarily imply causation. The stock market has tended to rise when women's skirts have become shorter and in years when teams from the "old" NFL have won the Superbowl, but i t would he risky to attribute this to a causal relationship! Nonetheless, statistical techniques, properly used, can be a powerful inferential aid.
~
Conclusions
A great deal of information about molecular properties can be obtained merely from the way in which the atoms of molecules are connected, without resort to more elaborate aspects of molecular structure. The study of structure-D~ODerty relationships has become an important area of chkmistry and pharmacologyltoxicology in recent years. With the development of new and more sophisticated indices and methods, this field can be expected to be of even greater importance in the future. For these reasons, study of molecular structure-property relations should he a valuable addition to the undereraduate chemistry curriculum. Exercises such as those illustrated can serve as an end in themselves. reinforcine andlor introducing many useful concepts. They can a~soHerveas a prelude to more sophisticated laboratorv "com~utere x ~ e r i ments", for example, in molecular mechanics (63) or moiecular orbital theory (64). Finally, i t should be noted that a number of important related areas, such as the classic approaches of Free and Wilson (65) and of Hansch et al. (66-68), and the more recent use of information indices (69,70), have been omitted from the present discussion. Interested readers are directed
-
~ ~
to the references cited, as well as to avery recent overview by Rouvray (71). Appendix: Chemical Graph Theory
Graph theory is a branch of mathematics drawing upon ideas from topology and combinatorics (72, 73). I t was first introduced in 1736 by the Swiss mathematician L. Euler in a solution of the famous "Konigsherg bridge problem". Kirchoff applied i t in 1847 to an analysis of the currents in electrical circuits. The first chemical use of graph theory was by Cayley in 1857, in his enumeration of the isomers of the alkanes. Present a~plications of graph theory range from .. electrical network design, coding the&, and romputerstorare ~rohlemsin engineering to mapping pnhlems in geography'and the analy& of logistics and military strategy. A graph is a representation consisting of points, or vertices, connected by lines, or edges. The degree of a vertex is the number of edges ending on the vertex. For example,
is a graph, and vertex d has degree 3. In some graphs, called digraphs, the edges have directions, as do one-way streets. However, we shall only be concerned with undirected graphs. A path is a sequence of edges connecting two vertices; for example, the edges (ab, bd, de) form a path of length three between vertices a and e in the graph above. In a connected graph all vertices in the graph are connected by paths. In chemical graph theory vertices represent atoms and ldges represent bonds. The chemical graph of propane might be
'
H-C-C-C-H= l HI Hl H
v7
I
+t
More commonly, hydrogen atoms ate ignored, and the hydrogen-suppressed graph
I
2
3
is used. The adjacency matrix A shows how the atoms are connected by including a 1 (or some other suitable metric) to indicate bonded atoms; for example, for propane, A=
(E)
More complete discussions of graph theory (72, 73) and chemical graph theory (2,5,6) can be found elsewhere. Acknowledgment
The authors thank M. Randit, N. Trinajstit, and L. B. Kier for helpful discussions and correspondence on this topic. Literature Cited 1. Rowray, D. H. Am. Sei. 1973.61.729. 2. Balabsn. A. T., Ed. Chemieol Appl&cationsof Graph Theory: Academic: New York, 1976. 3. Kier, L. B.: Hall, L. H. Molecular C o n ~ e t i v i f yin Chemistry and Drug Resroreh; Academic: Near York, 1976. 4. Sabljif, A,: TrinajstiC, N.ActoPhorm. Jugosl. 1981.31.189. 9. Tinajstif, N. Chemical Graph Theory;CRC: Boca Raton, FL, 1983: (two vols). 6. King. R. B., Ed. Chemicol Applicafioru of Topology ond Omph Theory. Amsterdam,