'a
S~mposiumon Graph Theory in Chemistry
6
10
2
4
A Graph-Theoretical Approach to Structure-Property Relationships -
-
Zlatko MlhaliC Faculty of Science and Mathematics,The University of Zagreb, Strossmayerovtrg 14.41000 Zagreb, The Republic of Croatia Nenad TrinajstlC The Rugjer BoSkoviC Institute, P.O.B. 1016,41001 Zagreb,The Republic of Croatia
A fundamental concept of chemistry is that the structural characteristics of a molecule are responsible for its pmperties (1).This was pointed out in the middle of the last century by Crum Brown and Fraser (2) who had also devised one of the first structure-property models. However, the earliest work in which this relationship was observed (the toxicity of methyl and amyl alcohols)was a thesis by Cms in 1863 (3). A Topological Model of Matter The origin of the structure-property concept can be traced (4) to the work of the Croatian Jesuit priest, scientist, and philosopher Rugjer Josip BoGkoviC (5)who introduced the idea of representing atoms as points in space (6). (His major work was the theory of a single law of forces.) By allowing the point atoms to assume a variety of different arrangements, BogkoviC was able to account for the existence of different substances. In this way the BobkoviC model may be considered as the forerunner of a topological model for the structure of matter. BoBkovib's fundamental idea, which is of the greatest importance in chemistry, was that substances have different properties because they have differentstructures. This idea was used, for example, by Davy to rationalize the difference between diamond and graphite (4, 7).
Table 2. List of Properties that Are Deslrable for Topological Indices a s Proposed by RandiC (18) 1 2 3 4
5
Direct structural interpretation Good correlation with at least one molecular property Good discrimination of isomers Locally defined Generalizable Linearly independent Simplicty Not based on physical or chemical properties Not trivially related to other indices Effidencyof construction Based on familiarstructural concepts Correct size dependence Gradual change with gradual change in structures
QSPR
The structure-~m~ertv . - relationships wantifv the connection between ihe structure and p&pekies o 0.99, while s depends on the property. For example, for boiling points, s c 5 'C. Therefore, Step 3 is a central step in the design of the structure-property models. Step 4. Predictions are made for the values of the molecular property for species that are not part of the training set via the obtained initial QSPR model. The unknown molecules are ~ t ~ ~ t u rrelated d l y to the initial set of compounds. Step 5. The predictions are tested with unknown molecules by experimental determination of the predicted properties. This step is rather involved because it requires acquiring or preparing the test molecules. Step 6. If the tests support the predictions, one presents the QSPR model in its final form with all necessary statistical characteristics.
If the tests do not support the initial QSPR model, it must be revised and
Table 11. Continued MTI
Alkane 2,2,4trimethylhexane 2,4,4trimethylhexane 2,2,5-trimethylhexane 22-dimethyiheptane 3,bdimethylheptane 44-dimethylheptane 3-ethyi-3-methylhexane 3,bdiethylpentane 23.4-trimethylhexane 2,4-dimethyl-3-ethyipentane 2,3,5-trimethylhexane 2,3-dimethylheptane 3-ethyl-2-methylhexane 3,4-dimethylheptane 3-ethyl-4methylhexane 2,4-dimethylheptane 4-ethyl-2-methylhexane 3.5-dimethyiheptane 2,5-dimethylheptane 2,6-dimethyiheptane 2-methyloctane 3-methyioctane 4-methyloctane Sethylheptane 4-ethylheptane nonane 2,2,3,3,4-pentamethylpentane 2,2,3,3-tetramethylhexane 3-ethyl-22.3-trimethylpentane 3,3.4,4-tetramethylhexane 2,2,3,4,4-pentamethylpentane 2,2,3,4-tetramethylhexane 3-ethyl-2,2,44rimethylpentane 2,3,4,4tetramethyihexane 2,2,3,5tetramethylhexane 2,2,3-trimethylheptane 2,2dimethyl-3-ethylhexane 3,3,4trimethylheptane 3.3-dimethyl-4-ethylhexane 2,3,3,4-tetramethylhexane 3,4,4-trimethylheptane 3,4-dimethyl-3-ethylhexane 3-ethyl-234-lrimethylpentane 2,3,3,54etramethylhexane 2,3,3-trimethylheptane 2.3-dimethyl-3-ethylhexane 33diethyl-2-methylpentane 2,2,4,4tetramethylhexane 2,2,5-trimethylheplane 2,5,54rimethylheptane 2,2,6-trimethyiheptane 2,2-dimethyloctane 3,3-dimethyloctane 4,4-dimethyloctane 3-ethyl-3-methylheptane 4-ethyl-4-melhylheptane 3,3-diethylhexane 2,3,4,5tetramethylhexane
121
58
4.4641
3.8140 Volume 69
436
13.9933
Number 9
161
September 1992
707
Table 11. Continued -
Alkane
J
2,3.4-trimethylheptane 2,3-dimethyi-4-ethylhexane 2,3-dimethyl-4-ethylhexane 2,4-dimethyl-3-ethyihexane
3,4,5-trimethyiheptane 2,4-dimethyl-3-isopropylpentane 3-isopropyl-2-methylhexane
2,35trimethylheptane 2,5-dimethyl-3-ethylhexane
2,4.5-trimethylheptane 2,3.6-trimethylheptane 2,3-dimethyloctane 3-ethyl-2-methylheptane 3.4-dimethyloctane 4-isopropylheptane 4-ethyl-3-methylheptane 43-dimethyloctane 3-ethyl-4-methylheptane 3.4-diethylhexane 2,4,6-trimethylheptane 2,4-dimethyloctane 4-ethyl-2-methylheptane 3,5-dimethyloctane 3-ethyl-5-methylheptane 2,5-dimethyloctane 5-ethyl-2-methylheptane 3.6-dimethyloctane 2.6-dimethyioctane 2.7-dimethyloctane 2-methylnonane 3-methylnonane 4-methylnonane 3-ethyloctane 5-methylnonane 4-ethyloctane 4-propylheptane decane
165
89
4.9142
the procedure repeated. The QSPR model thus established, even for a narrow class of compounds, is a very useful tool for predicting the properties of hypothetical compounds a n d for t h e search for new compounds with programmed properties (12).
3.5833 3.7561 3.7561 3.7979 3.6854 3.9835 3.7280 3.4617 3.6033 3.5027 3.3014 3.1296 3.3978 3.3088 3.4999 3.5637 3.3759 3.5299 3.6982 3.3374 3.1600 3.3908 3.2686 3.4123 3.1244 3.2555 3.1682 3.0333 2.9095 2.7732 2.8862 2.9680 3.0869 2.9984 3.2055 3.2951 2.6476
Step 1 The boiling points ('C) of the alkanes are taken from the CRC Handbook of Chemistry a n d Physics (49) and Beilstein (50).
Step 2 We will consider a t this stage all six topological indices discussed i n this report.
708
Journal of Chemical Education
find
Step 3 The following structure-property models are the most successful for each index considered: bp = 77.93 (M.97) ~30899'0'0137'- (3.35 f l . 0 2 ) 1 0 $ ~
-164.24 (i4.99)
An Instructive Example We will apply the procedure from the preceding section, to give a n instructive example of the design of the QSPR model for predicting the boiling points of alkanes. As the initial set we will consider alkanes with up to 8 carbon atoms (40 molecules).
MTI
(13)
Table 12. The Predicted Values of Boiling Points ('C) of Nonanes predicted boiling point Nonane 2,2,3,3-letramethylpenlane 2,2,3,4-tetramethylpentane
~q 14 119.26
~q 15 119.40
2,2,3-trimethylhexane 2,2-dimethyl-3-ethylpentane
3,3,4-trimethylhexane 2,3,3,4-tetramethylpentane
233-trimethylhexane 2,3-dimethyl-3-ethylpentane 2,2,4,4-tetramethylpentane
2,2,Plrimethylhexane 2.4,Ptrirnethylhexane 2.2,5-lrimethylhexane 22-dimelhylheptane 3.3-dimethylheptane 4.4-dimethylheptane 3-ethyl-3-methylhexane 3.3-diethylpentane 2,3,Ptrimethylhexane 2,4-dimethyl-3-ethylpentane
Figure 3. A flow diagram of the steps involved in the design of a QSPR model. 1: Source of experimental data. 2: Seledion of the topological index. 3: the- QSPR model. 4 Predictions. - -~ . Statistical - -~ .~ . .-work . and - .senino uo -r -5: Test ng the predictions. 6. The final foml ofthe OSPR model. S: Tests confirmeathe nit:al model. Tne model appears to be satlsfaclory for f~rtherwork. hS: Tens rejected the nit al model as not sat~sfactory. Tne model mJst be rev,seo and the proced~rerepeateo ~ n t i l the satisfactory model is obtained ~
2,3,5trimethylhexane 2,3-dimethylheptane 3-ethyl-2-methylhexane 3,4dimelhylheptane 3-ethyl-Pmethylhexane 2,4-dimethylheplane 4-ethyl-2-methylhexane 3,5-dimelhylheptane 2,5-dimethylheptane 2,6-dimethylheptane 2-methyloctane 3-methyloctane 4-methyloctane 3-elhylheptane 4-ethylheptane nonane
The most accurate models are those based on in Z (eq 14) and x (eq 15). They will be used in the next step.
The procedure may be repeated, and we will eventually arrive a t the best possible QSPR model for predicting the boiling points of alkanes. -
Step 4
Step 6
We use eqs 14 and 15 to predict the boiling points of nonanes (35 molecules) (see Table 12).
Step 5 We compare the predicted and experimental values of the nonane boiling ~ o i n t (see s Table 13). Both models have problems with some members of the nonane series. However. when S t e 3~is r e ~ e a t e dusine the boiling points of all alkanes with to 9 Arban atom; the QSPR models based on in Z and x did not improve. The slight improvement happened only when a hiparametric model (with x and N is the number of carbon atoms in alkane) was used. This model is given by
up
All three models expressed as 14, 15, and 1 9 may serve
as reliable models for predicting the alkane boiling points. Plots of bp vs in Z , bp vs X , and bp vs NXand the accompanying statistical data are given, respectively, in Figures 46. The boiling points of alkanes have been predicted many times (8,13,15,3037,40,51). Althoughmost of the QSPR models produced are very accurate (r > 0.998, s < 2 W, they suffer from several shortcomings. i. Methane was not considered. In some cases other lighter alkanes, such as ethane and propane, were also eliminated from the study. ii. Models were built for a limited set of alkanes, usually for C4-C7 families. iii. The complexity of some of the accurate QSPR models in the literature is forbidding. For example, one of the most a m rate QSPR models for predicting boiling points of alkanes is the following (40).(All alkanes with up to 9 carbon atoms have been considered but methsne.)
Volume 69 Number 9 Sevternber 1992
709
Table 13. Comparison between Predicted (Two Models) and Experimental Values of Boiling Points ('C) of Nonanes
Nonane
(bp)exp
Model (14)
Model (15)
Nonane
2,2,3,3-tetramethylpentane
2,3,4-trimethylhexane
2.2,3,4-tetramethylpentane
2,4-dimethyl-3ethylpentane
2.2,3-trimethylhexane
(bp).,
Model (14)
2,3,5-trimethylhexane
2,2-dimethyl-3ethylpentane
2.3-dimethylheptane
3,3,4-trimethylhexane
3-ethyl-2methylhexane
2,3,3,4-tetramethylpentane
3,4dimethylheptane
2,3,3-trlmethylhexane
3-ethyl-4-methylhexane
2.3-dimethyl-3ethylpentane
2,4dmethylheptane 4-ethyl-2-methylhexane
2,2,4.&tetramethylpentane
3,bdimethyiheptane
2,2+trimethylhexane
2,5dimethylheptane
2,4,&trimethylhexane
2,6dimethylheptane
2.23-trimethylhexane
2-methyloctane
2,2-dlmethylheptane
3-methyloctane
3,3-dimethylheptane
4-methyloctane
4,4-dimethylheptane
3-ethylheptane
3-ethyl-3-methylhexane
4-ethylheptane
32-diethylpentane
nonane
-2M 0.W
0.50
1.w
1.50
2w
In Z gure 4. A plot of bp vs In Zfor the first 40 alkanes.
710
Journal of Chemical Education
250
3.w
3.50
Model (15)
Figure 5. Aplot of bpvs x for the first 40 alkanes.
Figure 6. A plot of bpvs Ny for the first 75 alkanes.
Volume 69 Number 9 September 1992
711
I
The as follows.
in eq 20 are defined Figure 7. Examples of a path (3rd order), a cluster (3rd order) and a pathcluster (4th order)for a tree Tcorresponding to 3-methylpentane. The extended connectivity index
m ~ C[d(i) = dm
... d(m + l)la5
(21)
where m represents the order of possible fragments. When m = 1. framnents are edges which lead to the fint-order connekivitYindex 'x.
-
-
The zero-connectivity index u
where nl, n2,n3, and n4 are the numbers of vertices with valencies 1,2,3, and 4, respectively Connectivity indices "'x, of order m and type t can be obtained by summing analogous terms over subgraphs involving paths (t = p ) ,clusters (t = c), or path-cluster ( t = pc) combinations ofm edges. Examples of a path, a cluster and a path-cluster are given in Figure 7. To conclude this section we stress that there is no simple QSPR model for predicting boiling points over a wide range of alkanes. However, if we limit ourselves to a simple family of alkanes (especially with less than 10 carbon atoms), then simple aceurate models are possible (34). Conclusions In this report we presented a strategy for designing the quantitative structure-property relationships based on topological indices. The instructive example was directed to the design of the structure-property model for predicting the boiling points of alkanes. Six selected topological indices were tested. The most accurate QSPR models for alkane ~ o i n t sare based on ln 2.Y. and Nu. The accu~ ~boiline ~ racy of t h l bodel was judged according to thLcorrelation coefficient and the standard error. The umer limits for the accurate models were set at r > 0.995 z s < 5 T. We conclude that there is no simple single-parameter QSPR model for predicting the boiling points over a wide range of alkanes due to the great diversity among experimental values. Multivariate regression models appear to be verv accurate due to a varietv ~arametersinvolved in the correlation. Each of these p&.meters takes care of a certain structural detail of a large alkane. When all diverse structural features of alkanes are considered, the model usually gives extremely good agreement between the experimental and calculated boiling points. ~
~
~~~
-
Acknowledaement We are thankful to the Ministry of Science, Technology, and Informatics of the Republic of Cmatia for support.
712
Journal of Chemical Education
3. LipecL, R. LEnuimn. Tmrhi. Chem 1989.8, 1. 4. honey, R. J. Chem. Ed=. 1886,62,846. 5. DdiC,~R~uaiuaiB&Po~oi4 Skolaka knjige: W b . 1987. This is a bilibiligvsl edition: cmatlan and English. m &It ad micam legem uirivm in 6. Basmvick, R J. ~ I k - i ~ p h i l o e o p h inotvmlia mtum exUffntium; Runondinl: Venetia, 1763. The English translation ia also m4able: The TheoryafNolvrol P h i h p h y ; MIT Cambridge,MA, 1966. 7. Daw, H. EIPmntaofCkmimlPhllosophy; London, 1812. 8. %ajstiC, N. Chemiml Gmph Theory; CRC: BoeaRaton. FL, 1963:Vol. lI,Chapter
I1 h u n a y . D I1 InCh.mloolAppiicanomo/T~pd~g) ondUmph T h o ,fin& R B. Ed .Elsene,: Amsterdam. 1981; p 159. 12 Smkcneh. M. I.. Stankcnch. I V . M m + X. S R u m C k m Roo 1S88.57.337. 13. Hanscn.P J : Jura. P 1: J C h m Edvc LW. 65.575 11 Rsndk. M .I Math ChDm 1890.4, 337. 15 llopava H Bull C h e m S a . Jomn 1071.44.2332 . 16. Trinajatif.N. Ckmlml Gmkh Thewry, 2nd neviaeded.; CRC: BoeaRaton, FL,199% chapter 3. 17. huvray, D . H. J. MoL S t m t . (ThmhemJ 1988,285,187. 18. Randii. M. J. Moth. C h . 1891. 7.155. 19. Bonrheu. D . l b n s p l k . U J Chrm Phya. l(m. 67.4517. 20 F h l a b ~ n T.Bumms. .~ L V Math Chm. l M v l k ~ mHuh?. lW.9. 14.21:l 21 .\lullcr. W. R ; Szymanalu. K ; Knop. J V.. 'lhna).uc. S J Chem In/ Compur Sn 1880.30.160 22. Plav3iC.D.; N i b % S.;Rinajsti6,N. J Moth. Ck.m in p m s . 23. szymansld. K:~ o u e rw , R. ffiop, J. V;%sjati&, N. ~ n t J. . @onrum cham: Qunntum ChPm Symp. 1989,20,173. 24. Haran. F. Gmph Theary;Addison-Wesley: Reading,MA, 1971: 2nd prmtmg. 25. %ajetif,N. C h a m i d Gmph Thmy:CRC: Baca Raton, K, 1983;Val. 1. 26. Chartrand, G.Gmphs m Mothematical M&b; Rindle, we be^, and Sehmidt: B e ton, MA, 1977. 27. lhngstiC, N. In MATHICHEMICOMP 1967; Lacher, R C.. Ed.; Elsevier Amater dam, 1986, p83. 28. Sylvester, J. J.Natum 1878,17.264. 29. hbelta, F. 8. Dkrete Molhemniiml M&l; Rentiee-Hall: Englearaod CIS%NJ, 1976: p 56. 30. Wiener.H. J.Am Chem.Soc 1917,69.17. 31. RandiC, M.J.Am. Chem Soc. lW6,97,6MR. 82. %zinger, M.: Chr(den, J. R.; Dub0is.J. E. J. C h . 1°C Compul: &i. 19S5,26,23. 33. huvray, D. U %.Am 1988,254,40. 34. Seybld, P. 0.; May,M.;Bagal,U. A J Ckm. Edvc 1%87,84,575. 35. Kier. h B.:Hall. L.H. Molffvlor Conmtiuitv in Stmbre4ctiuihlAdwie.. Wiley: . N&Y&, 1986. 36. Rslaban,A. T C h . Phys. Lo#. l M , 89.399 37. Sehultz, H. P. J. Chem Inf Compvt S c i 1983.29.221. 88. PhraiC, D.; Nikoli&,S.; Trinajatik,N. J. Moth Chem, sutmuttedforpublicatim. 39. Ran&&,M.; Jeman-Bldif, B.; Gmaaman, S.C.; Rounay. D. H. Math. Compul. Mmklling 1968,6,571. 4C. Needham,D. E.; Wei,M.;&ybld.P G.J.Am.Chem S a 1988,120,4188. 41. Nizhnii, S. V.; Epehtein,N. A. Rum Chem Rou. 1078,47,363. 42. Hol, W. G. J A w u Ckm. Id.Ed*. En#. 1983, 26,767. 43. B a d , S. c.; Niemi. G. Vdth. G. D. I" C o m p v l n t i ~Chemiml ~l Gmph ThmX huvray, D . H.,Ed.; Nova: New Ymk,1990; p 235. U. Psta,B.,Mayer, J . M . A c l a P h a n Jugarl. 1990,40,315. 45. W.; h v i l k m , J. InPmtlool Applimtlolo o f Q ~ m t i f o t i iSm&=m4cIiuity Roiationahipa (QSARJ in Enuimnmnfd Clumiafry and lbdmlogy; M e r , w: Deviuem, J.. Ed*.: muarer:Dordnecht, 1990;p 1. 46. Topliaa. J. G.; Coste1lo.R. J. J M d . Chem la?& 15,1066. 47. lbpliss, J. G.; Edwards, R. P.J Med Chem 1818.22.1238. 48. Banchav, D.;Mekenyan, 0.J. M&. Ckm.. h pms. 49. We&, R. C. CRCHa&kofChrmlatnondPhysiac, 67th d , 3 d printing: CRC: Baea Raton, FL.1987. 50. Re&tPmbHandbueh &r%Mis~ishen Chamie. 51. N i p , P A,: Belaban, T.-8.;Balaban,A T J.Math. Chem 1987,1,61.
..
.
~
*;