How Chemical Structure Determines Physical, Chemical, and

Apr 21, 2005 - Center for Heterocyclic Compounds, Department of Chemistry,. University of ... radical polymerization, vulcanization of rubber), (iii) ...
3 downloads 0 Views 589KB Size
922

Energy & Fuels 2005, 19, 922-935

How Chemical Structure Determines Physical, Chemical, and Technological Properties: An Overview Illustrating the Potential of Quantitative Structure-Property Relationships for Fuels Science Alan R. Katritzky* and Dan C. Fara Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611 Received March 5, 2004

A brief summary of quantitative structure-property relationship methodology, together with an explanation of the approach using theoretical molecular descriptors to study diverse physical, chemical, and technological properties of organic compounds, is given. Studies of several properties of importance to fuels science are described including (i) physical properties of single molecular species (e.g., boiling points, melting points, refractive index), (ii) properties involving interactions between different molecules (e.g., critical micelle concentration, flash points, solvent effects, freeradical polymerization, vulcanization of rubber), (iii) solution properties (e.g., solvent polarity, solubility, chromatography), and (iv) biological properties, (e.g., toxicity). Future objectives of interest to fuel scientists are discussed.

1. Introduction Chemical structure is of paramount importance, not only to chemists, but also to all scientists and to humanity in general. As soon as a chemical structure is written we have defined all the properties of the compound in question: physical, chemical, biological, and technological. Of great importance is that chemical structure is invariant. Chemical structures are not affected by geography or history. They offer a unique lasting and definitive means of representation of each compound. Trying to deduce the properties of a compound from chemical structure has been an ongoing effort. Quantum chemistry has made enormous advances particularly in the past 25 years. Using the semiempirical and ab initio methods now available, we can deduce a great deal about a compound including its geometry, its charge distribution, and the way in which it interacts with UVvis and IR radiation and thus its spectral characteristics. Further we can predict, for example, many aspects of the NMR spectrum of a compound together with many other properties. However, despite these major advances, there still remain a very large number of properties that presently cannot be satisfactorily predicted using quantum-chemical methods. While this picture will undoubtedly change for the better, it seems likely that for the foreseeable future we will need to approach the correlation and prediction of many properties using other methods not only for biological and technological properties, but also for many physical and chemical properties. The other method most frequently applied is analysis of “quantitative structure-property/activity relation* To whom correspondence should be addressed. Phone: (352) 392-0554. Fax: (352) 392-9199. E-mail: [email protected].

ships” (QSPRs/QSARs). This approach attempts to use a set of structures for which a property has been measured and to relate quantitative values of that property to the chemical structures of the compounds. The concept of QSPRs dates back more than a century. In 1884 Mills developed a QSPR for predicting the melting points and boiling points of homologous series.1 Soon after, similar pioneering work appeared on QSARs between the potency of local anesthetics and oil/water partition coefficient,2 and between narcosis and chain length.3 In 1925, Langmuir proposed linking intermolecular interactions in the liquid state to the surface energy.4 The first theoretical whole molecule descriptors, the Wiener index5 and Platt number,6 were proposed in 1947 to model the boiling points of hydrocarbons. Seminal contributions to the area were made by Hammett7,8 and Taft9-12 with their development of linear free energy relationships (LFERs). Significant progress in QSAR methodology was made in the 1960s by Hansch and Fujita,13 who developed models connecting biological activities and the hydrophobic, electronic, and steric properties of compounds, (1) Mills, E. J. Philos. Mag. 1884, 17, 173. (2) Meyer, H. Arch. Exp. Pathol. Pharmakol. 1899, 42, 109. (3) Overton, E. Studien u¨ ber die Narkose zugleich ein Beitrag zur allgemeinen Pharmacologie; Verlag Gustav Fischer: Jena, Germany, 1901; p 141. (4) Langmuir, I. Colloid Symp. Monogr. 1925, 3, 48. (5) Wiener, H. J. Am. Chem. Soc. 1947, 69, 17. (6) Platt, J. R. J. Chem. Phys. 1947, 15, 419. (7) Hammett, L. P. Chem. Rev. 1935, 17, 125. (8) Hammet, L. P. Physical Organic Chemistry; McGraw-Hill: New York, 1940. (9) Taft, R. W. J. Am. Chem. Soc. 1952, 74, 2729. (10) Taft, R. W. J. Am. Chem. Soc. 1952, 74, 3120. (11) Taft, R. W. J. Am. Chem. Soc. 1953, 75, 4231. (12) Taft, R. W. J. Am. Chem. Soc. 1953, 75, 4538. (13) Hansch, C.; Fujita, T. J. Am. Chem. Soc. 1964, 86, 1616.

10.1021/ef040033q CCC: $30.25 © 2005 American Chemical Society Published on Web 04/21/2005

Potential of QSPR for Fuels Science

Energy & Fuels, Vol. 19, No. 3, 2005 923

Figure 1. Molecular descriptors.

and by Free and Wilson in their models of additive group contributions to biological activity.14 In the past 40 years QSAR methodology has expanded exponentially, and has become indispensable for productive applications in pharmaceutical chemistry and in computer-assisted drug design.15-21 Until 1970 QSPR analysis, or quantitative-structurerelated analysis of physicochemical properties, was essentially limited to analytical chemistry. The past three decades however have seen many efforts put into the development of the theoretical basis of QSPRs with important contributions from the groups of Abraham,22,23 Balaban,24 Hilal,25 Jurs,26 Katritzky and Karelson,27 Kier and Hall,28 Politzer,29 Randic,30 Trinjastic,31 and many others. The development of methodology was (14) Free, S. M.; Wilson, J. W. J. Med. Chem. 1964, 7, 395. (15) Martin, Y. C. Perspect. Drug Discovery 1998, 12, 3. (16) Norinder, U. Perspect. Drug Discovery 1998, 12, 25. (17) Maddalena, D. J. Expert Opin. Ther. Pat. 1998, 8, 249. (18) Kubinyi, H. Drug Discovery Today 1997, 2, 538. (19) Hansch, C.; Fujita, T. Status of QSAR at the End of the Twentieth Century. In Classical and Three-Dimensional QSAR in a Agrochemistry; Hansch, C., Fujita, T., Eds.; American Chemical Society: Washington, DC, 1995; pp 1-12. (20) Hansch, C.; Leo, A. Exploring QSAR, Fundamentals and Applications in Chemistry and Biology; American Chemical Society: Washington, DC, 1995. (21) Katritzky, A. R.; Fara, D. C.; Petrukhin, R. O.; Tatham, D. B.; Maran, U.; Lomaka, A.; Karelson, M. Curr. Top. Med. Chem. 2002, 2 (12), 1333. (22) Abraham, M. H. New Solute Descriptors for Linear Free Energy Relationships and Quantitative Structure-Activity Relationships. In Quantitative Treatments of Solute/Solvent Interactions; Politzer, P., Murray, J. S., Eds.; Elsevier: Amsterdam, 1994; pp 83-133. (23) Abraham, M. H.; Chadha, H. S.; Dixon, J. P.; Rafols, C.; Treiner, C. J. Chem. Soc., Perkin Trans. 2 1995, 887. (24) Balaban, A. T. J. Chem. Inf. Comput. Sci. 1997, 37, 645. (25) Hilal, S. H.; Carreira, L. A.; Karickhoff, S. W. Estimation of Chemical Reactivity Parameter and Physical Properties of Organic Molecules Using SPARC. In Quantitative Treatments of Solute/Solvent Interactions; Politzer, P., Murray, J. S., Eds.; Elsevier: Amsterdam, 1994; pp 291-353. (26) Stuper, A. J.; Brugger, W. E.; Jurs, P. C. Computer-Assisted Studies of Chemical Structure and Biological Function; John Wiley & Sons: New York, 1979. (27) Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Rev. 1996, 96 (3), 1027. (28) Kier, L. B.; Hall, L. H. Molecular Connectivity in StructureActivity Analysis; John Wiley & Sons: New York, 1986.

also supported by the simultaneous development of molecular-structure-based descriptors32,33 that allow a more and more precise description of molecules. QSPR analysis is now a well-established and highly respected technique to correlate diverse simple and complex physicochemical properties of a compound with its molecular structure, through a variety of descriptors. The basic strategy of QSPR analysis is to find optimum quantitative relationships, which can then be used for the prediction of the properties of molecular structures. Once a reliable equation has been obtained, it is possible to use it to predict that same property for other structures not yet measured or even not yet prepared. There are certain, rather obvious limitations to its use: (i) the family of compounds used to derive the QSPR/QSAR (the “training set”) should be chemically similar, and (ii) realistic predictions can only be made for compounds that are chemically related to some of those from which the QSPR/QSAR model was derived; i.e., predictions should be of interpolations or short extrapolations. At present a major limitation of the QSPR/QSAR approach is that mixtures cannot easily be modeled. Equations that are set up in this way utilize what are known as “descriptors” of the chemical structures. Strictly, a descriptor is any parameter that can be defined quantitatively from a chemical structure alone. Molecular descriptors can be divided into various classes as shown in Figure 1. The simplest are the constitu(29) Murray, J. S.; Politzer, P. A General Interaction Properties Function (GIPF): An Approach to Understanding and Predicting Molecular Interactions. In Quantitative Treatments of Solute/Solvent Interactions; Politzer, P., Murray, J. S., Eds.; Elsevier: Amsterdam, 1994; pp 243-289. (30) Randic, M.; Razinger M. On the Characterization of ThreeDimensional Molecular Structure. In From Chemical Topology to Three-Dimensional Geometry; Balaban, A. T., Ed.; Plenum Press: New York, 1996; pp 159-236. (31) Lucic, B.; Trinajstic, N. J. Chem. Inf. Comput. Sci. 1999, 39, 121. (32) Karelson, M. Molecular Descriptors in QSAR/QSPR; John Wiley & Sons: New York, 2000. (33) Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; Wiley-VCH: Weinheim, Germany, 2000.

924

Energy & Fuels, Vol. 19, No. 3, 2005

Katritzky and Fara

Figure 2. QSPR and QSAR models derived with CODESSA software.

tional descriptors derived from the atomic composition of the compound. Topological descriptors describe how the individual atoms of a compound are bonded with each other. Geometric descriptors and electrostatic descriptors obviously relate to geometry and charge distribution. Then there are a very large number of quantum chemical descriptors obtained by quantum mechanics from the structure. QSPR methodology has become more feasible and practical with the development of new software tools, which allow chemists to elucidate and to understand how molecular structure influences properties. Very importantly, this helps them to predict and prepare structures with optimum properties. The software is now more amenable for chemical and physical interpretation. There are tremendous opportunities for further developments in purely structure-based molecular descriptors32,33 for QSAR models and in the application of quantitative property-activity relationships (QSARs/ QSPRs) for predictions of physicochemical properties. In the past 15 years our group at the University of Florida, in a close collaboration with the group of Professor Mati Karelson at the University of Tartu, Estonia, has developed multipurpose statistical analysis software in the form of the CODESSA (Comprehensive Descriptors for Structure and Statistical Analysis) program, more recently updated as the CODESSA PRO program.34 A selection of the properties that we have modeled by CODESSA or CODESSA PRO is shown in Figure 2. A flowchart of a typical QSAR/QSPR treatment is shown in Figure 3. It is essential for a satisfactory treatment that good-quality input data are available in the form of a set of structures and quantitative measurements of the property, measured under similar conditions with satisfactory reproducibility and accuracy. The first stage of the treatment is the preparation of the input data using a molecular editor or by transferring structures from a chemical database. The structures and the corresponding experimental property data are imported. Note that in some cases analysis of the experimental data shows clearly that they are not (34) www.codessa-pro.com.

Figure 3. Flowchart of a QSPR/QSAR treatment.

linear, so a simple linear model is not appropriate. In these cases, another function (e.g., logarithmic) of the measured property can be used for the development of linear regression equations. Next, the 3D geometry is generated, most frequently using a molecular mechanics program; the 3D geometry is then optimized, usually utilizing a semiempirical quantum-chemical method such as AM1 as in MOPAC.35 The next stage is the generation of descriptors. The CODESSA PRO program enables many thousands of descriptors to be calculated. The definitions of the descriptors together with the original references are freely available on the CODESSA PRO home page.34 The descriptors are then combined with the measured property values in a statistical program which attempts to extract an equation utilizing a small number of descriptors (usually not more than four or five) and which satisfactorily correlates the measured value of the quantity. The higher the number of compounds employed in the training set, the higher the acceptable number of descriptors. Furthermore, the descriptors involved in any proposed model should not be highly intercorrelated. (35) Stewart, J. J. P. MOPAC 7.0, QCPE No. 455, http:// qcpe.chem.indiana.edu/.

Potential of QSPR for Fuels Science

Energy & Fuels, Vol. 19, No. 3, 2005 925

Figure 4. Experimental versus calculated boiling points by the two-parameter correlation equation. (Reprinted from ref 45. Copyright 1996 American Chemical Society.)

Figure 5. Prediction of melting points for substituted benzenes. (Reprinted from ref 46. Copyright 1997 American Chemical Society.)

It is essential to validate the results of any proposed QSPR conclusion. For an internal validation, the parent data set can be divided into three subsets: the first, fourth, seventh, etc. entries (listed in order of the magnitude of the property in question) go into the first subset (No. 1), the second, fifth, eighth, etc. into the second subset (No. 2), and the third, sixth, ninth, etc. into the third subset (No. 3). Then, three training subsets, sets A, B, and C, are prepared as a combination of two subsets (A from Nos. 1 and 2, B from Nos. 1 and 3, and C from Nos. 2 and 3). The remaining subsets (Nos. 3, 2, and 1, respectively) are then used as test sets. For each training subset the correlation equation is derived with the same descriptors, and the equation obtained is used to predict the property of interest values for the compounds from the corresponding test set. The efficiency of QSPR models for prediction is also estimated using the cross-validation (leave one out method) correlation for both the full set and each training set. In other words, all available data are used for both fitting and assessing. As Hawkins et al.36 (36) Hawkins, D. M.; Basak, S. C.; Mills, D. J. Chem. Inf. Comput. Sci. 2003, 43, 579.

suggested recently, it is preferable to use all data for the calibration step and check the fit by cross-validation, making sure that the cross-validation is carried out correctly. During the past decade, our group has provided general reviews covering “QSPR and QSAR models derived using large molecular descriptor spaces”,37 “structurally diverse QSPR correlations of technologically relevant physical properties”,38 “interpretation of QSPR and QSAR relationships”,39 and “the present utility and future potential for medicinal chemistry of QSAR/QSPR with whole molecule descriptors”.21 Among many other important reviews from other groups the following are illustrative: “QSAR and QSPR based solely on surface properties?”,40 “selection of molecular descriptors for quantitative structure-activity relation(37) Karelson, M.; Maran, U.; Wang, Y.; Katritzky, A. R. Collect. Czech. Chem. Commun. 1999, 64 (10), 1551. (38) Katritzky, A. R.; Maran, U.; Lobanov, V. S.; Karelson, M. J. Chem. Inf. Comput. Sci. 2000, 40 (1), 1. (39) Katritzky, A. R.; Petrukhin, R.; Tatham, D.; Basak, S.; Benfenati, E.; Karelson, M.; Maran, U. J. Chem. Inf. Comput. Sci. 2001, 41 (3), 679. (40) Clark, T. J. Mol. Graphics Modell. 2004, 22 (6), 519.

926

Energy & Fuels, Vol. 19, No. 3, 2005

Katritzky and Fara

Figure 6. Correlation of the refractive indices of some organic compounds. (Reprinted from ref 48. Copyright 1998 American Chemical Society.)

Figure 7. Correlation and prediction of the refractive indices of 95 amorphous homopolymers by QSPR analysis. (Reprinted from ref 49. Copyright 1998 American Chemical Society.)

ships”,41 “artificial neural networks in molecular structures-property studies”,42 “computer-aided drug design: the role of quantitative structure-property, structure-activity and structure-metabolism relationships (QSPR, QSAR, QSMR)”,43 “uniform-length molecular descriptors for quantitative structure-property relationships (QSPR) and quantitative structure-activity relationships (QSAR): classification studies and similarity searching”.44 2. Correlation of the Physical Properties of Single Molecular Species 2.1. Boiling Points. Boiling point was one of the first properties for which we derived QSPRs.45 We found a two-parameter correlation of good statistical quality (Figure 4); significantly the two parameters in question were both physically reasonable. It is intuitively evident (41) Sutter, J. M.; Jurs, P. C. Data Handl. Sci. Technol. 1995, 15, 111. (42) Novic, M.; Vracko, M. Data Handl. Sci. Technol. 2003, 23, 231. (43) Buchwald, P.; Bodor, N. Drugs Future 2002, 27 (6), 577. (44) Baumann, K. Trends Anal. Chem. 1999, 18 (1), 36. (45) Katritzky, A. R.; Mu, L.; Lobanov, V. S.; Karelson, M. J. Phys. Chem. 1996, 100 (24), 10400.

that boiling point is critically influenced by two characteristics of a molecule: first the molecular weight and second the intermolecular attractive forces between molecules. Of the two parameters in the model, the cubic root of the gravitation index addresses the first, and the hydrogen donor charged surface area addresses the second. Both descriptors have explicit physical meaning, the first being connected with dispersion and cavityformation effects in liquids (gravitation index), and the second with the hydrogen-bonding ability of compounds. The data set of 298 important organic compounds used in the study includes saturated and unsaturated hydrocarbons, halogenated compounds, and hydroxyl, cyano, amino, ester, ether, carbonyl, and carboxyl functionalities.45 2.2. Melting Points. The prediction of melting points is much more difficult. For a start, the melting point is not a unique characteristic of a compound. Many compounds crystallize in more than one polymorphic form, each of which has its own melting point. In such a situation it is necessary to restrict the range of structures and our results for the correlation of the melting points for substituted benzenes46 as shown in Figure 5. While 9 descriptors were found necessary, this

Potential of QSPR for Fuels Science

Energy & Fuels, Vol. 19, No. 3, 2005 927

Figure 8. Correlation of the viscosities of organic liquids. (Reprinted with permission from ref 50. Copyright 2000 John Wiley & Sons.)

is statistically quite in order since there are 443 data points (mono- and disubstituted benzenes); also there are no obvious outliers.46 Of course, in the future it should be possible to predict melting points much more accurately, but for this we will first need to be able to predict the crystal habit (or habits) in which a compound would crystallize and then estimate more exactly the interactive forces of attraction and repulsion existing in the crystal. We have also undertaken the correlation of the melting points of some ionic liquids.47 2.3. Refractive Index. Correlation of the refractive indices of liquids48 is much more tractable than that of melting points. Our five-parameter correlation developed for 125 diverse organic compounds is shown in Figure 6 (see the Supporting Information for details of the descriptors). The most important descriptor is the HOMO-LUMO energy gap, which is physically understandable with respect to refractive index. This descriptor is defined as the energy difference between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO). Both the refractive index and the HOMO-LUMO energy gap are related to the polarizability of a molecule. A small difference between HOMO-LUMO energies usually means that the molecule is relatively easy to polarize. Figure 7 shows an extension of this work to correlate refractive indices of linear polymers.49 In the calculation of descriptors for polymeric molecules, the techniques used for small molecules cannot be applied. However, in the case of linear polymers above a certain chain length, we have found that it is possible to use the repeating unit to calculate appropriate descriptors, and this was done for 95 amorphous homopolymers49 for which the refractive indices were known. The majority of the polymers examined fall into the classes of homochain polymers (only carbon atoms in the main (46) Katritzky, A. R.; Maran, U.; Karelson, M.; Lobanov, V. S. J. Chem. Inf. Comput. Sci. 1997, 37 (5), 913. (47) Katritzky, A. R.; Jain, R.; Lomaka, A.; Petrukhin, R.; Karelson, M.; Visser, A. E.; Rogers, R. D. J. Chem. Inf. Comput. Sci. 2002, 42 (2), 225. (48) Katritzky, A. R.; Sild, S.; Karelson, M. J. Chem. Inf. Comput. Sci. 1998, 38 (5), 840. (49) Katritzky, A. R.; Sild, S.; Karelson, M. J. Chem. Inf. Comput. Sci. 1998, 38 (6), 1171.

Figure 9. Representative structures of nonionic surfactants. (Reprinted from ref 51. Copyright 1996 American Chemical Society.)

chain) and polyoxides, but several polyamides and polycarbonates were also included. Note that the HOMO-LUMO energy gap is again the most important descriptor. The other four parameters involved in the QSPR model are described in the Supporting Information. 2.4. Viscosity. Figure 8 shows our correlation of the viscosities of organic liquids.50 Viscosity is highly important industrially, for example, in the transfer or movement of bulk quantities of liquids in the petroleum industry and the general fold of chemical engineering. The best five-parameter correlation equation obtained for the entire data set of 361 organic compounds containing C, H, N, O, S, and/or halogens involves the following theoretical molecular descriptors: (i) the hydrogen-bonding donor charged surface area, (50) Katritzky, A. R.; Chen, K.; Wang, Y.; Karelson, M.; Lucˇic´, B.; Trinajstic´, N.; Suzuki, T.; Schu¨u¨rmann, G. J. Phys. Org. Chem. 2000, 13 (1), 80.

928

Energy & Fuels, Vol. 19, No. 3, 2005

Katritzky and Fara

Figure 10. Correlation of the critical micelle concentrations of nonionic surfactants. (Reprinted from ref 51. Copyright 1996 American Chemical Society.)

HDCA(2), (ii) the gravitational index (GI) over all bonded atoms I, j in the molecule, (iii) the relative number of rings in the molecule, Nrings, (iv) the fractional positive partial charged surface area, FPSA(3), and (v) the minimum atomic state energy for a C atom, Emin(C) (see the details in the Supporting Information). The most important descriptor is the hydrogen-bonding donor charged surface area, which shows that hydrogen bonding is a key factor in determining the viscosity of liquids. The quality of the graph is less than perfect, but this may be related to errors connected with the measurement of viscosity, which is highly dependent on temperature.50 3. Correlation of Properties Involving Interactions between Different Molecules We now pass to the correlation of properties which, while still of single species, involve interactions between two chemical compounds, and as the outset we consider surfactants, which are of great industrial and practical significance. 3.1. Critical Micelle Concentration. A major characteristic of surfactants is their critical micelle concentration. We have examined the critical micelle concentrations of nonionic surfactants,51 representative structures of which are shown in Figure 9. A total of 77 critical micelle concentrations of compounds measured under the same experimental conditions were collected from the literature. In the case of surfactants, we have the possibility of utilizing not only whole molecule descriptors, but also “fragment descriptors” since surfactants have two distinct partssthe hydrophilic head and the hydrophobic tail. We calculated from our program 400 molecular descriptors for each molecule, 1/3 of which were related to the whole molecule, 1/3 to the tails, and 1/3 to the heads.51 Figure 10 demonstrates an extremely good correlation with just three descriptors. All three are fragment descriptors: two topological descriptors reflect the bulk and branching of the tail (hydrophobic fragment), and the remaining descriptor is a measure of the size of the hydrophilic (51) Huibers, P. D. T.; Lobanov, V. S.; Katritzky, A. R.; Shah, D. O.; Karelson, M. Langmuir 1996, 12 (6), 1462.

Figure 11. Correlation of flash points with predicted boiling points. (Reprinted from ref 54. Copyright 2001 American Chemical Society.)

fragment. Interpretation of the developed model led to the following observations: (i) the critical micelle concentration of nonionic surfactants in aqueous solution is primarily determined by the hydrophobic part of the molecule, (ii) the logarithm of the critical micelle concentration decreases with an increase in the size of the hydrophobic fragment and increases with an increase in the relative size of the hydrophilic fragment, and (iii) hydrophobicity is affected by branching of the hydrophobic fragment and the presence of heteroatoms.51 A similar result was obtained with anionic surfactants.52 At this point it is useful to contemplate the utility of the QSAR/QSPR approach.53 Obviously there are direct benefits in that finding an equation to link a property or activity with chemical structure enables the prediction of this property for unmeasured compounds, which (52) Huibers, P. D. T.; Lobanov, V. S.; Katritzky, A. R.; Shah, D. O.; Karelson, M. J. Colloid Interface Sci. 1997, 187 (1), 113. (53) Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 24 (4), 279.

Potential of QSPR for Fuels Science

Energy & Fuels, Vol. 19, No. 3, 2005 929

Figure 12. Acceleration of vulcanization: rubber rheometer curve. (Reprinted from ref 60. Copyright 1999 American Chemical Society.) Table 1. Correlations of the Activities of All Accelerator Classesa property time to scorch, ts2 sulfenamides and sulfenimides zinc complexes of sulfenamides and sulfenimides max cure rate, mxr sulfenamides and sulfenimides zinc complexes of sulfenamides and sulfenimides a

no. of data points

no. of descriptors

R2

R2cv

23 23

4 4

0.942 0.926

0.908 0.871

23 23

4 4

0.925 0.967

0.881 0.945

Reprinted from ref 60. Copyright 1999 American Chemical Society.

leads directly to the refinement of synthetic routes. More precisely it enables targeted molecule design. However, in addition to these direct benefits, there are also significant indirect benefits. As we have just seen for the nonionic surfactants the equations found allow us to better understand just how molecular structures control molecular behavior, thus helping to elucidate the internal mechanisms and enabling classifications. 3.2. Flash Points. Flash points are also of great industrial and practical importance. We have derived QSPRs for flash points54 as shown in Figure 11. It is well-known that flash points correlate well with boiling points. In the equation shown in Figure 11 we have included not the experimental boiling point, but the boiling point predicted by our previously derived equation.55 This enables the correlation of Figure 11 to be

used to predict flash points of compounds for which no measured boiling point is available. The other two parameters involved in the regression equation developed for 271 diverse organic compounds are (i) the difference in the positively charged partial surface area and the negatively charged partial surface area, DPSA, and (ii) the minimum electron attraction for a C atom, Ee-n,C (see the Supporting Information for descriptor details). DPSA is responsible for polar interactions between molecules, whereas the quantum-chemical descriptor Ee-n,C can be related to the reactivity of any carbon atom within the molecule in a combustion reaction.54 3.3. Solvent Effects on Decarboxylation Rates. Another chemical property, which is of considerable biological importance and has been much discussed,56,57

(54) Katritzky, A. R.; Petrukhin, R.; Jain, R.; Karelson, M. J. Chem. Inf. Comput. Sci. 2001, 41 (6), 1521.

(55) Katritzky, A. R.; Lobanov, V. S.; Karelson, M. J. Chem. Inf. Comput. Sci. 1998, 38 (1), 28.

930

Energy & Fuels, Vol. 19, No. 3, 2005

Katritzky and Fara

Figure 13. Correlation of glass transition temperatures for a general set of polymers. (Reprinted from ref 61. Copyright 1998 American Chemical Society.)

Figure 14. A unified treatment of solvent polarity. (Reprinted from ref 63. Copyright 1999 American Chemical Society.)

is the solvent effect on the decarboxylation rates of 6-nitrobenzisoxazole-3-carboxylates58 as measured for 23 pure organic solvents and water. The descriptors relate to the solvent and indicate that the branching and connectivity of the solvent molecules have an important bearing on the interactions of the solvent with the substrate and/or transition state in the decarboxylation process. The three-parameter correlation (R2 ) 0.909, R2cv ) 0.870, F ) 66.21, s2 ) 0.507) derived58 relates the log k values of the rate of decarboxylation of 6-nitrobenzisoxazole-3-carboxylates to (i) the hydrogen acceptor accessible surface area, HASA, (ii) the structural information content (order 1), 1SIC, and (iii) the image of the Onsager-Kirkwood solvation energy, SEOK (see the discussion in the Supporting Information). The significance of the polarity of the solvent is revealed by the contribution of the SEOK: with an increase in the value of this descriptor the energy of activation for the decarboxylation process diminishes. 3.4. Free-Radical Polymerization. Free-radical polymerization of styrene and other olefins is of great (56) Catalan, J.; Diaz, C.; Garcia-Blanco, F. J. Org. Chem. 2000, 65 (11), 3409. (57) Shirai, M.; Smid, J. J. Am. Chem. Soc. 1980, 102 (8), 2863. (58) Katritzky, A. R.; Perumal, S.; Petrukhin, R. J. Org. Chem. 2001, 66 (11), 4036.

industrial importance. To control the average chain length of the polymers formed, an additive is used which is able to transfer the radical center, thus ending one chain and starting another chain. The study employed a random set of 90 organic additives.59 The so-called “transfer constants” for these additives have been extensively documented, but apparently never previously correlated with structure. The experimental data of “transfer constants” relate to the radical polymerization of styrene at 60 °C. Notably, the five descriptors from the proposed model (R2 ) 0.818, R2cv ) 0.795, F ) 75.56, s2 ) 0.680) are consistent with the assumed mechanism of chain transfer and reveal the following: (i) additives with low LUMO energies are more reactive, (ii) the polarity and hydrogen-bonding ability of the transfer agents are important in facilitating the transfer reaction, and (iii) the reactivity of the weakest bond at the C atom59 is significant (more details about the descriptors are available in the Supporting Information). 3.5. Vulcanization of Rubber. The vulcanization of rubber is of great industrial importance particularly with respect to the manufacture of tires. As is wellknown, vulcanization is effected using sulfur to form cross-links between the linear polymer chains of natural or synthetic rubber. Sulfur alone works too slowly, and thus, an “accelerator” has to be added. Accelerators used for the vulcanization of rubber are generally heterocyclic disulfides, sulfenamides, or sulfenimides.60 Is important that in the vulcanization process there is a delay before the onset of cross-linking, and that after this delay the vulcanization should proceed rapidly and irreversibly. Compounds are tested as accelerators for these various properties by constructing a “rubber rheometer curve” (see Figure 12)60 in a machine that measures the change in the torque (“stiffness” of the rubber undergoing vulcanization). It is important that the torque does not immediately increase but for a certain period change is (59) Ignatz-Hoover, F.; Petrukhin, R.; Karelson, M.; Katritzky, A. R. J. Chem. Inf. Comput. Sci. 2001, 41 (2), 295. (60) Ignatz-Hoover, F.; Katritzky, A. R.; Lobanov, V. S.; Karelson, M. Rubber Chem. Technol. 1999, 72 (2), 318.

Potential of QSPR for Fuels Science

Energy & Fuels, Vol. 19, No. 3, 2005 931

Figure 15. Polarity scales: PCA loadings. (Reprinted from ref 63. Copyright 1999 American Chemical Society.)

Figure 16. Solvents: PCA scores. (Reprinted from ref 63. Copyright 1999 American Chemical Society.)

delayed enough to formulate the tire or other article but then proceeds rapidly and irreversibly to a maximum hardness. We considered 23 compounds that have been measured for their potential as accelerators. Together with colleagues at the Flexsys Co.,60 we investigated the possibility of using CODESSA to correlate the structure of accelerators with (i) the time to scorch, ts2, and (ii) the maximum rate of vulcanization, mxr. Modeling was done on both the parent molecular accelerator (12 sulfenamides, 11 sulfenimides) and also on a zinc complex of the accelerator with thiolate fragments.60 The statistical characteristics of the QSPR models are shown in Table 1, and the list of descriptors is given in the Supporting Information. 3.6. Glass Transition Temperatures. Glass transition temperatures are of great importance in polymer chemistry. We collected glass transition temperatures for 88 well-known non-cross-linked homopolymers from the literature and were able to correlate the glass transition temperatures using descriptors derived from the polymer repeating unit61 as shown in Figure 13. The (61) Katritzky, A. R.; Sild, S.; Lobanov, V. S.; Karelson, M. J. Chem. Inf. Comput. Sci. 1998, 38, 300.

molecular descriptors involved in the correlation relate rationally to the physical phenomena, because Tg is believed to be determined both by the shape/bulkiness of the repeating units (which are reflected in our equation by the moment of inertia and the Kier shape index) and by intermolecular electrostatic interactions (accounted for by the most negative atomic charge, HASA-2/TFSA and FPSA-3). The R2 of the correlation for Tg/M values was 0.946, and the standard error was 0.33 K mol g-1.61 A complete list of descriptors is given in the Supporting Information. 4. Solvents, Partitioning, and Chromatography The next section of this paper deals with questions related to solvents, solubility, chromatography, and partitioning. So far we have restricted ourselves to onedimensional correlations in which only one parameter (i.e., one chemical structure) has been varied at the same time. Now we consider cases where several parameters are being varied simultaneously, which we have termed “multidimensional QSPRs”. We first deal with solvent polarity. 4.1. Solvent Polarity. Solvent polarity is of enormous significance both in practical applications and as

932

Energy & Fuels, Vol. 19, No. 3, 2005

Katritzky and Fara

Figure 17. Correlation of air/water partition coefficients for 406 organic compounds. (Reprinted from 64. Copyright 1996 American Chemical Society.)

Figure 18. Calculated vs experimental values of response factors: the case of a gas flame ionization detector. (Reprinted from ref 73. Copyright 1994 American Chemical Society.)

a fundamental parameter. It has long been the aim of investigators to provide a quantitative assessment of solvent polarity. Indeed literally hundreds of scales62 have been proposed based on diverse properties including kinetics, solvatochromic effects, and enthalpies. In our first comprehensive approach63 (see Figure 14) to this problem we selected 40 scales and 40 solvents and constructed a matrix. There were very many gaps in this matrix, but by carrying out a QSPR analysis for each individual scale, we were able to predict values for the missing points and fill in the matrix. Each scale had four descriptors, and most QSPRs had good statistical quality. With the complete matrix it was possible to carry out a principal component analysis (see Figure 14). We found that the first three components describe 75% of the variance; now we were able to classify the scales (62) Katritzky, A. R.; Fara, D. C.; Yang, H.; Ta¨mm, K.; Tamm, T.; Karelson, M. Chem. Rev. 2004, 104 (1), 175. (63) Katritzky, A. R.; Tamm, T.; Wang, Y.; Karelson, M. J. Chem. Inf. Comput. Sci. 1999, 39 (4), 692.

from the principal component loadings as shown in Figure 15. This shows that the scales are clustered logically and each cluster is defined by rather similar physical measurements. Even more impressively, the plot of the first principal component analysis score against the second classified the solvents in a highly rational manner (see Figure 16). In the lower right hand quadrant are found all the hydroxylic solvents and only the hydroxylic solvents. In similar clusters we find the dipolar aprotic solvents, the polar solvents, and the nonpolar solvents. We are presently engaged in a major expansion of this work to a much large number of scales and solvents. 4.2. Solubility. We now consider the phenomenon of solubility. One paper by us correlated the air/water partition coefficients for 406 structurally very diverse organic compounds (see Figure 17), quantities of great importance environmentally.64 The data set includes saturated and unsaturated hydrocarbons, halogenated compounds, and compounds containing hydroxyl, cyano,

Potential of QSPR for Fuels Science

Energy & Fuels, Vol. 19, No. 3, 2005 933

Figure 19. Insect methylalkane identification. (Reprinted from ref 74. Copyright 2000 American Chemical Society.)

Figure 20. Prediction of UV spectral absorbance using QSPR analysis. (Reprinted from ref 75. Copyright 2002 American Chemical Society.)

amino, nitro, thio, ester, ether, carbonyl, and carboxyl functional groups plus furan, pyran, pyridine, and pyrazine rings. The correlation shown utilizes theoretical descriptors only (see the Supporting Information), so the air/water partition coefficient can be predicted for unmeasured compounds. Analogous work has been carried out for other systems, for example, the solubility of gases and vapors in ethanol.65 More recently we have been involved in a general treatment of solubility as described in two published papers66,67 and in unpublished work. The overall data collected include about 154 solvents and 397 solutes. In the reported results,66,67 the number of solutes used for (64) Katritzky, A. R.; Mu, L.; Karelson, M. J. Chem. Inf. Comput. Sci. 1996, 36 (6), 1162. (65) Katritzky, A. R.; Tatham, D. B.; Maran, U. J. Chem. Inf. Comput. Sci. 2001, 41 (2), 358.

Table 2. Correlation of Relative Sweetness: Subsets subset/class of sweeteners acesulfamates aldoximes R-arylsulfonylalkanoic acids guanidines natural products peptides sulfamates ureas and thioureas miscellaneous

no. of data points

no. of descriptors

8 47 10

2 2 2

0.996 677.0 0.0007 0.835 112.0 0.0100 0.941 55.3 0.0920

27 20 87 9 30 7

3 2 5 2 3 1

0.802 31.1 0.905 81.1 0.689 35.7 0.919 34.0 0.888 68.4 0.957 112.0

R2

F

s2

0.0462 0.0480 0.0200 0.0034 0.0112 0.0697

each solvent varies from 226 for hexadecane to less than 3 for several solvents. We have reported our results66 on the modeling of 69 of these solvents, which span from (66) Katritzky, A. R.; Oliferenko, A. A.; Oliferenko, P. V.; Petrukhin, R.; Tatham, D. B.; Maran, U.; Lomaka, A.; Acree, W. E., Jr. J. Chem. Inf. Comput. Sci. 2003, 43 (6), 1794.

934

Energy & Fuels, Vol. 19, No. 3, 2005

Katritzky and Fara

Figure 21. Correlation of class I aqueous toxicities (90 compounds). (Reprinted from ref 77. Copyright 2001 American Chemical Society.)

Figure 22. Nitrobenzene toxicities. (Reprinted with permission from ref 78. Copyright 2003 John Wiley & Sons.)

14 to 226 solutes, with the average number of solutes per solvent being 48. The remaining 72 solvents have less than 14 experimental points available. We also analyzed and reported the modeling of 80 selected solutes67 by choosing only those that have reliable solubility data for at least 15 solvents. The correlations were examined by holding the solute constant and varying the solvent and vice versa.66,67 Good-quality statistical correlations were obtained which should enable us to construct a large matrix and fill in the missing points (work in progress). At that stage we plan a principal component analysis, which should give considerable insight into the whole phenomenon of solubility (also in progress). 4.3. Chromatography. We have applied QSPR analysis to gas chromatography. Similar work on the correlation between molecular structures and retention indices of different classes of compounds has been carried out (67) Katritzky, A. R.; Oliferenko, A. A.; Oliferenko, P. V.; Petrukhin, R.; Tatham, D.; Maran, U.; Lomaka, A.; Acree, W. E., Jr. J. Chem. Inf. Comput. Sci. 2003, 43 (6), 1806.

by other authors.68-72 The correlation of retention times73 was studied for a data set of 152 individual structures incorporating a wide cross section of classes of organic compounds. Retention times, of course, depend on many experimental factors such as carrier gas, length of the column, packing of the column, temperature, and so on. Importantly, response factors which are needed to convert a gas chromatographic analysis from qualitative to quantitative are unaffected by these experimental factors. We have been able to correlate response factors for a gas flame ionization detector73 as shown in Figure 18. Note that here we have introduced as descriptors the concept of “effective carbon atoms”. This concept comes (68) Stanton, D. T.; Jurs, P. C. Anal. Chem. 1989, 61, 1328. (69) Stanton, D. T.; Jurs, P. C. Anal. Chem. 1990, 62, 2323. (70) Whalen-Pedersen, E. K.; Jurs, P. C. Anal. Chem. 1981, 53, 2184. (71) Georgakopoulos, C. G.; Kiburis, J. C.; Jurs, P. C. Anal. Chem. 1991, 63, 2021. (72) Georgakopoulos, C. G.; Tsika, O. G.; Kiburis, J. C.; Jurs, P. C. Anal. Chem. 1991, 63, 2025. (73) Katritzky, A. R.; Ignatchenko, E. S.; Barcock, R. A.; Lobanov, V. S.; Karelson, M. Anal. Chem. 1994, 66 (11), 1799.

Potential of QSPR for Fuels Science

from flame ionization detector theory, which has shown73 that certain carbon atoms in a structure are more effective in producing pyrolysis products that conduct electricity than others. The quality of the Figure 18 correlation is good, and the molecular descriptors involved in this six-parameter regression equation are listed in the Supporting Information. An application of gas chromatographic retention times is shown in Figure 19 for the identification of methylalkanes.74 Many insect pheromones are straight chain paraffins with a number of methyl side chains. We have been able to correlate retention times for a large number of such compounds with their structures, which offers a useful additional tool in the identification of insect pheromones that are produced and isolated in extremely small quantities. Whereas gas chromatographic response factors vary within a factor of 2 or 3, the widely used high-pressure liquid chromatography frequently uses a UV spectroscopy detector. Here, the response factors can vary by factors of many thousands. Recently75 we took a first step toward predicting UV spectra absorbance using QSPR analysis for such detectors as shown in Figure 20. Note that the descriptors found are physically reasonable as is described in the Supporting Information. Presently, there is a considerable scatter, but it is hoped that further work will refine this approach.75 5. Biological Properties We have derived QSPRs for relative sweetness,76 RS. The statistical characteristics of the correlations are shown for nine subsets of sweet compounds in Table 2: each subset consists of a distinct structural class as given in the table. However, when all the sweet compounds were treated together, only a poor correlation was obtained76 (R2 ) 0.686, F ) 101, s2 ) 0.0110) for the five-parameter regression equation. This is perhaps to be expected because it is now known that there is more than one biological detector of sweetness in the mouth.76 The influence of the descriptors emerged from the foregoing correlations on the sweetness potency varies as was evidently shown by the relative magnitudes of their coefficients in the QSPR models. The constitutional, topological, geometrical, and surface- and/or size-related descriptors may influence the binding of the sweetener to the receptor, while the electronic and charge-related descriptors may influence the chemical reactivity and/ or the electrostatic interactions between the sweetener and the receptor. The molecular descriptors involved in the models given in Table 2 are listed in the Supporting Information. Further correlations of biological properties are shown in Figure 21 for aqueous toxicities77 (LC50 for Poecilia (74) Katritzky, A. R.; Chen, K.; Maran, U.; Carlson, D. A. Anal. Chem. 2000, 72 (1), 101. (75) Fitch, W. L.; McGregor, M.; Katritzky, A. R.; Lomaka, A.; Petrukhin, R.; Karelson, M. J. Chem. Inf. Comput. Sci. 2002, 42 (4), 830. (76) Katritzky, A. R.; Petrukhin, R.; Perumal, S.; Karelson, M.; Prakash, I.; Desai, N. Croat. Chem. Acta 2002, 75 (2), 475. (77) Katritzky, A. R.; Tatham, D. B.; Maran, U. J. Chem. Inf. Comput. Sci. 2001, 41 (5), 1162.

Energy & Fuels, Vol. 19, No. 3, 2005 935

reticulata measured for 293 various organic compounds divided into four classes of toxins), and Figure 22 for nitrobenzene toxicities78 (IGC50 for the aquatic ciliate Tetrahymena pyriformis calculated for 97 nitrobenzene derivatives). We hope at some future time to tackle a general treatment of toxicity. This would assemble a large matrix of structures against measured toxicities. However, the work is enormous as there are several hundred thousands of structures for which toxicities have been measured, and tens of thousands of ways of measuring toxicities. Thus, such a treatment would need to be a collaborative effort and would need to initially consider a relatively small portion of the whole. 6. Future Developments In addition to the possibility of treating toxicity in the way just mentioned, many other challenges remain. So far, we have only treated single structures, whereas in real life, in many cases, mixtures of molecules are involved. Another big problem is the question of molecular conformation; this is particularly important when several aliphatic side chains are present where the determination of the true energy minimum is difficult and in some cases not appropriate because many different conformations exist simultaneously. The calculation of descriptors frequently applies to isolated molecules in the gas phase, and this is usually not the case. Another problem is the treatment of temperature dependencies. Thus, many objectives still remain. Acknowledgment. We acknowledge the help of all our colleagues and, above all, that of Professor Mati Karelson and his group at the University of Tartu, Estonia. Postdoctoral researchers and students who have been involved in our work are mentioned in the references; in particular we thank Viktor Lobanov, Andre Lomaka, Uko Maran, Alexander Oliferenko, Ruslan Petrukhin, and Tarmo Tamm. In addition, we have received significant financial and scientific help from the following industrial colleagues: (i) Michael Siskin, Bill Green, George Knudsen, and Manuel Francisco from Exxon, Corporate Lab, (ii) Jack Johnson and Rich Schlossberg from Exxon, Baton Rouge, (iii) Kam Wah Law from 3M, (iv) Fred Ignatz-Hoover from Monsanto/Flexsys, (v) Indra Prakash from Nutrasweet, (vi) Xiangfu Lan from Sandoz/Clarient, (vii) Ramiah Murugan, Martin Grandze, and Joe Toomey from Reilly, (viii) Dinesh Shah and Paul Huibers from the University of Florida, Department of Chemical Engineering, (ix) Dave Carlson from the University of Florida, USDA, and (x) Andy Holder from Semichem. Finally, we thank Dr. Michael Siskin and Dr. Charles D. Hall for critical comments and Dr. Minati Kuanar for checking a draft of this manuscript. Supporting Information Available: Table SAI giving details about the (i) properties studied, (ii) data sets used, and (iii) descriptors involved in the QSPR models discussed in the paper. This material is available free of charge via the Internet at http://pubs.acs.org. EF040033Q (78) Katritzky, A. R.; Oliferenko, P.; Oliferenko, A.; Lomaka, A.; Karelson, M. J. Phys. Org. Chem. 2003, 16 (10), 811.