Correlation of Boiling Points with Molecular Structure. 1. A Training Set

structural and statistical analysis) technique. A highly significant two-parameter correlation (R2 ) 0.9544, s. ) 16.2 K) employs just two molecular p...
0 downloads 0 Views 279KB Size
10400

J. Phys. Chem. 1996, 100, 10400-10407

Correlation of Boiling Points with Molecular Structure. 1. A Training Set of 298 Diverse Organics and a Test Set of 9 Simple Inorganics Alan R. Katritzky,* Lan Mu, and Victor S. Lobanov Center for Heterocyclic Compounds, Department of Chemistry, UniVersity of Florida, P.O. Box 117200, GainesVille, Florida 32611-7200

Mati Karelson* Department of Chemistry, UniVersity of Tartu, 2 Jakobi Street, Tartu, EE 2400, Estonia ReceiVed: October 31, 1995; In Final Form: March 3, 1996X

A quantitative structure-property relationship (QSPR) treatment of the normal boiling points was performed for a structurally wide variety of organic compounds using the CODESSA (comprehensive descriptors for structural and statistical analysis) technique. A highly significant two-parameter correlation (R2 ) 0.9544, s ) 16.2 K) employs just two molecular parameters, a bulk cohesiveness descriptor, GI1/3, and the area-weighted surface charge of the hydrogen-bonding donor atom(s) in the molecule. A more refined QSPR model (with R2 ) 0.9732 and s ) 12.4 K) includes, in addition, the most negative atomic partial charge and the number of the chlorine atoms in the molecule. The four-parameter equation offers an average predicted error of 2.3% for a standard set of compounds with an average experimental error of 2.1%. The QSPR equations developed allow remarkably accurate predictions of the normal boiling points for a number of simple inorganic compounds, including water.

Introduction The boiling point of a compound is predetermined by the intermolecular interactions in the liquid and by the difference in the molecular internal partition function in the gas phase and in the liquid at the boiling temperature. Therefore, it should be directly related to the chemical structure of the molecule, and numerous methods have indeed been developed for estimating the normal boiling point of a compound from its structure.1 Other physical properties such as critical temperatures2 and flash points3 can be estimated from boiling points. Various rules and formulas were proposed early on to correlate boiling points of homologous hydrocarbons with the number of carbon atoms or molecular weight.4 Later, other methods employed physical parameters such as the parachor and the molar refractivity.5 Previous methods to estimate boiling points have been summarized by Rechsteiner3 and by Horvath.6 The group contribution additivity (GCA) method extensively used for predicting boiling points is based on the assumption that the cohesion forces in the liquid predominantly have a shortrange character7 and proceeds from the division of a molecule into predefined structural groups, each of which adds a constant increment to the value of a property for a compound.8 In general, the group contribution methods give good predictions of boiling points for small and nonpolar molecules. A collection of 85 group increments were derived from a set of 4426 diverse organic compounds assembled by Joback and Reid9 and Stein and Brown10 and provided an average absolute error of 15.5 K for predicted boiling points. Group contribution methods for estimating boiling points are limited to the types of compounds for which all group contributions have been established. An alternative solution has been sought within the framework of the quantitative structureproperty relationship (QSPR) approach. The QSPR approach can employ descriptors derived solely from the molecular structure to fit experimental data. Pioneering work in applying QSPR to boiling points was done by Wiener who introduced X

Abstract published in AdVance ACS Abstracts, May 15, 1996.

S0022-3654(95)03224-2 CCC: $12.00

the path number w (named later the Wiener index) defined as the sum of the distances between any two carbon atoms in the molecule,11 and predicted the boiling points of paraffins with an average error of 1°. Other topological indices, including the Randic12 and Kier and Hall molecular connectivity indices,13 have been successful in correlating the boiling points of alkanes and amines.13 Seybold et al.14 obtained an excellent correlation (R2 ) 0.999) of the boiling points of 74 normal and branched alkanes using a model which combined five different connectivity indices: the most significant parameter, the first-order connectivity index 1χ,14 already described most of the variance (R2 ) 0.969). The factor analysis of the same 74 alkanes demonstrated that most of the variance in the boiling point can be attributed to the molecular mass with smaller contributions from branching and steric factors.14 Using the topological QSPR approach, Balaban et al.15 developed four six-parameter models of similar quality (R2 ) 0.97) to correlate the normal boiling points of 532 haloalkanes C1-C4 with their chemical structure. All four of these models included various connectivity indices and numbers of halogen atoms, with the first-order connectivity index involved in each model. In a simultaneous study,16 the normal boiling points of 185 saturated acyclic compounds with one or two divalent oxygen or sulfur heteroatoms were correlated with their chemical structure using topological descriptors and counts of heteroatoms. Thus, topological indices work well for homologous and congeneric series of compounds. Indeed, there are foundations for such success. In a factor analysis of six physical properties related to intermolecular interactions in the liquid state (including the boiling point), Cramer found that two factors, namely, the “bulk” and “bulk-corrected cohesiveness” of a molecule, determine the observed physical properties.17 It has been shown that Wiener and Randic molecular connectivity indices effectively represent molecular van der Waals volume, or molecular bulk in the liquid state.18,19 Cohesiveness can be described by a shape-dependent variable, such as a topologicalshape descriptor.20 Although molecular connectivity indices also offer a suitable basis for attributing shape-dependent variance, © 1996 American Chemical Society

Correlation of Boiling Points with Molecular Structure this is often disguised by their bulk dependence.21 Orthogonalized connectivity indices were suggested by Randic as a possible solution to this difficulty.21,22 However, the uniform applicability of topological indices to compounds of wide structural diversity still presents difficulties. For instance, no rigorous method of weighting heteroatoms has been found to calculate topological indices for compounds containing heteroatoms.20 Topological indices also fail to account for hydrogen-bonding and long-range effects in condensed media.20 In general, topological indices are appropriate when intermolecular interactions parallel the increase of the molecular size, as is observed for homologous and congeneric series of organic compounds,20 but additional molecular descriptors are needed for large diverse sets. Numerous molecular QSPR characteristics, accounting for inter- and intramolecular interactions in condensed media in more detail, have been developed on the basis of physical models of different complexity. For instance, Grigoras23 estimated boiling points of organic compounds using the assumption that the dominant intermolecular interaction is related to the molecular surface energy, derived from the molecular surface area and the charge density distribution. The corresponding multilinear model, which gave a good correlation (R2 ) 0.958) with the boiling points of 137 diverse organic compounds, included four parameters: total molecular surface area, the sums of the positively and negatively charged atomic surface areas multiplied by the corresponding partial charges, and a hydrogenbonding term. However, to achieve this correlation, atomic charge scaling factors to correct the partial charges calculated by the extended Hu¨ckel theory had to be introduced. For a better representation of the intermolecular interactions, Jurs and Stanton introduced (simultaneously with Grigoras) charged partial surface area (CPSA) descriptors, which combine solvent accessible surface areas with partial atomic charges.24 Using CPSA descriptors in combination with some constitutional, topological, and other descriptors, Jurs and co-workers found correlations with normal boiling point for five large sets, each of a single class of heterocycle: furans and tetrahydrofurans (R2 ) 0.969),25 thiophenes (R2 ) 0.974), pyrans (R2 ) 0.978),26 pyrroles (R2 ) 0.962),26 and pyridines (R2 ) 0.933),27 all of which demonstrated standard errors ranging from 8 to 15 K for boiling point estimates. Although the correlation models obtained lacked uniformity and involved different, sometimes even class dependent, descriptors for different classes of compounds, the CPSA descriptors were demonstrated to be useful, especially where hydrogen-bonding specific descriptors were added to the studies.26,27 The utility of the CPSA descriptors was also evident in a model developed to predict the boiling point for 298 diverse organic compounds (R2 ) 0.976, s ) 11.85 K) which employs 8 parameters, 4 of which are CPSA descriptors.28 In a recent publication,29 a highly accurate linear sixparameter model (R2 ) 0.994, s ) 6.3 K) to predict the normal boiling points of hydrocarbons has been presented by Wessel and Jurs. Besides being more accurate than previous correlation models, it contains easy-to-comprehend descriptors. Thus, the dispersion interaction attributed to the molecular bulk is represented by the square root of the molecular weight. The authors note that initial models exhibited nonlinear behavior, which was eliminated by replacing the molecular weight by its square root. Indeed, a rough correlation between the normal boiling point and the square root of the molecular weight was first described by Walker 100 years ago,4 and later confirmed by Kamlet et al. who presented a theoretically justified correlation equation to predict boiling points of 80 organic and

J. Phys. Chem., Vol. 100, No. 24, 1996 10401 inorganic liquids.30 The equation deals with dispersion and dipolar interactions which depend on the size, polarizability, and dipolarity of the liquids. The size was described by the square root of the molecular weight, the dipolarity was represented by the square of the dipole moment, and the polarizability was reflected by the number of atoms adjusted empirically to take into account different types of atoms.30 The ratio of the isopotential molecular volume to isopotential molecular surface area (V/S) was introduced as the leading parameter in the recent nonlinear QSPR model by Le and Weers31 to predict the normal boiling point of fluorocarbons. Notably, the dimensionality of the molecular size had been reduced to first order (length) in this case. Recently, we achieved a quite successful correlation of the boiling points of 85 substituted pyridines (R2 ) 0.948)32 with a six-parameter model including the gravitation index (bulk quantity), CPSA descriptors, the point-charge component of the molecular dipole, and nitrogen specific parameters. This promising correlation of the boiling point of the set of substituted pyridines prompted us to apply the CODESSA approach to a larger set of diverse organic compounds in an attempt to uncover the main molecular structural characteristics that determine the normal boiling point and the relevant intra- and intermolecular interactions in liquids. Results and Discussion The compilation of boiling points of 298 important organic compounds drawn from the Design Institute for Physical Property Data (DIPPR) database was chosen as the complete set to develop the QSPR models. The same data set was previously used by Jurs et al.28 in their ADAPT (automated data analysis using pattern recognition techniques) QSPR treatment. This set is structurally sufficiently diverse and includes saturated and unsaturated hydrocarbons, halogenated compounds, and hydroxyl, cyano, amino, ester, ether, carbonyl, and carboxyl functionalities, and yet it is compact enough to allow calculation within a reasonable time frame of the numerous semiempirical quantum-chemical molecular descriptors is employed in the future QSPR development. The structures were drawn from scratch and preoptimized by the molecular mechanics MMX method using the PCMODEL33 program. The final geometry optimizations were performed on IBM RISC/6000 Model 320 with the semiempirical quantumchemical AM1 parametrization,34 using the MOPAC 6.0 program.35 The MOPAC output files of individual compounds were loaded into our CODESSA program for MS Windows36 along with the boiling point data. The CODESSA program implements procedures which enable the calculation of a large selection of descriptors including numbers of atoms and bonds, molecular weight, the gravitation index,32 Wiener indices,11 Randic connectivity indices,12 Kier and Hall connectivity indices,13 information content indices,37 molecular volume, shadow indices,38 and numerous quantum-chemical indices extracted from the AM1 output.32,39-41 The CPSA descriptors proposed by Jurs et al.24,26 are also included in the CODESSA program: AM1 calculated atomic partial charges were used to calculate these descriptors. The quantum-chemical descriptors used in this work included the most positive and the most negative Mulliken net atomic charges, frontier molecular orbital (FMO) energies, and the respective Fukui FMO nucleophilic, electrophilic, and oneelectron reactivity indices. The total dipole moment of the molecule, dipole moment components, and molecular bond orders were used as descriptors. Additional, more specific, descriptors of this type included the valence state energies of

10402 J. Phys. Chem., Vol. 100, No. 24, 1996 atoms and total Coulombic and exchange energies between atoms in the molecule. The zero-point energy, the calculated electronic and vibrational transition energies, the rotational, vibrational, translational, internal, and total enthalpies, entropies, and heat capacities were also used. Altogether more than 600 molecular descriptors programmed into CODESSA were calculated for all compounds. Various modifications of the original descriptors, as will be discussed later, were calculated using CODESSA’s descriptor construction facility. The correlation analysis to find the best QSPR model of a given size was carried out using two procedures based on the linear regression technique: (i) by the best multilinear regression analysis32 and (ii) by a heuristic method.32 In all cases a preselection of descriptors was implemented. Descriptors for which values were not available for every structure in the data in question were discarded. Descriptors having a constant value for all structures in the data set were also discarded. The first strategy used to develop physically meaningful multilinear QSPR equations from the very large pool of descriptors is a combination of the possible regressions and forward selection procedures.45 This strategy involved the following steps: (1) All orthogonal pairs of descriptors i and j (with pair correlation coefficient Rij < Rmin) were found in a given descriptor pool. The chance for the absolute orthogonality of two descriptors is negligible, and Rmin was therefore defined as a practical limit for two descriptors being approximately orthogonal. The value of Rmin ) 0.1 was used throughtout this work. In the further treatment, 400 orthogonal descriptor pairs (R < 0.1) with highest R2 for two-parameter correlations were used. (2) The complete set of two-parameter regression equations utilizing all the orthogonal pairs of descriptors, obtained in step 1, were then found for the property studied. The Nc (e400) significant pairs with the highest multilinear regression correlation coefficients were chosen for the next step. (3) For each significant descriptor pair ij, obtained in the previous step, a noncollinear descriptor scale, k (with Rik < Rnc and Rkj < Rnc), was added, and the corresponding threeparameter regression treatment was performed. When the Fisher criterion at a given probability level, F, was smaller than that for the best two-parameter correlation, the latter was chosen as the final result and the search terminated; otherwise the Nc (e400) descriptor triples with the highest regression correlation coefficients considered in the next step. The noncollinearity limit, Rnc, is a subjective parameter and has to be of a value higher than Rmin in order to incorporate a more substantial part of the descriptor space. Different Rnc values were tested in the present treatment, and Rnc ) 0.6 was chosen as leading to the most stable correlations. (4) For each significant descriptor set, obtained in the previous step, an additional noncollinear descriptor scale was added, and the appropriate (n + 1)-parameter regression treatment was performed. When the Fisher criterion at the given probability level, F (or the cross-validated correlation coefficient,3 Rcv, obtained for any of these correlations was smaller than that for the best correlation of the previous rank, the latter was designated as the final result and the search was terminated. Otherwise, the Nc descriptor sets with the highest regression correlation coefficients were stored, and the current step was repeated with the number of parameters again increased by one (n ) n +1). The final result had therefore the maximum value of the Fisher criterion and the highest value of the cross-validated

Katritzky et al. TABLE 1: One-Parameter Correlations for the Set of Structures without H Bonding (137 Compounds) descriptor

R 1/2

square root of the gravitation index, GI cube root of the gravitation index, GI1/3 gravitation index, GI gravitation index for all pairs of atoms, GP square root of the molecular weight, (MW)1/2 molecular weight, MW first order Randic index, 1χ AM1 R-polarizability

0.9647 0.9626 0.9497 0.9477 0.9266 0.9027 0.9011 0.8819

correlation coefficient. According to these statistical criteria, it was considered as the best representation of the property in the given (large) descriptor space. The second strategy used to develop the best multilinear QSPR equations is based on the stepwise regression procedure.45 The following procedure was carried out using this approach: (1) To reduce the number of descriptors in the starting set, the following criteria were applied to eliminate descriptors for which (a) the F value for the one-parameter correlation with the descriptor was below 1, (b) the correlation coefficient for the one-parameter equation was less than Rmin, a user-defined value for insignificant correlations (Rmin ) 0.1 was used in the present work), (c) the t value for the descriptor in the one-parameter correlation was less than t1 ) 1.5, and (d) the descriptor was highly intercorrelated with another descriptor which was characterized by a higher single-parameter correlation coefficient value for a given property. (2) All two-parameter regression models with the remaining descriptors were calculated and ranked by their correlation coefficents R2. The best 10 two-parameter models were submitted to the following stepwise regression procedure. (3) Each of the remaining descriptors not significantly correlated (correlation coefficient above 0.8)46 with any of the descriptors already in the model was added, in turn, to the current n-parameter model, and the resulting (n + 1)-parameter models were tested. The best 10 (n + 1)-parameter models were again submitted to the same procedure. (4) When the optimum number of parameters in a model was reached, the correlation equations with highest correlation coefficients and with the highest F test values were selected. Importantly, both the strategies described above yielded the same best correlations which adds confidence both to the reliability of the methodology and to the QSPR equations developed. In our previous study of the boiling points of pyridines,32 we found it helpful to divide the original set of compounds into two subsets consisting of compounds capable and not capable of forming hydrogen bonds. This eliminates the obscuring effect of difficult-to-quantify hydrogen bonding and allows the elucidation of structural features attributable to bulk and bulkcorrected cohesiveness as defined by Cramer.17 Accordingly, we proceeded from a subset of 137 hydrocarbons and halogenated hydrocarbons. Several bulk-related descriptors performed satisfactorily with this reduced set (see Table 1). The best performance was observed for the gravitational index over all bonded atoms i, j in the molecule,32 defined as

GI ) ∑(mimj/rij2)

(1)

i,j

(where mi and mj are the atomic masses of the bonded atoms and rij denotes the respective bond lengths) and, particularly, with the square root GI1/2 (R2 ) 0.9647) and cube root GI1/3 derivatives. Also, significant one-parameter correlations of the

Correlation of Boiling Points with Molecular Structure

J. Phys. Chem., Vol. 100, No. 24, 1996 10403

normal boiling point were obtained for the limited set of 137 compounds with their molecular weight (MW) and its square root, Randic index of the first order,12 AM1 calculated R-polarizability of the molecule, and the gravitational index over all pairs of atoms i, j in the molecule,32 defined as

GP ) ∑(mimj/rij2)

(2)

i,j

(where rij denotes interatomic distances). Notably, the correlations with each of the bulk descriptors, whether it be molecular weight, molecular volume, total molecular surface area, or molecular connectivity index, all exhibited systematic curvedistributed deviations from the regression line, indicating a nonlinear dependence on that descriptor. Indeed, the correlations were significantly improved if the square root or cubic root of the descriptor was used. This observation suggests that the lower dimensionality representation of the molecular bulk better describes the related effective inter- and intramolecular interaction, at least for the boiling points. Notably, the gravitational index accounts simultaneously for both the atomic masses (volumes) and for their distribution within the molecular space. Proceeding to two-parameter equations, the second descriptor in the best correlation model for the boiling points of 137 hydrocarbons and halogenated hydrocarbons was found to be the AM1 calculated most negative atomic charge in the molecule (Table 2, supporting information).

Figure 1. Calculated vs experimental normal boiling points according to the one-parameter correlation equation with the cubic root of the gravitation index, GI1/3 for the data set of 298 compounds.

TABLE 4: Best Two-Parameter Correlation of the Normal Boiling Point for the Data Set of 298 Diverse Structures (R2 ) 0.9544, s ) 16.15, F ) 3126.5) descriptor

X ( ∆X

t test

intercept GI1/3 HDSA(2)

-170.7 ( 7.5 65.88 ( 0.86 18470 ( 540

-23.21 76.93 34.25

Tb ) (63.53 ( 8.96) + (15.10 ( 0.22)GI1/2 + (222.7 ( 30.1)δmax

n ) 137, R2 ) 0.9749, F ) 2602.5, s ) 12.0 If our model follows the intrinsic dimensionality of intermolecular interactions revealed by Cramer,17 this descriptor should account for the bulk-corrected cohesiveness, or, in other words, the degree to which the molecular bulk is cohesive. Indeed, the variance described by this two-parameter equation matches the variance ascribed by Cramer to his two-dimensional model of liquid state intermolecular interactions. We proceeded further to treat the complete data set of 298 compounds. As expected, the best one-parameter correlation with the cube root of GI is substantially poorer (R2 ) 0.7753) than that of the corresponding correlation for the limited data set of hydrocarbons and halogenated hydrocarbons (Table 3, supporting information).

Tb ) (-98.01 ( 16.6) + (59.63 ( 1.85)GI

1/3

n ) 298, R2 ) 0.7753, F ) 1035.2, s ) 35.8 The larger dispersion of the calculated values of boiling points is originated mainly from the hydrogen-bonded compounds (Figure 1). Accordingly, a dramatic improvement of the correlations for the whole data set was observed if one of the hydrogen-bonding related descriptors was added to the QSPR equation. The best result was obtained for the two-parameter equation involving the cube root of GI and the area-weighted surface charge of the hydrogen-bonding donor atom(s) in the molecule (Table 4). The latter was calculated as

HDSA(2) ) ∑(qDSD1/2/Stot)

(3)

where qD is the partial charge on hydrogen-bonding donor (H) atom(s), SD denotes the surface area for this atom, and Stot is

Figure 2. Calculated vs experimental normal boiling points according to the best two-parameter correlation equation for the data set of 298 compounds.

the total molecular surface area, calculated from the van der Waals radii of the atoms (overlapping spheres). The summation in eq 3 is performed over the number of simultaneously possible hydrogen-bonding donor and acceptor pairs per molecule. Also, the hydrogen atoms at the R-position of carbonyl and cyano groups were accounted as possible hydrogen-bonding donor centers (their effectiveness is, of course, much smaller because of the smaller partial charge on them). The two-parameter equation presented in Table 4 is physically highly significant and demonstrates that two practically orthogonal molecular descriptor scales (the intercorrelation coefficient between GI1/3 and HDSA(2) is 0.2041) describe most of the variance of the normal boiling points for a wide variety of organic substances (cf. also Figure 2). Both descriptors have explicit physical meaning; the first is connected with the

10404 J. Phys. Chem., Vol. 100, No. 24, 1996

Katritzky et al.

Figure 3. Calculated vs experimental normal boiling points according to the best four-parameter correlation equation for the data set of 298 compounds.

TABLE 5: Best Four-Parameter Correlation of the Normal Boiling Point for the Data Set of 298 Diverse Structures (R2 ) 0.9732, s )12.41, F ) 2700.0) descriptor

X ( ∆X

t test

intercept GI1/3 HDSA(2) AM1 most negative atomic charge, δmax number of Cl atoms, NCl

-151.3 ( 6.3 67.39 ( 0.67 21540 ( 480 140.4 ( 13.1 17.51 ( 2.31

-24.12 101.14 45.14 10.75 7.57

dispersion and cavity-formation effects in liquids (gravitation index), and the second with the hydrogen-bonding ability of compounds. Further adjustment of the correlation brings up a fourparameter equation with R2 ) 0.9732 (Figure 3 and Table 5) and involves, as additional descriptors, the AM1 most negative atomic partial charge and the number of chlorine atoms in the molecule. The most negative atomic partial charge could account for the bulk-corrected cohesiveness, as discussed above. The chlorine-substituted compounds are distinct outliers from the two-parameter equation given in Table 4 (See also Figure 2 and Table 6), with systematically lower predicted normal boiling point values. In principle, this may be caused by several things. In the first place, the gravitation index may need to be adjusted to correct for large atoms, such as chlorine, for which van der Waals radii are comparable with bond lengths (e.g., 1.80 for the Cl atom Vs 1.74 for the C-Cl bond). Secondly, a minor hydrogen-bonding contribution may exist for these compounds (for instance, hydrogen-bonded systems with chloroform are well-known). Nevertheless, the account for the presence of chlorine atoms is substantial in the final QSPR equation developed (Table 5). It should be emphasized that the equation in Table 5 is characterized by almost the same correlation coefficient and standard deviation values as the 8-parameter equation developed by Jurs et al.28 for a subset of 268 compounds of the set used in this work (R ) 0.987 Vs 0.988 and s ) 12.41 Vs 11.85). However, the number of correlation parameters has been halved, with all four of them having a distinct physical meaning. The stability of the correlation equations presented in Tables 4 and 5 is characterized by the corresponding cross-validated correlation coefficients,36 which are almost identical to the correlation coefficients themselves (Rcv2 ) 0.9534 and 0.9719, respectively). Also, the experimental uncertainties which accompany the set of data used in this work have been established by the DIPPR and estimated at 2.1% error in boiling point values.28 The

average prediction error by the best two-parameter correlation (Table 4) is 3.0% and by the best four-parameter correlation (Table 5) 2.3%. Consequently, the uncertainty of the fourdescriptor equation matches the experimental imprecision of data, and therefore no statistically better model representation of the data used is to be expected. Although we screened a large number of descriptors, we believe that the possibility of us having derived chance correlations can be discounted. Firstly, not all the descriptors used are independent variables, and some correlate significantly with others: this intercorrelation is accounted for in our CODESSA analysis, effectively reducing the number of variables screened. Each descriptor added to our models was checked for significance by the t test and the one-parameter equation for this descriptor. Cross-validation was carried out. Finally, the good predictions found for the test set (see below) are the best proof that the correlations do not arise by chance. The power of our approach was tested by reference to a set of compounds of completely different character: nine simple inorganics containing one or two atoms of first-row elements (Table 7). In this severe test, the two-parameter equation performs credibly with an average deviation of 22°, supporting the basic approach which allocates seminal influence to a gravitational index function and a surface charge function. The fact that the three- and four-parameter equations give much less satisfactory results for this set of nine inorganics indicates that, in a truly general treatment, the third and fourth descriptors need to be modified. Most of the structural descriptors used in the correlation equations were calculated from the AM1 optimized molecular geometries and charge distribution. To study the dependence of the descriptors and correlations obtained on the quantumchemical method used, the calculation of the descriptors and regression treatment were repeated using PM343 quantumchemical optimized geometries and charge distribution of the molecules. The resulting QSPR correlation equations were similar to those obtained using AM1 molecular descriptors. The best two-parameter correlation with PM3 descriptors has the following form:

Tb ) (-160.1 ( 8.0) + (64.96 ( 0.94)GI1/3 + (20317 ( 673)[HDSA(2)] n ) 298, R2 ) 0.9445, F ) 2544.5, s ) 17.81 The best four-parameter is represented as follows:

Tb ) (-161.7 ( 7.0) + (66.65 ( 0.82)GI1/3 + + (22999 ( 641)[HDSA(2)] + (73.68 ( 9.53)δmax (17.88 ( 6.39)NCl

n ) 298, R2 ) 0.9598, F ) 1772.0, s ) 15.21 where δmax denotes the most negative atomic partial charge and NCl is the number of chlorine atoms in the molecule. The statistical fit of these equations is slightly worse than those for the same equations with AM1 descriptors (Tables 4 and 5). Nevertheless, the equations obtained seem to have little dependence on the quantum-chemical method used to obtain the molecular geometry and charge distribution. Both AM1 and PM3 charge-dependent descriptors were calculated proceeding from the Mulliken atomic partial charges. For comparison, the AMPAC44 electrostatic potential charges were applied in the calculation of the charge-dependent descriptors. However, the statistical fit of the corresponding regression

Correlation of Boiling Points with Molecular Structure

J. Phys. Chem., Vol. 100, No. 24, 1996 10405

TABLE 6: Comparison of the Experimental and Predicted Normal Boiling Points (K) for the Set of 298 Compounds No

compound

Tb(exp)a

Tb(2)b

Tb(4)c

No

compound

Tb(exp)a

Tb(2)b

Tb(4)c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76

1,1-dichloropropane 1,1-diphenylethane 1,2,3,4-tetrahydronaphthalene 1,2,3-trimethylbenzene 1,2,4-trimethylbenzene 1,2-dichloropropene 1,2-diphenylethane 1,2-propylene glycol 1,3-butadiene 1,3-butanediol 1,3-cyclohexadiene 1,3-dichloropropane 1,3-propylene glycol 1,4-butanediol 1,4-dichlorobutane 1,5-dichloropentane 1,5-hexadiene 1,5-pentanediol 1,6-hexanediol 1-bromobutane 1-bromopropane 1-butene 1-chlorobutane 1-chloropentane 1-decanol 1-decene 1-dodecane 1-heptene 1-hexadecene 1-hexanal 1-hexanol 1-hexene 1-octadecene 1-octene 1-pentanol 1-pentene 1-tetradecane 2,2,3,3-tetramethylpentane 2,2,3-trimethylbutane 2,2,3-trimethylpentane 2,2,4-trimethylpentane 2,2-dimethyl-1-propanol 2,2-dimethylbutane 2,3,3-trimethylpentane 2,3-butanediol 2,3-dimethyl-1-butene 2,3-dimethyl-2-butene 2,3-dimethyl-3-butadiene 2,3-dimethylbutane 2,3-dimethylhexane 2,3-dimethylpentane 2,4,4-trimethyl-1-pentene 2,4,4-trimethyl-2-pentene 2,6-xylenol 2-bromobutane 2-bromopropane 2-ethylbutyric acid 2-ethyl-1-butanol 2-ethyl-1-butene 2-ethyl-1-hexanol 2-ethylhexyl acrylate 2-hexanol 2-hexanone 2-methyl-1-butanol 2-methyl-1-butene 2-methyl-1-pentanol 2-methyl-1-pentene 2-methyl-2-butanol 2-methyl-2-pentene 2-methyl-3-ethylpentane 2-methylbutyric acid 2-methylhexane 2-methylpentane 2-methylpyridine 2-pentanol 2-pentanone

361.25 545.78 480.77 449.27 442.53 361.25 553.65 460.75 268.74 480.15 353.49 393.55 487.55 501.15 427.05 453.15 332.61 512.15 516.15 374.75 344.15 266.90 351.58 381.54 503.35 443.75 486.50 366.79 558.02 401.45 430.15 336.63 587.97 394.44 410.95 303.11 524.25 413.44 354.03 383.00 372.39 386.25 322.88 387.92 453.85 328.76 346.35 341.93 331.13 388.76 362.93 374.59 378.06 474.22 364.37 332.56 466.95 419.65 337.82 457.75 489.15 413.04 400.85 401.85 304.30 421.15 335.25 375.15 340.45 388.80 450.15 363.20 333.41 402.55 392.15 375.46

337.48 532.55 460.33 431.39 429.52 345.82 532.99 470.04 265.62 473.94 352.17 338.28 471.32 490.56 366.48 392.45 335.79 503.24 509.64 363.97 335.14 264.77 320.27 350.70 503.97 433.33 475.78 363.15 543.43 399.23 429.57 335.23 572.98 390.28 407.56 302.60 511.45 411.75 361.68 387.72 388.07 408.15 333.24 387.86 477.31 335.10 336.89 336.61 333.59 388.78 362.46 389.42 389.65 484.37 361.72 332.70 477.07 409.78 335.55 471.36 509.90 422.19 399.46 382.34 303.03 432.23 335.54 397.31 336.21 388.41 459.75 362.83 333.66 421.11 400.21 379.49

373.10 539.22 472.59 438.70 437.45 380.69 550.05 478.33 265.93 480.13 361.75 379.42 481.93 498.53 409.52 435.61 334.89 509.93 514.00 365.08 335.73 262.52 338.63 369.69 498.87 434.27 477.96 362.00 547.64 395.44 425.12 334.87 577.66 391.12 403.28 301.07 515.19 415.10 364.10 390.41 391.24 404.61 334.81 390.60 483.76 334.02 342.97 337.29 335.40 391.52 364.61 389.50 393.40 490.28 363.42 333.64 470.70 402.33 334.33 466.98 495.36 416.35 394.39 374.23 300.85 428.66 333.97 391.92 338.22 391.09 453.97 364.85 334.95 432.38 394.71 341.88

77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152

2-propanol 3,3-diethylpentane 3,3-dimethyl-1-butene 3-chloropropene 3-hexanone 3-methyl-1-butanol 3-methyl-1-butene 3-methyl-1-pentene 3-methyl-2-butanol 3-methyl-2-butene 3-methylhexane 3-methylpentane 3-methylpyridine 3-pentanol 4-methyl-1-pentene 4-methyl-2-pentanol 4-methylpyridine acetal acetone acetophenone acetylacetone acrylaldehyde acrylonitrile adiponitrile R-methylstyrene allyl acetate allyl alcohol allylamine aniline benzaldehyde benzene benzoic acid benzyl acetate benzyl alcohol benzyl benzoate bicyclohexyl bromobenzene butyl vinyl ether chlorobenzene cis-1,2-dimethylcyclohexane cis-1,3-dimethylcyclohexane cis-1,4-dimethylcyclohexane cis-2-butene cis-2-hexene cumene cyclohexane cyclohexanol cyclohexanone cyclohexylamine cyclopentadiene cyclopentane cyclopentene cylohexene di-n-butyl ether di-n-hexyl ether di-n-propyl ether di-n-propylamine dibutyl phthalate dibutyl sebacate diethyl ether diethyl ketone diethyl phthalate diethylamine diisopropyl ether diisopropylamine dimethyl phthalate dimethyl terephthalate diphenyl ether diphenylamine diphenylmethane divinyl ether ethyl acetate ethyl acrylate ethyl benzoate ethyl formate ethyl isobutyrate

355.41 419.34 314.40 318.11 396.65 404.35 293.21 327.33 384.65 311.71 365.00 336.42 417.29 388.45 327.01 404.85 418.50 376.75 329.44 475.15 413.55 325.84 350.50 568.15 438.65 377.15 370.23 326.45 457.60 451.90 353.24 522.40 486.65 477.85 596.65 512.19 429.24 366.97 404.87 402.94 393.24 397.47 276.87 342.03 425.56 353.87 434.00 428.90 407.65 314.65 322.40 317.38 356.12 413.44 498.85 362.79 382.00 613.15 622.15 308.58 375.14 567.15 328.60 341.45 357.05 556.85 561.15 531.46 575.15 537.42 301.45 350.21 372.65 486.55 327.46 383.00

346.26 411.98 333.87 286.48 401.91 407.44 302.01 334.55 401.88 304.02 362.82 334.11 418.24 401.78 334.79 424.00 418.34 415.9 322.06 466.93 411.38 343.98 343.17 517.07 430.43 389.37 355.69 325.54 445.26 440.16 353.94 517.81 486.61 476.17 576.62 493.30 432.48 378.94 397.38 401.34 401.50 401.31 266.07 336.12 428.10 349.55 449.02 418.73 423.85 318.54 317.91 318.27 350.90 425.34 502.92 377.53 408.15 636.69 648.02 319.69 373.59 580.26 357.15 376.48 404.15 555.73 548.53 526.20 558.24 516.89 322.72 361.83 387.52 486.42 332.60 411.64

341.88 415.16 333.42 306.60 397.99 403.44 301.19 334.48 396.78 308.83 364.76 335.64 430.53 396.58 333.95 418.79 429.35 407.16 319.49 464.63 410.06 345.26 362.42 546.67 432.67 373.53 353.09 317.37 441.43 436.92 367.06 512.72 471.24 473.85 563.43 506.29 442.38 372.11 429.25 404.80 404.93 404.76 269.58 337.55 431.79 359.40 446.48 414.87 416.35 323.60 326.57 324.41 358.93 419.04 498.50 369.89 404.00 623.95 643.77 311.29 369.52 567.20 351.52 368.65 398.85 542.68 535.92 535.86 564.87 533.64 315.87 343.81 370.72 470.49 321.92 394.42

10406 J. Phys. Chem., Vol. 100, No. 24, 1996

Katritzky et al.

TABLE 6 (Continued) No

compound

Tb(exp)a

Tb(2)b

Tb(4)c

No

compound

Tb(exp)a

Tb(2)b

Tb(4)c

153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225

ethyl isopropyl ketone ethyl n-butyrate ethyl propionate ethyl propyl ether ethyl vinyl ether ethylbenzene ethylcyclohexane ethylcyclopentane ethylenecarboxylic acid hexamethylene imine hexanenitrile isopentane isobutane isobutanol isobutene isobutyl acetate isobutyl acrylate isobutyl formate isobutyl isobutyrate isobutylamine isobutylbenzene isobutyraldehyde isobutyric acid isobutyronitrile isophorone isoprene isopropyl acetate isopropyl chloride isopropylamine isovaleric acid m-cresol m-diethylbenzene m-diisopropylbenzene m-ethyltoluene m-toluidine m-xylene mesityl oxide mesitylene methacrolein methyl acrylate methyl ethyl ketone methyl acetate methyl ethyl ether methyl isobutyl methyl isobutyl ether methyl isopropenyl ketone methyl isopropyl ether methyl isopropyl ketone methyl n-butyrate methyl propionate methyl sec-butyl ether methyl tert-butyl ether methyl tert-pentyl ether methyl vinyl ether methanal methylcyclohexane methylcyclopentadiene methylcyclopentane N,N-dimethylaniline n-butane butanol n-butyl acetate n-butyl acrylate n-butyl ethyl ether n-butyl formate n-butyl stearate n-butylamine n-butylbenzene n-butylcyclohexane n-butyraldehyde n-butyric acid n-butyronitrile n-decane

386.55 394.65 372.25 337.01 308.70 409.35 404.95 376.62 414.15 404.85 436.75 300.99 261.43 380.81 266.25 389.80 405.15 371.22 420.65 340.88 445.94 337.25 427.85 376.76 488.35 307.21 361.65 308.85 305.55 448.25 475.43 454.29 476.33 434.48 476.55 412.27 402.95 437.89 341.15 353.35 352.79 330.09 280.50 389.65 331.70 371.15 323.75 367.55 375.90 352.60 332.15 328.35 359.45 278.65 315.00 374.08 345.93 344.96 466.69 272.65 390.81 399.15 421.00 365.35 379.25 623.15 350.55 456.46 454.13 347.95 436.42 390.75 447.30

400.07 412.14 388.07 350.35 321.38 405.97 401.03 376.05 425.43 425.63 397.98 301.34 263.22 383.66 265.20 411.77 434.49 387.09 454.30 360.72 449.20 344.59 439.26 347.55 486.97 303.67 387.95 283.95 321.89 458.94 478.56 450.10 487.63 429.07 465.61 406.63 409.89 429.59 360.72 362.23 352.93 332.87 284.85 398.70 350.10 382.56 319.48 378.47 388.15 362.09 349.82 349.31 376.82 286.71 337.61 376.73 349.18 348.47 461.90 263.62 383.98 412.09 434.84 377.64 387.70 660.04 359.40 449.68 446.35 351.58 420.76 346.21 432.19

396.43 395.28 370.76 342.15 303.14 409.11 404.10 378.68 422.75 424.05 405.10 302.21 263.52 380.60 261.54 394.62 417.87 376.95 437.96 351.56 453.46 340.70 434.01 354.89 484.14 303.83 370.19 299.92 312.98 453.05 486.64 454.27 493.32 432.82 461.15 414.11 406.09 437.69 358.52 344.00 349.48 314.20 275.60 393.68 342.41 380.67 310.35 374.98 371.24 343.95 341.93 341.72 368.99 277.52 332.16 379.83 354.42 351.09 464.93 263.51 380.36 395.11 420.11 370.26 378.19 649.17 349.42 453.80 449.70 348.33 421.56 352.37 433.67

226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298

n-dodecane n-heptane n-hexadecane n-hexane n-hexanoic acid n-hexylamine n-nonadecane n-nonane n-octadecane n-octane n-pentane n-pentyl formate n-pentylamine propanol n-propionaldehyde n-propyl acetate n-propyl chloride n-propyl formate n-propyl propionate n-propylamine n-propylcyclohexane n-propylcyclohexane n-propylcyclopentane n-tetradecane neopentane neopentyl glycol o-cresol o-dichlorobenzene o-diethylbenzene o-ethyltoluene o-toluidine o-xylene p-cresol p-cymene p-diethylbenzene p-diisopropylbenzene p-ethyltoluene p-hydroquinone p-toluidine p-xylene phenol piperidine propane propionic acid propionitrile propylene pyridine quinoline sec-butyl acetate sec-butyl alcohol sec-butyl chloride sec-butylamine sec-butylbenzene stearic acid styrene tert-butyl acetate tert-butyl alcohol tert-butyl chloride tert-butylamine tert-butylbenzene tetrahydrofuran toluene trans-1,2-dimethylcyclohexane trans-1,3-dimethylcyclohexane trans-1,4-dimethylcyclohexane trans-2-butene trans-2-hexene trans-crotonic acid trimethylamine valeraldehyde valeric acid valeronitrile vinyl acetate

489.47 371.58 560.01 341.88 478.85 404.65 603.05 423.97 589.86 398.83 309.22 406.60 377.65 370.35 321.15 374.65 319.67 353.97 395.65 321.65 432.39 429.90 404.11 526.73 282.65 483.00 464.15 453.57 456.61 438.33 473.55 417.58 475.13 450.28 456.94 483.65 435.16 558.15 473.40 411.51 454.99 379.55 231.11 414.32 370.50 225.43 388.41 510.75 385.15 372.70 341.25 336.15 446.48 648.35 418.31 369.15 355.57 323.75 317.55 442.30 338.00 383.78 396.58 397.61 392.51 274.03 341.02 458.15 276.02 376.15 458.65 414.45 345.65

475.44 363.09 542.97 334.18 474.62 413.15 586.37 412.92 572.43 389.60 301.68 411.91 387.64 352.70 319.46 378.51 285.62 361.72 411.89 330.26 428.68 424.49 400.97 511.27 300.48 476.02 478.19 435.42 449.98 429.01 455.39 406.46 473.46 449.54 450.10 487.76 429.06 545.17 463.68 406.61 457.50 402.81 217.37 417.96 318.84 218.75 362.07 467.81 411.68 374.98 319.17 347.82 449.26 640.23 406.51 410.93 369.72 317.87 350.86 448.58 334.67 381.46 401.21 401.19 401.20 265.94 336.11 421.22 283.59 377.87 456.73 372.44 363.10

480.17 365.26 549.27 335.50 467.94 403.45 593.70 415.45 579.42 392.36 302.42 402.86 377.78 349.39 316.39 374.60 303.05 343.97 395.16 320.70 432.16 428.02 403.91 514.68 302.22 479.41 486.5 486.77 454.15 432.85 450.10 413.89 480.87 453.80 454.28 493.49 432.63 563.59 459.38 414.35 465.71 403.09 216.19 413.73 326.42 215.44 368.29 476.11 394.43 370.33 336.30 338.40 453.08 631.57 409.82 393.63 364.45 335.60 342.49 453.45 325.96 388.34 404.66 404.62 404.65 269.42 337.55 422.45 276.37 373.61 450.63 378.65 346.58

a

Experimental value from DIPPR.6 correlation (Table 5).

b

Calculated according to the two-parameter correlation (Table 4). c Calculated according to the four-parameter

Correlation of Boiling Points with Molecular Structure

J. Phys. Chem., Vol. 100, No. 24, 1996 10407 References and Notes

TABLE 7: Comparison of QSPR Predicted and Experimental Normal Boiling Points Tb (K) for Some Inorganic Substances compound

Tb(exp)a

Tb(2)b

Tb(3)c

Tb(4)d

H2O H2O2 NH3 N2H4 NH2OH HCN CH3F CH3NH2 HF

373 425 240 387 330 299 195 267 293

371 493 267 357 339 310 179 250 296

400 545 267 376 347 338 185 241 338

372 543 270 375 350 333 181 241 324

a Experimental value.42 b Calculated according to the best twoparameter correlation (Table 4). c Calculated according to the threeparameter correlation (cube root of the gravitation index, the areaweighted surface charge of the hydrogen bonding donor atom(s) in the molecule, and the AM1 most negative atomic partial charge). d Calculated according to the four-parameter correlation (Table 5).

equations was worse, and therefore the AM1 Mulliken charges seem to operate more adequately in the QSPR description of the normal boiling points of compounds. Conclusions The herein presented QSPR two- and four-parameter models allow the prediction of boiling points of structurally diverse organic compounds with average errors of 16.2 and 12.4 K, respectively. The model is theoretically justified and provides significant additional insight into the relationship between the structure and the boiling points of the compounds. The cubic root of the gravitation index reflects most adequately the molecular size-dependent bulk effects (dispersion and cavity formation) whereas the second most important parameter, the area-weighted surface charge on hydrogen-bonding donor atoms, is directly related to the specific hydrogen-bonding interactions in the molecule. Clearly, the proposed four-parameter equation is capable of improvement. Especially the number of chlorine atoms is not optimal. However, improvement of the present relation will require a much larger data set as the variance to be described by the fourth parameter is already not much greater than the experimental uncertainty in the present data set. The collection and the treatment of very large data sets, of course, demand much larger computational efforts. The successful description of the normal boiling points with a few physically significant molecular descriptors, however, encourages application of the same methodology for other solvation-related phenomena, such as the solubilities of gases in different media, solvation free energies of compounds, and various thermodynamic, kinetic, and spectroscopic solvent effect parameters. Corresponding studies are in progress and will be reported elsewhere. Supporting Information Available: Tables 2 and 3 giving the best two-parameter correlation for the set of structures without H-bonding and the correlation of the normal boiling point with GI1/3 over all bonds in a molecule for the data set of 298 diverse structures (2 pages). Ordering information is given on any current masthead page.

(1) Rechsteiner, C. E. In Handbook of Chemical Property Estimation Methods; Lyman, W. J., Reehl, W. F., Rosenblatt, D. H., Eds.; McGrawHill: New York, 1982; Chapter 12. (2) Fisher, C. H. Chem. Eng. 1989, 96, 157. (3) Satyanarayana, K.; Kakati, M. C. Fire Mater. 1991, 15, 97. (4) Walker, J. J. Chem. Soc. 1894, 65, 193. (5) Meissner, H. P. Chem. Eng. Prog. 1949, 45, 149. (6) Horvath, A. L. Molecular Design: Chemical Structure Generation from the Properties of Pure Organic Compounds; Elsevier: Amsterdam, 1992; Chapter 2. (7) Benson, S. W.; Buss, J. H. J. Chem. Phys. 1958, 29, 546. (8) Copeman, T. W.; Mathias, P. M.; Klotz, H. C. In Physical Property Prediction in Organic Chemistry; Jochum, C., Hicks, M. G., Sunkel, J., Eds.; Springer-Verlag: New York, 1988; p 351. (9) Joback, K. G.; Reid, R. C. Chem. Eng. Commun. 1987, 57, 233. (10) Stein, S. E.; Brown, R. L. J. Chem. Inf. Comput. Sci. 1994, 34, 581. (11) Wiener, H. J. Am. Chem. Soc. 1947, 69, 17. (12) Randic, M. J. Am. Chem. Soc. 1975, 97, 6609. (13) Kier, L. B.; Hall, L. H. In Molecular ConnectiVity in Chemistry and Drug Research; Academic Press: New York, 1976; pp 27-39, 64. (14) Needham, D. E.; Wei, I-C.; Seybold, P. G. J. Am. Chem. Soc. 1988, 110, 4186. (15) Balaban, A. T.; Joshi, N.; Kier, L. B.; Hall, L. H. J. Chem. Inf. Comput. Sci. 1992, 32, 233. (16) Balaban, A. T.; Kier, L. B.; Joshi, N. J. Chem. Inf. Comput. Sci. 1992, 32, 237. (17) Cramer, R. D., III. J. Am. Chem. Soc. 1980, 102, 1837. (18) Motoc, I.; Balaban, A. T. ReV. Roum. Chim. 1981, 26, 593. (19) Labanowski, J. K.; Motoc, I.; Dammkoehler, R. A. Comput. Chem. 1991, 15, 47. (20) Rouvray, D. H. J. Mol. Struct.: THEOCHEM 1989, 185, 187. (21) Randic, M.; Seybold, P. G. SAR QSAR EnViron. Res. 1993, 1, 77. (22) Randic, M. J. Chem. Inf. Comput. Sci. 1991, 31, 311. (23) Grigoras, S. J. Comput. Chem. 1990, 11, 493. (24) Stanton, D. T.; Jurs, P. C. Anal. Chem. 1990, 62, 2323. (25) Stanton, D. T.; Jurs, P. C.; Hicks, M. G. J. Chem. Inf. Comput. Sci. 1991, 31, 301. (26) Stanton, D. T.; Egolf, L. M.; Jurs, P. C.; Hicks, M. G. J. Chem. Inf. Comput. Sci. 1992, 32, 306. (27) Egolf, L. M.; Jurs, P. C. J. Chem. Inf. Comput. Sci. 1993, 33, 616. (28) Egolf, L. M.; Wessel, M. D.; Jurs, P. C. J. Chem. Inf. Comput. Sci. 1994, 34, 947. (29) Wessel, M. D.; Jurs, P. C. J. Chem. Inf. Comput. Sci. 1995, 35, 68. (30) Kamlet, M. J.; Doherty, R. M.; Taft, R. W.; Abraham, M. H.; Koros, W. J. J. Am. Chem. Soc. 1984, 106, 1205. (31) Le, T. D.; Weeres, J. G.; J. Phys. Chem. 1995, 99, 6739. (32) Katritzky, A. R.; Lobanov, V. S.; Karelson, M.; Murugan, R.; Grendze, M. P.; Toomey, J. E., Jr. Unpublished work. (33) PCMODEL, 5th ed.; Serena Software: Bloomington, IN, 1992. (34) Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F. and Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107, 3902. (35) Stewart, J. J. P. MOPAC Program Package 6.0. QCPE 455, 1990. (36) Katritzky, A. R.; Lobanov, V. S.; Karelson, M. CODESSA Training Manual; 1995. (37) Stankevich, M. I.; Stankevich, I. V.; Zefirov, N. S. Russ. Chem. ReV. 1988, 57, 191. (38) Rohrbaugh, R. H.; Jurs, P. C. Anal. Chim. Acta 1987, 199, 99. (39) Katritzky, A. R.; Ignatchenko, E. S.; Barcock, R. A.; Lobanov, V. S.; Karelson, M. Anal. Chem. 1994, 66, 1799. (40) Murugan, R.; Grendze, M. P.; Toomey, J. E., Jr.; Katritzky, A. R.; Karelson, M.; Lobanov, V. S. and Rachwal, P. CHEMTECH 1994, 24, 17. (41) Karelson, M.; Lobanov, V. S.; Katritzky, A. R. Chem. ReV., submitted for publication. (42) Handbook of Chemistry and Physics; Weast, R. C., Ed.; CRC Press: Cleveland, OH, 1976. (43) Stewart, J. J. P. J. Comput. Chem. 1989, 10, 209 (44) User’s Manual, AMPAC 5.0; Semichem, 7128 Summit, Shawnee, KS, 1994; Chapter 7. (45) Draper, N. R.; Smith, H. Applied Regression Analysis; Wiley: New York, 1966. (46) Reading and Understanding MultiVariate Statistics; Grimm, L. G., Yarnold, P. R. Eds.; American Psychological Association: Washington, DC, 1994; p 45.

JP953224Q