6860
Ind. Eng. Chem. Res. 2006, 45, 6860-6863
CORRELATIONS Kinney Revisited: An Improved Group Contribution Method for the Prediction of Boiling Points of Acyclic Alkanes Joseph A. Palatinus,† Cassandra M. Sams,† Christopher M. Beeston,† Felix A. Carroll,*,† Andre´ B. Argenton,‡ and Frank H. Quina‡ Department of Chemistry, DaVidson College, DaVidson, North Carolina 28035, and Instituto de Quı´mica, UniVersidade de Sa˜ o Paulo, CP 26077, Sa˜ o Paulo 05513-970 and Centro de Capacitac¸ a˜ o e Pesquisa em Meio Ambiente (CEPEMA-USP), Cubata˜ o, Brazil
The classical Kinney method for predicting the boiling points of acyclic alkanes is taken as the starting point for the development of a much more accurate group contribution method developed using multiparametric linear regression. The procedure involves calculating a revised “boiling point number” (YR) from a count of structural features, including the length of the longest carbon chain, the nature and location of substituents, and the overall shape of the molecule. For a combined data set of 198 acyclic alkanes having from 6 to 30 carbon atoms, the correlation of predicted and literature boiling points has an R2 of 0.999 and an average absolute deviation of 1.45 K. Thus, the method reported here is comparable in accuracy to, but much easier to apply than, more elaborate molecular connectivity, nonlinear regression, and neural network methods that were developed for narrower ranges of molecular weights. Introduction The correlation of physical properties with molecular structure continues to be an active area of investigation. The properties of the acyclic alkanes are of particular interest because their intermolecular interactions are generally considered to be dominated by dispersion forces, without complications from dipole-dipole attractions or hydrogen bonding. Therefore, accurate methods to predict the physical properties of alkanes can provide insight into the subtle differences in dispersion interactions among constitutional isomers. Recently, we reported the development of a new molecular volume parameter, VY, which can be used to predict a variety of physical properties of acyclic alkanes, including vapor pressures, log L16 values, HPLC and GC retention indices, and aqueous solubilities.1 The VY values were calculated from experimental “boiling point number” (YBP) values, as shown in eq 1. In turn, the YBP values were obtained from literature boiling points through the relationship shown in eq 2.
VY ) 0.0498YBP + 0.0015 YBP )
(
)
BP(K) + 270 230.14
(1)
3
(2)
One constraint on the prediction of physical properties with VY values is the limited availability of experimental boiling point data for the alkanes. Although literature values of the boiling points of acyclic alkanes having up to 10 carbon atoms are available, reliable boiling point values are available for only some of the 159 constitutional isomers of undecane or the 355 constitutional isomers of dodecane,2,3 and few experimental * To whom correspondence should be addressed. Tel.: (704) 8942544. Fax: (704) 894-2709. E-mail:
[email protected]. † Davidson College. ‡ Instituto de Quı´mica-USP and CEPEMA-USP.
values have been reported for higher molecular weight branched alkanes. Therefore, a boiling point prediction method must be used to obtain the VY values of many alkanes of interest. Among the approaches to predicting the boiling points of alkanes that have been reported are: (i) group contribution methods based on counts of structural features, such as the number of atoms or groups of a particular type;4,5 (ii) mathematical correlations based on molecular graphs or atom connectivity patterns;6 and (iii) computational methods incorporating nonobservable parameters, such as theoretical functions of a molecular surface.7,8 Furthermore, in principle, each of these approaches may be based on a linear correlation of boiling point with the chosen parameters or on nonlinear correlations, often carried out through neural network analysis.9,10 Each approach has its particular merits and limitations. The computational and neural network methods can offer useful accuracy, but to the inexperienced practitioner they may seem to be “black box” procedures that depend on unfamiliar theoretical concepts. Molecular connectivity approaches can also provide accurate estimates of boiling points, but only at the expense of considerable mathematical analysis, especially for larger structures. Moreover, the underlying chemical significance of the various connectivity parameters may be uncertain.11,12 Group contribution methods based on simple structural counts are the most straightforward and, thus, may be more readily applied by chemists or engineers,7 but such methods have had limited accuracy. To extend the utility of our method for predicting alkane physical properties from YBP values, we sought to develop a group contribution method for predicting boiling points that is comparable in accuracy to molecular connectivity and computational approaches but which is easier to use. One of the earliest empirical approaches for predicting alkane boiling points directly from structure was a group contribution method offered by Kinney.13,14 In this scheme, the normal boiling point of an acyclic alkane was predicted from its
10.1021/ie0604425 CCC: $33.50 © 2006 American Chemical Society Published on Web 08/25/2006
Ind. Eng. Chem. Res., Vol. 45, No. 20, 2006 6861
Figure 1. Correlation of predicted and literature boiling points for 198 acyclic alkanes using the original Kinney method. The diagonal line represents a perfect correlation of literature and predicted values.
calculated boiling point number (Y) with eq 3, where the Y value of the alkane was obtained with eq 4.
BP(K) ) 230.14Y1/3 - 270
(3)
Y ) 0.8C + H + 3.05M + 5.5E + 7P - 0.4D + 0.5V2 or 3 + V4+ (4) Here, C is the number of carbon atoms in the longest carbon chain, H is the number of hydrogen atoms attached to this main chain, M is the number of methyl groups, E is the number of ethyl groups, and P is the number of propyl groups. To account for the effect of specific substitution patterns, D is the number of 2,2-dimethyl groups, V2 or 3 ) 1 for structures having either two or three adjacent substituents on a chain of six carbons or fewer (otherwise it is 0), and V4+ ) 1 for compounds having four or more adjacent substituents on a chain of six carbons or fewer (otherwise it is 0). We evaluated the original Kinney method with a set of 198 linear and branched acyclic alkanes having from 6 to 30 carbon atoms. The experimental data for the n-alkanes and for branched alkanes having from 6 to 10 carbon atoms were taken from the compilation by Wilhoit and Zwolinski.15 Boiling points for branched alkanes having more than 10 carbon atoms were taken from a listing by Cao et al.16 and from the NIST Chemistry WebBook.17 The Kinney Y values for these 198 compounds were calculated with eq 4 (as described in detail in the Supporting Information), and then the boiling points were predicted with eq 3. The results are shown in Figure 1. The correlation between predicted and reported boiling points has an R2 of 0.996 and a standard error of 5.01. The average absolute deviation (AAD) between literature and predicted values is 4.46 K. Figure 1 reveals two important limitations of Kinney’s method. First, there is considerable scatter where there are data points for many isomeric compounds. Second, there is a pronounced deviation between literature and experimental boiling points for the higher molecular weight compounds. For example, the difference between literature and predicted values is >23 K in the case of triacontane. Nevertheless, the conceptual simplicity and easy application of the Kinney approach are highly attractive features of any boiling point prediction method. Therefore, we sought to improve Kinney’s original method by
Figure 2. Correlation of literature boiling points for 25 linear alkanes with values predicted using eq 5. The diagonal line represents a perfect correlation of literature and predicted values.
reevaluating the relationship between boiling point and Y values and also by reconsidering the structural parameters used to predict Y. The results presented here considerably improve the accuracy of the Kinney method, particularly with regard to the higher molecular weight alkanes. Results and Discussion Figure 1 demonstrated that the dependence of boiling point on Y1/3 values is more complex than the linear relationship (eq 3) proposed by Kinney. To improve this relationship, we considered first a data set consisting of only n-alkanes having from 6 to 30 carbon atoms. A cubic equation gave an excellent correlation between boiling points and Kinney’s original Y1/3 values for the normal alkanes, but it did not afford an analytical solution that would enable easy calculation of experimental YBP values of the branched alkanes from their boiling points. As we have noted, there is an inherent tension between mathematical complexity and conceptual simplicity in chemical models,18 and the most accurate model is not always the most useful one. Therefore, we decided to use a simpler quadratic relationship of the Y1/3 parameter (eq 5, where the best fit for the boiling points of the linear alkanes was obtained when a ) -16.802, b ) 337.377, and c ) -437.8835) that is nearly as accurate as the cubic equation over the range of molecular weights considered.
BP(K) ) aYBP2/3 + bYBP1/3 + c
(5)
As shown in Figure 2, this approach corrects the curvature in Figure 1 quite well. Solving eq 5 for YBP1/3 produced eq 6, which allows the calculation of more accurate YBP values of higher molecular weight branched alkanes from their experimental boiling points than is possible with eq 2.
YBP1/3 )
[
]
-b + xb2 - 4a(c - BP) 2a
(a ) -16.802, b ) 337.377, and c ) -437.883)
(6)
6862
Ind. Eng. Chem. Res., Vol. 45, No. 20, 2006
Next, we divided the total data set of 198 linear and branched alkanes randomly into a training set of 126 compounds and a test set of 72 compounds. Using eq 6, we calculated experimental YBP values directly from literature boiling points for the 126 compounds in the training set. Then we used multiparametric linear regression to explore a series of correlations of these YBP values with different sets of structural parameters. The Y values predicted by these correlations were designated YR (for revised Kinney) values. With each of these correlations, we looked for common features among compounds with predicted YR values that did not closely match the experimental YBP values, and then we modified the set of structural parameters used in the next correlation. It seemed obvious to retain Kinney’s C, the number of carbon atoms in the longest carbon chain, and also to include counts of the number of methyl, ethyl, and propyl groups attached to this chain. We did not include Kinney’s parameter H, however, because the number of hydrogen atoms on the main chain is dependent on the number of carbon atoms in the longest chain and the number of substituents attached to it. Another change was in the counting of methyl groups. Kinney assumed the same contribution to Y for all methyl groups, although 3-methylalkanes are known to have higher boiling points than isomeric 2-methylalkanes or 4-methylalkanes.19,20 Therefore, we included M3 (for 3-methyl) and M (for methyl substitution on any other carbon) as separate parameters. Similarly, better correlations resulted from using E3 as the count of 3-ethyl groups and E as the count of other ethyl groups. Kinney had observed that some patterns of dimethyl substitution produced different boiling points from others, and he used the D parameter to account for the substantially lower boiling points of 2,2-dimethylalkanes in comparison with isomeric structures with the methyl groups at other positions. We retained D as a specific parameter for 2,2-dimethyl substitution, but we added the parameter G for all other geminal disubstitutions. Kinney used two parameters (denoted above as V2 or 3 and V4+) to account for the effects of vicinal substitution. Those parameters are not very sensitive to structure, however. Each of them takes the same value for more than one substitution pattern, and they were counted only for structures with six or fewer carbon atoms in the main chain. We decided to use just one such parameter, V, which is the total count of vicinal alkyl relationships, and we included this parameter for all of the compounds in the data set. Our preliminary correlations suggested that substitution patterns on nonadjacent carbons can also influence boiling points. Therefore, we included the parameter T, which is the number of groupings with two methyl substituents on both carbons 1 and 3 of a three-carbon segment of the main chain. (Examples of the use of these parameters are provided in the Supporting Information.) We also incorporated a simple shape parameter S, defined as the square of the ratio of the total number of carbons to the number of carbons in the longest chain. This approach resulted in the correlation shown in eq 7. Statistical details of the correlation are provided in the Supporting Information.
YR ) 1.726 + 2.779C + 1.716M3 + 1.564M + 4.204E3 + 3.905E + 5.007P - 0.329D + 0.241G + 0.479V + 0.967T + 0.574S (7) (n ) 126, R2 ) 0.999, standard error ) 0.289, F ) 24 575) The YR values calculated with eq 7 were then used in eq 5 to predict the normal boiling point for each of the 126 compounds in the training set, and the results were compared with the
Figure 3. Correlation of literature values for alkane boiling points with those predicted using eqs 5 and 7 for 126 linear and branched alkanes in the training set. The solid line represents a perfect fit of literature and predicted values.
Figure 4. Correlation of literature values for alkane boiling points with those predicted using eq 5 for 72 linear and branched alkanes in the test set. The solid line represents a perfect fit of literature and predicted values.
literature values for these compounds (as shown in Figure 3). The correlation between literature and predicted boiling points is R2 ) 0.999, the standard error is 1.87, and the average absolute deviation (AAD) between predicted and reported boiling points is 1.45 K. Equation 7 was also used to predict the YBP values of the 72 compounds in the test set, and their boiling points were then predicted with eq 5. The resulting correlation of literature and predicted boiling points had an R2 of 0.999 and a standard error of 1.81. The AAD between literature and predicted boiling points for the test set was 1.45 K. Figure 4 shows a plot of the excellent correlation of predicted and literature boiling points
Ind. Eng. Chem. Res., Vol. 45, No. 20, 2006 6863
for the test set. For the combined data set of 198 compounds, the AAD between literature and predicted boiling points was also 1.45 K. In principle, many more parameters could be included in a correlation such as eq 5. For example, 2-methyl, 3-methyl, 4-methyl, etc. substitutions might each be counted with a different parameter. However, the average percent absolute error between literature and predicted boiling points for the combined data set in this study (0.32%) is much less than the estimated 1% average error estimated for literature boiling point values.23 Thus, we believe that the accuracy of the method reported here is as good as can be expected given the inherent uncertainty of the experimental data. It is difficult to compare the results of this study with those of other methods because of the different data sets used in the various studies. In particular, many of the other studies do not include undecanes and dodecanes, and very few other studies include higher molecular weight compounds. Nevertheless, it is instructive that Cordes and Rarey compared five group contribution methods for predicting the boiling points of a set of 166 acyclic alkanes and found AADs ranging from 6.5 to 26.7 K.21 Iwai et al. reported a second-order group contribution method, requiring the counting of 17 different substitution patterns, that predicted the boiling points of the C6 to C10 acyclic alkanes with an AAD of 1.26 K.22 A neural network based method produced a 1.3 K AAD for a set of 140 alkanes having from 1 to 10 carbon atoms,23 a connectivity method yielded a 1.47 K AAD for a set of 143 C6 to C10 acyclic alkanes,24 and a model based on molecular polarizability gave a 5 K AAD for a set of 152 C6 to C12 acyclic alkanes.16 Thus, the method presented here covers a much greater range of molecular sizes than do most other methods, yet its accuracy compares quite favorably with that of those methods. Conclusions The group contribution method reported here allows highly accurate prediction of the boiling points of acyclic alkanes directly from structure, yet it requires nothing more than pencil, paper, and a hand-held calculator. Therefore, this approach is a useful alternative to more elaborate molecular connectivity, multiple nonlinear regression, or neural network methods for predicting the boiling points of acyclic alkanes. Combined with our previous work,1 the present results allow the prediction of a wide range of physical properties of acyclic alkanes from molecular structure. Acknowledgment Financial and fellowship support from FAPESP (Fundac¸ a˜o de Amparo a` Pesquisa do Estado de Sa˜o Paulo), CAPES (Coordenac¸ a˜o de Aperfeic¸ oamento de Pessoal de Nı´vel Superior), CNPq (Conselho Nacional de Desenvolvimento Cientı´fico e Tecnolo´gico), and Davidson College are gratefully acknowledged. This work was performed in part during a visiting professorship (F.H.Q.) at Wake Forest University, WinstonSalem, NC. Supporting Information Available: Included are data sets of acyclic alkanes along with their literature boiling points; counts of the structural parameters used in eqs 4 and 7; examples of Y and YR calculations; tables of Y, YBP, and YR values; boiling points predicted using the Y and YR values; and statistical data for eq 7. This material is available free of charge via the Internet at http://pubs.acs.org. Literature Cited (1) Palatinus, J. A.; Carroll, F. A.; Argenton, A. A.; Quina, F. H. An Improved Characteristic Molecular Volume Parameter for Linear Solvation
Energy Relationships of Acyclic Alkanes. J. Phys. Org. Chem., accepted for publication. (2) Henze, H. R.; Blair, C. M. The Number of Isomeric Hydrocarbons of the Methane Series. J. Am. Chem. Soc. 1931, 53, 3077. (3) Trinajstic´, N.; Nikolic´, S.; Knop, J. V.; Mu¨ller, W. R.; Szymanski, K. Computational Chemical Graph Theory: Characterization, Enumeration and Generation of Chemical Structures by Computer Methods; Ellis Horwood: New York, 1991. (4) Boethling, R. S.; Mackay, D. Handbook of Property Estimation Methods for Chemicals: EnVironmental and Health Sciences; Lewis Publishers: Boca Raton, FL, 2000; Chapter 2. (5) Lyman, W. J.; Reehl, W. F.; Rosenblatt, D. H. Handbook of Chemical Property Estimation Methods: EnVironmental BehaVior of Organic Compounds; McGraw-Hill: New York, 1982; Chapter 12. (6) Randic´, M. Wiener-Hosoya IndexsA Novel Graph Theoretical Molecular Descriptor. J. Chem. Inf. Comput. Sci. 2004, 44, 373. (7) Dyekjær, J. D.; Jo´nsdo´ttir, S. O Ä . QSPR Models Based on Molecular Mechanics and Quantum Chemical Calculations. 2. Thermodynamic Properties of Alkanes, Alcohols, Polyols, and Ethers. Ind. Eng. Chem. Res. 2003, 42, 4241. (8) Ehresmann, B.; de Groot, M. J.; Alex, A.; Clark, T. New Molecular Descriptors Based on Local Properties at the Molecular Surface and a Boiling-Point Model Derived from Them. J. Chem. Inf. Comput. Sci. 2004, 44, 658. (9) Lai, W. Y.; Chen, D. H.; Maddox, R. N. Application of a Nonlinear Group-contribution Model to the Prediction of Physical Constants. 1. Predicting Normal Boiling Points with Molecular Structure. Ind. Eng. Chem. Res. 1987, 26, 1072. (10) Lucˇic´, B.; Amic´, D.; Trinajstic´, N. Nonlinear Multivariate Regression Outperforms Several Concisely Designed Neural Networks on Three QSPR Data Sets. J. Chem. Inf. Comput. Sci. 2000, 40, 403. (11) Randic´, M.; Zupan, J. On Interpretation of Well-Known Topological Indices. J. Chem. Inf. Comput. Sci. 2001, 41, 550. (12) Charton, M. The Nature of Topological Parameters. I. Are Topological Parameters Fundamental Properties? J. Comput.-Aided Mol. Des. 2003, 17, 197. (13) Kinney, C. R. A System Correlating Molecular Structure of Organic Compounds with their Boiling Points. I. Aliphatic Boiling Point Numbers. J. Am. Chem. Soc. 1938, 60, 3032. (14) Kinney, C. R. Calculation of Boiling Points of Aliphatic Hydrocarbons. Ind. Eng. Chem. 1940, 32, 559. (15) Wilhoit, R. C.; Zwolinski, B. J. Handbook of Vapor Pressures and Heats of Vaporization of Hydrocarbons and Related Compounds; Thermodynamics Research Center: College Station, TX, 1971; pp 21-35. (16) Cao, C.; Liu, S.; Li, Z. On Molecular Polarizability: 2. Relationship to the Boiling Point of Alkanes and Alcohols. J. Chem. Inf. Comput. Sci. 1999, 39, 1105. (17) Brown, R. L.; Stein, S. E. Boiling Point Data. In NIST Chemistry WebBook, NIST Standard Reference Database Number 69; Linstrom, P. J., Mallard, W. G., Eds.; National Institute of Standards and Technology: Gaithersburg, MD, June 2005 (http://webbook.nist.gov). (18) Carroll, F. A. PerspectiVes on Structure and Mechanism in Organic Chemistry; Brooks/Cole: Pacific Grove, CA, 1997. (19) Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17. (20) Wiener, H. Correlation of Heats of Isomerization, and Differences in Heats of Vaporization of Isomers, Among the Paraffin Hydrocarbons. J. Am. Chem. Soc. 1947, 69, 2636. (21) Cordes, W.; Rarey, J. A new method for the estimation of the normal boiling point of nonelectrolyte organic compounds. Fluid Phase Equilib. 2002, 201, 409. (22) Iwai, Y.; Yamanaga, S.; Arai, Y. Calculation of normal boiling points for alkane isomers by a second-order group contribution method. Fluid Phase Equilib. 1999, 163, 1. (23) Espinosa, G.; Yaffe, D.; Cohen, Y.; Arenas, A.; Giralt, F. Neural Network Based Quantitative Structural Property Relations (QSPRs) for Predicting Boiling Points of Aliphatic Hydrocarbons. J. Chem. Inf. Comput. Sci. 2000, 40, 859. (24) Burch, K. J.; Wakefield, D. K.; Whitehead, E. G., Jr. Boiling Point Models of Alkanes. MATCH 2003, 47, 25.
ReceiVed for reView April 8, 2006 ReVised manuscript receiVed July 5, 2006 Accepted July 13, 2006 IE0604425