A Group-Contribution Method for Predicting Pure Component

Jul 29, 2004 - Table 2 lists the second-order group contributions for the same properties. .... flashpoint. 418 16.530 14.733 ... List of Symbols. AAE...
0 downloads 0 Views 89KB Size
Ind. Eng. Chem. Res. 2004, 43, 6253-6261

6253

A Group-Contribution Method for Predicting Pure Component Properties of Biochemical and Safety Interest Emmanuel Stefanis,† Leonidas Constantinou,‡ and Costas Panayiotou*,† Department of Chemical Engineering, Aristotle University of Thessaloniki, GR 54124, Thessaloniki, Greece and Frederick Research Center, P.O. Box 24729, Nicosia, Cyprus

A simple, yet quite accurate method for predicting properties of organic compounds of environmental and nutraceutical interest is presented. It is an extension of a previous successful group-contribution method (Constantinou, L.; Gani, R. AIChE J. 1994, 40, 1697) and uses two kinds of groups: first-order groups that describe the basic molecular structure of the compounds and second-order groups, which are based on conjugation theory and improve the accuracy of the predictions. Twenty-six new first-order groups have been defined to ensure that the molecular structures of any compound of biochemical interest, including complex aromatic, multiring, and heterocyclic compounds, can be easily described. Furthermore, 12 new second-order groups have been defined to enhance the reliability of the predictions and the applicability of the method. The three properties that have been estimated by the new method are the octanol-water partition coefficient (logKow), the total (Hildebrand) solubility parameters at 25 °C, and the flash point. These properties have many applications in the chemical, pharmaceutical, and food industries, as well as in the protection of the environment. Introduction Computer-aided molecular design is a very important process and tool that is used for the prediction of properties of organic compounds when reliable experimental data are not available, for checking questionable values of already measured properties, and especially for the selection of compounds with desired properties. In the past several decades, many group-contribution methods have been widely used for the prediction of physicochemical properties of pure organic compounds. One of the first widely used group-contribution methods was the UNIFAC method,1 where the value of each property was obtained as the sum of contributions of simple first-order groups. The methods of Joback and Reid2 and of Horvath3 are also methods of this kind. More recently, a new class of group-contribution methods has been proposed. In this kind of method,4,5 secondorder groups are defined to provide more structural information, to distinguish isomers, and to afford more accurate predictions. Second-order groups have a strong physicochemical meaning and can significantly improve the accuracy of property predictions. The definition of second-order groups is based on the theory of conjugation operators.5,6 Marrero and Gani7 introduced a higher level of approximation by defining third-order groups to provide more structural information about systems of fused aromatic and nonaromatic rings. The Existing Constantinou-Gani Model One of the most accurate group-contribution methods is the model proposed by Constantinou and Gani.4 According to this model, the molecular structure of each organic compound can be described by using two kinds * To whom correspondence should be addressed. Fax: +302310996222. E-mail: [email protected]. † Aristotle University of Thessaloniki. ‡ Frederick Research Center.

of functional groups: first-order groups (UNIFAC groups) and second-order groups that are based on conjugation theory. The second-order groups give a physical meaning to the method, and this is an advantage compared to the other group-contribution methods. These groups improve the accuracy of the predictions significantly. The definition of second-order groups is based on the ABC framework.5 According to this framework, each compound is represented as a hybrid of many conjugate forms. Every conjugate form is considered as a structure with integer-order localized bonds and integer charges on atoms. The purely covalent conjugate form is the dominant conjugate, and the ionic forms are the recessive conjugates, which are generated by using a conjugate operator. When a conjugate operator is applied to a dominant conjugate, it can generate a series of recessive conjugates. Conjugate operators consist of subchains with two or three bonds, such as OdC-C or C-C-C-H. In this theory, the properties of each compound are estimated by combining the corresponding properties of its conjugate forms. The properties of the conjugate forms are estimated through conjugation operators. Each operator has a fixed contribution, which is determined through regression and reflects the contribution of a whole series of conjugate forms. The basic property in the ABC framework is the standard enthalpy of formation at 298 K. For the estimation of this property, the contributions of the conjugate forms can be expressed in terms of their physical significance rather than adjustable parameters. Therefore, the most important conjugate forms, i.e., the forms that exert the strongest influence on the standard enthalpy of formation (and also on the other properties to be estimated), can be distinguished. The conjugation operators that are related to the most important conjugate forms make the largest contributions.6 It is possible to identify the classes of conjugate forms with the highest conjugation activity by examining the contributions of their operators. The identification of

10.1021/ie0497184 CCC: $27.50 © 2004 American Chemical Society Published on Web 07/29/2004

6254 Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004 Table 1. First-Order Group Contributions for logKow, Total Solubility Parameter (25 °C), and Flash Point contributions group

logKow

total solubility parameter

flash point

sample group assignment (occurrences)

-CH3 -CH2 -CH< >C< CH2dCH-CHdCHCH2dC< -CHdC< >CdC< CH2dCdCHCHECCEC ACH AC ACCH3 ACCH2ACCH< CH3CO CH2CO CHO COOH CH3COO CH2COO HCOO COO OH ACOH CH3O CH2O CHO C2H5O2 CH2O (cyclic) CH2NH2 CHNH2 CH3NH CH2NH CHNH CH3N CH2N ACNH2 CONH2 CONHCH3 CONHCH2 CON(CH3)2 CON(CH2)2 C5H4N C5H3N CH2SH CH3S CH2S CHS I BR CH2Cl CHCl CCl CHCl2 CCl3 ACCl ACF Cl-(CdC) CF3 CH2NO2 CHNO2 ACNO2 CH2CN CF2 C4H3S F (except as above) CH2dCdC< CHdCdCHCHCO O (except as above) Cl (except as above) NH2 (except as above) >CdN-

0.6998 0.4707 0.0405 -0.4723 0.9737 0.6749 0.8361 0.1234 2.6256 0.2159 0.1597 0.3633 0.2497 0.7748 0.4036 0.1910 -0.5433 -0.9379 -0.3524 -0.5994 -0.3164 -0.7454 -0.9078 -0.8772 -1.0577 -0.2851 -0.3088 -0.8032 -0.5994 -1.5767 -0.4673 -0.9178 -2.0541 -0.7114 -1.1628 -1.5361 -0.7715 -1.1007 -0.7834 -1.8463 -1.5663 -1.8559 0.3803 0.5426 0.2730 -0.0200 -0.0961 1.0874 0.7195 0.7601 0.4039 -0.3759 0.9738 0.4075 -0.0890 0.8115 -0.3970 -0.6961 0.0451 -0.6326 1.5387 -1.0761 -0.0431 0.2919 0.0088 -0.5341

-2308.6 -277.1 -355.5 -176.2 -2766.2 -381.9 -980.2 1887.1 1601.8 -3745.0 -975.5 2169.3 -6.4 684.3 -221.8 1023.4 605.5 3269.1 7274.2 5398.2 9477.8 1865.1 5194.2 1716.0 3671.8 12228.9 8456.1 -480.8 -206.7 1229.1 3733.9 3650.7 560.4 8616.2 4183.8 3381.8 2166.5 -2662.6 9228.4 14930.1 27386.9 12770.8 4686.3 6574.7 2191.2 1271.1 3585.2 3183.8 2163.8 1923.3 426.3 -1415.6 1164.0 -1208.7 1332.2 -701.5 -473.5 -5199.5 10030.7 12706.7 6303.5 9359.8 -3464.4 4722.7 -2965.3 -2326.1 -795.6 7805.8 2467.6 636.3 -841.5 3380.7

0.6 10.8 12.2 12.4 10.3 19.1 19.1 37.9 21.8 24.5 11.5 20.5 23.8 31.5 36.4 47.4 69.7 35.5 97.3 53.0 61.9 38.4 54.1 60.3 68.5 22.4 29.8 28.8 91.9 28.1 39.4 33.6 44.4 41.3 51.4 34.1 87.4 150.1 126.3 104.3 46.1 63.8 44.2 37.6 37.8 105.8 40.7 -15.7 7.6 -6.2 81.5 96.5 69.7 6.7 47.6 -

propane (2) butane (2) isobutane (1) neopentane (1) propylene (1) cis-2-butene (1) isobutene (1) 2-methyl-2-butene (1) 2,3-dimethyl-2-butene (1) 1,2-butadiene (1) propyne (1) 2-butyne (1) benzene (6) naphthalene (2) toluene (1) m-ethyltoluene (1) sec-butylbenzene (1) methyl ethyl ketone (1) cyclopentanone (1) 1-butanal (1) vinyl acid (1) ethyl acetate (1) methyl propionate (1) n-propyl formate (1) ethyl acrylate (1) 2-propanol (1) phenol (1) methyl ethyl ether (1) ethyl vinyl ether (1) diisopropyl ether (1) 2-methoxyethanol (1) 1,4-dioxane (2) 1-amino-2-propanol (1) isopropylamine (1) n-methylaniline (1) di-n-propylamine (1) diisopropylamine (1) trimethylamine (1) triethylamine (1) aniline (1) 2-methacrylamide (1) n-methylacetamide (1) n-butylacetamide (1) n,n-dimethylacetamide (1) n,n-diethylacetamide (1) 2-methylpyridine (1) 2,6-dimethylpyridine (1) n-butyl mercaptan (1) methyl ethyl sulfide (1) diethyl sulfide (1) diisopropyl sulfide (1) isopropyl iodide (1) 2-bromopropane (1) n-butyl chloride (1) isopropyl chloride (1) tert-butyl chloride (1) 1,1-dichloropropane (1) benzotrichloride (1) m-dichlorobenzene (2) fluorobenzene (1) 2,3-dichloropropene (1) perfluorohexane (2) 1-nitropropane (1) 2-nitropropane (1) nitrobenzene (1) n-butyronitrile (1) perfluoromethylcyclohexane (5) 2-methylthiophene (1) 2-fluoropropane (1) 3-methyl-1,2-butadiene (1) 2,3-pentadiene (1) diisopropyl ketone (1) divinyl ether (1) hexachlorocyclopentadiene (2) melamine (3) 2,4,6-trimethylpyridine (1)

Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004 6255 Table 1 (Continued) contributions group

logKow

total solubility parameter

flash point

sample group assignment (occurrences)

-CHdNNH (except as above) NdNCN (except as above) NO2 (except as above) OdCdNCHSH CSH SH (except as above) S (except as above) SO2 >CdS OdP< >P>CdO (except as above) NHCO -NdO N (except as above)

-0.5597 -0.5432 -0.0403 -0.8772 0.5157 0.3793 -2.7494 -3.4646 -0.6046 -0.5028 -1.0134 -0.1526 -0.5546

5026.4 3459.4 -7339.6 10253.0 1655.1 2694.6 1234.8 2230.2 4770.2 14215.0 26271.8 -1643.4 -

38.8 60.1 42.3 45.9 17.8 190.7 2.7 -

isoquinoline (1) dibenzopyrrole (1) p-aminoazobenzene (1) cis-crotonitrile (1) nitroglycerine (3) n-butyl isocyanate (1) cyclohexyl mercaptan (1) tert-butyl mercaptan (1) 2-mercaptobenzothiazole (1) thiophene (1) sulfolene (1) n-methylthiopyrrolidone (1) bis-2-chloroethyl-2-chloroethyl phosphonate (1) triphenylphosphine (1) anthraquinone (2) phenylurea (1) nitrosobenzene (1) triphenylamine (1)

second-order groups is based on the operators with much higher contributions than the others. The structure of a second-order group should incorporate a subchain with at least one important conjugate operator. Because of a possible structural similarity of the operators, a second-order group can contain more than one conjugate operators. For example, the second-order group CH3COCH2 contains the following operators: OdC-C, OdC-C-H, and C-C-C-H. The structure of a second-order group should be built with first-order groups and should be as small as possible. The methodology that is followed for the identification of second-order groups is as follows:8 (a) identification of all first-order groups present in the syntactic type of a given compound; (b) definition of all possible substructures of two or three adjacent first-order groups; (c) identification of all two-bond and three-bond conjugation operators in the substructures; (d) estimation of the conjugation operator energy of all substructures by addition of the energies of all of the conjugation operators; and (e) identification of substructures with much higher conjugation energies than the others. These substructures are the second-order groups. The basic equation that gives the value of each property according to the molecular structure is

f(p) )

∑i niFi + ∑j mjSj

(1)

where Fi is the contribution of the first-order group of type i that appears ni times in the compound and Sj is the contribution of the second-order group of type j that appears mj times in the compound. f(p) is a single equation for the property under consideration, p, and is selected after a thorough study of the physicochemical and thermodynamic behavior of the property. The determination of the contributions is made by a twostep regression analysis. In the first step of the regression, the aim is to determine the contributions of the first-order groups only (that is, the Fi). In the second step, using the Fi contributions, the second-order groups are activated, and the second-order group contributions (Sj) are calculated through regression. These contributions act as a correction to the first-order approximation. Ten properties have been estimated so far by the Constantinou-Gani method. These are the critical

temperature, Tc; critical pressure, Pc; critical volume, Vc; melting point, Tm; normal boiling point, Tb; standard Gibbs energy at 298 K, ∆Gf; standard enthalpy of vaporization at 298 K, ∆Hvap; standard enthalpy of formation at 298 K, ∆Hf;4 acentric factor, ω; and liquid molar volume at 298 K, Vl.9 Recently,10 the method has been extended to the prediction of polymer properties, such as the glass transition temperature, Tg, through the estimation of three scaling constants, namely, T*, P*, and F*, of the lattice fluid (LF) model of Sanchez and Lacombe.11 The first- and second-order group contributions of each scaling constant were estimated through regression. Proposed Model The model of Constantinou-Gani, even though it is one of the most accurate group-contribution methods,12 cannot be applied to all homologous series of organic compounds because the 78 first-order groups (UNIFAC groups) that it uses cannot describe the molecular structure of complex compounds. Compounds of complex multiring, heterocyclic, and aromatic structures are of significant importance in the chemical, biochemical, pharmaceutical, and food industries, as well as for the environmental protection. The first target of the present work was to introduce a new, accurate model that can predict the properties of both simple- and complex-structured compounds. This was achieved by adding 26 new first-order groups to the 78 already existing UNIFAC groups. The new, simple groups ensure that any molecular structure can be described at an initial, basic level. Thus, the model is able to provide a basic description of each of the compounds that occur in the DIPPR database of thermophysical properties. Twelve new second-order groups were also added to the existing second-order groups of the Constantinou-Gani method to provide more details about the molecular structure of the recently introduced complex compounds, such as fused aromatic and multiring structures. These groups lead to more accurate results, and in some cases, they allow isomers to be distinguished. The second target was to apply the new model to the prediction of important properties of chemical, biochemical, and safety interest. These are (a) octanol-water partition coefficients (logKow), (b) total (Hildebrand) solubility parameters at 25° C, and (c) flash points.

6256 Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004

The octanol-water partition coefficient (logKow) is defined as the common logarithm (base 10) of the ratio of a compound’s concentration in n-octanol to its concentration in water in a two-phase system in equilibrium. The logKow of n-octanol is equal to 3.00. This means that the concentration of n-octanol that is diluted in the n-octanol of the two-phase system is considered to be 1000 times greater than that in water (log 1000 ) 3). The total (Hildebrand) solubility parameter is defined as the square root of the cohesive energy density, where the cohesive energy density is the ratio of the cohesive energy (Ecoh) to the molar volume (V) of a compound. Cohesive energy is equal to ∆Hvap - RT, where ∆Hvap is the standard enthalpy of vaporization, R is the universal gas constant, and T is the temperature. Thus

δ)

x

Ecoh ) V

x

∆Hvap - RT V

(2)

The flash point is the minimum temperature at which the vapor pressure of a liquid is sufficient to form an ignitable mixture with air near the surface of the liquid. The sources of reliable experimental data were the handbook Exploring QSARsHydrophobic, Electronic and Steric Constants by Hansch, Leo, and Hoekman for the octanol-water partition coefficients (logKow), the DIPPR database of thermophysical properties14 for the total (Hildebrand) solubility parameters at 25° C; and Fire Protection Guide to Hazardous Materials (National Fire Protection Association)15 for the flash points of organic compounds. A least-squares analysis was carried out to estimate the first-order and second-order contributions for all properties. The modified Levenberg approach was used to minimize the total sum of squared errors between the experimental and predicted values of the properties. This was the criterion for the selection of the most appropriate equation to fit the experimental data. The model is applicable to organic compounds with three or more carbon atoms excluding the atom of the characteristic group (e.g., -COOH or -CHO). In Table 1, the first-order group contributions for the three previously mentioned properties are presented. Table 2 lists the second-order group contributions for the same properties. (Dashes indicate that the contributions of the group in the specific property are not available.) The equations selected for the estimation of each property are as follows:

Octanol-water partition coefficient (logKow) logKow )

∑i niFi + ∑j mjSj + 0.097

(3)

Total solubility parameter (25 °C) [(kJ/m3)(1/2)] solubility parameter ) (

∑i niFi + ∑j mjSj + 75954.1)0.383837 - 56.14 (4)

Flash point (K) flash point )

∑i niFi + ∑j mjSj + 216

(5)

The quantity ∑mjSj is considered to be zero for compounds that do not have second-order groups.

Table 3 illustrates the overall improvement of the estimations of the three properties that was achieved after the introduction of second-order groups in the regression. The following three parameters are used to measure the accuracy of the estimations:

standard deviation (SD) )

∑(Xest - Xexp)2

x

N



1 |Xest - Xexp| N average absolute percent error (AAPE) ) 1 |Xest - Xexp| × 100% N Xexp average absolute error (AAE) )



where N is the number of data points, Xest is the estimated value of the property, and Xexp is the experimental value. Scatter plots of estimated vs experimental values for the three properties are presented in Figures 1-3. In Table 4, the statistical logKow values of the proposed method are compared with the values of similar existing methods of logKow estimation, according to the standard deviations, average absolute errors, and the correlation coefficient r2

r2 )

∑(Xest - Xexp)2 ∑(Xexp - Xexp)2

In Table 5, the standard deviation and the average absolute error of the method of Marrero and Gani18 for logKow estimation and that of the present new method are compared at each level of approximation. Discussion The proposed model features many advantages over existing similar models. First, it is able to describe the basic molecular structure of organic compounds, both simple and complex, with a relatively small set of firstorder groups. In addition, the set of second-order groups, which can provide more accurate results, is probably the only set of functional groups in the literature that has a sound physical meaning because it is based on conjugation theory. The application of the model is simple, and the accuracy is exceptional compared to other existing methods. Concerning the octanol-water partition coefficients (logKow), this model seems to be more accurate than other existing models by Meylan and Howard16 (KowWin software17 is based on this theory) and Marrero and Gani.18 In Tables 4 and 5, one can see that the proposed method has smaller standard deviations, smaller average percent errors, and higher correlation coefficients (r2) between the experimental and estimated values. The other models for estimating logKow16-18 use larger sets of experimental data but also hundreds of groups to describe the molecular structures. These groups have little physical meaning, whereas the secondorder groups of the proposed new method are based on conjugation theory. On the other hand, even though the proposed method employs a significantly smaller number of first- and second-order groups, it still can describe most of the existing molecular structures of organic

Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004 6257 Table 2. Second-Order Group Contributions for logKow, Total Solubility Parameter, and Flash Point contributions group

logKow

total solubility parameter

flash point

sample group assignment (occurrences)

(CH3)2-CH(CH3)3-C-CH(CH3)-CH(CH3)-CH(CH3)-C(CH3)2-C(CH3)2-C(CH3)2ring of 5 carbons ring of 6 carbons -CdC-CdCCH3-Cd -CH2-Cd >C{H or C}-Cd string in cyclic >CHCHO CH3(CO)CH2C(cyclic)dO ACCOOH >C{H or C}-COOH CH3(CO)OC{H or C}< (CO)C{H2}COO (CO)O(CO) ACHO >CHOH >CN{H or C}(in cyclic) -S-(in cyclic) ACBr ACI CH3(CO)CH< ring of 3 carbons ring of 4 carbons ring of 7 carbons ACCOO AC(ACHm)2AC(ACHn)2 Ocyclic-CcyclicdO AC-O-AC CcycHmdNcyc-CcycHndCcycHp NHm-CHn-COOH CHn-O-OH CHm-O-O-CHn NcycHm-CcycdO Ocyc-CcycHmdNcyc -O-CHm-O-CHnAC-NH-AC C(dO)-C-C(dO)

0.0341 0.3415 0.4434 0.3986 0.3436 -0.1577 0.0020 0.1639 -0.2217 -0.1947 0.4878 -0.2086 0.0749 0.0339 0.0229 -0.0238 -0.2811 0.0293 0.1716 -0.2029 0.0732 0.4087 0.2166 0.1263 0.0000 -0.2375 0.2113 0.6085 -0.0047 -0.0814 -0.9326 0.0930 -0.0582 -1.0851 0.2315 -0.0487 0.2923 0.2617 0.9193

142.1 592.3 1581.2 2678.4 5677.6 -2637.7 -524.2 -426.8 11.9 -762.7 -1257.2 626.1 -1634.4 142.0 -3745.0 -3076.5 511.1 134.4 1060.5 -2875.9 3315.0 -359.5 -23.4 5020.6 3306.4 4022.7 -228.5 2493.0 -492.7 2389.4 337.4 1267.1 -437.1 -9764.5 -3673.4 -1486.4 -83.5 -69.8 9215.6 -4646.5 2348.2 -7854.3 2002.5 -2029.1 11489.1 -8721.6 -620.3 2.8 -3668.9

2.7 0.6 -6.8 -24.6 -9.6 -22.8 -4.3 10.2 20.2 23.9 -3.7 1.1 -7.6 9.1 2.2 3.5 8.4 -17.0 6.4 -1.4 -2.8 5.9 21.4 -1.7 -14.4 2.8 -0.5 -3.6 -9.4 3.2 64.8 6.1 0.0 49.3 -2.3 -14.7

isobutane (1) neopentane (1) 2,3-dimethylbutane (1) 2,2,3-trimethylbutane (1) 2,2,3,3-tetramethylpentane (1) cyclopentane (1) cyclohexane (1) 1,3-butadiene (1) isobutene (2) 1-butene (1) 3-methyl-1-butene (1) ethylcyclohexane (1) 2-methylpropanal (1) methyl ethyl ketone (1) cyclopentanone (1) benzoic acid (1) isobutyric acid (1) isopropyl acetate (1) ethyl acetoacetate (1) acetic anhydride (1) benzaldehyde (1) 2-propanol (1) tert-butanol (1) 1,2-propanediol (1) 1-amino-2-propanol (1) cyclohexanol (1) ethyl vinyl ether (1) methyl phenyl ether (1) cyclopentimine (1) tetrahydrothiophene (1) bromobenzene (1) iodobenzene (1) methyl isopropyl ketone (1) cyclopropane (1) cyclobutane (1) cycloheptane (1) methyl benzoate (1) naphthalene (1) diketene (1) diphenyl ether (1) 2,6-dimethylpyridine (1) ethylenediaminetetraacetic acid (4) ethylbenzene hydroperoxide (1) di-tert-butyl peroxide (1) 2-pyrrolidone (1) oxazole (1) methylal (1) dibenzopyrrole (1) 2,4-pentanedione (1)

Table 3. Comparison of the First- and Second-Order Approximations of the Proposed Method SD property logKow total solubility parameter flash point

AAE

data points

first order

second order

first order

422 1017

0.315 1.468

0.267 1.308

0.226 0.996

418

AAPE (%)

second first second order order order 0.188 0.901 5.15

16.530 14.733 11.898 10.689

3.66

4.67 3.27

compounds. Introduction of a third-order approximation level18 makes the method rather inconvenient for scientists to use and also complex for the needs of computer-aided design. In addition, this third-order approximation does not seem to exceed the accuracy of the second-order approximation of the proposed method (see Tables 4 and 5). The key compound that can provide an indication on the accuracy of each logKow method is the n-octanol. Theoretically, the logKow value of n-octanol is 3.00. This is also the experimental value for this compound. As

can be seen in the Appendix, the estimated value of the present new method is 3.03 (absolute error ) 1%). The estimated value for n-octanol obtained by the Marrero and Gani method18 is 2.85 (absolute error ) 4.8%), and the logKow value that the KowWin software17 estimates is 2.81 (absolute error ) 6.3%). The group-contribution methods by van Krevelen and Hoftyser20 and Fedors21 have been used widely for decades for predicting the cohesive energy, liquid molar volume, and solubility parameters of polymers. Fedors’ contributions can also give satisfactory results for the total solubility parameters of pure compounds. In fact, for relatively simple molecules, Fedors’ method is comparable to and sometimes better than ours. However, for complex molecules, as are most of the molecules of biochemical interest, Fedors’ method appears rather inappropriate. We demonstrate this fact by means of examples in the Appendix. The proposed model has an error of -0.5% for salicylic acid, whereas the Fedors’ method give a result for the total solubility parameter

6258 Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004 Table 4. Comparison of the Accuracy of Existing Methods for logKow Estimation KowWin software Marrero, new version 1.6617 Gani18 method standard deviation (SD) absolute average error (AAE) correlation coefficient (r2)

0.44 0.32 0.95

0.34 0.24 0.97

0.27 0.19 0.99

Table 5. Comparison of the Marrero and Gani18 Method and the New Method for the logKow Estimation first-order second-order third-order approximation approximation approximation standard deviation (SD) Marrero, Gani 0.42 new method 0.31 absolute average error (AAE) Marrero, Gani 0.35 new method 0.23 Figure 1. Scatter plot of estimated vs experimental logKow values.

0.38 0.27

0.34

0.27 0.19

0.24

In the literature, there are no relevant groupcontribution methods for estimating the flash point exclusively from the molecular structure and without any other data, so there cannot be any direct comparisons with our method. In the flash point estimation, the mean error is only 3.27% (Table 3), and the average error of the absolute values about 10 K. In some cases, an accuracy of 10 K would not be appropriate for safety purposes, but it could act as a warning when the actual temperature reaches the estimated flash point -10 K. This warning is very important for the safety of the personnel in both laboratories and industries, especially in cases of presence of substances without any experimental data available or in the production of new compounds with completely unknown flammability behavior. Conclusion

Figure 2. Scatter plot of estimated vs experimental total solubility parameters.

With the group contributions and model equations that were presented in this work, crucial environmental and safety properties of organic compounds can be predicted easily and with satisfactory accuracy. By adding the results for the octanol-water partition coefficients (logKow), the total solubility parameters at 25° C, and the flash points to the other 10 properties previously predicted by Constantinou et al.,4,8 the total set of properties of this method is much more complete. In addition, the experimental database of the method now includes a larger number of compounds that cover homologous series with multiring, aromatic, or heterocyclic structures and have special applications in the chemical, pharmaceutical, or food industries. The present method is currently being extended to temperature-dependent properties, such as surface tension, through its combination with the QCHB model.19 Acknowledgment

Figure 3. Scatter plot of estimated vs experimental flash points.

of this compound with an error of 29.5%. In the example of 2-cyclohexyl cyclohexanone, the errors were 0.1% and 8.8% for our method and for Fedors’ method, respectively. In general, the differences in accuracy are probably due to the extensive use of second-order groups that can provide a more detailed description of molecular structures in multiring, heterocyclic, or aromatic compounds of biochemical interest, which was one of the most important scopes of this new method.

The authors express their appreciation to the Cyprus Research Promotion Foundation for the partial financial support of the project. List of Symbols AAE ) average absolute error AAPE ) average absolute percentage error C ) universal constant of the model f(p) ) equation of property p Fi ) contribution of a first-order group of type i logKow ) octanol-water partition coefficient

Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004 6259 mj ) number of occurrences of a second-order group of type j in the compound ni ) number of occurrences of a first-order group of type i in the compound N ) number of data points Pc ) critical pressure R ) universal gas constant r2 ) correlation coefficient SD ) standard deviation Sj ) contribution of a second-order group of type j T ) temperature Tb ) normal boiling point Tc ) critical temperature Tm ) melting point V ) molar volume Vc ) critical volume Vl ) liquid molar volume Xest ) estimated value of a property Xexp ) experimental value of a property δ ) total (Hildebrand) solubility parameter ∆Gf ) standard Gibbs energy ∆Hf ) standard enthalpy of formation ∆Hvap ) standard enthalpy of vaporization ω ) acentric factor

First-order Approximation:

first-order group

occurrences, ni

contribution, Fi

1 1

0.6998 0.3803

-CH3 C5H4N ∑niFi universal constant, C

niFi 0.6998 0.3803 1.0801 0.097

First-order approximation value

logKow )

∑i niFi + 0.097 ) 1.1771

First-order approximation error ) (1.18-1.11)/1.11 ) 6.3% Second-Order Approximation:

second-order group CcycHmdNcycCcycHndCcycHp ∑mjSj

occurrences, mj

contributions, Sj

mj S j

1

-0.0582

-0.0582 -0.0582

Appendix Examples of Predictions of the Octanol-Water Partition Coefficients (logKow). Example 1: 1-Octanol.

first-order group -CH3 -CH2 -OH ∑niFi universal constant, C

occurrences, ni

contribution, Fi

1 7 1

0.6998 0.4707 -1.0577

niFi

Second-order approximation value

logKow )

∑i niFi + ∑i mjSj + 0.097 ) 1.1189

Estimated logKow ) 1.12 Experimental logKow ) 1.11 Percentage error ) (1.12-1.11)/1.11 ) 0.9% Other Methods of Estimation Marrero, Gani: Estimated logKow ) 1.40. Error ) (1.40-1.11)/1.11 ) 26.1% KowWin version 1.66: Estimated logKow ) 1.35. Error ) (1.35-1.11)/1.11 ) -21.6% Examples of Predictions of Total Solubility Parameters. Example 1: Salicylic Acid.

0.6998 3.2949 -1.0577 2.9370 0.097

No second-order groups are involved. First-order approximation value

logKow )

∑i niFi + 0.097 ) 3.034

Estimated logKow ) 3.03 Experimental logKow ) 3.00 Percentage error ) (3.03-3.00)/3.00 ) 1% Other Methods of Estimation Marrero, Gani: Estimated logKow ) 2.85. Error ) (2.85-3.00)/3.00 ) -4.8% KowWin version 1.66: Estimated logKow ) 2.81. Error ) (2.81-3.00)/3.00 ) -6.3% Example 2: 2-Methylpyridine.

First-Order Approximation:

first-order group ACH AC COOH ACOH ∑niFi universal constant, C

occurrences, ni

contribution, Fi

4 1 1 1

-6.4 684.3 9477.8 8456.1

niFi -25.6 684.3 9477.8 8456.1 18592.6 75954.1

First-order approximation value

∑i niFi + C)0.383837 - 56.14

solubility parameter ) (

) (94546.7)0.383837 - 56.14 ) 25.11 (kJ/m3)1/2 First-order approximation error ) (25.11-24.21)/ 24.21 ) 3.7%

6260 Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004

Second-Order Approximation:

second-order group ACCOOH ∑mjSj

occurrences, mj

contributions, Sj

1

-3076.5

mj S j -3076.5 -3076.5

Percentage error ) (18.79-18.77)/18.77 ) 0.1% Other Method of Estimation Fedors: Estimated solubility parameter ) 20.42. Error ) (20.42-18.77)/18.77 ) 8.8% Examples of Predictions of Flash Point. Example 1: Naphthalene.

Second-order approximation value

∑i niFi + ∑i mjSj +

solubility parameter ) (

C)0.383837 - 56.14 ) (91470.2)0.383837 - 56.14 ) 24.09 (kJ/m3)1/2 Estimated solubility parameter ) 24.09 (kJ/m3)1/2 Experimental solubility parameter ) 24.21 (kJ/m3)1/2 Percentage error ) (24.09-24.21)/24.21 ) -0.5% Other Method of Estimation Fedors: Estimated solubility parameter ) 31.36. Error ) (31.36-24.21)/24.21 ) 29.5% Example 2: 2-Cyclohexyl Cyclohexanone.

First-Order Approximation:

first-order group

occurrences, ni

contribution, Fi

8 2 1

11.5 20.5 7274.2

ACH AC ∑niFi universal constant, C

niFi 92 41 133 216

First-order approximation value

flash point )

First-order approximation error ) (349 - 352)/352 ) 0.9% Second-Order Approximation:

First-order approximation:

first-order group

occurrences, ni

contribution, Fi

8 2 1

-277.1 -355.5 7274.2

-CH2 -CH CH2CO∑niFi universal constant, C

niFi

second-order group

-2216.8 -711 7274.2 4346.4 75954.1

AC(ACHm)2AC(ACHn)2 ∑mjSj

∑i niFi + C)

0.383837

solubility parameter ) (

- 56.14

) (80300.5.)0.383837 - 56.14 ) 20.18 (kJ/m3)1/2 First-order approximation error ) (20.18-18.77)/ 18.77 ) 7.5% Second-Order Approximation:

C(cyclic)dO ∑mjSj

occurrences, mj

contributions, Sj

1

-3745

occurrences, mj

contributions, Sj

1

3.2

mj S j 3.2 3.2

Second-order approximation value

flash point )

First-order approximation value

second-order group

∑i niFi + 216 ) 349 K

∑i niFi + ∑j mjSj + 216 ) 352.2 K

Estimated flash point ) 352.2 K Experimental flash point ) 352 K Percentage error ) (352.2 - 352)/352 ) 0.02% Example 2: D-Limonene [Cyclohexene, 1-Methyl-4-(1-methylethenyl)-, (R)-].

mj S j -3745 -3745

First-Order Approximation:

Second-order approximation value

∑i niFi + ∑i mjSj +

solubility parameter ) (

0.383837

C)

first-order group

- 56.14

) (76555.5)0.383837 - 56.14 ) 18.79 (kJ/m3)1/2 Estimated solubility parameter ) 18.79 (kJ/m3)1/2 Experimental solubility parameter ) 18.77 (kJ/m3)1/2

-CH3 -CH2 -CH CH2dC< -CHdC< ∑niFi universal constant, C

occurrences, ni

contribution, Fi

2 3 1 1 1

0.6 10.8 12.2 19.1 37.9

niFi 1.2 32.4 12.2 19.1 37.9 102.8 216

Ind. Eng. Chem. Res., Vol. 43, No. 19, 2004 6261

No second-order groups are involved. First-order approximation value

flash point )

∑i niFi + 216 ) 318.8 K

Estimated flash point ) 318.8 K Experimental flash point ) 318 K Percentage error ) (318.8 - 318)/318 ) 0.25% Example 3: Camphor (Bicyclo-2.2.1-heptan-2one, 1,7,7-Trimethyl-).

First-order approximation:

first-order group

occurrences, ni

contribution, Fi

3 2 1 2 1

0.6 10.8 12.2 12.4 69.7

-CH3 -CH2 -CH -C CH2CO ∑niFi universal constant, C

niFi 1.8 21.6 12.2 24.8 69.7 130.1 216

First-order approximation value

flash point )

∑i niFi + 216 ) 346.1 K

First-order approximation error: ) (346.1 - 339)/339 ) 2.1% Second-order Approximation:

second-order group C(cyclic)dO ∑mjSj

occurrences, mj

contributions, Sj

1

-7.6

mj S j -7.6 -7.6

Second-order approximation value

flash point )

∑i niFi + ∑j mjSj + 216 ) 338.5 K

Estimated flash point ) 338.5 K Experimental flash point ) 339 K Percentage error ) (338.5 - 339)/339 ) -0.15% Literature Cited (1) Fredenslund, Aa.; Gmehling, J.; Rasmussen, P. VaporLiquid Equilibria Using UNIFAC; Elsevier Scientific: Amsterdam 1977.

(2) Joback, K. G.; Reid, R. C. Estimation of Pure-Component Properties from Group Contributions. Chem. Eng. Commun. 1983, 57, 233. (3) Horvath, A. L. Molecular Design; Elsevier: Amsterdam 1992. (4) Constantinou, L.; Gani, R. New Group Contribution Method for Estimating Properties of Pure Compounds. AIChE J. 1994, 40, 1697. (5) Mavrovouniotis, M. L. Estimation of Properties from Conjugate Forms of Molecular Structures. Ind. Eng. Chem. Res. 1990, 32, 1734. (6) Constantinou, L.; Prickett, S. E.; Mavrovouniotis, M. L. Estimation of Thermodynamic and Physical Properties of Acyclic Hydrocarbons Using the ABC Approach and Conjugation Operators. Ind. Eng. Chem. Res. 1993, 32 (8), 1734. (7) Marrero, J.; Gani, R. Group Contribution Based Estimation of Pure Component Properties. Fluid Phase Equilib. 2001, 183184, 183-208. (8) Constantinou, L. Property Estimation Method for Accurate Process Design. Petroleum Technology Quarterly 2001, 6/2, 103109. (9) Constantinou, L.; Gani, R.; O’Connell, J. Estimation of the Acentric Factor and the Liquid Molar Volume at 298 K Using a New Group Contribution Method. Fluid Phase Equilibr. 1995, 103, 11-22. (10) Boudouris, D.; Constantinou, L.; Panayiotou, C. Prediction of Volumetric Behavior and Glass Transition Temperature of Polymers: A Group-Contribution Approach. Fluid Phase Equilib. 2001, 167, 1-19. (11) Sanchez, I. C.; Lacombe, R. Statistical Thermodynamics of Polymer Solutions. Macromolecules 1978, 11, 1145. (12) Poling, B. E.; Prausnitz, J. M.; O’Connel, J. P. The Properties of Gases and Liquids; McGraw-Hill: New York, 2000. (13) Hansch, C.; Leo, A.; Hoekman, D. Exploring QSARs Hydrophobic, Electronic and Steric Constants; ACS Professional Reference Book; American Chemical Society: Washington, DC, 1995. (14) Daubert, T. E.; Danner, R. P. Physical and Thermodynamic Properties of Pure Compounds: Data Compilation; Hemisphere: New York, 1989. (15) Fire Protection Guide to Hazardous Materials, 11th ed.; National Fire Protection Association: Quincy, MA, 1994. (16) Meylan, W.; Howard, P. Atom/Fragment Contribution Method for Estimating Octanol-Water Partition Coefficients. J. Pharm. Sci. 1995, 84, 83. (17) KowWin; U.S. Environmental Protection Agency: Washington, DC, 2000 (free distribution at www.epa.gov/oppt/exposure/ docs/episuite.htm). (18) Marrero, J.; Gani, R. Group-Contribution-Based Estimation of Octanol/Water Partition Coefficient and Aqueous Solubility. Ind. Eng. Chem. Res. 2002, 41, 6623. (19) Panayiotou, C. The QHCB model of fluids and their mixtures. J. Chem. Thermodyn. 2003, 35, 349. (20) Van Krevelen, D. W.; Hoftyzer, P. J. Properties of Polymers, Their Estimation and Correlation with Chemical Structure; 2nd ed.; Elsevier: New York, 1976. (21) Fedors, R. F. Method for estimating both the solubility parameters and molar volumes of liquids. Polym. Eng. Sci. 1974, 14, 147.

Received for review April 8, 2004 Revised manuscript received June 10, 2004 Accepted June 17, 2004 IE0497184