Prediction of the Temperature Dependency of Henry's Law Constant

Aug 2, 2005 - Department of Chemical Ecotoxicology, UFZ Centre for ... for Henry's law constant at any temperature, but so far such methods are not ...
0 downloads 0 Views 181KB Size
Environ. Sci. Technol. 2005, 39, 6705-6711

Prediction of the Temperature Dependency of Henry’s Law Constant from Chemical Structure RALPH KU ¨ HNE, RALF-UWE EBERT, AND GERRIT SCHU ¨ U ¨ RMANN* Department of Chemical Ecotoxicology, UFZ Centre for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany

A new model to estimate the temperature dependency of Henry’s law constant in water for organic compounds from the two-dimensional structure is presented. Air/water partition enthalpies of 456 chemicals were fitted to 46 substructural parameters with a squared correlation coefficient r2 of 0.81 and a standard error of 7.1 kJ/mol. The compound set covers various organic compound classes with the atom types C, H, N, O, F, Cl, Br, I, and S. Application of the model together with experimental data for 25 °C to a set of 462 compounds with 2119 experimental Henry’s law constants at temperatures below 20 °C yields a predictive squared correlation coefficient q2 of 0.99 and a standard error of 0.21 logarithmic units. The prediction capability is further evaluated using cross validation and permutation.

Introduction Henry’s law constant H in water is one of the major determinants of the environmental fate of chemicals. Primarily, it quantifies the equilibrium partitioning of a chemical substance between a water phase such as in rivers, lakes, or seas and air as the adjacent bulk gas phase. In environmental chemodynamics, wet depositions through washout as well as the rate of volatilization are affected by H. Moreover, the octanol/air partition coefficient as a rough parameter to characterize the bioconcentration of airborne xenobiotics in terrestrial plants can also be estimated from the octanol/ water partition coefficient and H. In fugacity models to estimate multimedia partitioning (1, 2), Henry’s law constant is one of the key input parameters. Here, uncertainties in H may lead to significant scatter in the model output as was demonstrated with respective Monte Carlo simulations (3). Recently, Beyer et al. (4) investigated the temperature dependence of the long-range transport potential. They used temperature-specific partition coefficients and degradation rates to demonstrate the importance of considering temperature dependence. In the absence of experimental data, the 25 °C value of Henry’s law constant can be estimated using various models (5). However, realistic environmental scenarios usually refer to temperatures T significantly below 25 °C, which appears to have been largely ignored in many modeling exercises that deal with the environmental fate of organic compounds. In principle, quantum-chemical approaches to estimate the solvation enthalpy in water would enable calculations for Henry’s law constant at any temperature, but so far such * Corresponding author phone: +49-341-235-2309; fax: +49-341235-2401; e-mail: [email protected]. 10.1021/es050527h CCC: $30.25 Published on Web 08/02/2005

 2005 American Chemical Society

methods are not generally applicable (5). At present, there are only four methods available that allow a more or less rough prediction of temperature-dependent H values from the molecular two-dimensional (2D) structure of chemicals. The method of Cabani et al. (6) yields estimations of both Henry’s law constant at 25 °C and the enthalpy term ∆Hø from structural fragments, and the method of Nirmalakhandan et al. (7) is based on topological indices. While the first method was calibrated with only the 197 experimental enthalpy data available in 1981, the latter is trained with only 20 different compounds and tested for another 18 chemicals. Additionally, the training set covers only hydrocarbons containing either an aromatic ring or chlorine atoms, and there is no information about the predictive performance with regard to other compound classes. An approximate but more general option to predict H(T) is given by estimating the ratio of the vapor pressure at the temperature of interest and the water solubility at 25 °C as suggested in ref 8, exploiting the fact that in most cases the temperature dependence of H is governed by the vapor pressure term. Finally, the EPI Suite of the U. S. Environmental Protection Agency (9) includes an update of the 25 °C model from Meylan and Howard (10), augmented by 15 simple structural rules that yield an estimated temperature variation of H values in the range from 0 to 50 °C. However, the respective temperatures are fixed to predefined values, and there is little information about respective calibration or prediction statistics; moreover, this feature is not available in the batch mode of the EPI software. In the present paper, a new 2D increment model is introduced that enables extrapolation of Henry’s law constant from 25 °C to temperatures of environmental interest through estimation of the air/water phase transition enthalpy from chemical structure.

Theoretical Background Mainly for historical reasons, there are several differing definitions of Henry’s law constant. The fundamental thermodynamic property is the Henry coefficient, denoting the pressure dependence of the solubility of ideal gases in ideal solutions. Henry coefficients are expressed in pressure units. In environmental sciences, two other definitions are in use. The air/water partition coefficient

Kaw )

ca cw

(1)

refers to the thermodynamic equilibrium concentrations of a chemical in air, ca, and water, cw, respectively. This partition coefficient is also called the dimensionless Henry’s law constant, Kaw, and in the environmental context, estimation methods are usually calibrated with log Kaw values. Although this parameter does not have a formal unit, attention must be paid to the underlying concentration unit. Note that Kaw based on molar fractions differs both in relative and absolute values from Kaw based on molar concentrations. A further definition is based on the ratio of the partial pressure P and the concentration in water cw

H)

P cw

(2)

which is usually considered to be Henry’s law constant, H. Most of the organic chemicals are not completely miscible with water. The concentration in water at saturation is the water solubility, Sw. When the solubility of water in the organic VOL. 39, NO. 17, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

6705

chemical is neglected, the partial pressure at saturation equals the vapor pressure PV of the pure compound, leading to

H)

PV Sw

(3)

a well-known approximate formula for Henry’s law constant. For gases in real world systems, there is another restriction. Their partial pressure cannot exceed the atmospheric pressure. As a consequence, their measured water solubility is below the saturation solubility at vapor pressure and corresponds to the atmospheric pressure. In a strict thermodynamic sense, eq 2 is valid only at infinite dilution, and thus (a) Henry’s law constant is not a constant but the limiting value at a molar fraction of 0, and (b) eq 3 is always an approximation, because saturation never equals infinite dilution. However, these considerations are of minor importance in the context of the accuracy needed for environmental modeling. Combining eq 1 and 2 and relating the concentration in air to the partial pressure in air by the equation of state for an ideal gas leads to

Kaw )

H RT

(4)

at the temperature T of interest, with the universal gas constant R in some appropriate unit (e.g., 8.314 J/mol K). From eq 4, the temperature dependence of the relation between the partition coefficient air/water and Henry’s law constant is known. In addition, however, there is the intrinsic temperature dependence of the partition coefficient itself. In principle, the temperature dependence of either Kaw or H could be modeled in the same manner. In this paper, we will address Kaw. The van’t Hoff equation for the temperature dependency of a partition coefficient K is

ln

(

K2 ∆Hø 1 1 )K1 R T2 T1

)

(5)

with the standard enthalpy of the phase transition, ∆Hø, and K1 and K2 denoting the K values at temperatures T1 and T2, respectively. This is an approximation already, because ∆Hø also depends on temperature. However, the intrinsic temperature dependence of ∆Hø is usually quite small within the environmentally relevant temperature interval and thus can be neglected in our context. For convenience, in our model ln K is replaced by log K, and R is included in the modeled parameter. When Kaw is inserted at T1 and T2 for K1 and K2 in eq 5, ∆Hø is the phase transfer enthalpy from water to air (the desolvation enthalpy). When Kaw,0 is denoted as a reference value of Kaw at some temperature T0, eq 5 can be converted to the final model equation

log Kaw ) log Kaw,0 - B

(

1 1 T T0

)

(6)

with B ) ∆Hø/(R ln(10)). The constant ln(10) addresses the conversion to the natural logarithm.

Methods and Materials Data Compilation. In the past few years, a large number of data related to the temperature dependence of Henry’s law constant have been published. Besides the more comprehensive paper of Staudinger et al. (11), typically specific compounds or compound classes with only a few individual substances have been measured. 6706

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 17, 2005

A set containing 7455 experimental data for 581 compounds, mainly organic chemicals, together with a few inorganic gases, was collected from the original literature. There were three different types of sources. For some compounds, ∆Hø was available directly. In other cases, Henry’s law constants were measured at different temperatures. Then, ∆Hø was obtained by linear regression by applying eq 6. The third group of data was built from the experimental vapor pressure and water solubility by eq 3, when temperature-dependent data were available for both properties. To obtain the respective vapor pressure and water solubility values at the same temperature, interpolation had to take place occasionally. As with the second group, ∆Hø data were then obtained by linear regression of these (partly interpolated) values. We emphasize, however, that at this stage any regression was strictly confined to interpolation, so there was no extrapolation outside the data interval available from original experimental data. One could argue that the use of such different kinds of data sources might compromise the model. However, data uncertainties were, as far as possible, reduced through rigorous consistency checks and the condition that for every compound to be included in the training set experimental data for at least three different temperatures (different sources) or two different temperatures (single source) were available. For every compound, data consistency was assessed through visual inspection of the log Kaw(T) trend as well as through evaluation of the statistics (F-test, r2, standard error) when fitting ∆Hø from the log Kaw(T) versus 1/T data distribution. Through this procedure, all apparently inconsistent data were removed from the training set. Thus, the approach is a compromise between maximizing the range of data and compounds available for the model development and confining the training data to those of high accuracy and homogeneous quality. After the data quality checks, 5763 data remained, yielding ∆Hø values for 524 chemicals. A subsequent validation of the desolvation enthalpies led to the exclusion of some values to remove inconsistencies in terms of controversial data for members of homologous series, resulting in a final set of 456 ∆Hø values obtained from 5488 individual data. The maximum value of ∆Hø is 108 kJ/mol, and a temperature range from 0 to almost 100 °C is covered. This final data set has been compiled in our inhouse database within the software system ChemProp (12). Model Architecture and Development. Basically, the model counts the occurrence of substructure fragments in the hydrogen-suppressed molecular structures of the compounds. Each atom of the molecule has to belong to exactly one of the defined fragments, and for each molecule there is a unique decomposition into the relevant fragments. For each fragment, the number of occurrences is multiplied by the fragment value. Furthermore, correction factors are applied to account for particular substructures. Again, the number of occurrences is multiplied by the fragment value, but correction factors are applied independently from each other and in addition to the basic fragments. Finally, indicator values are also added for respective substructures but only once per molecule regardless of the actual number of occurrences. A three-step procedure was used to train the model. In the first step, compounds containing not more than one functional group were used to fit an initial parameter set. These parameters were not changed anymore in the subsequent steps. Second, compounds with multiple occurrences of not more than one functional group were added to fit some additional parameters. In the final step, additional parameters have been adjusted to the full data set, now including compounds with complex occurrences of functional groups. The parameters have been calculated by multiple linear regressions in each step.

Application Domain. The training set compounds include the following atom types: C, H, F, Cl, Br, I, N, O, and S. Even though a wide range of organic chemical classes is covered, the chemical domain is currently restricted as follows: (a) Only compounds completely described by the basic fragmentation scheme may be treated, which, for example, excludes any substances containing phosphorus, and (b) whenever two heteroatoms are bonded directly, a correction factor for this particular constellation has to be available. Furthermore, the model may be less reliable in the case of missing correction factors for heteroatoms attached to the same carbon atom or ortho substitution in aromatic rings. Finally, neither any fragment count nor any correction factor count for a given molecule should exceed the maximum count of the respective item in any training set molecule. These rules may appear quite conservative but should prevent misuse of the model. A more systematic approach to characterize the application domain in future updates might be based on molecular similarity defined through atom-centered fragments (ACFs), as was already used for error corrections (13). ACFs are atoms characterized by their atom type and bonding environment, basic features such as, for example, aromaticity and hydrogen number, and their neighboring atoms. Comparing the ACFs of a given molecule to the pools of all ACFs of the training set will provide a criterion as to what degree the structural features of a given chemical are already covered by the training set. With respect to the data range applied to develop the model, prediction results above 100 kJ/mol may be doubtful. The respective training set range of T is 0 to 100 °C, which at least at the upper end is significantly above temperatures of environmental interest. Model Validation. The prediction capability for ∆Hø was evaluated employing cross validation and permutation. Cross validation is supposed to be common knowledge and therefore is not explained in detail here. In the less common technique of permutation, the ∆Hø values were allocated randomly to the compounds while keeping the input descriptors in the original order. In other words, no number was changed, but the target values were mixed up. If such scrambled data can be fitted similarly well as the original data, then the model apparently memorizes rather than generalizes the data, without actually mapping a mechanistic relationship. Cross validation and permutation can be combined. A good model is expected to yield good crossvalidation results for the real data but poor results for permuted data without and with cross validation. Although ∆Hø is actually modeled, the focus of this paper is on Henry’s law constant. Hence, experimental Henry’s law constants at varying temperatures were taken to further validate the model. Due to the rather limited number of available data, it was decided not to leave out data for a separate prediction set. Instead, mainly the raw data H(T) used to generate the ∆Hø values were taken for the validation in terms of Henry’s law constant. While this is not a strictly independent validation, it is still useful because the model calibration employs substructure-based fragment values to convert raw H(T) data into compound-specific desolvation enthalpies ∆Hø, which in turn are used to predict Henry’s law constant in the temperature range of environmental interest (see below). To apply eq 6, a known reference value at another temperature is required. For practical reasons, 25 °C is the best choice due to the high number of data available from experiments and also estimation methods. To confine the validation to data at temperatures sufficiently different from the reference temperature (to truly challenge the predicted temperature variation), only low-temperature data up to 20 °C were taken into account for the validation set, resulting in 1970 values of 382 compounds. For 80 other compounds,

the number of data points besides the reference value was too low for the model training, but these 149 data could still be included in the model validation. Accordingly, this latter subset provides an external validation, but its value is limited due to its rather limited chemical domain, with 50% of these compounds being polychlorinated biphenyls (PCBs). The final validation set thus consisted of 2119 experimental Henry’s law constants for 462 compounds. The experimental reference values at 25 °C were again taken from the ChemProp database, including partly calculations from vapor pressure and water solubility. Alternatively, the estimation methods for Henry’s law constant at 25 °C from fragments by Meylan and Howard in the software version within the EPI Suite (EPI) as well as the estimation model of Viswanadhan et al. (14) (V) were applied for the purpose of a strict prediction from structure without the application of experimental data. Comparison to Existing Models. The prediction performance of the new model was compared to the application of eq 3 with a temperature-dependent vapor pressure and a water solubility at 25 °C. This requires vapor pressures PV(T) at the relevant temperatures T. In principle, such data could be obtained in high quality from tabulated vapor pressure equations for particular compounds based on experimental data. However, such an approach is not suitable for large data sets as in the present investigation. Instead, three different estimation methods to predict PV(T) from the molecular structure were applied. The neural network method of Ku ¨ hne et al. (15) (KU ¨ ), quite accurate, but restricted to hydrocarbons and halogenated hydrocarbons, the equation of state by Lee and Kesler (16) (LK) when estimating the critical data and the reduced boiling point from fragments according to Joback and Reid (17), and the modified Antoine equation according to Grain (18) (AnG) with boiling points predicted from fragments according to Stein and Brown (19), both overall less accurate, but more general. For water solubility at 25 °C, three different models were used. The first is the estimation method from fragments by Marrero and Gani (20) (MG). The second method was the one of Hou et al. (21) (HOU), and the third was the model included in the EPI Suite (EPI). The vapor pressure methods require melting points for solid compounds. These have been calculated by a model from Marrero and Gani (22). In addition, the performance of the two existing models for ∆Hø from Cabani et al. (CAB) and Nirmalakhandan et al. (NIR) in their respective application domains were tested. All property calculations except EPI runs were accomplished automatically by means of ChemProp. For EPI, an appropriate structure file was prepared within ChemProp and imported into EPI, and finally the EPI output was reimported into ChemProp for the subsequent statistical analysis. Statistical Evaluation. The calibration performance (goodness of fit) of the model to predict ∆Hø from molecular fragments was assessed through calculation of the squared correlation coefficient n

∑(y r2 ) 1 -

fit i

2 - yobs i )

i

(7)

n

∑(y

obs i

- ymean)2

i

obs and the associated standard error (se). Here, yfit i , yi , and ymean denote the regression-fitted and observed target values (in our case ∆Hø) and the observed mean and n the number

VOL. 39, NO. 17, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

6707

TABLE 1. Fragments and Correction Factors to Calculate ∆Hø/(R ln(10)) (in K with ∆Hø Denoting the Desolvation Enthalpy) Coded According to Ref 23 SMARTS code (see remarks for additional explanation)

nT

value

nC

nmax

remark

General 522

456

456

1

-26 -258 90 88 85 247 570 774 1800 854 1273 -675 1282 1083 932 724 220 313

1327 22 1344 3597 158 329 20 4 82 79 96 2 21 35 9 43 14 4

368 8 170 368 45 117 13 4 78 66 87 2 21 34 9 36 13 4

CdC [F,Cl,Br,I]-c [O]-c [OdC]-c [$(OdNX3)]([$∼OH0X1])-c N-c (other than above) [F,Cl,Br,I]-[$(CdC)] [O,$(OdCH1),$(N#C)]-[$(CdC)] [OR1,SR1] [$(F,Cl,Br,I)]-[$(CR1)] [$(CR0c)]:[$(cCR0)] [$(Fc,Clc,Brc,Ic)]:[$(cF,cCl,cBr,cI)] [$(CR0X4)c)]:[$(cNX3)] [$(OC(dO)c)]:[$(cC(dO)O)] OdN([∼OH0X1])[$(cccc)][NX3,OX2] [Cl][$(c:c-c)]

235 -128 -760 -495 -360 -206 -36 -458 -352 93 -231 145 118 2317 -1046 -258

38 207 50 11 27 13 31 6 18 45 8 85 10 2 15 50

35 62 33 9 20 13 10 6 12 7 7 29 7 2 13 26

2 10 4 2 3 1 6 1 3 12 2 8 2 1 2 4

[OdC]C([*(CX4)])[$(sCX4),(dC)] OdN([∼OH0X1])sOsC([$(*)])[$(*)] [$(*)]∼[CR0](∼[$(*)])sOs[*(#6)] c-c [OsO],[$(OdC)]s[$(CdO)],[SsS] [O,N,S]s[$(CdO)],[O]s[$(SdO)] [OH0]s[$(N(∼OH0X1)dO)] N∼O not appropriate

-395 -430 297 -383 -845 -1307 -352 -2124 -668

7 4 5 39 10 42 12 6 13

7 4 5 35 10 36 12 6 10

1 1 1 2 1 2 1 1 3

689

9

9

1

2638

3

3

1

C [$(c(:c):c)]:c:[$(c(:c):c)] [c] (other than above) [H][$(#6)] F [Cl] [Br] I [OH] [#8H0D2] [(CdO),(CdS)] OX1d[$(CdO),$(CdS)] [#7H2,#7HX3] [#7H0X3,#7X2] C#N [NX3](dO)∼[OH0X1] [#16X2,SX3dO] [H][$(S)]

regression constant

Fragments 12 6 20 27 8 12 3 1 2 2 2 1 1 2 1 3 2 1

C in alkyl groups fused aromatic C at two others other aromatic C H at C (including CdO, but not CtN) F Cl Br I OH sOs (nonaromatic or aromatic) OdC or SdC Od at CdO or CdS NH2 or NH (nonaromatic or aromatic) >Ns or -Nd (nonaromatic or aromatic) NtC NO2 S or SO (nonaromatic or aromatic) H at S

Correction Factors

[$(OH0R0)]sCs[$(OH0R0)] [$(OJ1)]C[$(CH0dO)] [$(OJ1)]C[$(OH0sO)] [$(OdC)]∼C∼[$(CdO)] [$(CX4sOsCX4)][F,Cl,Br,I] [$(c-O-CX4)[F,Cl,Br,I] [F,Cl,Br,I]-CX4_[$(CdO)([H],#6)] [F,Cl,Br,I]-CX4_[$(C(dO)O)]

CdC halogen at aromatic C O at aromatic C OdC at aromatic C NO2 at aromatic C other N at aromatic. C halogen at CdC O, CHO, or NtC at CdC cyclic O or S (nonaromatic or aromatic) halogen at cyclic C acyclic alkyl groups in ortho position halogen atoms in ortho position acyclic alkyl group in ortho position to N (sp3) OC(dO) groups in ortho position (phthalates) nitro group in ortho position to O (sp3) or N (sp3) halogen in ortho position to biphenyl bridge (aromatic rings must not contain N) OdC at alkyl branch nitrate group at alkyl branch sOs at acyclic alkyl branch biphenyl bridge C(aromatic)sC(aromatic) OsO, OdCCdO, SsS OsCO, NsCO, SsCO, OsSO OsNO2 (nitrate) NsO, NdO (not in nitro or in nitrate) 2 functional groups separated by 1 alkyl group in following cases (X denotes any halogen): sOsCsOs (both O atoms acyclic) OHsCsCdO OHsCsOsO OdCsCsCdO XsCsOsC (C atoms only sp3) X-C-O-C(aromatic) X-C-CdO (CdO not at O,N,S) X-C-C(dO)O

Indicator [F,Cl,Br,I]cc[$(#8R1)]

halogen in 2-position to ring O as in polychlorinated dibenzodioxins or -furans

a The numbers n , n , and n T C max denote the total fragment occurrence in the training set, the number of compounds containing the fragment, and the maximum count in a single compound, respectively. Increment values for all groups occurring in the molecule have to be added. If the molecule contains a certain group more than once, then the increment value has to be applied for each occurrence of the respective fragment. All atoms have to be assigned to fragments with increment values. If uncounted atoms remain, then the model will not be applicable to the respective compound. Atoms must not be considered more than once. In contrast to the increment values, correction factors have to be applied in addition, after the complete association of fragments to the atoms in the molecule. Finally, the general constant has to be added.

of respective data. For prediction performance, the predictive squared correlation coefficient n

∑(y q2 ) 1 -

pred i

2 - yobs i )

i

(8)

n



(yobs i

-y

mean 2

)

i

6708

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 17, 2005

and the respective standard error were used. In eq 8, ypred i denotes the predicted (not fitted) target value, which applies for cross validation (internal validation) or when predicting data of compounds not used for the model derivation (external validation). While r2 corrects for systematic errors (which are eliminated through a least-squares fit) and is limited to values from 0 to 1, q2 has an upper limit of 1 (perfect model) but no lower limit and thus can become even negative. An overall model performance identical to just using ymean

FIGURE 1. Predicted vs experimental desolvation enthalpies ∆Hø for 456 compounds (r2 ) 0.82, se ) 7.1 kJ/mol).

TABLE 2. Error Distribution Ranges in kJ/mol for the Desolvation Enthalpy ∆Hø error range

number of results

0‚‚‚1 1‚‚‚2 2‚‚‚3 3‚‚‚5 5‚‚‚10 10‚‚‚20 20‚‚‚50

66 53 49 103 129 52 4

for all predictions corresponds to q2 ) 0, and (increasingly) negative q2 values indicate that the model is (increasingly) inferior to simply using ymean as a predictor for all target values.

Results and Discussion Model. Table 1 contains the fragment definitions (encoded in SMARTS (23)) and the increment values of the model to estimate ∆Hø/(R ln(10)). The result of the fragment summation can be entered directly into eq 6 to predict the logarithm of the dimensionless air/water partition coefficient, log Kaw, based on molar concentrations. The final model consists of 46 parameters: 18 fragments, 26 correction factors, 1 indicator, and 1 absolute constant. The model calibration r2 is 0.82 for the B term (∆Hø/(R ln(10))) of eq 6, with a standard error of 371 K or 7.1 kJ/mol in terms of ∆Hø (Figure 1). Of the predicted desolvation enthalpies, 60% are within an error range of 5 kJ/mol (Table 2). To explore the derivation of a much simpler model, a linear regression of ∆Hø against the 25 °C values of log Kaw has been performed. However, the poor statistics (q2 ) 0.41, se ) 11.2 kJ/mol) indicate that such an approach is not feasible across different chemical classes. The derived fragment values reveal some interesting trends with regard to the impact of structural features on the desolvation enthalpy of organic compounds. Increasing molecular size tends to increase the desolvation enthalpy and thus the energy penalty associated with the phase transfer from water to air. The reason is the vapor pressure, which decreases with increasing molecular size, and may overrule the possibly opposing trend of the water solubility. Consequently, the temperature dependence of Henry’s law constant

increases with decreasing volatility (decreasing vapor pressure) of the compounds. The molecular size effect is seen when comparing the B values of F (85 K), Cl (247 K), Br (570 K), and I (774 K) that correspond to desolvation enthalpies ∆Hø of 1.63 kJ/mol (F) to 14.8 kJ/mol (I). Taking 1-fluorobenzene and 1-iodobenzene as examples, the desolvation enthalpy of the latter (41.1 kJ/mol) is ca. 50% greater than that of the former (27.9 kJ/mol), corresponding to calculated log Kaw values at 5 °C (EPI/new model) and 25 °C (EPI) of -0.94 and -0.59 (1-fluorobenzene) as well as -1.81 and -1.29 (1-iodobenzene), respectively. Note that in this case the significantly lower water solubility of the iodo compound (log Sw (mol/L) -2.91 vs -1.68, HOU) is overcompensated by a 2 orders of magnitude lower vapor pressure (log Pv (Pa) 2.16 vs 4.00, KU ¨ ). It follows further that multiply halogenated compounds are associated with large temperature dependencies of their Henry’s law constants. Interestingly, halogen substitution in the ortho position to ring oxygen as in dibenzodioxins and -furans requires a specific and large positive correction (2638 K) regardless of the number of such occurrences, which was accounted for by a respective indicator term (last entry in Table 1). Increasing polarity as well as increasing hydrogen bond interaction capacity tend to increase the desolvation enthalpy and thus the temperature dependence of Henry’s law constant. Accordingly, functional groups such as OH, the ether oxygen, aldehyde, the ketone carbonyl group, NH2, CN, NO2, and SH have relatively large B values, with the largest being found for the aliphatically bound hydroxyl group (B ) 1800 K, corresponding to ∆Hø ) 34.5 kJ/mol). This is illustrated through the B values of OH, NH2 (1282 K), and F (85K), all of which have quite similar atomic masses (17 vs 16 vs 19 amu). Further examples are SH (33 amu) vs Cl (35.5 amu) with B values of 532 K (calculated from the B values for singly bound S and H attached to S) and 247 K, respectively, and the quite similar B values of NO2 (724 K) and I (774 K) despite their large difference in molecular weight (46 vs 126.9 amu). CH2 and aromatic CH units provide moderate positive contributions to the desolvation enthalpy (B 150 and 178 K), while fused aromatic carbons yield a negative B value (-258 K) that reflects an enthalpic preference for the gas phase as compared to aqueous solvation. The usually positive B contribution of functional groups is reduced when these are attached to an aromatic carbon (negative correction factors in Table 1). Ortho substitution of a halogen yields a small positive B value (145 K), while ortho substitution of acyclic alkyl groups supports the gasphase state (-231 K). The latter also holds true for the separation of various functional groups by one alkyl carbon (-668 K, cf. Table 1). Interestingly, the correction factor for aromatic attachment of OH is -760 K, resulting in a net B value of 1040 K. It follows that the temperature dependence of Henry’s law constant is greater for aliphatic alcohols than for phenols. The largest positive correction factor is derived for the phthalate moiety (two COO units in ortho positions at aromatic rings with B ) 2317 K). Here, the sum of the two COO B values (2 × 2127 K) is increased by ca. 50%, indicating that the phthalate function has a particularly large preference for the solvated state as compared to the gas phase, and shows the pronounced temperature dependence of Henry’s law constant. Validation. Cross validation of 10 runs, leaving out 4546 compounds (10% of the training set compounds) in each run for prediction, yields an average q2 of 0.76 with an average standard error of 7.6 kJ/mol for ∆Hø. Comparison with the calibration statistics indicates the good predictive capability of the model. The permutation test was performed 10 times with varying degrees of scrambling, ranging from about 20% reallocated VOL. 39, NO. 17, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

6709

FIGURE 2. Permutation test, calibration and cross-validation performance (r2, q2) for predicting ∆Hø vs the degree of permutation expressed as scrambling q2.

FIGURE 3. Predicted vs experimental values of the temperaturedependent air/water partition coefficient in logarithmic form (q2 ) 0.99, se ) 0.21), covering 462 compounds with 2119 data in the temperature range from 0 to 20 °C.

target values up to 100% permutation. For models representing a mechanistically sound relationship, increasing scrambling should result in decreasing calibration quality, with the fully scrambled data set yielding particularly poor statistics. Correspondingly, increasing scrambling should yield decreasing cross-validation statistics, reflecting decreasing predictive capability. In Figure 2, the model calibration and prediction performances in terms of r2 and q2 values are plotted versus the degree of scrambling, the latter of which is expressed as q2 between the permuted and the original data (scrambling q2). Here, the leftmost entries (scrambling q2 around -1) refer to 100% scrambling of the target values, while the entries at the right end of the x-axis (q2 ) 1) refer to using the original data (0% scrambling). For the fully scrambled data set, a calibration r2 of 0.07 was obtained, which is indeed much inferior to both the original calibration and cross-validation statistics. Above scrambling q2 values of around 0 (middle of the x-axis), the goodness of fit increases systematically with a decreasing degree of scrambling, which also holds true for the cross-validation q2. Moreover, the difference between calibration and prediction performance increases with an

increasing degree of scrambling. These results indicate that the newly derived model is based on a proper relationship between molecular descriptors and the target property and that this relationship has a good predictive capability. Application of the model to the validation set of experimental dimensionless Henry’s law constants at temperatures below 20 °C (Figure 3) yields q2 ) 0.99 and se ) 0.21 logarithmic units, when basing the estimation on the experimental reference data for 25 °C. For the pure prediction subset (80 compounds, half of which are however PCBs), the separate validation yields q2 ) 0.96 and se ) 0.32. When EPI is used to predict the reference data Kaw (25 °C), the overall model performance decreases to q2 ) 0.91, se ) 0.56, and to q2 ) 0.77, se ) 0.82, when using the V method. It suggests that there is still room for improvement in predicting Henry’s law constant at 25 °C from molecular structure. Comparison with Literature Methods. In Table 3, the log Kaw(T) prediction statistics are summarized for the new model in comparison to various literature methods, all of which (except EPI) have been implemented in our ChemProp

TABLE 3. Statistical Performance of Various Methods to Predict Temperature-Dependent Henry’s Law Constant in Logarithmic Dimensionless Form method

number of compounds

data + new model

441

standard error

minimum error

maximum error

0.02

-1.28

1.36

0.05 -0.17

-3.71 -6.14

2.94 3.16

0.14 -0.27 0.10 -0.37 -0.62 -0.27 0.10 -0.26 0.09

-1.78 -13.41 -3.43 -3.04 -13.71 -3.76 -1.45 -12.74 -2.79

1.85 3.83 3.12 2.17 3.98 3.59 4.98 4.79 4.04

-0.02 0.02

-1.48 -1.06

0.84 1.26

bias

2069

0.99

0.21

Estimated 25 °C Data + New Model 453 402

KU ¨ /EPI LK/EPI AnG/EPI KU ¨ /MG LK/MG AnG/MG KU ¨ /HOU LK/HOU AnG/HOU

233 430 430 233 428 427 233 433 430

data + CAB data + NIR

207 139

9

q2

Experimental 25 °C Data + New Model

EPI + new model V + new model

6710

number of valid results

2081 1784

0.91 0.77

0.56 0.82

From Vapor Pressure/Water Solubility 1456 1998 1997 1456 1981 1980 1456 1998 1997

0.85 0.27 0.85 0.67 0.18 0.68 0.80 0.43 0.81

0.51 1.59 0.71 0.83 1.72 1.04 0.58 1.41 0.80

Experimental 25 °C Data + Other Models 1161 1053

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 17, 2005

0.99 0.97

0.19 0.19

software package (12). Note that the approach Pv(T)/Sw (25 °C) should be compared only to methods employing predicted reference data (e.g., EPI + new model), because both Pv and Sw are also predicted from the molecular structure. Since the combination V + new model suffers from the shortcomings of the 25 °C model, the focus will be on the combination EPI + new model. Among the Pv(T)/Sw (25 °C) methods, the best statistics are achieved with KU ¨ /EPI; note, however, that the respective chemical domain is restricted to hydrocarbons and halogenated hydrocarbons. Here, AnG/EPI is more widely applicable, with a performance clearly inferior to EPI + new model (q2 0.85 vs 0.91, se 0.71 vs 0.56). Any application of LK clearly fails. The accuracy is further affected by the choice of the water solubility method, with the best results achieved with EPI, followed by HOU and MG. The excellent predictive capability of the models of Cabani et al. (q2 ) 0.99) and Nirmalakhandan et al. (q2 ) 0.97) is again due to their severely restricted application range. For the respective subsets, the new model achieves similar performances (q2 ) 0.99, se ) 0.16, and q2 ) 0.97, se ) 0.19), indicating also that these subsets include many of the structurally simpler compounds as compared to the full data set.

Supporting Information Available Complete list of the training set including compounds, CAS registration numbers, validated experimental desolvation enthalpies ∆Hø, and predicted values. This material is available free of charge via the Internet at http://pubs.acs.org.

Literature Cited (1) Mackay, D. Multimedia Environmental Models; Lewis Publishers: Chelsea, MI, 1991. (2) Mackay, D. Finding fugacity feasible, fruitful, and fun. Environ. Toxicol. Chem. 2004, 23, 2282-2289. (3) Ku ¨ hne, R.; Breitkopf, C.; Schu ¨u ¨ rmann, G. Error propagation in fugacity level-III models in the case of uncertain physicochemical compound properties. Environ. Toxicol. Chem. 1997, 16, 2067-2069. (4) Beyer, A.; Wania, F.; Gouin, T.; Mackay, D.; Matthies, M. Temperature dependence of the characteristic travel distance. Environ. Sci. Technol. 2003, 37, 766-771. (5) Dearden, J. C.; Schu ¨u ¨rmann, G. Quantitative structure-property relationships for predicting Henry’s law constant from molecular structure. Environ. Toxicol. Chem. 2003, 22, 1755-1770. (6) Cabani, S.; Gianni, P.; Mollica, V.; Lepori, L. Thermodynamic study of the partitioning of organic compounds between water and octan-1-ol. J. Solution Chem. 1981, 10, 563-595. (7) Nirmalakhandan, N.; Brennan, R. A.; Speece, R. E. Predicting Henry’s Law constant and the effect of temperature on Henry’s Law constant. Water Res. 1997, 31 (6), 1471-1481. (8) Ku ¨ hne, R.; Ebert, R.-U.; Kleint, F.; Schu ¨u ¨ rmann, G. Estimation of Henry’s law constants at varying temperatures. In ECOINFORMA’97 Information and Communication in Environmental and Health Issues, Proceedings of ECO-INFORMA’97, Neuherberg, Germany, Oct 6-9, 1997; Alef, K., Brandt, J., Fiedler, H., Hauthal, W., Hutzinger, O., Mackay, D., Matthies, M., Morgan, K., Newland, L., Robitaille, H., Schlummer, M.,

(9) (10) (11)

(12)

(13)

(14)

(15)

(16) (17) (18)

(19) (20) (21)

(22) (23)

Schu ¨u ¨ rmann, G., Voigt, K., Eds.; Eco-Informa Press: Bayreuth, Germany, 1997; Vol. 12; pp 464-469. U. S. Environmental Protection Agency. EPI Suite, version 3.12; Environmental Protection Agency: Washington, DC, 2004; http://www.epa.gov/oppt/exposure/docs/episuitedl.htm. Meylan, W. M.; Howard, P. H. Bond contribution method for estimating Henry’s law constants. Environ. Toxicol. Chem. 1991, 10, 1283-1293. Staudinger, J.; Roberts, P. V. A critical compilation of Henry’s law constant temperature dependence relations for organic compounds in dilute aqueous solutions. Chemosphere 2001, 44, 561-576. Schu ¨u ¨ rmann, G.; Ku ¨ hne, R.; Kleint, F.; Ebert, R.-U.; Rothenbacher, C.; Herth, P. A software system for automatic chemical property estimation from molecular structure. In Quantitative Structure Activity Relationships in Environmental Sciences, VII, Proceedings of QSAR ‘96, Elsinore, Denmark, June 24-28, 1996; Chen, F., Schu ¨u ¨ rmann, G., Eds.; SETAC Press: Pensacola, FL, 1997; pp 93-114. Ku ¨ hne, R.; Kleint, F.; Ebert, R.-U.; Schu ¨u ¨ rmann, G. Calculation of compound properties using experimental data from sufficiently similar chemicals. In Software Development in Chemistry 10, Proceedings of the 10th CIC Workshop, Nov 19-22, 1995, Hochfilzen, Austria; Gasteiger, J., Ed.; Gesellschaft Deutscher Chemiker: Frankfurt, Germany, 1996; pp 125-134. Viswanadhan, V. N.; Ghose, A. K.; Singh, U. C.; Wendoloski, J. J. Prediction of solvation free energies of small organic molecules. Additive-constitutive models based on molecular fingerprints and atomic constants. J. Chem. Inf. Comput. Sci. 1999, 39, 405412. Ku ¨ hne, R.; Ebert, R.-U.; Schu ¨u ¨ rmann, G. Estimation of vapour pressures for hydrocarbons and halogenated hydrocarbons from chemical structure by a neural network. Chemosphere 1997, 34, 671-686. Lee, B. I.; Kesler, M. G. A generalized thermodynamic corelation based on three-parameter corresponding states. AIChE J. 1975, 21, 510-527. Joback, K. G.; Reid, R. C. Estimation of pure-component properties from group-contributions. Chem. Eng. Commun. 1987, 57, 233-243. Grain, C. F. Vapor pressure. In Handbook of Chemical Property Estimation Methods; Lyman, W. J., Reehl, W. F., Rosenblatt, D. H., Eds.; American Chemical Society: Washington, DC, 1990; pp 14-1-14-20. Stein, S. E.; Brown, R. L. Estimation of normal boiling points from group contribution J. Chem. Inf. Comput. Sci. 1994, 34, 581-587. Marrero, J.; Gani, R. Group-contribution-based estimation of octanol/water partition coefficient and aqueous solubility. Ind. Eng. Chem. Res. 2002, 41, 6623-6633. Hou, T. J.; Xia, K.; Zhang, W.; Xu, X. J. ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach. J. Chem. Inf. Comput. Sci. 2004, 44, 266275. Marrero, J.; Gani, R. Group-contribution based estimation of pure component properties. Fluid Phase Equilib. 2001, 183184, 183-208. Daylight Chemical Information Systems, Inc.; http://www.daylight.com/dayhtml_tutorials/languages/smarts/.

Received for review March 17, 2005. Revised manuscript received June 20, 2005. Accepted July 7, 2005. ES050527H

VOL. 39, NO. 17, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

6711