Predicting Temperature-Dependent Aqueous ... - ACS Publications

Dec 5, 2013 - Sarah A. Brockbank,* Neil F. Giles, Richard L. Rowley, and Wade Vincent Wilding. Department of Chemical Engineering, Brigham Young ...
0 downloads 0 Views 727KB Size
Article pubs.acs.org/jced

Predicting Temperature-Dependent Aqueous Henry’s Law Constants Using Group Contribution Methods Sarah A. Brockbank,* Neil F. Giles, Richard L. Rowley, and Wade Vincent Wilding Department of Chemical Engineering, Brigham Young University, 350 Clyde Building, Provo, Utah 84602, United States S Supporting Information *

ABSTRACT: A first-order temperature-dependent group contribution method was developed to predict Henry’s law constants of hydrocarbons, alcohols, ketones, and formates in which none of the functional groups are attached directly to a benzene ring. Efforts to expand this method to include ester and ether groups were unsuccessful. Second-order groups were developed at a reference condition of 298.15 K and 100 kPa. A second-order temperature-dependent group contribution method was then developed for hydrocarbons, ketones, esters, ethers, and alcohols. These methods were compared to existing literature prediction methods.



INTRODUCTION Legislation throughout the world requires health, safety, and environmental information of commercial chemicals in order to manage and predict risks. This includes the Toxic Substances Control Act (TSCA) in the United States and the Registration, Evaluation and Authorization of Chemicals (REACH) in the European Union.1,2 It is estimated that experimental values are available for less than 1% of the approximately 84 000 compounds included in TSCA and the 100 000 compounds included in REACH. 3 This means that environmental regulations rely heavily on prediction methods. A property used in risk assessments is the aqueous Henry’s law constant (kH).4,5 The definition of kH is given by kH ≡ lim

xi → 0

fi ̂

where Bi and Ci are group contribution values for fragment i, ni is the number of times a group occurs, T is temperature, and superscript “ref” refers to a reference value.



COMPUTATIONAL METHODS Existing Prediction Methods. Quantitative structure− property relationships (QSPRs) are based on structural descriptors. Nirmalakhandan et al.8 initially developed a QSPR to predict kH values at 298.15 K for several compounds covering multiple chemical families. The method was later expanded to model temperature-dependence, but the model only applies to a small number of compounds over a limited temperature range.9 Closely related to QSPRs are linear free energy relationships (LFERs). LFERs use Abraham descriptors that originate from experimental data which are not always available for every compound.10 Two temperature-dependent LFERs are the methods of Goss11 and Abraham et al.12 Group contribution methods parse the chemical’s structural formula into functional groups and fragments, and the property of a compound is estimated by summing together the contributions of each segment.13 First-order methods use groups that do not take into account interactions with neighboring compounds. Second-order methods are more complex because neighboring atoms are included in the group definitions. Bond contribution methods work in a similar fashion by adding up the contributions from each bond in the molecule. Sedlbauer et al.7 developed a group contribution method that applies to hydrocarbons up to temperatures and pressures of approximately 570 K and 100 MPa, respectively. Parameters for additional compounds were later added to the method.14,15 This method requires an equation of state for water making it more

L

xi

(1)

where fLî is the partial fugacity of compound i in solution, and xi is the mole fraction in the liquid phase.6 If a solution is dilute and ideal gas behavior is assumed, eq 1 simplifies to yP = xikH i

(2)

where yi is the mole fraction in the vapor phase and P is the total pressure. Typically kH increases with temperature, reaches a maximum between 373 and 473 K, and then decreases.6,7 The temperature-dependence is often modeled using an equation of the form ln(kH) = A +

B + C ln(T ) T

(3)

A variation of eq 3 is given by Special Issue: In Honor of Grant Wilson

⎛ k ⎞ ⎛ ⎛ T ⎞⎞ ⎛ 1 1⎞ ln⎜⎜ H ⎟⎟ = (∑ Bi ni)⎜ − ⎟ + (∑ Cini)⎜⎜ln⎜ ⎟⎟⎟ T T T ⎝ ⎠ ⎝ ⎠⎠ ⎝ ⎝ kH,ref ⎠ ref ref i i

Received: August 24, 2013 Accepted: November 25, 2013 Published: December 5, 2013

(4) © 2013 American Chemical Society

1052

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

Table 1. Temperature-Dependent kH Prediction Method Comparison for Select Compounds Plyasunov and Shock

a

Kühne

Lau GCM

Lau BCM

Sedlbauer

method

AAD %a

nCb

AAD %a

nCb

AAD %a

nCb

AAD %a

nCb

AAD %a

nCb

overall acetates aliphatic ethers dimethylalkanes formates ketones methylalkanes n-alcohols n-alkanes n-alkylbenzenes other aliphatic alcohols other alkanes other alkylbenzenes other ethers/diethers other polyfunctional C, H, O other saturated aliphatic esters propionates and butyrates

139.8 71.8 176.8 87.4 84.3 91.9 43.9 62.9 64.3 139.2 87.9 82.0 949.3 92.3 97.9 84.0 93.6

157 14 11 6 7 24 8 10 14 8 21 2 9 5 2 3 13

40.4 15.2 18.6 113.8 14.9 34.6 18.3 29.7 102.6 18.5 20.5 14.8 145.0 10.9 10.6 31.0 26.4

157 14 11 6 7 24 8 10 14 8 21 2 9 5 2 3 13

16.9

98

13.8

69

30.1

46

12.7

6

12.6

6

65.7

6

39.5 3.9 14.5 11.9 7.8 11.5 11.2 10.5

21 8 10 14 8 21 2 8

23.6 4.0

22 8

20.8

8

12.2 7.7

14 8

32.1 19.0

14 8

2.7 10.1

2 9

24.2 22.0

2 8

Average absolute deviation. bNumber of applicable compounds.

difficult to use than other methods. Plyasunov and Shock16 also developed a group contribution method that predicts values from 273 K to the critical temperature of water (647.1 K). This method uses water vapor pressure, and liquid and vapor water densities obtained from the IAPWS-95 equation of state.17 The method of Kühne et al.18 is a temperature-dependent group contribution method that requires a reference value. This can be an experimental value or a value predicted using a different method. This method only predicts coefficient “B” in eq 4. Lau et al.19 developed both bond and group contribution methods. From a statistical standpoint, these methods are equivalent in their prediction capabilities. As with the method of Kühne et al.,18 a reference value is required that can come from experimental data or a different prediction method. This method predicts coefficients “B” and “C” in eq 4 making it applicable over a broader temperature range than the Kühne et al.18 method. However, it does not apply to as many chemical families. The group- and bond-contribution methods were compared in this study using data for 157 compounds. The reference point used in the Lau et al.19 and Kühne et al.18 methods is the DIPPR 801E recommended value at 298.15 K determined from experimental data.20 Some of these methods apply to compounds outside of the chemical families included in the comparison. Only the compounds included in later method development were included to make the comparisons more meaningful. The number of applicable compounds (nC) and the average absolute deviations (AAD) for each chemical family and method are summarized in Table 1. The AAD is given by n

AAD =

∑ i=1

|kHmod , i − kHexp , i| kHexp , i

than the Kühne et al.18 method, and this can be attributed to the additional temperature-dependent coefficient. The Sedlbauer et al.7,14,15 method applies to the smallest group of compounds. It does not perform as well as the Lau et al.19 methods. However, the Lau et al.19 methods use an experimental reference value whereas the Sedlbauer et al.7,14,15 method is completely predictive. Prediction Method Development. While the methods of Kühne et al.18 and Lau et al.19 have reasonable prediction capabilities, they both require a reference value that comes from a separate source (either an experimental value or a predicted value from a separate prediction method). Prediction methods that can be used to calculate a reference value are not available that use the same group definitions as Kühne et al.18 or Lau et al.19 It was decided that a prediction method should be developed that uses the same group definitions for all parameters including the reference value. This simplifies the calculations because each compound only has to be parsed once. Initially efforts were made to develop a group contribution method that predicted coefficients “A”, “B”, and “C” of eq 4 simultaneously. This did not work well due to the high number of parameters required by the model. Instead, it was decided to use the method of Plyasunov et al.21 for the reference values, and then develop a model using the same group definitions to predict coefficients “B” and “C” in eq 4. There are many methods that predict kH values at 298.15 K.22−24 The Plyasunov et al.21 method was chosen as the reference method due to the simplicity of the model and the reliable data used to develop the model. Plyasunov et al.21 did a thorough literature review of kH and derivative property data for hydrocarbons. They provided recommended values for each compound and developed their prediction methods based on these values. Initially Plyasunov et al.21 developed a first-order group contribution prediction method for hydrocarbons and alcohols. Additional functional group contribution terms were added to their method in later publications.25−28 In these later publications, second-order group contribution parameters were also developed. However, second-order hydrocarbon groups were not defined for all of the hydrocarbons included in the

(5)

The AAD is calculated on a normal scale and not a log scale to provide better insight of the method performance. The methods of Plyasunov and Shock16 and Kühne et al.18 apply to all of the compounds included in the comparison, but the method of Kühne et al.18 has a smaller overall AAD. The Lau et al.19 methods have comparable overall performances, but the group-contribution method applies to more of the compounds included in the comparison. The Lau et al.19 methods are better 1053

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

Table 2. Summary of Performance for First-Order Group Contribution Temperature-Dependent Method group

a

set

no. compounds

AADa

R2

10-fold cross-validation

mean AADa

mean R2

learning prediction

14 % 27 %

0.994 0.951

learning prediction

24 % 29 %

0.921 0.934

hydrocarbons

training

42

14 %

0.994

alcohols, ketones, and formates

validation training

6 54

13 % 24 %

0.993 0.920

validation

7

33 %

0.969

Average absolute deviation.

The first two options were not directly used due to the different temperature ranges of available data. The recommended coefficients in the DIPPR 801E database do not refer to the same temperature ranges. In addition, a second temperaturedependent coefficient is not statistically significant for some compounds due to the limited data. The data were not used directly due to the extreme distribution of the number of points per compound in addition to the experimental uncertainty. For example, benzene has 341 experimental points with a range of experimental uncertainties (some of the data are more reliable), whereas 1,2,3,4-tetramethylbenzene has three experimental points. A variation of these options was determined to be the best method. During the data analysis process, temperaturedependent regressions were obtained based on the experimental data. Uncertainties were assigned to each regression based on 95% confidence intervals. For prediction method development, smoothed values were calculated from these regressions at every unique temperature found within the data of a given compound. For example, if data are available for a compound at temperatures of 275 K, 298 K, 298 K, and 330 K, values calculated from the regression at 275 K, 298 K, and 330 K were included in the method development. The uncertainties assigned to the regression were then assigned to these smoothed points. Compounds with more data will have smaller uncertainties and will also inherently have more weight due to the existence of more unique temperatures. A weighted least-squares procedure was used to determine the group contribution values. The objective function, O, to be minimized is defined by

original first-order method. In this study, both the first- and second-order methods were expanded. The method of Plyasunov et al.21 uses the relationship between kH and the Gibbs energy of hydration, ΔG∞ hyd. Plyasunov et al.21 define the standard state in terms of molality (standard molality of 1 mol/1000 g at a Pref of 100 kPa), so a unit conversion must be used with the definition of kH in this study, ⎛ k ⎞ ∞ RT ln⎜ H ⎟ = ΔG hyd ⎝ Pref NW ⎠

(6)

where NW = (1000 g/1 mol)/(18.0153 g/mol) ≈ 55.5084 (mol/ mol). The group contribution values determined by Plyasunov et al.21 are given in terms of ΔG∞ hyd. For the first- and second-order methods of this study, values for kH,ref are calculated using eq 6. Using a group contribution scheme, any property (Y) can be determined by

Y = Y0 +

∑ niYi i

(7)

where ni is the number of a times group i appears in the compound, Yi is the contribution of the group to the overall property, and Y0 is a constant. Plyasunov et al.21 defined a Y0 value of 7.95 kJ/mol for ΔG∞ hyd based on theoretical models (the contribution of a material point). The Y0 values for B and C are 0 for the first- and second-order methods of this study. Data Selection. The Design Institute for Physical Properties (DIPPR) 80129 database contains evaluated thermophysical property data for industrially important compounds. The DIPPR 801E database contains kH, aqueous infinite dilution activity coefficient (γi∞) and water solubility (xiaq) data. The structure of the 801E database is compatible with the DIPPR 80129 database and DIPPR Information and Data Evaluation Manager (DIADEM), and data are included for a subset of industrially important compounds found in the DIPPR 80129 database. Thermodynamic relationships between the three properties and chemical family trends were used to evaluate the available literature data and provide recommended values.20 Recommended temperature-dependent coefficients and confidence intervals were determined as part of the analysis. With temperature-dependent models, there are multiple ways that data are used: (1) Regression coefficients are determined from the data, and the prediction method is then developed from the regression coefficients (A, B, C, etc.) and not directly from data. Examples are the methods of Kühne et al.,18,30 Nagvekar,30 and Hsu.31 (2) Regression coefficients are determined from the data, and values are calculated at equal temperature intervals (such as every 10 K) using the regression. The prediction method is developed from these values. Examples include prediction methods developed by Elbro et al.32 and Harrison.33 (3) The prediction method is developed directly from data.

⎡ ln k mod , i − ln k exp , i ⎤2 H H ⎥ O = ∑⎢ σ ⎦ ⎣ i i

(8)

where ln kHmod,i are the natural log values calculated from the model, ln kHexp,i are the natural log experimental values, and σi is the uncertainty (on a natural log scale) of the smoothed value. By using weighted values, smoothed values with smaller uncertainty are given greater weight in the regression. In this study, the weighted least-squares procedure was completed using SAS software. Validation Techniques. A general procedure was used multiple times throughout method development. Approximately 10 % of the compounds were assigned to a validation set. The remaining compounds (training set) were then randomly divided into 10 subsets to perform a 10-fold cross-validation. In the 10fold cross-validations, a single subset of the training set was reserved as a prediction set and the remaining nine were used as a learning set. A preliminary model was developed using the learning set and applied to the prediction set. This process was repeated so that each subset was used as the prediction set once. The statistics from the 10 iterations were averaged to provide a better estimation of the overall method accuracy. The entire 1054

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

each classification for the validation set. This ensured that a diverse set of groups were represented in the validation set. The remaining 42 compounds (training set) were randomly divided into 10 subsets to perform a 10-fold cross-validation. The group contribution parameters were determined using the entire training set, and the prediction method was then applied to the six compounds that were initially set aside in the validation set. Results of the 10-fold cross-validation and overall method performance for the hydrocarbons are summarized in Table 2. The statistics indicate that the method has reasonable predictive capabilities. The results are shown in Figure 1. The compound with the biggest discrepancy in the validation set is 1,4-cyclohexadiene. It is the only cycloalkene with more than one double bond. It randomly ended up in the validation group instead of the training group. The cyclic compounds also have the largest error in the training set. There are less data for these compounds and more uncertainty in the available data compared to the alkanes and benzene derivatives. The method was expanded for aliphatic alcohols, ketones, and formates. Data were found for 61 compounds. Seven compounds were randomly chosen for the validation set. The remaining 54 training set compounds were divided into 10 subgroups. A 10fold cross validation was performed as previously described. The group contribution parameters were then determined using the entire training set, and the prediction method was then applied to the seven compounds in the validation set. Results of the 10-fold cross-validation and overall method performance are summarized in Table 2. The results are shown in Figure 2. The two compounds with the largest discrepancies are acetophenone and acetylacetone. Acetophenone is the only aromatic ketone included in the entire data set, and acetylacetone is the only diketone included in the entire data set. The large errors could be attributed to experimental error and/or the structures. More data from similar compounds would be necessary to determine the exact cause. The statistics given in Table 2 include the results of these two compounds. It is recommended that this method should only be used for alcohols, ketones, and formates that are aliphatic. In addition, correction factors might be necessary for multifunctional compounds. The temperature-dependent coefficients determined in this study are summarized in Table 3 along with the standard errors and two-tailed p-values. The table also includes the reference values developed by Plyasunov et al.,21,25,26,28,34 the number of compounds per group (nC), the total number of temperaturedependent points per group (nG), and the temperature range of the data used for the individual groups. Multiple coefficients have large standard errors and are not statistically significant (p-values > 0.05). In general, the groups corresponding to these coefficients do not have as much data compared to the groups with statistically significant coefficients resulting in higher uncertainty. When Plyasunov et al. initially developed the group contribution value for ketones, they did not include a correction factor.27 A correction factor was added later for tertiary carbons attached to a ketone group.34 The only compound influenced by this correction in the training and validation sets was diisopropyl ketone. The reference value used in method development included the correction factor. However, the correction factor was not included in the temperaturedependent development because it was not needed. Sample calculations are included in the Supporting Information. Efforts to expand the model to include esters and ethers were unsuccessful because the first-order contributions were not always adequate in modeling the behavior of compounds with

training set was then used to develop model parameters. The model was finally applied to the validation set as an additional check on the method performance. The Pearson’s product moment correlation coefficient, R2, (the R2 value returned by the Microsoft Excel RSQ function) is given by n

R2 =

∑i (ln kHmod , i − ln kHmod ,avg)(ln kHexp , i − ln kHexp ,avg) n

n

∑i (ln kHmod , i − ln kHmod ,avg)2 ∑i (ln kHexp , i − ln kHexp ,avg)2

(9)

where ln kHexp,avg is the average natural log experimental value used in the training set (or learning set during cross-validation) and ln kHmod,avg is the average natural log model value used in the validation set (or prediction set during cross-validation). A reliable method will have R2 values that approach unity and low AADs for both the training and validation sets.



RESULTS AND DISCUSSION First-Order Group Contribution Method. Evaluated temperature-dependent data were available for 48 hydrocarbons.

Figure 1. Plot of experimental versus model values for the first-order temperature-dependent group contribution method for alkanes, alkenes, cycloalkanes, cycloalkenes, and benzene derivatives: +, training set; ■, validation set.

Figure 2. Plot of experimental versus model values for the first-order temperature-dependent group contribution method for alcohols, ketones, and formates: +, training set; ■, validation set; □, acetylacetone values; ○, acetophenone values.

More data are available for compounds at 298.15 K; however, these compounds were not used because they would only test the validity of the Plyasunov et al.21 method and not the temperature-dependent method. Compounds were classified as alkanes, alkenes, cycloalkanes, cycloalkenes, or benzene derivatives, and one or two compounds were randomly chosen from 1055

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

Table 3. Summary of Parameter Estimates for the First-Order Method Bi group f

c-CC c-CHf c-CH2f C CH CH2 CH3 Carg CHarg CC Hh I(C−C)i HCOOj COk OH Y0

Ci

training

validation

a ΔG∞ hyd,i (kJ/mol)

value (K)

SEb

p-valuec

value

SEb

p-valuec

nCd

nGe

nCd

nGe

overall temp range (K)

−9.47 −1.03 0.83 −4.51 −1.72 0.70 3.67 −3.85 −0.65 −10.23 3.91 −1.01 −15.33 −22.74 −25.4 7.95

94550 335 3275 1203 1987 2962 3683 −102 1849 9750 −153 6047 −1650 4520 5094 0

16375 990 276 2676 1305 291 557 586 26 23970 6566 4279 5960 1010 1039

< 0.001 0.74 < 0.001 0.65 0.13 < 0.001 < 0.001 0.86 < 0.001 0.68 0.98 0.16 0.78 < 0.001 < 0.001

−326 −3.6 −8.26 −8.1 −8.0 −8.43 −8.73 0.05 −4.16 −34 2.3 −19.2 14 −4.0 0.2 0

55 3.0 0.82 8.6 4.2 0.90 1.76 1.8 0.07 80 22 13.9 20 3.1 3

< 0.001 0.22 < 0.001 0.34 0.06 < 0.001 < 0.001 0.98 < 0.001 0.67 0.92 0.17 0.48 0.20 0.96

3 4 10 3 6 17 33 13 13 3 6 5 6 19 29

24 78 681 19 56 935 1034 388 1940 59 246 46 62 332 480

1 0 2 1 0 1 4 1 1 1 2 0 1 3 3

12 0 39 4 0 92 170 90 450 6 30 0 12 34 48

273−318 273−444 273−444 273−424 273−423 273−456 262−568 262−568 262−568 273−361 273−361 273−353 272−364 265−453 273−375

a From Plyasunov et al..25,26,28,34 bStandard error. cTwo-tailed p-values. dNumber of compounds with group. eNumber of groups. fCyclic (nonaromatic) group. gAromatic carbon. hHydrogen bound to alkene group. iCorrection for nearest-neighbor interactions of two −CH3 or −CH2− groups attached to the benzene ring or to the cyclic ring for cis-isomers. jFormate. kKetone.

Table 4. Summary of Hydrocarbon Second-Order Group Contributions for Reference Values at 298.15 K and 0.1 MPa training group d

C−(H)2(C)(Cb) C-(H)3(Cb)d Cb−(C)(Cb)2d Cb−(H)(Cb)2d C−(H)3(O) C−(H)3(C)f C−(H)2(C)2f C−(H)(C)3f C−(C)4f [CH3−CH2−O−CH2] corrf,g [O−(CH2)2−O] corrf,g C−(C)(H)2(O)f C−(C)2(H)(O) alcoholf C−(C)2(H)(O) esterf C−(C)2(H)(O) etherf C−(C)3(O) alcoholf C−(C)3(O) esterf C−(C)3(O) etherf C−(CO)(C)3f,h C−(CO)(H)(C)2f,h C−(CO)(H)2(C)f,h CO−(C)2f,h COO−(C)2f,i HCOO−(C)f,j O−(C)(H)f O−(C)2f Y0f

a

ΔG∞ hyd,i

validation b

nC

nG

6 8 15 15 6 7 5 1 0

6 21 28 62 7 8 18 1 0

nC

a

0 1 1 2 1 0 0 0 0

nG 0 3 3 9 1 0 0 0 0

b

kJ/mol

SEc

p-valuee

1.02 4.00 −4.53 −0.60 3.48 3.72 0.68 −1.93 −4.60 −1.42 2.22 0.77 −1.64 −1.46 −2.82 −4.58 −2.58 −7.77 −2.50 −0.88 1.15 −23.46 −20.53 −15.43 −25.46 −15.52 7.95

0.31 0.33 0.36 0.04 0.11

0.007 < 0.0001 < 0.0001 < 0.0001 < 0.0001

a nC is the number of compounds with the group. bnG is the number of times the group appeared in the set. cStandard error. dCb is an aromatic carbon (benzene). eTwo-tailed p-value. fDetermined by Plyasunov et al.25,26,28,34 gEther corrections used in addition to group contributions. hCO represents CO. iEster. jFormate.

model due to the first-order limitations. A second-order method takes into account the nearest neighbor interactions for particular groups. This requires defining more groups than a first-order group contribution method. The group definitions as outlined by

functional groups.25,26,28,34 Because of these limitations, a different model was pursued. Second-Order Group Contribution Method. Plyasunov et al.25,26,28,34 developed a second-order group contribution 1056

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

Table 5. Performance Summary of the Second-Order Temperature-Dependent Group Contribution Method group

a

set

no. compounds

AADa

R2

10-fold cross-validation

mean AADa

mean R2

learning prediction

10 % 19 %

0.998 0.950

learning prediction

18 %b 21 %b

0.986b 0.982b

alkylbenzenes and alkanes

training

27

10 %

0.998

esters, ethers, ketones, and alcohols

validation training

4 83

14 % 17 %

0.997 0.987

validation

11

19 %

0.970

Average absolute deviation. bOnly includes results of the eight iterations that were full rank.

order contributions. Additional hydrocarbon contributions were first established at 298.15 K before developing temperaturedependent contributions. Reference Value Additions. The values of the four hydrocarbon groups defined by Plyasunov et al.25,26,28,34 were used without modification. Values for four alkylbenzene groups were developed in this study. Accepted kH values at 298.15 K were found for 17 alkylbenzenes. Two of these compounds were randomly chosen for the validation set. The 15 training set compounds were divided into 10 subgroups. A 10-fold cross validation was performed as previously described. Only three of the four parameters could be determined in one of the iterations due to the learning model not being full rank, meaning that the groups were not linearly independent resulting in an infinite number of least-squares solutions. Of the remaining nine iterations, the relative absolute standard deviation of the group contribution values did not exceed 5.6 %. Overall parameters were determined using the full training set, and the model was applied to the validation set. The AAD of the overall training set was 18 % and the AAD of the validation set was 14 %. An additional ether term was also developed, C−(H)3(O), for a reference value at 298.15 K. A total of seven compounds were found with this group. One of the compounds was randomly set aside for validation. A leave-one-out cross-validation was performed on the remaining six compounds (similar idea as the 10-fold cross-validation except each compound is a subset). The value of the group contribution did not change significantly during cross-validation (standard deviation of 2 %). The AAD of the overall training set (six compounds) was 21 %, and the AAD of the validation set (1 compound) was 1 %. Table 4 is a summary of the second-order group contribution values for the prediction method at 298.15 K that were determined in this study and by Plyasunov et al.25,26,28,34 The table includes the number of compounds with each group (nC) and the number of times each group appeared in a set (nG). Temperature-Dependent Contributions. A total of 31 (aliphatic and aromatic) hydrocarbons defined by the groups in Table 4 have temperature-dependent data. Four were randomly selected by subfamily (2 alkanes, 2 benzene derivatives) for validation. The training set of 27 compounds was divided into 10 groups for 10-fold cross-validation. Overall parameters were determined using the full training set. Results of the 10-fold cross-validation and overall method performance are summarized in Table 5 and shown in Figure 3. Temperature-dependent parameters were then determined for functional groups using the hydrocarbon groups. For the model to be full rank, either structural groups had to be redefined or group contribution values had to be defined. There are three possible carbon groups that can be attached to the O−(C)(H) group (hydroxyl group). The hydroxyl group will always be found in a 1:1 ratio with these three groups, so the hydroxyl group cannot be considered an independent variable. The groups

Figure 3. Plot of experimental versus model values for the second-order temperature-dependent group contribution method for alkylbenzenes and alkanes: +, training set, ■, validation set.

Table 6. Second-Order Alcohol Groups Plyasunov et al.25,26,28,34 group

“new” group

C−(C)(H)2(O) C−(C)2(H)(O) alcohol C−(C)3(O) alcohol O−(C)(H)

C[OH]−(C)(H)2a C[OH]−(C)2(H) C[OH]−(C)3 NA

a

Only for alcohols. Group treated as original definition for other compounds.

Table 7. Second-Order Method Correction Factor D chemical families

value

SE

hydrocarbons alcohols, ketones, esters, ethers

1 0.7758

0.0034

Figure 4. Plot of experimental versus model values for the second-order temperature-dependent group contribution method for esters, ethers, ketones, and alcohols: +, training set; ■, validation set; ○, 2butoxyethanol values.

Domalski et al.35 were used for parsing the compounds. Plyasunov et al.25,26,28,34 only define four hydrocarbon second1057

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

Table 8. Summary of Second-Order Temperature-Dependent Model Values Bi group

value/K

SEa

C−(C)4 C−(H)(C)3 C−(H)2(C)2 C−(H)3(C) C−(H)2(C)(Cb)e C−(H)3(Cb)e Cb−(C)(Cb)2e Cb−(H)(Cb)2e Y0

−5451 −4267 868 6820 19962 16423 −13358 1870 0

2205 1166 292 542 33762 33753 33753 20

[CH3−CH2−O−CH2] corrf [O−(CH2)2−O] corra C−(C)(H)2(O) C−(C)2(H)(O) alcohol C−(C)2(H)(O) ester C−(C)2(H)(O) ether C−(C)3(O) alcohol C−(C)3(O) ester C-(C)3(O) ether C−(CO)(C)3g C−(CO)(H)(C)2g C−(CO)(H)2(C)g CO−(C)2g COO−(C)2h HCOO−(C)i O−(C)(H) O−(C)2 C-(H)3(O)

1106 −26034 22467 19591 18533 12226 18816 34714 12359 −7084 −14006 −9550 17819 −5096 1855 0 −29379 28004

5853 17351 1574 994 6657 5167 2525 6219 9442 15452 4324 2434 805 2902 5873 5765 7327

Ci p-valueb

value

SEa

Hydrocarbon Regression 0.01 13.2 7.1 0.0003 11.8 3.8 0.003 −1.89 0.90 < 0.0001 −18.6 1.7 0.55 −65 111 0.63 −52 111 0.69 45 111 < 0.0001 −4.28 0.06 0 Functional Group Regression 0.85 −6 19 0.14 72 56 < 0.0001 −49.8 4.9 < 0.0001 −39.3 3.1 0.006 −38 21 0.02 −15 17 < 0.0001 −36.8 7.9 < 0.0001 −90 20 0.19 −15 30 0.65 25 49 0.001 46 14 < 0.0001 32.2 7.6 < 0.0001 −37.8 2.5 0.08 13 9 0.75 −14 19 0 < 0.0001 68.9 18.7 0.0001 −69.9 23.3

training

validation

p-valueb

nCc

nGd

nCc

nGd

temp range (K)

0.06 0.002 0.04 < 0.0001 0.56 0.64 0.69 < 0.0001

4 5 13 19 4 7 12 13

23 45 830 645 115 303 423 2229

0 1 3 3 1 1 2 2

0 11 56 90 11 38 49 131

273−424 273−423 273−456 273−456 273−373 262−568 262−568 262−568

0.76 0.20 < 0.0001 < 0.0001 0.07 0.37 < 0.0001 < 0.0001 0.62 0.61 0.0007 < 0.0001 < 0.0001 0.17 0.47

2 1 39 11 2 1 3 2 3 1 3 12 19 21 6 27 9 6

35 11 608 174 11 40 42 21 72 10 28 128 306 259 50 425 173 112

0 0 7 2 0 0 0 0 1 0 0 4 1 4 1 4 2 1

0 0 103 20 0 0 0 0 9 0 0 42 13 42 14 54 14 6

269−364 273−373 269−375 273−373 273−364 273−334 273−371 273−354 273−363 273−364 273−364 273−364 265−453 273−365 272−364 273−375 269−373 273−373

0.0002 0.003

a

Standard error. bTwo-tailed p-value. cnC is the number of compounds with the group. dnG is the number of times the group appeared in the set. Cb is an aromatic carbon (benzene). fEther corrections used in addition to group contributions. gCO represents CO. hEster. iFormate.

e

⎡ ⎛ k ⎞ ⎛ 1 1⎞ ln⎜⎜ H ⎟⎟ = D⎢(∑ Bi ni)⎜ − ⎟ T⎠ ⎢⎣ i ⎝ Tref ⎝ kH,ref ⎠

could be redefined to include the hydroxyl group which would cause the model to be full rank. The groups as defined by Plyasunov et al.25,26,28,34 and possible new group definitions are given in Table 6. However, as originally defined the C− (C)(H)2(O) group applies to ethers and esters in addition to alcohols, so an additional group would have to be added for ethers and esters if these new group definitions were used. To still use the original group definitions, the hydroxyl contributions were defined as 0. This is essentially treating the carbon groups as if they were the new groups given in Table 6 without formally redefining them. The C−(C)(H)2(O) group is then still applicable for other chemical families. If a group was only represented in 1 or 2 compounds, the compounds were added to the training set. The remaining compounds were divided into subsets (COO, CH3O, COOH, CO, OH, and O), and compounds were randomly chosen from each subset to ensure multiple groups were represented in the validation set. The training and validation sets contained data for 83 and 11 compounds, respectively. During model development, it was discovered that the model showed a predictable bias. A correction factor was added to the model to improve the results. The modified model with the correction factor, D, is given by

⎛ ⎛ T ⎞⎞⎤ + (∑ Cini)⎜⎜ln⎜ ⎟⎟⎟⎥ ⎝ ⎝ Tref ⎠⎠⎥⎦ i

(10)

The correction factor is only necessary for alcohols, esters, ethers, and ketones. The correction factors are listed in Table 7. The training set was divided into 10 subgroups for 10-fold cross-validation. With three of the folds, the learning sets were not full rank so parameters could not be determined. The results of the remaining seven iterations were averaged for the crossvalidation statistics. Overall parameters were determined using the full training set. The model was then applied to the validation set. Results of the 10-fold cross-validation and overall method performance are summarized in Table 5. Results are shown in Figure 4. The most extreme outliers in Figure 4 are data points for 2-butoxyethanol which is a multifunctional compound. Plyasunov et al.23 found it necessary to provide a correction for diethers ([O−(CH2)2−O]. 2-Butoxyethanol has a similar structure except that one ether group is replaced by a hydroxyl group. A correction factor was not added in this study due to the limited amount of data. However, the data from this compound suggest that correction factors may be necessary for multifunctional compounds. Additional temperature-dependent data are needed for other compounds with ether and hydroxyl groups in 1058

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

Table 9. Comparison of Methods from This Study with Existing Methods Kühne

a

this study 1st order

this study 2nd order

method

AAD %a

nCb

AAD %a

Lau GCM nCb

AAD %a

Lau BCM nCb

AAD %a

nCb

AAD %a

nCb

overall 1-alkenes acetates aliphatic ethers alkylcyclohexanes cycloaliphatic alcohols cycloalkanes cycloalkenes dialkenes dimethylalkanes formates ketones methylalkanes methylalkenes n-alcohols n-alkanes n-alkylbenzenes other aliphatic alcohols other alkanes other alkylbenzenes other ethers/diethers other monoaromatics other polyfunctional C, H, O other saturated aliphatic esters propionates and butyrates

49.9 12.2 14.9 16.9 258.4 9.5 21.8 8.1 10.8 159.3 14.9 35.8 67.4 6.3 29.7 172.9 21.2 20.5 28.7 163.2 13.1 6.2 10.3 27.6 26.4

144 2 10 9 4 1 4 3 1 4 7 22 2 2 10 8 7 21 1 8 1 1 1 2 13

18.6 21.0

101 2

11.6

50

24.2 7.4

108 2

24.5

125

26.3 25.7

10 9

22.8 33.0 28.3 14.8

4 7 21 2

17.6 15.6 8.8 22.7 16.7 29.8 35.1

10 8 7 21 1 8 1

89.7 37.9 25.5

1 2 13

11.1 7.0 14.6 14.7 7.4 9.8

4 1 4 3 1 4

38.1 13.8 45.1 14.5 14.1 8.9 11.5 21.6 10.9

22 2 2 10 8 7 21 1 8

6.9

1

11.2

4

11.7 13.3

20 2

14.5 8.7

8 7

4.4 11.4

1 8

48.0 37.7 52.7 40.6 65.6 23.9 20.9 30.2 14.1 7.5 21.5 16.1 10.1 22.1 20.3 14.6

4 1 4 3 1 4 7 22 2 2 10 8 7 21 1 8

6.0

1

Average absolute deviation. bNumber of applicable compounds.

Table 8 for the second-order method. (3) Determine coefficient D using Table 7 if using the second-order method. (4) Determine the final equation by combining the coefficients and reference value using eq 4 for the first-order method and eq 10 for the second-order method. Method Comparison and Recommendations. The methods of this study were compared with the literature methods using experimental data from 144 compounds used in the first- and second-order method development. The reference point used in the Lau et al.19 and Kühne et al.18 methods is the DIPPR 801E recommended value at 298.15 K determined from experimental data.20 The results are summarized in Table 9 by chemical family. The first- and second-order methods from this study are comparable to the methods of Lau et al.19 and better than the method of Kühne et al.18 for hydrocarbons. However, in general the methods of Kühne et al.18 and Lau et al.19 (when applicable) are more accurate for compounds with functional groups. This is partially because the comparisons use experimental values as reference values with these methods, whereas the methods of this study use predicted values for reference values. The reliability of any prediction method that uses reference values is dependent on the quality of those values. The first-order group contribution method from this study is recommended for hydrocarbons, alcohols, ketones, and formates in which the functional group is not directly attached to a benzene ring. It is easier to use than the Lau et al.19 methods because it does not require an additional prediction method or experimental value. The Kühne et al.18 method is recommended for compounds with functional groups over a limited temperature range that have reliable reference values. The second-order group contribution method developed

order to more reliably develop a correction factor for compounds that contain both of these groups. The second-order group contribution method works slightly better for hydrocarbons than for compounds with functional groups. This is not surprising because of the increased complexity of the compound structures. In addition, there are typically less temperature-dependent data available for the compounds with functional groups, and the data generally had higher experimental uncertainties. All of the temperature-dependent groups developed in the two regressions (hydrocarbon and functional groups) and the coefficient standard errors (SE) and two-tailed p-values are summarized in Table 8. Only the first four hydrocarbon groups were used in the functional group regression. The table also includes the number of times a group appeared in a compound (nC) and the number of times a group appeared in all of the temperature-dependent data (nG) in the training and validation sets for each regression. The minimum and maximum temperatures of the data corresponding to each group found in the training and validation sets are also included. The groups with the smallest amount of data typically have the largest standard errors and p-values. The second-order correction factor is summarized in Table 7. Sample calculations and a list of compounds used in the training and validation sets are included in the Supporting Information. Method Application Summary. The general procedure required to use the first- and second-order bond contribution methods of this study are described in the following steps: (1) Calculate kH,ref using eq 6 and the ΔG∞ hyd contributions listed in Table 3 for the first-order method and Table 4 for the secondorder method. (2) Calculate coefficients B and C using the group contribution values listed in Table 3 for the first-order method 1059

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

(8) Nirmalakhandan, N. N.; Speece, R. E. QSAR model for predicting Henry’s constant. Environ. Sci. Technol. 1988, 22 (11), 1349−1357. (9) Nirmalakhandan, N.; Brennan, R. A.; Speece, R. E. Predicting Henry’s Law constant and the effect of temperature on Henry’s Law constant. Water Res. 1997, 31 (6), 1471−1481. (10) Katritzky, A. R.; Oliferenko, A. A.; Oliferenko, P. V.; Petrukhin, R.; Tatham, D. B.; Maran, U.; Lomaka, A.; Acree, W. E. A general treatment of solubility. 1. The QSPR correlation of solvation free energies of single solutes in series of solvents. J. Chem. Inf. Comput. Sci. 2003, 43 (6), 1794−1805. (11) Goss, K.-U. Prediction of the temperature dependency of Henry’s law constant using poly-parameter linear free energy relationships. Chemosphere 2006, 64 (8), 1369−1374. (12) Abraham, M. H.; Acree, W. E., Jr. Prediction of gas to water partition coefficients from 273 to 373 K using predicted enthalpies and heat capacities of hydration. Fluid Phase Equilib. 2007, 262 (1−2), 97− 110. (13) Lin, S.-T.; Sandler, S. I. Henry’s law constant of organic compounds in water from a group contribution model with multipole corrections. Chem. Eng. Sci. 2002, 57 (14), 2727−2733. (14) Majer, V.; Sedlbauer, J.; Bergin, G. Henry’s law constant and related coefficients for aqueous hydrocarbons, CO2 and H2S over a wide range of temperature and pressure. Fluid Phase Equilib. 2008, 272, 65− 74. (15) Č enský, M.; Šedlbauer, J.; Majer, V.; Růzǐ čka, V. Standard partial molal properties of aqueous alkylphenols and alkylanilines over a wide range of temperatures and pressures. Geochim. Cosmochim. Acta 2007, 71 (3), 580−603. (16) Plyasunov, A. V.; Shock, E. L. Prediction of the vapor−liquid distribution constants for volatile nonelectrolytes in water up to its critical temperature. Geochim. Cosmochim. Acta 2003, 67 (24), 4981− 5009. (17) Wagner, W.; Pruss, A. The IAPWS formulation 1995 for the thermodynamic properties of ordinary water substance for general and scientific use. J. Phys. Chem. Ref. Data 2002, 31 (2), 387−535. (18) Kühne, R.; Ebert, R.-U.; Schüürmann, G. Prediction of the temperature dependency of Henry’s Law constant from chemical structure. Environ. Sci. Technol. 2005, 39 (17), 6705−6711. (19) Lau, K.; Rogers, T. N.; Zei, D. A. Modeling the temperature dependence of the Henry’s law constant of organic solutes in water. Fluid Phase Equilib. 2010, 290 (1−2), 166−180. (20) Brockbank, S. A.; Russon, J. L.; Giles, N. F.; Rowley, R. L.; Wilding, W. V., Critically evaluated database of environmental properties: The importance of thermodynamic relationships, chemical family trends, and prediction methods. Int. J. Thermophys. 2013, DOI: 10.1007/s10765-013-1530-z. (21) Plyasunov, A. V.; Shock, E. L. Thermodynamic functions of hydration of hydrocarbons at 298.15 K and 0.1 MPa. Geochim. Cosmochim. Acta 2000, 64 (3), 439−468. (22) Dearden, J. C.; Schüürmann, G. Quantitative structure-property relationships for predicting Henry’s law constant from molecular structure. Environ. Toxicol. Chem. 2003, 22 (8), 1755−1770. (23) Raventos-Duran, T.; Camredon, M.; Valorso, R.; MouchelVallon, C.; Aumont, B. Structure−activity relationships to estimate the effective Henry’s law constants of organics of atmospheric interest. Atmos. Chem. Phys. 2010, 10 (16), 7643−7654. (24) Brennan, R. A.; Nirmalakhandan, N.; Speece, R. E. Comparison of predictive methods for Henrys Law Coefficients of organic chemicals. Water Res. 1998, 32 (6), 1901−1911. (25) Plyasunov, A. V.; Plyasunova, N. V.; Shock, E. L. Group contribution values for the thermodynamic functions of hydration at 298.15 K, 0.1 MPa. 4. Aliphatic nitriles and dinitriles. J. Chem. Eng. Data 2006, 51 (5), 1481−1490. (26) Plyasunov, A. V.; Plyasunova, N. V.; Shock, E. L. Group contribution values for the thermodynamic functions of hydration at 298.15 K, 0.1 MPa. 3. Aliphatic monoethers, diethers, and polyethers. J. Chem. Eng. Data 2006, 51 (1), 276−290. (27) Plyasunov, A. V.; Shock, E. L. Group contribution values of the infinite dilution thermodynamic functions of hydration for aliphatic

in this study is recommended for compounds without a reliable experimental reference point.



CONCLUSIONS A first-order temperature-dependent group contribution method was developed using reference values predicted at 298.15 K and 100 kPa from the method of Plyasunov et al.21 Temperaturedependent group contribution values were developed using groups as defined in Plyasunov et al.,21 which predicts only at constant temperature and pressure. The first-order method developed in this study works for hydrocarbons, alcohols, ketones, and formates where none of the functional groups are directly attached to a benzene ring. Attempts to add first-order temperature-dependent group contributions for ethers and esters were unsuccessful. Therefore, a second-order temperature-dependent group contribution method was developed using the second-order groups at a reference condition of 298.15 K and 100 kPa from the method of Plyasunov et al.25,26,28,34 Additional second-order hydrocarbon groups were added to the reference predictions before developing temperature-dependent parameters. The second-order method is applicable for hydrocarbons, esters, ethers, ketones, and alcohols. The prediction methods from this study were compared to other temperaturedependent prediction methods, and the first-order group contribution method from this study is recommended when applicable.



ASSOCIATED CONTENT

S Supporting Information *

Sample calculations using the first- and second-order bond contribution methods. List of chemicals used in method training and validation. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Tel.: 1-702-271-2510. Fax: 1-801-422-0151. E-mail: [email protected]. Funding

The authors thank AIChE and DIPPR801 for project funding. Notes

The authors declare no competing financial interest.



REFERENCES

(1) DeVito, S. C.; Farris, C. A. Chemistry Assistance Manual for Premanufacture Notification Submitters. Office of Pollution Prevention and Toxics; U.S. Environmental Protection Agency: Washington, DC, 1997. (2) REACH. http://ec.europa.eu/environment/chemicals/reach/ reach_intro.htm (accessed Apr 8, 2009). (3) SPARC Performs Automated Reasoning in Chemistry. http:// www.epa.gov/athens/research/sparc.html (accessed Jan 24, 2013). (4) Nielsen, F.; Olsen, E.; Fredenslund, A. Henry’s Law Constants and Infinite Dilution Activity Coefficients for Volatile Organic Compounds in Water by a Validated Batch Air Stripping Method. Environ. Sci. Technol. 1994, 28 (12), 2133−2138. (5) Yaws, C.; Yang, H.-C.; Pan, X. Henry’s Law Constants for 362 organic compounds in water. Chem. Eng. 1991, 98 (11), 179−85. (6) Smith, F. L.; Harvey, A. H. Avoid common pitfalls when using Henry’s Law. Chem. Eng. Prog. 2007, 103 (9), 33−39. (7) Sedlbauer, J.; Bergin, G.; Majer, V. Group contribution method for Henry’s Law constant of aqueous hydrocarbons. AIChE J. 2002, 48 (12), 2936−2959. 1060

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061

Journal of Chemical & Engineering Data

Article

noncyclic hydrocarbons, alcohols, and ketones at 298.15 K and 0.1 MPa. J. Chem. Eng. Data 2001, 46 (5), 1016−1019. (28) Plyasunova, N. V.; Plyasunov, A. V.; Shock, E. L. Group contribution values for the thermodynamic functions of hydration at 298.15 K, 0.1 MPa. 2. Aliphatic thiols, alkyl sulfides, and polysulfides. J. Chem. Eng. Data 2005, 50 (1), 246−253. (29) Rowley, R. L.; Wilding, W. V.; Oscarson, J. L.; Giles, N. F., DIPPR Data Compilation of Pure Chemical Properties; Design Institute for Physical Properties, AIChE: New York, NY, 2011. (30) Nagvekar, M.; Daubert, T. E. A group contribution method for liquid thermal conductivity. Ind. Eng. Chem. Res. 1987, 26, 1362−5. (31) Hsu, H.-C.; Sheu, Y.-W.; Tu, C.-H. Viscosity estimation at low temperatures (Tr < 0.75) for organic liquids from group contributions. Chem. Eng. J. (Amsterdam, Netherlands) 2002, 88, 27−35. (32) Elbro, H. S.; Fredenslund, A.; Rasmussen, P. Group contribution method for the prediction of liquid densities as a function of temperature for solvents, oligomers, and polymers. Ind. Eng. Chem. Res. 1991, 30, 2576−82. (33) Harrison, B. K.; Seaton, W. H. Solution to missing group problem for estimation of ideal gas heat capacities. Ind. Eng. Chem. Res. 1988, 27, 1536−40. (34) Plyasunov, A. V.; Plyasunova, N. V.; Shock, E. L. Group contribution values for the thermodynamic functions of hydration of aliphatic esters at 298.15 K, 0.1 MPa. J. Chem. Eng. Data 2004, 49 (5), 1152−1167. (35) Domalski, E. S.; Hearing, E. D. Estimation of the thermodynamic properties of C−H−N−O−S−halogen compounds at 298.15 K. J. Phys. Chem. Ref. Data 1993, 22 (4), 805−1159.

1061

dx.doi.org/10.1021/je400770a | J. Chem. Eng. Data 2014, 59, 1052−1061