QSAR model for predicting Henry's constant | Environmental Science

Predicting Temperature-Dependent Aqueous Henry's Law Constants Using Group Contribution Methods. Journal of Chemical & Engineering Data 2014, 59 (4) ...
0 downloads 0 Views 1MB Size
Environ. Sci. Technol. 1988,22,1349-1357

QSAR Model for Predicting Henry’s Constant Nagamany N. Nlrmalakhandan and Richard E. Speece” Environmental and Water Resources Engineering, Vanderbilt University, Nashville, Tennessee 37235

w Quantitative structure-activity relationship (QSAR) techniques are used to develop a predictive equation for estimating Henry’s constant, H. Based on theoretical considerations, an empirical model for H is developed in terms of nonexperimentally determined, easy-to-calculate structural descriptors. The model explains over 98% of the variance in the experimental data set containing 180 compounds, with a standard error of 0.262 log unit. The robustness and utility of the model are demonstrated by using it in a predictive mode on 20 compounds of various types not included in the original training set.

Introduction Henry’s constant is a physical property of a chemical that is a measure of its partitioning nature between the two phases in an air-water binary system. By virtue of its definition, H will often dictate where and how a chemical will tend to concentrate or “accumulate” at equilibrium, Chemicals with low H will tend to accumulate in the aqueous phase, while those with high H will partition more into the gas phase. Once the major medium of accumulation is established, other properties of the chemical will dictate its fate in that medium. Since air and water are the major ”compartments” of the model ecosphere and water is considered to act as the link between all its other compartments, knowledge of H is very important in assessing the environmental risks associated with a chemical. It is also a key parameter in determining the “cleanup” process of choice for contaminated sites and in detailed design of decontamination processes. Even for common chemicals, such as chloroform and trichloroethylene, for example, reported values of log H vary from -0.54 ( I ) to -0.95 (2) and -0.11 (1) and -0.55 (3), respectively. Mackay and Shiu (4)have critically reviewed Henry’s constant for 167 chemicals of environmental concern and reported only 40 experimentally measured data. They used vapor pressure and solubility data to calculate a “recommended” Henry’s constant in the absence of experimental data and concluded that “...considerable discrepancies exist in the literature even for common compounds. It is believed that bringing together vapor pressure, solubility, and H data for homologous series will promote the establishment of more accurate values for all three properties”. Because of the large number of chemicals in active use [over 60000 ( 5 ) ] ,and the lack of experimentally determined data, there is an incentive and a growing need for nonexperimental, QSAR methods for establishing the physicochemical properties of chemicals, so that timely action could be taken in evaluating their impact on the environment. Methods for estimating Henry’s constant could be very useful therefore in understanding and predicting the behavior of chemicals when they are released into the environment. Such predictive methods could also assist greatly in corroborating existing H data and solubility and vapor pressure data as well.

* Present address: Civil and Environmental Engineering, Vanderbilt University, Nashville, T N 37235. 0013-936X/88/0922-1349$01.50/0

Estimating H from Vapor Pressure-Solubility Data A review of the literature revealed only one popular method for estimating H. It is an indirect method based on the definition of H and estimates H a s the ratio of the vapor pressure, vp, of the compound to its ultimate aqueous solubility. The underlying assumption in this method is that Henry’s law is valid up to the saturation solubility limit of the compound. Once this assumption is satisfied, this method would give a true value for H. Either experimental data or predictive methods could be used to obtain values for vapor pressure and aqueous solubility. It therefore follows that the errors associated with these two properties will be propagated in to the estimated H value. The error in vapor pressure measurement “...is at best 6%, but may be much larger, a factor of 2-3 ...” (6) as in the case of high molecular weight, low vapor pressure solutes. If the solute is a solid at environmental temperatures, and if vapor pressure data of that solute is available at an elevated temperature (as is the case usually), then extrapolation of vapor pressure through the triple point is necessary, with attendant loss of reliability. In using the vapor pressure/solubility ratio for estimating H, it is also necessary to ensure that the two properties of the solutes relate to the same state. Measured data are normally available at higher temperatures and, therefore, have to be extrapolated to environmental temperatures. Vapor pressure data are alternatively and commonly estimated from temperature correlation equations (e.g., Antoine equation) of the form log (vp) = A - B/(T + C), where A , B, and C are chemical-dependent constants and T i s the absolute temperature. The errors associated with these predictive equations are about 2 to 6% for vapor pressure > l o mm, and may be up to 80% for compounds of vapor pressure C=O: in aldehydes and ketones behaves for most part as though the functional group is “ionizednas represented by >C+-O::-. The entire QSAR analyses were carried out in a Macintosh personal computer environment. Simple and valence connectivity indexes up to third order were calculated by using the valence values recommended by Kier and Hall (16).A commercial statistical package (Statview 512’) was used for the stepwise multiple regression analysis procedure and graphical analyses. Before proceeding with the QSAR analysis, the data set was evaluated to identify any outliers, as their presence can cause outliers to appear inlying (masking) or inliers to appear outlying (swamping). Initial evaluation of the training set revealed the following three compounds to be outliers: ethylene glycol, glycerol, and 4-nitrophenol. The residues of these compounds were found to be more than 3 standard deviations away. Since outliers in a data set could be due to experimental errors or, more importantly, due to members differing from the bulk of the data, these three compounds were first examined for any specific properties. Ethylene glycol and glycerol were the only compounds in the set containing more than one OH group. As suggested by Hine and Mookerjee (11),the OH group in these two compounds may be internally hydrogen bonded in the gas phase; thus, their H values could be more than those expected for the monohydroxylic alcohols. Nitrophenol was the only compound in the set containing the nitro group. This nitro group probably interacts with water largely by acting as a base in hydrogen bonding, while the hydroxy group acts as an acid. Hence, these three compounds were considered outliers due to their specific characteristics. Statistical considerations also supported this conclusion by the substantial improvement in the quality of the correlation obtained when these three compounds were left out from the analysis. For instance, the r value increased from 0.918 to 0.965 and the standard error decreased by almost 40% to 0.445 when these compounds were deleted from the training set. Their deletion removed the excessive leverage and changed the coefficients in the model considerably, thus confirming the outlying tendency of these three compounds. On the basis of the above considerations, these three compounds were left out in further analyses. The optimized QSAR model for the remaining 180 compounds was found to be log H = -1.773 + 1.0045 + 1.156’~- 0.386lx‘ (11)

n = 180; r = 0.965; r2 = 0.932; SE = 0.445 Even though the above model explained 93% of the variance in the H data, the standard error of 0.445 is considerably more than that expected in an experimental result. Residues of six compounds (2,3-dimethyl-1,3-butadiene, 1,2,4-trimethylbenzene, tetrachloromethane, 1,3dichloropropane, dichlorodifluoromethane, and 1,l-diEnviron. Sci. Technol., Vol. 22,

No. 11, 1988 1351

Table 111. Comparison between Observed and Fitted log H (Hin Nondimensional Form) log H no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 1352

chemical ethane propane n-butane 2-methylpropane n-pentane 2,2-dimethylpropane n-hexane 2-methylpentane 3-methylpentane 2,2-dimethylbutane n-heptane 2,4-dimethylpentane n-octane 2,2,4-trimethylpentane ethylene propylene 1-butene 2-methylpropene 1-pentene trans-2-pentene 2-methyl-2-butene 3-methyl-1-butene hexene 4-methyl-1-pentene 1-octene 1,3-butadiene 1,4-pentadiene 2-methyl-1,3-butadiene 1,5-hexadiene 2,3-dimethyl-l,3-butadiene acetylene ProPYne 1-butyne 1-pentyne 1-hexyne 1- heptyne 1-octyne 1-nonyne cyclopentane cyclohexane methylcyclopentane methylcyclohexane 1,2-dimethylcyclohexane cyclopentane cyclohexane methylcyclohexene benzene toluene ethylbenzene o-xylene m-xylene p-xylene propylbenzene 1,2,4-trimethylbenzene 2-propylbenzene butylbenzene 2-butylbenzene tert-butylbenzene tert-amylbenzene chloromethane dichloromethane trichloromethane tetrachloromethane bromomethane dibromomethane tribromomethane iodomethane fluoromethane trifluoromethane tetrafluoromethane chlorofluoromethane chlorodifluoromethane chlorotrifluoromethane dichlorodifluoromethane bromotrifluoromethane

Envlron. Sci. Technol., Vol. 22, No. 11, 1988

5 0.434 0.771 1.108 1.108 1.445 1.445 1.782 1.782 1.782 1.782 2.119 2.119 2.456 2.456 -0.185 0.152 0.489 0.489 0.826 0.826 0.826 0.826 1.163 1.163 1.837 -0.130 0.207 0.207 0.544 0.544 -0.054 0.283 0.620 0.957 1.294 1.631 1.968 2.305 0.733 1.070 1.070 1.407 1.744 0.114 0.451 0.788 0.165 0.502 0.839 0.839 0.839 0.839 1.176 1.176 1.176 1.513 1.513 1.513 1.850 0.434 -0.037 -0.104 -0.171 -0.005 -0.107 -0.209 0.624 0.160

0.286 0.349 0.093 0.156 0.219 0.089 0.184

lXV

I

obsd

fitted

residue

1.00 1.41 1.91 1.73 2.41 2.00 2.91 2.77 2.80 2.56 3.41 3.12 3.91 3.41 0.50 0.99 1.52 1.35 2.02 2.02 1.86 1.89 2.52 2.37 3.52 1.14 1.63 1.55 2.13 1.95 0.33 0.79 1.34 1.84 2.34 2.84 3.34 3.84 2.50 3.00 2.89 3.39 3.80 2.15 2.65 3.05 2.00 2.41 2.97 2.82 2.82 2.82 3.47 3.23 3.35 3.97 3.82 3.66 4.22 1.00 1.60 1.96 2.26 1.96 2.77 3.40 3.42 -0.22 -0.39 -0.45 0.70 0.44 0.15 0.98 0.66

0 0 0 0 0 0 0

1.31 1.46 1.58 1.68 1.71 1.95 1.87 1.85 1.84 1.90 1.92

1.27 1.41 1.51 1.60 1.61 1.81 1.71 1.78 1.76 1.88 1.81 1.95 1.91 2.15 0.89 0.99 1.08 1.16 1.18 1.18 1.25 1.24 1.28 1.35 1.48 0.64 0.74 0.78 0.84 0.93 -0.16 -0.04 0.04 0.14 0.24 0.34 0.44 0.54 0.85 0.95 1.01

0.04 0.05 0.07 0.08 0.10 0.14 0.16 0.07 0.08 0.02 0.11 0.16 0.21 -0.06 0.06 -0.06 -0.07

0 0 0 0 0 0

0 0

0 0 0 0 0

0 0 0

0 0 0 0

0 0 0 1 1 1 1 1 1 1 1 0

0 0 0 0 0 0 0

1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0

1 1 0 0 0

2.11

2.12 2.09 0.94 0.93 1.01 0.94 1.22 0.98 0.98 1.34 1.25 1.40 1.59 0.41 0.69 0.50 0.74 0.29 -0.01 -0.35 -0.12 0.01 0.21 0.44 0.52 0.77 0.88 0.90

1.17 1.25 1.16 0.41 0.27 0.49 -0.65 -0.56 -0.45 -0.66 -0.59 -0.59 -0.39 -0.63 -0.22 -0.29 -0.33 -0.32 -0.13 -0.39 -1.03 -0.75 0.07 -0.58 -1.44 -1.56 -0.65 -0.16 0.59 2.32 -0.57 0.08 1.85 1.24 1.31

1.11

1.25 0.40 0.50 0.65 -0.73 -0.59 -0.52 -0.45 -0.45 -0.45 -0.42 -0.30 -0.36 -0.32 -0.25 -0.17 -0.10 0.02 -0.74 -0.98 0.06 -0.88 -1.37 -1.77 -0.95 0.32 0.53 1.87 -0.18 0.00

1.46 0.93 1.18

-0.22

0.04 -0.20 -0.27 0.10 -0.03 0.05 0.11 -0.23 -0.05 -0.28 -0.10

-0.64 0.15 -0.31 -0.16 -0.13 -0.03 0.10 0.08 0.23 0.03 -0.05 0.17 0.15 -0.09 0.01 -0.23 -0.16 0.08 0.03 0.07 -0.22 -0.15 -0.15 0.03 -0.33 0.14 0.03 -0.09 -0.15 -0.03 -0.41 -0.29 0.23 0.01

0.30 -0.07 0.21 0.30 -0.48 0.06 0.45 -0.39 0.08 0.40 0.31 0.13

Table I11 (Continued) log H no. 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

113 114 115 116 117 118 119 120 121 122

123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150

chemical chloroethane bromoethane iodoethane 1,l-dichloroethane 1,2-dichloroethane 'l,2-dibromoethane 1-chloro-2-bromoethane 1,1,l-trichloroethane 1,1,2-trichloroethane 1,1,2,2-tetrachloroethane pentachloroethane hexachloroethane 1,l-difluoroethane 2-chloro-l,l,l-trifluoroethane 1-chloropropane 2-chloropropane 1-bromopropane 2-bromopropane 1-iodopropane 2-iodopropane 1,2-dichloropropane 1,3-dichloropropane 1,2-dibromopropane 1,3-dibromopropane 1-chlorobutane 1-bromobutane 1-bromo-2-methylbutane 1-iodobutane 1,l-dichlorobutane 1-chloropentane 2-chloropentane 3-chloropentane 1-bromo-3-methylpentane chloroethylene l,2-dichloroethylene trichloroethylene tetrachloroethylene 3-chloropropene chlorobenzene bromobenzene 1,2-dichlorobenzene 1,3-dichlorobenzene 1,4-dichlorobenzene 1,4-dibromobenzene p-bromotoluene 1-bromo-2-ethylbenzene o-bromocumene acetic acid propionic acid butyric acid methyl formate ethyl formate methyl acetate propyl formate isopropyl formate ethyl acetate methyl propionate isobutyl formate propyl acetate isopropyl acetate ethyl propionate methyl butyrate butyl acetate isobutyl acetate propyl propionate isopropyl propionate ethyl butrate methyl pentanoate amyl acetate propyl butyrate ethyl pentanoate methyl hexanoate hexyl acetate amyl propionate isoamyl formate

b

lXV

I

0.367 0.332 0.961 0.300 0.300 0.230 0.265 0.233 0.233 0.166 0.099 0.032 0.560 0.436 0.704 0.704 0.669 0.669 1.298 1.298 0.637 0.637 0.567 0.567 1.041 1.006 1.006 1.635 0.974 1.378 1.378 1.378 1.680 -0.252 -0.319 -0.386 -0.453 0.085 0.098 0.063 0.031 0.031 0.031 -0.039 0.400 0.737 1.074 -4.591 -4.254 -3.917 -1.835 -1.498 -1.498 -1.161 -1.161 -1.161 -1.161 -0.824 -0.824 -0.824 -0.824 -0.824 -0.487 -0.487 -0.487 -0.487 -0.487 -0.487 -0.150 -0.150 -0.150 -0.150 -0.150 0.187 -0.487

1.50 2.09 3.13 1.88 2.10 3.27 2.69 2.20 2.51 2.95 3.29 3.65 0.32 1.00 2.00 1.80 2.59 2.28 3.63 3.13 2.44 2.60 3.50 3.77 2.50 3.09 2.95 4.13 2.92 3.00 2.84 2.88 3.99 1.06 1.64 2.07 2.51 1.61 2.47 2.89 2.96 2.95 2.95 3.78 3.30 3.87 4.25 0.93 1.48 1.98 0.88 1.48

1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1.11

1.96 1.86 1.90 1.87 2.32 2.40 2.29 2.46 2.37 2.90 2.75 2.96 2.85 2.96 2.87 3.40 3.46 3.46 3.37 3.46 3.96 2.82

obsd

fitted

residue

-0.46 -0.51 -0.53 -0.62 -1.27 -1.54 -1.43 -0.18 -1.43 -1.73 -1.00 -1.03 0.08 0.04 -0.26 -0.18 -0.41 -0.35 -0.43 -0.34 -0.92 -1.39 -1.42 -1.44 -0.10 -0.30 -0.02 -0.19 -0.51 -0.05 0.05 0.03 0.15 0.36 -0.56 -0.32 -0.30 -0.42 -0.74 -1.07 -1.00 -0.72 -1.10 -1.69 -1.02 -0.87 -0.62 -4.91 -4.74 -4.66 -2.04 -1.94 -2.43 -1.82 -1.48 -2.26 -2.18 -1.63 -2.09 -1.94 -2.05 -2.08 -1.87 -1.73 -1.80 -1.63 -1.83 -1.86 -1.80 -1.67 -1.85 -1.82 -1.66 -1.46 -1.56

-0.29 -0.61 -0.47 -0.54 -0.64 -1.27 -0.96 -0.76 -0.91 -1.18 -1.41 -0.40 0.47 0.02 -0.19 -0.10 -0.51 -0.36 -0.37 -0.13 -0.47 -0.54 -1.04 -1.17 -0.09 -0.41 -0.34 -0.27 -0.36 0.01 0.09 0.07 -0.16 -0.70 -1.05 -1.32 -0.34 -0.63 -1.02 -1.26 -1.32 -1.32 -1.32 -1.78

-0.17

-1.11

-1.05 -0.89 -4.99 -4.92 -4.82 -2.20 -2.15 -1.98 -2.04 -1.99 -2.01 -2.00 -1.88 -1.91 -1.86 -1.94 -1.90 -1.81 -1.74 -1.84 -1.79 -1.84 -1.80 -1.71 -1.74 -1.74 -1.70 -1.74 -1.64 -1.78

0.10

-0.06 -0.08 -0.63 -0.27 -0.47 0.58 -0.53 -0.55 0.41 -0.63 -0.39 0.02 -0.07 -0.09 0.10 0.01 -0.06 -0.21 -0.45 -0.85 -0.38 -0.27 -0.01 0.11 0.32 0.08 -0.15 -0.06 -0.04 -0.04 0.31 1.06 0.49 1.00 0.04 0.21 0.28 0.19 0.32 0.60 0.22 0.09 0.09 0.18 0.27 0.08 0.18 0.16 0.16 0.21 -0.46 0.22 0.51 -0.25 -0.18 0.25 -0.18 -0.08 -0.11

-0.18 -0.06 0.01 0.04 0.16 0.01 -0.06 -0.09 0.07 -0.11 -0.12 0.08 0.18 0.22

Environ. Sci. Technol., Vol. 22, No. 11, 1988

1353

Table 111 (Continued) log H no.

chemical

5

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180

isoamyl acetate methyl octanoate ethyl heptanoate methyl benzoate methanol ethanol 1-propanol 2-propanol allyl alcohol 1-butanol 2-butanol tert-butyl alcohol 2-methyl-1-propanol 1-pentanol 2-pentanol 2-methyl-1-butanol 2-methyl-2-butanol 1-hexanol 3-hexanol 2,3-dimethylbutanol 2-methyl-3-pentanol 4-methyl-2-pentanol 2-methyl-2-pentanol 1-heptanol 1-octanol phenol 4-bromophenol 4-tert-butylphenol 2-cresol 4-cresol

-0.150 0.861 0.861 -1.430 -3.484 -3.147 -2.810 -2.810 -3.429 -2.473 -2.473 -2.473 -2.473 -2.136 -2.136 -2.136 -2.136 -1.799 -1.799 -1.799 -1.799 -1.799 -1.799 -1.462 -1.125 -3.416 -3.518 -2.068 -3.079 -3.079

4

obsd

fitted

residue

3.25 4.95 5.53 3.05 0.45 1.02 1.52 1.41 1.13 2.02 1.95 1.72 1.87 2.52 2.45 2.41 2.28 3.02 2.98 2.78 2.86 2.80 2.78 3.52 4.02 2.13 3.02 3.78 2.55 2.54

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

-1.62 -1.50 -1.69 -3.14 -3.72 -3.59 -3.56 -3.46 -3.69 -3.46 -3.38 -3.31 -3.31 -3.29 -3.22 -3.24 -3.25 -3.20 -2.70 -2.87 -2.85 -2.74 -2.88 -3.12 -3.01 -4.79 -5.21 -4.34 -4.30 -4.49

-1.64 -1.44 -1.71 -2.83 -3.65 -3.59 -3.49 -3.43 -3.92 -3.39 -3.35 -3.24 -3.32 -3.29 -3.25 -3.23 -3.17 -3.19 -3.17 -3.07 -3.11 -3.08 -3.07 -3.09 -2.99 -4.39 -4.91 -3.82 -4.25 -4.24

0.02 -0.06 0.02 -0.31 -0.07

4

/

I n = 180: r = 0.965:std error = 0.4451

lXV

I

0.00

-0.07 -0.03 0.23 -0.07 -0.03 -0.07 0.01 0.00 0.03 -0.01 -0.08 -0.01 0.47 0.20 0.26 0.34 0.19 -0.03 -0.02 -0.41 -0.30 -0.52 -0.05 -0.25

In = 180; r = 0.99: std error = 0.26]

Y

.

-6 -6

-4

I

I

I

-2

0

2

1

4

-6

-4

-2

0

Calculated log H

Calculated log H

2

Flgure 1. Comparison between experimental and fitted log H: preliminary model.

Figure 2. Comparison between experimentaland fitted log H: refined model with indicator variable.

fluoroethane) were found to be more than twice the standard error. We could not locate any other literature values for these compounds to check the accuracy of the original data set. The last four of these compounds had shown similar deviations in the studies of Hine and Mookerjee (11)too. Apart from these six compounds, the fitted values agreed satisfactorily with the experimental values except in the case of alkanes and the fluorinated compounds in particular, where the model generally underestimated, with residues ranging up to 1.11log units. The quality of the fit is shown in Figure 1. It appeared that the systematic error of certain classes of compounds and the unexplained variance could be due to the effect of hydrogen bonding. To account for this effect, we followed the approach of Hansch et al. (18), where an indicator variable, I, is used to differentiate between compounds on the basis of their ability to take place in hydrogen bonding. This indicator, I, is assigned a value of 1 for all compounds containing an electronegative element (oxygen, nitrogen, halogen, etc.) attached directly to a carbon atom holding a hydrogen atom.

Acetylinic compounds and aromatic compounds with partially substituted hydrogen atoms were also assigned 1. For the remaining compounds, I was set equal to zero. With this indicator variable, a more precise model is obtained: log H = 1.29 1.005 5 - 0.468lx" - 1.2581 (12)

1354

Environ. Sci. Technol., Vol. 22, No. 11, 1988

+

n = 180; r = 0.99; R2 = 0.98; SE = 0.262 The optimized contributions of the atoms to $ are summarized in Table 11. This model explains over 98% of the variance in the data, leaving only 2% for inadequacy of the model and the experimental errors in the data. The standard error is reduced by over 40%, and is now superior to that in other estimation methods, and comparable to that in experimental results. The experimental and fitted log H values are compared in Table 111, and the quality of the fit of the model is shown in Figure 2. Checking for Robustness of t h e Q S A R Model The statistical validity of the general model is tested first by screening the variables for any intercorrelation. It is

Table IV. Results of Regression Runs for Subsets

run no.

compds deleted

1

none

2 3 4

all alkanes all alcohols all esters all aromatics halogenated compounds all alcohols and esters

5

6 7

no. deleted

14 25 24 25 25 49

model parameters coefficient' of

no. included

r

SE

constant

180

0.988

0.262

1.298

166 155 156 155 155 131

0.985 0.981 0.988 0.988 0.992 0.977

0.272 0.274 0.272 0.268 0.234 0.286

1.277 1.314 1.310 1.318 1.272 1.317

m

lX"

0.971(-) 1.005 1.035(+) 0.999(d) 0.997(4) 1.004(4) 0.991(d) 1.002(d) 0.997(d)

-0.518(-) -0.468 -0.434(+) -0.479(4) -0.476(4) -0.477(4) -0.475 (4) -0.459(4) -0.477(d)

I -1.354(-) -1.258 -1.258(+) -1.215(4) -1.260(4) -1.252(4) -1.272(4) -1.268(4) -1.259(d)

denotes coefficient lying (-) denotes lower limit and (+) denotes upper limit of estimates of coefficients at 95% confidence level. (4) within lower and upper limits of the coefficients of the general model.

Table V. Comparison between Observed and Predicted log H Values for Testing Set (H in Nondimensional Form) log H no. 1

2 3 4 5

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 a If

chemica1

ref

2,2,24rifluoroethanol l,l,l-trifluoro-2-propanol 2,2,3,3-tetrafluoropropanol 2,2,3,3,3-pentafluoropropane hexafluoro-2-propanol cyclohexanol naphthalene acenaphthene anthracene phenanthrene fluorene pyrene biphenyl 1-methylnaphthalene l&dimethylnaphthalene 1-chloronaphthalene 2-chloronaphthalene 1,2,3-trichlorobenzene 1,2,3,5-tetrachlorobenzene hexachlorobenzene

10 10 10 10 10 10

10 10 10 10 4 4 4 4 23 23 23 23 23 24

m

lXV

-2.96 -2.62 -2.56 -2.50 -2.43 -2.51 -0.34 -0.38 0.86 -0.86 -0.57

0.32 1.08 0.89 0.85 0.73 3.05 3.39 4.43 4.81 4.81 4.60

-0.56

5.55

0.57

4.07 3.80 4.24 3.91 3.91 3.51 3.65 5.08

0.00 -0.48 -0.41 -0.41 -0.04 -0.10 -0.24

I 1 1 1 1 1 1 1 1 1

1

1 1

1 1 1 1 1

1 1

0

obsd

predicted

error

-3.15 -3.05 -3.59 -3.04 -2.76 -3.63 -1.77 -2.49 -3.14 -2.98 -2.37 -3.33 -1.77 -1.96 -1.82 -1.84 -1.88 -1.30 -1.19 -1.27

-3.09 -3.11 -2.96 -2.87 -2.75 -3.92 -1.90 -2.42 -1.35 -3.08 -2.69 -3.13 -1.30

-0.06 0.06 -0.63 -0.17 -0.01 0.29 0.13

-1.75

-2.43 -2.21 -2.21 -1.65 -1.78 -1.33

-0.07

-1.79' 0.10 0.32 -0.20 -0.47 -0.21 0.61 0.37 0.33 0.35 0.59 0.06

alternate experimental value of -1.52 is used, error = -0.17. See discussion in text.

known that, in multiple regression analyses, intercorrelations between any two variables (collinearity), or between one and several others (multicollinearity), or among many variables (polycollinearity) can lead to false models and are not simple to measure and report (19-21). The intercorrelation matrix between the parameters in this model indicates minimal collinearity problems in this model:

z

I' X "

-0.387

0 125

c

The question of collinearity is further addressed by use of the "collinearity indicators" defined below: (1) the percentage of total variance in the matrix accounted for by the first principal component. For completely orthogonal data, this indicator is normally