Environ. Sei. Technol. 1988, 22,328-338
Prediction of Aqueous Solubility of Organic Chemicals Based on Molecular Structure Nagamany N. Nirmalakhandan and Richard E. Speece"
Environmental Studies Institute, Drexel University, Philadelphia, Pennsylvania 19104 Correlations for aqueous solubility of a range of 200 environmentally relevant chemicals are derived from molecular connectivity indexes and a polarizability factor, calculated solely from molecular structure. The quality and reliability of the correlations are shown to be high enough for environmental applications, even with the minimum number of variables in the equations and without excluding any data to improve the correlations. The robustness and validity of these correlations are demonstrated by use of appropriate statistical techniques. A generalized predictive equation for aqueous solubility is recommended, which employs easily calculable molecular descriptors.
formalized by Kier and Hall (7, 8) under the name of molecular connectivity indexes x appears to meet all the above requirements. Techniques of graph theory are used to calculate these indexes by treating the hydrogen-suppressed structures as linear and star graphs and identifying the various connected paths and clusters or subgraphs. By considering various sectors of the subgraphs, various levels of indexes could be calculated by allocating either valence or number of bonds at each node to that node. For further details an the definitions and calculations of these indexes refer to Kier and Hall (8). The simple x values used in this study are calculated with a slightly modified algorithm from that of Kier and Hall (8) by assigning 6 values as equal to the number of bonds at the node.
Introduction
Rationale for the Models
The modeling of environmental fate of pollutants requires a knowledge of certain physical properties of the chemicals in question. In some cases, the physical property of the chemical may be unknown or there may be considerable discrepancy in the reported experimental data. Indeed, in some cases, the chemical may not even be in existence. In addition, it is advantageous when determining physical properties of chemicals to have a screening mechanism to both predict and corroborate experimental data on the chemical's properties and also to guide in the setting up of experiments and pilot studies more efficiently. For these reasons, there is a growing interest in the development of quantitative structure-activity relationships (QSAR). This paper is an attempt to develop predictive equations for aqueous solubility of a variety of chemicals based solely on their molecular structure, with no reliance on experimentally derived properties. The data set used included saturated and unsaturated aliphatics, alcohols, and aromatics with alkyl, chloro, bromo, and mixed substitution. Many models for estimating aqueous solubility of organic chemicals have been proposed; Lyman et al. (1) and Horvath (2) identified many pathways, the most popular ones using the octanol-water partitioning coefficient P in a direct or indirect manner. Some methods use experimentally or empirically established atomic and structural contributions in an additive manner. Other methods reported include two- or three-variable models using combinations of log P, molar volume, melting point, entropy of fusion, electronic information, etc. Recently, the solvatochromic approach has been used by Kamlet et al. to derive fundamental, high-quality correlations for aqueous solubility (3-5). The objective of this study was to derive reliable predictive equations for aqueous solubility, with molecular descriptors which are simple to formulate and fundamental in nature, could be derived or calculated purely from structural information, would encode information relating to physicochemical properties, and are consistent and calculable for environmentally relevant compounds without any experimental inputs. A series of descriptors originally proposed as a branching index by Randic' (6) and later extensively developed and
Until recently, aqueous solubility has remained a poorly understood phenomenon (9), and only empirical and experimental observations combined with theoretical concepts have been used to describe the process ( 2 ) . Most theories are developed on the basis of intermolecular forces or potential energy of interaction in aqueous solutions and the polarity of the solute. A typical result of such a derivation is the Ostwald's expression for solubility coefficient for solubility of gases in liquids in terms of the activation energy. The activation energy is generally modeled as the sum of the energy of cavity formation in the solvent and the energy of interaction of the solute molecule with the solvent in the cavity. These models are based on polarizability, cavity surface area, and hardcore diameter of the solute and the dipole moment of the solvent as main parameters. Kamlet and co-workers (3-5) have extended this concept to model solubility related phenomena by using solvatochromic parameters defined to quantify the intermolecular forces. Known as linear solvation energy relationships (LSERs), these models have been very successful in correlating many solvent-solute interactions including aqueous solubility. In our approach, polarizability and connectivity indexes are used to model the solute-solvent interactions and the cavity formation phenomena. One of the common methods of estimating polarizability is by the addition of atom/ group contribution factors as proposed by Ketelaar ( 2 ) . According to this method, the polarizability CP is given by an equation such as CP = A(no. of H) + B(no. of C) + C(no. of C1) + D(no. of Br) + E(no. of double bonds) (1)
328 Environ. Sci. Technol., Vol. 22, No. 3, 1988
where A-E are constants equal to 0.42,0.93, 2.28, 3.34, and 0.58, respectively, for the range of compounds used in this study. Since this CP carries only partial information relating to solubility,obviously, additional descriptors are necessary to adequately represent the variation of solubility among a wide range of compounds. Further, being a simple atom count based descriptor, CP could not be expected to differentiate among isomers. However, within particular classes of compounds, CP could correlate well with solubility. The connectivity indexes up to third order have been shown to encode information pertaining to molecular size
0013-936X/88/0922-0328$01.50/0
0 1988 American Chemical Society
and related properties, as well as polarizability (7,8). For example, ‘x, lx”, 3xp,3xc,and 3x> have been used in deriving very strong correlations for cavity surface area (r = 0.987), polarizability ( r = 0.998), and solubility of selected groups of compounds (7,8, 10). We have used the above two concepts and combined with the x indexes in correlation studies to relate solubility of a large number of organic chemicals ranging from simple, straight alcohols to chloro/bromo/alkyl-substituted aromatics containing up to 10 carbons. Since polarizability information appears to be duplicated by @ and some x indexes, the coefficients A-E in eq (1)were kept variable and optimized statistically to get the best predictive equations, in terms of the correlation coefficient, standard error, sample size, and the number of variables in the equation, and a corresponding modified polarizability, 5, was derived.
Methods Simple and valence connectivity indexes up to thirdorder path and cluster levels were calculated and statistically analyzed with an integrated program developed on a Macintosh computer. After eliminating highly intercorrelated descriptors among a total of 16, a stepwise multiple-regression procedure was used to select the best correlation for log solubility (g/g%). It has been recommended by Topliss and Costello (11) that if r2 = 0.4 is considered an acceptable level for chance correlations, then a sample size of at least 30 is required to test 5 variables and 50 is required for 10 variables in multiple-regression analysis to avoid unduly high risk of chance correlations. The number of variables tested in this study was restricted to meet the above recommendations. Each correlation was tested by a different method for chance correlations and robustness as described below for each set of data. Three different sets of data were used separately and in combination. Set A, consisting of 38 chloro/bromo/ alkyl-substituted benzenes with mixed substitution, was assembled with data from a compilation by Horvath (2) and Mackay and Shiu (12). Set’B, assembled from Horvath ( 2 ) ,consisted of a training subset of 38 alkanes and alkenes with only chloro/ alkyl-substitutions and a testing subset of 19 additional chloro/bromo/alkyl/mixed-substituted alkanes and alkenes. A total of 50 alcohols selected from a compilation by Yalkowski and Valvani (9) formed set C. No data were deleted from within any of the series.
Results and Discussion The suitability of Q, alone in solubility correlations was first examined in the above data sets separately and in all combinations. Table I summarizes the results of this simple regression in the first line under each set. Only in set A, Qj explained 85.4% of the variation in solubility with low standard error. In all other cases either the correlation was poor or the error was high. This could be expected as, in set A, the ring is a common element, and polarizability-solubility relationship is probably more dominant. Among members of the other sets, topological variations are more pronounced, and hence, Qj could not correlate well. The second line under each set in Table I shows the results when a stepwise multiple-regression procedure was allowed to select the best variables among x and Qj. Since cavity area information is now available in the form of x
indexes, the correlations are significantly improved, and the error reduced in all cases, as expected. However, the number of variables entered into the model also increased, thus reducing the utility value of the predicting equation. The equations now included C, H, and C1 atom counts, suggesting that the coefficients in eq (1) could be modified to relate to solubility more efficiently. Furthermore, in set B and set C, @ was completely rejected in preference to x and counts of C, C1, and H, suggesting again that @ as defined in eq 1 is rather inefficient in contributing to solubility correlation. The last line under each set in Table I shows the results when the modified polarizability, 5, was used instead of @, It can be seen that for all cases, the quality of the correlation is better than both the earlier schemes. Considering the number of cases and the structural diversity in the combined sets, % along with the x indexes appears to yield improved models with fewer descriptors for every combination. The three sets were studied individually with the appropriate modified polarizability descriptor, 5. The robustness of the predictive equations was examined and wherever possible compared with results of other researchers. Study of Set A. By using the stepwise multiple-regression analysis, the best model for solubility was found to be log S = 1.790 - 0.934lx’ - 1.01%
(2)
n = 38; r = 0.960; r2 = 0.922; adjusted r2 = 0.915; SE = 0.20
% = O.l(no. of C)
+ 0.227(no. of C1)
The fitted values are compared with the observed ones in Table 11. In a similar study of a set of 35 halobenzenes, Kier and Hall (8) reported a stronger correlation for log S with a three-variable model using ‘x, 2xv,and melting point ( r = 0.995; SE = 0.15). In order to highlight any “outliers” or “strong points” and to examine the model for chance correlations, a modified form of the “jackknife test” as suggested by Dietrich et al. (13) and Cornish-Bowden and Wong (14) was used. In this method, each observation was in turn deleted from the analysis, and the resulting r was noted as the jackknifed r in the last column of Table 11. These r values do not indicate any unduly high variation, suggesting that the data set is fairly consistent and that the model is not biased by any particular data point. Study of Set B. The subset of 38 chloro/alkyl-substituted alkanes/alkenes was used as the training set in the stepwise multiple-regression analysis. The fitted values are compared with the observed ones in Table IIIA, and the model for solubility was found to be log S = 0.795 + 1.9345 + 1.3253~,- 1.1923~> (3)
n = 38; r = 0.964; r2 = 0.929; adjusted r2 = 0.923; SE = 0.22 & = 0.17(no. of H) - 0.57(no. of C) The testing set consisting of 19 additional chloro/bromo/alkyl/mixed-substituted alkanes and alkenes was used Environ. Sci. Technol., Vol. 22, No. 3, 1988
329
Table I. Summary of Results of Regression Analysis no. of variables in eq
r2
adjusted r2
SE
0.854 0.918 0.922
0.850 0.913 0.915
0.277 0.210 0.200
Set B, 57 Cl/Br/Alkyl-Substituted Alkanes/Alkenes using polarizability only 1 0.798 0.637 using polarizability and x indexes 4" 0.943 0.890 using modified polarizability and x indexes 3 0.964 0.929
0.630 0.881 0.923
0.476 0.270 0.220
0.707 0.959 0.959
0.294 0.110 0.110
Set AB, 95 Cl/Br/Alkyl-Substituted Aromatics and Alkanes/Alkenes using polarizability only 1 0.906 0.821 using polarizability and x indexes 4 0.937 0.878 using modified polarizability and x indexes 2 0.944 0.890
0.820 0.873 0.885
0.435 0.365 0.347
Set AC, 88 Cl/Br/Alkyl-Substituted Aromatics and Alcohols using polarizability only 1 0.895 0.802 7 0.993 0.985 using polarizability and x indexes 3 0.994 0.988 using modified polarizability and x indexes
0.800 0.984 0.987
0.604 0.172 0.156
Set BC, 107 Cl/Br/Alkyl-Substituted Alkanes/Alkenes and Alcohols using polarizability only 1 0.538 0.290 using polarizability and x indexes 7 0.951 0.905 using modified polarizability and x indexes 4 0.952 0.906
0.283 0.898 0.901
0.713 0.269 0.265
Set ABC, 145 Cl/Br/Alkyl-Substituted Alkanes/Alkenes/Aromatics and Alcohols 1 0.774 0.599 0.596 using polarizability only using polarizability and x indexes 8 0.958 0.917 0.909 using modified polarizability and x indexes 3 0.962 0.926 0.924
0.735 0.254 0.318
regression analysis
r
Set A, 38 Cl/Br/Alkyl-Substituted Aromatics using polarizability only 1 0.924 using polarizability and x indexes 3 0.958 using modified polarizability and x indexes 2 0.960
Set C, 50 Alcohols 1 30 3"
using polarizability only using polarizability and x indexes using modified polarizability and x indexes
a
0.844 0.980 0.980
0.713 0.961 0.961
Polarizability rejected in preference to x indexes.
Table 11. Observed vs Predicted Solubilities of Set A no.
chemical
ref
b
IXV
obsd log s
predicted log s
residue
jackknifed r
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
benzene toluene ethylbenzene p-xylene m-xylene o-xylene 1,2,3-trimethylbenzene 1,2,4-trimethylbenzene 1,3,5-trimethylbenzene propylbenzene isopropylbenzene 1-ethyl-2-methylbenzene 1-ethyl-4-methylbenzene n-butylbenzene isobutylbenzene sec-butylbenzene tert-butylbenzene 1-bromo-2-chlorobenzene 1-bromo-3-chlorobenzene 1-bromo-4-chlorobenzene bromobenzene 1,2-dibromobenzene 1,3-dibromobenzene 1,4-dibromobenzene 1,2,3-tribromobenzene 1,2,4-tribromobenzene 1,3,5-tribromobenzene 1,2,4,5-tetrabromobenzene chlorobenzene 1,2-dichlorobenzene 1,3-dichlorobenzene 1,4-dichlorobenzene 1,2,3-trichlorobenzene 1,2,4-trichlorobenzene 1,3,5-trichlorobenzene 1,2,3,4-tetrachlorobenzene 1,2,3,5-tetrachlorobenzene 1.2.4.5-tetrachlorobenzene ,
12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 2 2 2 2 2 2
-0.60 -0.70 -0.80 -0.80 -0.80 -0.80 -0.90 -0.90 -0.90 -0.90 -0.90 -0.90 -0.90 -1.00 -1.00 -1.00 -1.00 -0.83 -0.83 -0.83 -0.60 -0.60 -0.60 -0.60 -0.60 -0.60 -0.60 -0.60 -0.83 -1.05 -1.05 -1.05 -1.28 -1.28 -1.28 -1.51 -1.51 -1.51
2.00 2.41 2.97 2.82 2.82 2.82 3.24 3.23 3.23 3.47 3.35 3.38 3.38 3.97 3.82 3.89 3.66 3.37 3.37 3.37 2.89 3.79 3.78 3.78 4.69 4.68 4.67 5.58 2.47 2.96 2.95 2.95 3.44 3.43 3.43 3.92 3.92 3.92
-0.749 -1.288 -1.818 -1.733 -1.790 -1.757 -2.125 -2.244 -2.013 -2.260 -2.301 -2.032 -2.027 -2.921 -3.000 -2.770 -2.469 -1.907 -1.927 -2.347 -1.387 -2.125 -2.171 -2.699 -3.538 -3.004 -2.699 -4.300 -1.304 -2.036 -1.907 -2.051 -2.509 -2.523 -3.180 -3.367 -3.456 -3.250
-0.669 -1.150 -1.771 -1.632 -1.632 -1.632 -2.122 -2.113 -2.113 -2.336 -2.225 -2.253 -2.253 -2.902 -2.762 -2.827 -2.613 -2.170 -2.170 -2.170 -1.497 -2.335 -2.325 -2.325 -3.172 -3.163 -3.153 -4.000 -1.333 -2.016 -2.006 -2.006 -2.689 -2.680 -2.680 -3.362 -3.362 -3.362
-0.080 -0.138 -0.047 -0.101 -0.158 -0.125 -0.003 -0.131 0.100 -0.076 -0.077 0.220 0.225 -0.020 -0.238 0.057 0.144 0.264 0.244 -0.176 0.110 0.210 0.154 -0.374 -0.366 0.159 0.455 -0.300 0.029 -0.020 0.100 -0.044 0.180 0.157 -0.500 -0.004 -0.093 0.113
0.954 0.958 0.960 0.960 0.960 0.960 0.960 0.961 0.960 0.960 0.960 0.961 0.961 0.959 0.961 0.960 0.961 0.962 0.962 0.961 0.958 0.961 0.961 0.964 0.961 0.960 0.967 0.954 0.958 0.960 0.960 0.960 0.961 0.961 0.966 0.958 0.958 0.959
38 __
330
1-1
Environ. Sci. Technol., Vol. 22, No. 3, 1988
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Table 111. Observed vs Predicted Solubilities
(A) Training Set B chemical
no.
1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
dichloromethane trichloromethane tetrachloromethane chloroethane l,l-dichloroethane 1,2-dichloroethane l,l,l-trichloroethane 1,1,2-trichloroethane
1,1,1,2-tetrachloroethane 1,1,2,2-tetrachloroethane pentachloroethane hexachloroethane I-chloropropane 2-chloropropane 1,2-dichloropropane 1,3-dichloropropane 1,2,3-trichloropropane 1-chloro-n-butane 2-chloro-n-butane 1,l-dichloro-n-butane 2,3-dichloro-n-butane I-chloro-n-pentane 2-chloro-n-pentane 3-chloro-n-pentane I-chloro-n-hexane 1-chloro-2-methylpropane 2-chloro-2-methyl-n-butane
2,3-dichloro-2-methyl-n-butane chloroethylene l,l-dichloroethylene
cis-l,2-dichloroethylene trichloromethylene tetrachloroethylene 3-chloro-1-propylene cis-1,3-dichloro-l-propylene hexachloro-I-propylene 1,1,3,4,4-pentachloro-1,2-butadiene
hexachloro-1,3-butadiene
ref
3xc
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
0.000 0.580 2.000 0.000 0.580 0.000 2.000 0.410 1.560 0.670 1.650 2.500 0.000 0.580 0.410 0.000 0.290 0.000 0.410 0.410 0.670 0.000 0.410 0.290 0.000 0.410 1.560 1.650 0.000 0.350 0.000 0.290 0.500 0.000 0.000 1.620 0.810 0.750
3
Xc
v
0.000 0.840 2.910 0.000 0.740 0.000 2.650 0.520 2.090 0.860 2.210 3.380 0.000 0.650 0.460 0.000 0.330 0.000 0.460 0.520 0.760 0.000 0.460 0.330 0.000 0.410 1.720 1.830 0.000 0.450 0.000 0.370 0.640 0.000 0.000 2.150 0.980 0.930
jackknifed
5
obsd log
predicted log s
residue
r
-0.26 -0.39 -0.53 -0.38 -0.51 -0.51 -0.65 -0.65 -0.79 -0.79 -0.92 -1.06 -0.63 -0.63 -0.77 -0.77 -0.91 -0.89 -0.89 -1.02 -1.02 -1.14 -1.14 -1.14 -1.40 -0.89 -1.14 -1.28 -0.65 -0.79 -0.79 -0.92 -1.06 -0.91 -1.04 -1.59 -1.98 -2.12
0.115 -0.101 -1.102 -0.246 -0.321 -0.065 -0.824 -0.358 -0.959 -0.529 -1.301 -2.301 -0.577 -0.531 -0.553 -0.565 -0.722 -1.212 -1.000 -1.302 -1.478 -1.703 -1.604 -1.604 -2.041 -1.034 -1.479 -1.544 -0.559 -0.476 -0.456 -0.959 -1.824 -0.544 -1.000 -2.770 -2.879 -3.387
0.300 -0.203 -1.048 0.070 -0.315 -0.195 -0.969 -0.544 -1.149 -0.863 -1.438 -1.971 -0.425 -0.440 -0.700 -0.690 -0.963 -0.920 -0.930 -1.269 -1.203 -1.415 -1.425 -1.423 -1.910 -0.866 -1.399 -1.675 -0.460 -0.798 -0.725 -1.050 -1.357 -0.955 -1.220 -2.696 -3.138 -3.414
-0.185 0.102 -0.054 -0.316 -0.006 0.130 0.145 0.186 0.191 0.334 0.137 -0.330 -0.151 -0.091 0.147 0.125 0.241 -0.292 -0.070 -0.033 -0.275 -0.288 -0.179 -0.180 -0.131 -0.168 -0.080 0.131 -0.099 0.323 0.269 0.091 -0.466 0.411 0.219 -0.074 0.259 0.027
0.962 0.963 0.964 0.965 0.963 0.963 0.964 0.964 0.965 0.966 0.964 0.965 0.964 0.964 0.964 0.964 0.965 0.966 0.964 0.964 0.965 0.965 0.964 0.964 0.963 0.965 0.964 0.964 0.964 0.966 0.965 0.964 0.968 0.967 0.965 0.959 0.960 0.953
s
(B) Testing Set no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
chemical bromochloromethane bromomethane dibromomethane tribromomethane tetrabromomethane I-bromo-2-chloroethane bromomethane cis-1,2-dibromoethylene 1,2-dibromomethane 1,1,2,2-tetrabromoethane I-bromo-3-chloropropane 1-bromopropane 2-bromopropane l,2-dibromo-3-chloropropane 1,2-dibromopropane 1,3-dibromopropane 1-bromobutane 1-bromo-2-methvl~ro~ane l-bromo-3-meth~l:n-6utane
to validate the above model. As can be seen in Figure 1
and Table IIIB, the agreement between the predicted and the observed values for the members of the testing set is very satisfactory. To test for chance correlations in the training model, 20 regression runs were done with the respective solubility values replaced by 20 sets of random numbers of similar
ref
2 2 2 2
2 2
2 2
obsd log s
predicted log s
0.170 0.182 0.058 -0.455 -1.620 -0.166 -0.049 -0.051 -0.382 -1.167 0.257 -0.616 -0.511 -1.000 -0.848 -0.770 -1.268 -1.292 -1.699
0.233 0.477 0.233 -0.418 -1.670 -0.223 0.021 -0.712 -0.223 -1.003 -0.680 -0.436 -0.613 -1.013 -0.783 -0.680 -0.892 -0.991 . . -1.448
residue -0.063 -0.295 -0.175 -0.037 0.050 0.057 -0.070 0.661 -0.159 -0.164 0.937 -0.180 0.102 0.013 -0.065 -0.090 -0.376 ~
-o.mi
-0.251
magnitude. In none of these runs did r exceed 0.585, and the average r was 0.323. In a similar fashion, using the actual solubility values and replacing the descriptors by random numbers, the maximum r was 0.417, and the average r was 0.298 for 20 regression runs. These exercises indicate that the selected descriptors are in fact an ordered set of variables representing in tandem a systematic variation of solubility with molecular structure. Environ. Sci. Technol., Vol. 22, No. 3, 1988 331
Study of Set C. The fitted values for 50 alcohols are compared against the observed ones in Table IV. The best model for log S was found to be
0
log S = 4.52 - 4.018'~+ 2.905lx" + 0.1853~," (4)
n = 50; r = 0.980; r2 = 0.961; adjusted rz = 0.959; SE = 0.11 Yalkowski and Valvani used log P and melting point for the same data set and obtained a higher r of 0.994, but with a larger error of 0.178. Hall et al. (IO) derived a model using connectivity indexes for 51 alcohols with 1x ( r = 0.978; SE = 0.455). They improved this model by deleting 13 observations out of the 51 and introducing lx" ( r = 0.9945; SE = 0.234). Further reduction in the standard error was obtained by a four-variable model with Ix, If', 2x, and 4x> ( r = 0.9966; SE = 0.189). As can be seen from the last column of Table IV, only 3,3-dimethyl-l-butanol (log S = 0.509) appears to be an
-1
-
-2
-
-3
-
Testing Set (19 observations)
0
-4
-4
-2
-3
-1
0
Figure 1. Solubility correlation for 57 organic chemicals.
outlier causing a relatively higher jackknifed r of 0.986, while all the others in the set are reasonably consistent (between 0.978 and 0.982). This suspicion was supported by a plot of the residues, where the residue relating to this
Table IV. Observed vs Predicted Solubilities of Set C no.
chemical
ref
lX
lXV
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50.
1-butanol 2-methyl-1-propanol 2-butanol 1-pentanol 2-methyl-1-butanol 3-methyl-1-butanol 2,2-dimethyl-l-propanol 2-pentanol 3-pentanol 3-methyl-2-butanol 2-methyl-2-butanol 1-hexanol 2-methyl-1-pentanol 4-methyl-1-pentanol 2,2-dimethyl-l-butanol 3,3-dimethyl-l-butanol 2-ethyl-1-butanol 2-hexanol 3-hexanol 3-methyl-2-pentanol 4-methyl-2-pentanol 2-methyl-3-pentanol 3,3-dimethyl-Z-pentanol 2-methyl-2-pentanol 3-methyl-3-pentanol 2,3-dimethyl-2-butanol 2,4-dimethyl-2-pentanol 1-heptanol 2,2-dimethyl-l-pentanol 2,4-dimethyl-l-pentanol 4,4-dimethyl-l-pentanol 2-heptanol 3-heptanol 4-heptanol 5-methyl-2-hexanol 2-methyl-2-hexanol 2,2-dimethyl-3-pentanol 2,4-dimethyl-3-pentanol 2-methyl-2-hexanol 3-methyl-3-hexanol 2,3-dimethyl-2-pentanol 2,3-dimethyl-3-pentanol 3-ethyl-3-pentanol 2,3,3-trimethyl-2-butanol 1-octanol 2-ethyl-1-hexanol 2-octanol 3-methyl-2-heptanol 3-methyl-3-heptanol 2,2,3-trimeth$-3-pentanol
9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
2.41 2.27 2.27 2.91 2.80 2.77 2.56 2.77 2.80 2.64 2.56 3.41 3.30 3.27 3.12 3.06 3.34 3.27 3.30 3.18 3.12 3.18 2.94 3.06 3.12 2.94 3.41 3.91 3.62 3.66 3.56 3.77 3.80 3.80 3.62 3.56 3.48 3.55 3.56 3.62 3.48 3.48 3.68 3.25 4.41 4.34 4.27 4.18 4.12 3.81
2.02 1.87 1.95 2.52 2.41 2.37 2.16 2.45 2.48 2.32 2.28 3.02 2.91 2.87 2.73 2.66 2.95 2.95 2.98 2.86 2.80 2.86 2.62 2.78 2.84 2.66 3.14 3.52 3.23 3.27 3.16 3.45 3.48 3.48 3.30 3.28 3.16 3.23 3.28 3.34 3.20 3.16 3.40 2.97 4.02 3.95 3.95 3.86 3.84 3.53
~
332 Environ. Sci. Technol., Vol. 22, No. 3, 1988
1
Fitted log S
3
XP
v
0.51 0.37 0.59 0.76 1.00 0.71 0.47 0.71 0.94 0.96 0.87 1.01 1.09 0.94 1.38 0.86 1.41 0.98 1.09 1.46 0.81 1.18 1.25 0.86 1.52 1.41 0.91 1.26 1.35 1.18 1.08 1.22 1.36 1.24 1.16 1.14 1.38 1.36 1.14 1.55 1.81 1.38 1.97 1.83 1.51 1.80 1.47 1.83 1.83 2.30
obsd log s
predicted log s
0.859 0.929 1.259 0.334 0.464 0.434 0.524 0.634 0.704 0.734 1.034 -0.231 -0.101 -0.131 -0.031 0.509 -0.161 0.129 0.189 0.269 0.199 0.299 0.369 0.499 0.619 0.599 0.104 -0.766 -0.456 -0.536 -0.486 -0.486 -0.376 -0.336 -0.316 -0.256 -0.096 -0.166 -0.026 0.064 0.154 0.204 0.194 0.344 -1.256 -0.996 -0.976 -0.606 -0.486 -0.156
0.799 0.899 1.173 0.289 0.455 0.405 0.596 0.638 0.648 0.830 1.017 -0.222 -0.085 -0.108 0.169 0.111 -0.070 0.131 0.119 0.320 0.268 0.269 0.549 0.460 0.515 0.695 0.108 -0.732 -0.393 -0.469 -0.405 -0.381 -0.388 -0.410 -0.225 -0.045 -0.028 -0.110 -0.045 -0.036 0.167 -0.028 -0.026 0.427 -1.243 -1.111
-0.891 -0.724 -0.541 -0.109
residue 0.060 0.030 0.086 0.046 0.009 0.029 -0.072 -0.003 0.057 -0.096 0.018 -0.009 -0.017 -0.023 -0.201 0.398 -0.091 -0.002 0.070 -0.052 -0.069 0.030 -0.180 0.039 0.104 -0.096 -0.003 -0.033 -0.063 -0.067 -0.080 -0.105 0.013 0.075 -0.091 -0.210 -0.067 -0.056 0.020
jackknifed r
0.101
0.980 0.979 0.978 0.980 0.980 0.980 0.980 0.980 0.980 0.980 0.979 0.980 0.980 0.980 0.982 0.986 0.981 0.980 0.981 0,980 0.981 0.980 0.981 0.980 0,980 0.980 0.980 0.979 0.980 0.980 0,980 0.980 0.980 0.980 0.980 0.982 0.981 0.980 0.980 0.981
-0.013 0.233 0.220 -0.083 -0.013 0.115 -0.085 0.118 0.055 -0.047
0.982 0.982 0.981 0.978 0.979 0.979 0.980 0.980 0.980
0.980
,
compound was found to be four standard deviations away from the mean of the residues. A review of the literature revealed another observation for this chemical: log S = -0.116 (14), which was also the value used by Hall et al. (10). When this value was substituted, r increased to 0.985 and SE dropped to 0.09. Another form of the jackknife test was applied to this data set where a random number of observations were deleted at a time, and a randomly selected number of regression runs were done with different members deleted every time, within each series, to check the validity of the correlation. To maintain stability, the number of deletions was kept below 10% of the total number of observations (8). Results shown in Table V indicate that the model is very stable statistically and that it is not due to a chance correlation. When the three data sets were combined and tested for correlation with modified polarizability 8, the following equations were obtained: set AB log S = 1.512 - 0.619'~ + 0.996
(5)
n = 95; r = 0.944; r2 = 0.890; adjusted r2 = 0.885; SE = 0.347
8 = -0.131(no. of C) + 0.082(no. of H)
-5
-4
-3
-2
-1
Fitted log S
d
1
Flgure 2. Comparison between observed and fitted log S . _________
Table V. Summary of Results of RandomDeletion Test no. of no. of cases deleted regression series (