Fragment Contribution Method for Predicting Soil

H The first-order molecular connectivity index (lx) has been successfully used to predict soil sorption coefficients. (K,) for nonpolar organics, but ...
1 downloads 0 Views 976KB Size
Environ. Sci. Technol. 1092, 26, 1580-1567

Molecular Topology/Fragment Contribution Method for Predicting Soil Sorption Coefficients Wllllam Meylan and Philip H. Howard Chemical Hazard Assessment Division, Syracuse Research Corporation, Merrill Lane, Syracuse, New York 13210 Robert S. Boethllng'

Office of Pollution Prevention and Toxics (TS-798), US. Environmental Protection Agency, 401 M Street SW, Washington, D.C. 20460 ~

~~~~~~

(lx) has been successfully used to predict soil sorption coefficients (K,) for nonpolar organics, but extension of the model to polar compounds has been problematic. To address this, we developed a new estimation method based on lx and a series of statistically derived fragment contribution factors for polar compounds. After developing an extensive database of measured K , values, we divided the dataset into a training set of 189 chemicals and an independent validation set of 205 chemicals. Two linear regressions were then performed. First, measured log KO,values for nonpolar compounds in the training set were correlated with 'x. The second regression was developed by using the deviations between the measured log KO,and the log K , estimated with the nonpolar equation and the number of certain structural fragments in the polar compounds. The final equation for predicting log K , accounts for 96% and 86% of the variation in measured values for the training and validation sets, respectively. Results also show that the model outperforms and covers a wider range of chemical structures than do models based on octanol/ or water solubility. water partition coefficient (KO,) H The first-order molecular connectivity index

Introduction The soil/sediment adsorption partition coefficient normalized to organic carbon, KO,, is an important parameter in environmental fate assessment for organic chemicals (1). This parameter provides an indication of the extent to which a chemical partitions between the solid and solution phases in soil, or between water and sediment in aquatic ecosystems, and indicates whether a chemical is likely to leach through soil or be immobile. It is important to know the value of K , for both new and existing chemicals under the Toxic Substances Control Act (TSCA). However, such data are often missing for these chemicals (in 12 years the EPA has never received a measured K , value for a new chemical submission, for example), and estimation methods must be employed to supply the missing parameters. Traditional methods for estimating KO,are based on statistical relationships between KO,and other properties, especially the octanol/ water partition coefficient (KO,), water solubility, and bioconcentration factor (1-3). Computerized chemical property estimation programs such as AUTOCHEM and CHEMEST ( 4 ) generally estimate KO,from KO,. KO,can also be estimated directly from chemical structure (5-9). Predicting KO,from chemical structure has advantages over methods based on KO,or water solubility (5, 6). For example, the predictive accuracy of methods using KO,or water solubility cannot exceed the accuracy of the input data. But since published values of water solubility and log KO,can vary greatly for the same compound (6), it may be difficult or impossible to find accurate experimental values. Moreover, most estimation equations that use KO,or water solubility are limited to specific classes of chemicals. For example, regression 1560 Environ. Scl. Technol., Vol. 26, No. 8 , 1992

equations are available for polynuclear aromatics, dichlorinated aromatics, triazines, dinitroaniline herbicides, chlorinated hydrocarbons, and other classes (1-3). But which equation does an estimator use when a compound falls into more than one class or none of the classes? Estimation equations applicable to widely diverse structures are available (1,lo),but the ? values are low (on the order of 0.7), and large estimation errors are likely. Estimating KO,from estimated KO,or water solubility introduces an additional error into any correlation equation. Sabljic (5, 6 ) developed a method of estimating soil sorption coefficients solely from chemical structure. The model is based upon a linear relationship between KO,,the "organic matter" partition coefficient, and the simple fiist-order molecular connectively index (MCI) lx plus an empirically determined correction factor for polar compounds. More recent work confirms the usefulness of 'x as an index of the tendency to bind to soil organic matter for nonpolar organics (81, but the correction factors for polar compounds have not proved satisfactory, and the method has never been validated with an independent set of chemicals. The primary purpose of this study was to develop a K , estimation method based on lx and a new series of fragment correction factors for polar compounds (MCI/fragment contribution method), to be derived from experimental K, data using multiple linear regression. We also developed an updated and greatly expanded database of measured K , values and used these to perform an independent validation of the method.

Methods Collection of KO,Data. Available compilations (5-8, 10,11) were used as starting points for developing a log K , database. Additional references to papers containing soil sorption data were obtained primarily from Syracuse Research Corporation's Environmental Fate Data Base (EFDB) (12,13). Nearly 800 experimental K , values were eventually located for entry into the database. Many chemicals in the database have K , values from more than one source. For example, benzene has nine different KO, values from as many sources. Although an effort was made to minimize duplication of KO,values, some redundancy undoubtedly exists. In various instances, experimental sorption data required conversion to KO,values, usually by dividing Freundlich sorption coefficients by the fraction of organic carbon in the soil. In addition, some sources report sorption coefficients normalized to organic matter content rather than organic carbon content. The relationship KO,= 1.724 KO, (1)was used to convert KO, to B K , in such cases. Selection of KO,Values for Regression and Validation Analyses. Model development and validation require a single log K, value for each compound. Usually the median value was used when more than one K , value was available for a given compound. However, we excluded some K , values based on the following considerations: (i)

0013-936X/92/0926-1560$03.00/0 8 1992 American Chemical Society

Values from certain sources were consistently high or low when compared to other database values for the same chemicals. In such cases, values from sources that are normally consistent with other sources were selected over values from sources that are consistently high or low. (ii) K , data based on high mineral soils or sediments with very low carbon content were not used when values from more common soils were available. (iii) K , values for salts (e.g., sodium d t s of organic acids) were excluded. Initially, data on 320 discrete organics were available from the experimental database. Prior to model development, these chemicals were divided into a training set of 189 chemicals and an independent validation set of 131 chemicals. The training set was divided further into a data set containing 64 nonpolar organics and 125 polar organics. The training and validation sets were both given a diverse mix of simple and complex structures, but many larger and more complex structures were placed in the validation set. Subsequently, data on an additional 74 discrete organics were located, and these chemicals were placed in the validation set, creating a combined validation set of 205 chemicals. Compilations of the K, values and references for both the training and validation sets are available from the authors. Nonpolar Regression. Two separate regression analyses were performed. The first regression related log KO,of nonpolar compounds to the first-order MCI, 'x. Measured values of log KO,were used in a simple linear equation of the form log K, = alx b, where a and b are the coefficients fitted by least-squares analysis. All but 2 of the 64 chemicals used in the nonpolar regression were used by Sabljic (6) in his regression of log K , vs 'x for 72 nonpolar compounds (we added benzene and lindane). The nonpolar training set includes a variety of halogenated and unhalogenated aromatics, polycyclic aromatic hydrocarbons, halogenated aliphatics, and phenols. Several classes of compounds considered to be nonpolar for purposes of model development require explanation. Phenols, polynuclear aromatic amines (e.g., 2-aminoanthracene), and polynuclear aromatics containing a single ring nitrogen (e.g., acridine) are so considered, but can ionize at environmental pH. Sorption of these compounds varies with the degree of dissociation (14-16). For example, the apparent log K, of 2,3,4,64etrachlorophenolcan vary from 2.59 (dissociated) to 3.90 (nondissociated) (14), depending upon the ratio of ionized to unionized compound. Nevertheless, most K, values in the database suggest that, for these compounds, log K, is adequately described by 'x alone. Therefore, compounds in these classes were included in the nonpolar regression. Polar Regression. The second regression was performed as follows. A 125 by 27 matrix was constructed, wherein the 125 rows represented the 125 chemicals in the polar training set, and the 27 columns represented 26 preselected structural fragments plus the solution (dependent variable). The solution column contained the deviations between measured log K , and log K, predicted by using the nonpolar regression equation, whereas counts of the structural fragments were entered into the other columns. A zero was entered whenever an individual chemical did not contain a specific fragment; otherwise, the number of occurrences of that fragment in the molecule was entered. Most of the 125 chemicals in the polar training set contain more than one type of fragment. The 26 fragments used in the regression are listed in Table I along with the coefficients fitted by the regression, which are the polarity or fragment correction factors (Pfvalues). Several fragments were suggested by polarity correction factors in ref 6, but we modified the fragments to allow

+

Table I. Fragments and Coefficients (Polarity Correction Factors; Pr)from Polar Regression c,eff fragment

Gr

P,

frr ruency i t !,raining set4

with N -1.028

aZ0

with N, C nitrilelcyanide nitrogen to noncyclic aliphatic carbon nitrogen to cycloalkane nitrogen to nonfused aromatic ring pyridine ring with no other fragments aromatic ring with 2 nitrogens triazine ring with N, 0 nitro with N, C, 0 urea (N-CO-N) acetamide, aliphatic carbon (N-CO-C) uracil (-N-CO-N-CO-C=Cring) N-CO-0-N= carbamate (N-CO-0-phenyl) N-phenyl carbamate (N-CO-0 or N-CO-S) with C, 0 ether, aromatic ether, aliphatic ketone ester aliphatic alcohol carboxylic acid carbonyl with P, 0 organophosphorus [P=O], aliphatic organophosphorus [P=O], aromatic with P, S organophosphorus [P=S] with C, S thiocarbonyl with S, 0 sulfone

-0.722b -0.124c -0.822 -0.777"c -0.700d -0.965 -0.752

2 2

44 3 69 2 2 7

-0.632

18

-0.922 -0.811 -1.806 -1.920 -2.002 -1.025

26

-0.643' -1.264 -1.248 -1.309 -1.519 -1.751' -1.200r

11 3 4

-1.698': -2.87gb

2 2

-1.263b

8

-1.100

2

-0.995

4

12

3 2 4 10

4 4 6

.,Number of compounds in the training set containing the fragment. Counted only once per structure, regardless of the number of occurrences. "ny nitrogen attached by a double bond is not counted; carbonyl and thiocarbonyl are not counted as carbons. dCounted only when no other fragments in this list are present. eEither one or both carbons aromatic; if both carbons aromatic, cannot be cyclic. /Not included in regression derivation; estimated from other carbonyl fragments; counted only when no other carbonyl-containing fragments are present. gThis is the only fragment counted, even if other fragments are present.

more general application. However, many new fragments were also identified, by studying how deviations between experimental K , values and K, values predicted using the nonpolar regression equation varied with chemical structure. In order to be considered as a possible fragment in the regression analysis, we stipulated that the fragment had to be present in at least two chemicals in the data set. To be considered an aromatic organophosphorus fragment, a compound has to contain the P=O group and an aromatic group; the aliphatic organophosphorus fragment must contain P 4 but cannot contain an aromatic group. An aromatic ether must have an aromatic carbon attached to at least one side of the oxygen; if both sides are aromatic carbons, the oxygen cannot be a member of a ring (e.g., tetrachlorodibenzodioxin). Several fragments are counted only once, no matter how many times they occur in a structure, since the regression analyses indicated that counting these fragments only once yields the best results. These include the organophosphorus,nitrogen to nonfused aromatic ring, organic acid, and nitrile/cyanide fragments. The aliphatic organophosphorus fragment is given special Environ. Sci. Technol., Vol. 26, No. 8, 1992

1561

treatment: it is the only fragment counted, even if other fragments are present. Software. All statistical operations were performed with Co-Stat Statistical Software (17) using an IBM-compatible 25-Mz 386 MS-DOS computer with numerical math coprocessor. Experimental log KO, values were taken from Hansch and Leo (18);recommended values were used when available. In the absence of experimental data, log KO, values were calculated by using the program CLOGP3.3, accessed via the US.EPAs PCGEMS (GEMS, Graphical Exposure Modelling System) software. The PCCHEM (PC version of AUTOCHEM) program, in PCGEMS, was used to calculate log KO,from log Kow. The molecular connectivity index (MCI)/fragment contribution method has been encoded in a computer program, PCKOC, designed to run on IBM and IBM-compatible personal computers. The only input required to estimate KO,is the chemical structure of the compound, which is entered by SMILES (simplified molecular identification and line entry system) notation (19). The program automatically determines lx and which (if any) of the designated fragments are present. The output includes 'x, the applicable fragments and correction factors, the noncorrected and corrected log K,, and the estimated K,. Results and Discussion K , Model. The equation derived by the nonpolar regression is log KO,= 0.53'~+ 0.62 (1) ( n = 64 r = 0.978 SD = 0.267 ME = 0.211 F = 1371)

The number of chemicals in the dataset (n),correlation coefficient (r),standard deviation (SD), mean error (ME), and F statistic are listed in parentheses. The parameter values in eq 1are very close to the revised values reported by Sabljic (20),if KO, is converted to KO,using the relationship KO,= 1.724K0, (1). The coefficient of multiple determination (R2)for the polar regression is 0.988, which indicates the proportion of total variation (differences between experimental log K, and log K, predicted by using the nonpolar regression equation) that is explained by the polar regression. Summary statistics for the correlation between measured and calculated log K, for the polar organics are r = 0.958, SD = 0.210, ME = 0.168, and F = 311 ( n = 125). Combination of eq 1 with the correction factors from Table I yields eq 2, which is the general equation used to estimate the log KO,of any compound: log KO, = 0.53lx + 0.62 + CPfN (2) where CPfN is the summation of the products of all applicable correction factors from Table I1 multiplied by the number of times (N) that fragment occurs in the structure, except for the organophosphorus, nitrogen to nonfused aromatic ring, organic acid, and nitrilelcyanide fragments, which are counted only once. In addition, the aliphatic organophosphorus fragment is the only fragment counted in structures containing it. The combined training set includes 189 compounds. Summary statistics for the correlation of experimental vs calculated log KO,for the 189 compounds are r = 0.977, rz = 0.955, SD = 0.230, and ME = 0.182. Validation. To be effective, an estimation method must be capable of making accurate predictions for chemicals not included in the training set. Therefore, data on 131 compounds were set aside before model development for use in validation, and 74 additional compounds were later 1582

Envlron. Scl. Technol., Vol. 26, No. 8, 1992

Table 11. Validation Set-Chemicals with No Correction Factors" compound

expb

calc'

MCId

biphenyl 2,2',4-PCB 2,2',5-PCB benzo[a]pyrene benz[a]anthracene fluorene fluoranthene phenol 2-chlorophenol 3-chlorophenol 4-bromophenol 3,4-dichlorophenol 3,5-dimethylphenol 2,3,5-trimethyIphenol p-cresol 5-indanol styrene o-xylene n-propylbenzene aldrin a-BHC (benzene hexachloride) 8-BHC (benzene hexachloride) a-chlordane mirex 1,3-dichloropropene iodobenzene 2-aminoanthracene 1-methylnaphthalene 1-ethylnaphthalene 2-ethylnaphthalene 1-naphthylamine 6-aminochr ysene carbazole 7H-dibenzo[cg]carbazole

3.27 4.84 4.57 5.95 5.30 3.85 4.62 2.40 2.60 2.54 2.41 3.09 2.83 3.61 2.70 3.40 2.96 2.25 2.87 4.69 3.30 3.50 4.77 6.00 1.75 3.10 4.50 3.36 3.78 3.76 3.51 5.20 3.40 6.03 6.02 3.10 4.11 4.64 3.49

3.80 4.44 4.44 5.90 5.36 4.05 4.85 2.43 2.65 2.64 2.64 2.86 2.85 3.07 2.64 3.21 2.71 2.65 2.98 5.02 3.53 3.53 4.94 5.67 1.91 2.43 4.52 3.48 3.77 3.76 3.48 5.59 4.05 6.16 6.16 3.26 4.31 4.32 3.00

5.966 7.182 7.182 9.916 8.916 6.449 7.949 3.394 3.805 3.788 3.788 4.198 4.182 4.609 3.788 4.860 3.932 3.805 4.432 8.276 5.464 5.464 8.114 9.500 2.414 3.394 7.327 5.377 5.915 5.898 5.377 9.343 6.449 10.416 10.416 4.966 6.933 6.949 4.466

13H-dibenzo[a,i]carbazole quinoline acridine benzolflquinoline benzo[b]thiophene

"Chemicals with residuals of 2lf1.001 are listed in Table IV. blog K, derived from experimental measurements. "log K , calculated from ea 2. First-order molecular connectivity index ('XI.

added to the validation set. The validation data set included a diverse selection of chemical structures that we expected would rigorously test predictive accuracy. It contained many chemicals that are similar in structure to chemicals in the training set, but also many chemicals that are different from and structurally more complex than chemicals in the training set. Table I1 lists the experimental and calculated log KO, values for the nonpolar validation chemicals, and Table I11 lists the same data for the polar validation chemicals (except for chemicals with residuals 1 I&l.OOl, which are listed in Table IV). Statistical performance for calculated vs experimental log KO,for the validation was as follows:

n r r2

SD ME

nonpolar validation chemicals

polar validation chemicals

combined validation data set

41 0.930 0.865 0.444 0.269

164 0.895 0.801 0.465 0.363

205 0.925 0.856 0.462 0.344

Figure 1illustrates the correlation between calculated and experimental log KO,for the combined dataset. These results affii the predictive accuracy of the MCI/fragment contribution method.

8- ,

,

I

,

,

7 -

1

6 -

543 2 1 -

0-1

-

0

,

I

1

2

,

3

4

5

I

6

7

8

Log KO,- Experimental Figure 1. log K , calculated by eq 2 vs experimental log KO, for the validation data set.

Comparison to Other Estimation Methods. Numerous regression equations have been developed that correlate log K , and log KO,,but most of these were developed for specific classes of compounds (1-3). Equation 3 ( l ) is , the only equation implemented in the widely used log KO,= 0.544 log KO, + 1.377 (3) chemical property estimation programs AUTOCHEM and PCCHEM (PC version of AUTOCHEM) for estimating log K, (21). Equations 3 and 4 (where WSol is water solubility log KO,= -0.55 log WSol + 3.64 (4) in mg/L) are the equations most frequently selected by users of CHEMEST, a mainframe system for estimating chemical properties that is based on the methods described in Lyman et al. ( I ) . Both equations were derived from K, data on a wide variety of compounds (most were pesticides). These equations are often preferred in chemical assessment activities because it may be unclear which, if any, of the class-specific equations are applicable to a given structure. For example, chemicals for which premanufacture notices (PMNs) must be submitted under TSCA typically do not fall conveniently into any of the classes for which class-specific estimation methods exist. Summary statistics for the correlation between calculated and experimental log K, are as follows: n r r2

SD ME

eq 3

eq 4

this paper

202 0.788 0.621 0.759 0.653

122 0.847 0.718 0.661 0.608

205 0.925 0.856 0.462 0.344

log K, could not be calculated using eq 3 for three chemicals in the dataset because experimental log KO, values were not found, and CLOGP3.3 produced incomplete estimates due to missing fragments. For eq 4,83 compounds out of 205 were excluded because their water solubilities were not readily available from the literature. Most of the water solubility data used in these calculations were taken from common sources such as the Pesticide Manual (22) and the Arizona Database of Aqueous Solubility (23). Under these conditions, the MCI/fragment contribution method is clearly a much better predictor of KO,,as demonstrated not only by the higher correlation coefficient but

also by the lower standard deviation and mean error. Sabljic ( 5 ) was the first to show that soil sorption potential was highly correlated with 'x for nonpolar compounds, and the first to develop a series of polarity correction factors (6). Revised values for the correction fadors were subsequently published (20). However, for two principal reasons we did not attempt to calculate KO,(or Kom)for our validation set using his methods. First, many of the chemical classes for which polarity correction factors were developed are too vaguely defined to be useful. Second, for many chemicals in our validation set it was difficult to determine what group of polar compounds to assign the chemical to since most had more than one type of fragment. It should be noted that for many polar compounds in the validation set, no correction factors are available in refs 6 or 23. Outliers. The experimental database contains 19 chemicals that yield estimation errors of 1order of magnitude or greater when KO, is predicted by the MCI/ fragment contribution method. Sabljic (6) excluded three of these outliers (1,2,3,4- and 1,2,4,5-tetrachlorobenzene and 2,3,4,54etrachlorophenol) because the laboratories reporting the measurements tended to report KO,values that were consistently higher than those reported by other investigators. We also excluded those compounds. Five other compounds (3,6-dichloropicolinic acid; malathion; methylamine; dimethylamine; and trimethylamine) were excluded because the accuracy of the experimental KO, values is questionable. In the case of malathion, for example, Rf values of 0 . 8 0 . 9 1 from tests using a variety of soils (24, 25) and the recommended log KO, of 2.36 (18) both suggest moderate to high mobility, in contrast to the sole log K, value of 3.25 (11). The other 11 compounds, which were included in the validation set, are listed in Table IV along with possible reasons for the observed deviations. It seems likely that the deviations observed for several of the chemicals in Table IV (4-aminobenzoic acid; 2,2'biquinoline; Dowco 275; and isopropalin) result at least in part from ionization effects. As noted earlier, the apparent KO, of ionizable structures such as phenols, acids, and compounds with aromatic nitrogen can be sensitive to pH in the environmental range, and several researchers have reported large variations in KO, for these compounds (14-16). Unfortunately, it is not now possible to quantitatively consider pH and pK, in most estimation methods because sufficient data are not available. This does not signify that either the MCI/fragment contribution method or experimental value is wrong; it simply indicates that our method does not account for all variables that may possibly affect K,. Of course, neither does any other method for estimating KO,. PCKOC, the computer program that calculates log KO,from SMILES structure entry using our method, addresses this problem by displaying a warning message whenever the program detects an aromatic hydroxyl group, an organic acid, or an aromatic nitrogen. In some cases, the deviations reported in Table IV could be reduced or eliminated with additional correction factors, for fragments that do not presently have such factors. Possible examples of compounds containing such fragments are methazole, metribuzin, napropamide, pronamide, and 2,3,7,8-tetrachlorodibenzo-p-dioxin. Our approach was to require that a fragment be present in at least two compounds before it could be considered for development of a correction factor. It may also be possible to identify new correction factors by examining the 74 chemicals added to the validation set after the initial assignment of chemicals. The lower r2 value observed for Environ. Sci. Technol., Vol. 26, No. 8, 1992

1563

Table 111. Validation Set-Chemicals with Correction Factors" compound acetanilides 3-methylacetanilide 3-(trifluoromethy1)acetanilide 4-methoxyacetanilide 9-acetylanthracene acids acetic acid anthracene-9-carboxylic acid 3,6-dichlorosalicylic acid benzoic acid 4-hydroxybenzoic acid 4-methylbenzoic acid 4-nitrobenzoic acid 3,4-dinitrobenzoic acid hexanoic acid phenylacetic acid phthalic acid alcohols methanol ethanol 1-propanol 1-butanol 1-pentanol 1-hexanol 1-heptanol 1-octanol 1-nonanol 1-decanol sec-phenethyl alcohol anilines 4-chloroaniline 3,4-dichloroaniline 3,5-dichloroaniline 3,5-dinitroaniline N&-dimethylaniline N-methylaniline 3-methyl-4-bromoaniline 2,3,4,5-tetrachloroaniline 2,3,4-trichloroaniline 3-(trifluoromethy1)aniline aldicarb sulfone asulam azinphos methyl benzamides benzamide 2-chlorobenzamide 2-nitrobenzamide 3-nitrobenzamide 4-nitrobenzamide 3,5-dinitrobenzamide 4-methylbenzamide N-methylbenzamide benzidine BMPC 1-butylamine butyranilide butyl benzyl phthalate captafol captan carbamates methyl N-(3-chlorophenyl) methyl N-(3,4-dichlorophenyl) carbophenothion chlorfenvinphos chlornitrofen 6-chloro-2,4-diaminomethylthiopyrimidine chloroneb 3-chloro-4-bromonitrobenzene chloroxuron chlorpropham crotoxyphos cycloate diamidaphos diazinon dicamba 1564

Environ. Sci. Technol., Vol. 26, No. 8, 1992

fragmentsb

expc

calcd

MCI"

5, 11 5, 11 5, 11, 16 18

1.45 1.75 1.40 3.58

1.79 2.43 1.43 3.77

5.182 6.393 5.720 8.271

21 21 21 21 21 21 9, 21 9, 21 21 21 21

0.00 2.67 2.30 1.50 1.43 1.77 1.54 1.53 1.46 1.45 1.07

-0.21

3.27 1.82 1.16 1.37 1.37 1.22 1.29 0.88 1.42 1.87

1.732 8.271 5.537 4.305 4.715 4.715 5.626 6.947 3.770 4.788 5.626

20 20 20 20 20 20 20 20 20 20 20

0.44 0.20 0.48 0.50 0.70 1.01 1.14 1.56 1.89 2.59 1.50

4.36 -0.14 0.12 0.39 0.65 0.92 1.19 1.45 1.72 1.98 1.39

1.000 1.414 1.914 2.414 2.914 3.414 3.914 4.414 4.914 5.414 4.305

1.96 2.29 2.49 2.55 2.26 2.28 2.26 3.03 2.60 2.36 0.50 2.48 2.28

1.86 2.08 2.07 1.77 1.89 1.81 2.08 2.52 2.31 2.50 0.88 2.52 1.84

3.788 4.198 4.182 6.003 4.305 3.932 4.198 5.037 4.626 4.999 6.205 6.954 9.093

22 22 9, 22 9, 22 9, 22 9, 22 22 3, 22 5 3, 14 3 5, 11 19 11 11

1.46 1.51 1.45 1.95 1.93 2.31 1.78 1.42 3.46 1.71 1.88 1.71 4.23 3.32 2.30

1.71 1.93 1.78 1.78 1.78 1.85 1.93 1.87 3.44 2.32 1.78 2.13 3.97 3.44 2.94

4.305 4.715 5.626 5.626 5.626 6.947 4.715 4.843 6.754 7.185 2.414 5.826 11.220 8.343 7.400

5, 15 5, 15 25 24 9, 16 5, 7 16 9 3, 5, 10, 16 5, 15 19, 24 3, 4, 15 3, 24 7, 25 5, 21

2.15 2.74 4.66 2.47 3.90 2.08 3.10 2.60 3.55 2.80 2.00 2.54 1.51 2.75 1.50

1.86 2.08 3.93 2.77 4.12 1.62 2.36 2.72 3.11 2.32 1.70 2.26 0.79 3.13 1.46

5.720 6.130 8.593 9.453 8.969 5.147 5.685 5.109 9.542 6.575 9.898 6.791 6.200 8.898 6.075

5 5 5 5, 9 3, 5 3, 5 5 5 5 5 3, 13, 27 5, 15 1, 3, 22, 25

Table I11 (Continued) compound 3,3'-dichlorobenzidine dieldrin diethylacetamide di-2-ethylhexyl phthalate diflubenzuron dimeton-S-methyl dimethoate 2,6-dinitro-a,a,a-trifluoro-p-toluidine 2,6-dinitro-N-propyl-a,a,a-trifluoro-p-toluidine diphenylamine diphenyl ether dipropetryn disulfoton DNOC (dinitro-o-cresol) EPN esters benzoic acid butyl ester benzoic acid ethyl ester benzoic acid methyl ester benzoic acid phenyl ester ethyl 4-hydroxybenzoate ethyl 4-nitrobenzoate ethyl 3,5-dinitrobenzoate ethyl 4-methylbenzoate ethyl phenylacetate ethyl pentanoate ethyl hexanoate ethyl heptanoate ethyl octanoate ethion ipazine maleic hydrazide methabenzthiazuron methidathion methomyl methoxychlor methyl chloramben metolachlor mo1inate neburon 4-nitrophenol norflurazon oxadiazon permethrin phenazine 3-phenyl-1-cyclohexylurea 3-phenyl-1,l-dimethylureas 3-chlorophenyl 3,5-dimethylphenyl 3,5-dimethyl-4-bromophenyl 3-fluorophenyl 4-fluorophenyl 3-phenyl-1,l-dimethylureas 3-methoxyphenyl 4-methoxyphenyl 4-methylphenyl phenylureas 3-chloro-4-methoxyphenylurea 3,4-dichlorophenylurea 3-methyl-4-bromophenylurea 3-methyl-4-fluorophenylurea 4-phenoxyphenylurea 3-(trifluoromethy1)phenylurea phosalone piperophos pirimicarb profluralin prometon pronamide propachlor propylene glycol methyl ether acetate pyrazon pyroxychlor quintozene secbumeton

fragmentsb

expC

MCI'

5 17 11 19 5, 10, 22 23 3, 11, 25 5, 9 3, 5, 9 5 16 3, 5, 8 25 9 9, 25

4.35 4.10 1.84 4.94 3.83 1.49 1.20 2.56 3.61 2.78 3.29 3.07 3.22 2.41 3.12

3.87 4.03 1.54 5.22 3.03 1.95 1.39 2.65 3.34 3.28 3.41 3.10 2.91 2.78 4.07

7.575 8.776 3.719 13.556 9.969 5.682 5.576 7.642 9.180 6.449 6.449 8.007 6.682 6.430 10.049

19 19 19 19 19 9, 19 9, 19 19 19 19 19 19 19 25 3, 5, 8 11 3, 10 3, 15, 17, 25 3, 13 16 5, 19 3, 5, 11, 17 3, 15 3, 5, 10 9 3, 5, 11 5, 15, 16 16, 19 7 4, 5, 10

2.10 2.30 2.10 3.16 2.21 2.48 2.74 2.59 1.89 1.97 2.06 2.61 3.02 4.06 2.91 0.45 2.81 1.53 1.30 4.90 2.71 2.46 1.92 3.40 2.37 3.28 3.51 4.80 3.37 2.07

2.69 2.16 1.89 3.23 2.36 2.22 2.28 2.36 2.41 1.61 1.87 2.14 2.40 4.12 2.74 0.31 3.29 0.96 1.08 4.63 1.76 2.46 2.46 2.95 2.49 3.75 3.54 5.25 3.34 2.27

6.343 5.343 4.843 7.360 5.736 6.647 7.985 5.753 5.826 4.308 4.808 5.308 5.808 8.593 7.562 3.788 7.220 7.542 4.702 9.952 6.058 9.061 5.843 8.041 4.698 9.342 10.091 12.375 6.933 7.843

3, 5, 10 3, 5, 10 3, 5, 10 3, 5, 10 3, 5, 10

1.79 1.73 2.53 1.73 1.43

1.92 2.12 2.35 1.92 1.92

6.092 6.486 6.914 6.092 6.092

3, 5, 10, 16 3, 5, 10, 16 3, 5, 10

1.72 1.40 1.51

1.56 1.56 1.92

6.630 6.630 6.092

5, 10, 16 5, 10 5, 10 5, 10 5, 10, 16 5, 10 3, 5, 14, 25 23 3, 5, 7, 15 3, 5, 9 3, 5, 8, 16 3, 22 3, 5, 11 17, 19 3, 5, 11 16 9 3, 5, 8, 16

2.00 2.49 2.37 1.78 2.56 1.96 2.63 3.44 1.90 3.93 2.60 2.30 2.42 0.36 2.08 3.48 4.30 2.78

1.54 1.90 1.90 1.90 2.66 2.32 1.77 3.63 1.52 4.26 2.20 3.20 2.45 0.26 2.74 3.13 3.38 2.29

6.130 5.592 5.592 5.592 8.237 6.393 9.987 10.021 7.824 11.146 7.507 7.337 6.664 4.164 7.198 5.931 6.375 7.689

Envlron. Sci. Technol., Vol. 26, No. 8, 1992

1585

Table I11 (Continued) log K , compound silvex su1fones SD13207 (dinitroamino) SD12030 (dinitroamino) SD12346 (dinitroamino) tebuthiuron terbuthylazine tetrachloroguaiacol thiobencarb triallate trichlorfon trichloroacetamide 4,5,6-trichloroguaiacol 3,5,6-trichloro-2-pyidinol urea veratrole

fragmentsb

expC

calcd

MCI'

16, 21

1.75

1.91

6.914

3, 5, 9, 27 3, 5, 9, 27 3, 5, 9, 27 3, 10 3, 5, 8 16 3, 15 3, 15 23 11 16 6 10 16

2.47 2.86 3.07 2.79 2.32 2.85 3.27 3.35 1.90 0.99 2.80 2.11 0.50 2.03

2.76 3.29 3.56 2.98 2.52 3.17 3.43 3.22 1.73 1.38 2.94 2.37 0.62 1.93

10.189 11.189 11.689 6.858 6.904 6.002 7.668 7.269 5.276 2.943 5.575 4.609 1.732 4.881

"Chemicals with residuals of Llf1.001 are listed in Table IV. *Applicable fragments from Table I. "og K , derived from experimental measurements. dlog K , calculated from eq 2. e First-order molecular connectivity index ('x). Table IV. Validation Set-Compounds with Large Deviations between Measured and Calculated log K, compound 4-aminobenzoic acid 2,2'-biquinoline 1-dodecanol Dowco 275 folpet isopropalin methazole metribuzin napropamide pentafluorophenyl methyl sulfone 2,3,7,8-TCDD(dioxin)

1% K, fragmentsu expb calc'

5, 21 none

2.05 0.59 4.02 5.89 20 3.52 2.51 25 2.41 3.41 22 3.27 2.16 3, 5, 9 4.88 3.83 3, 5, 10, 15 3.42 1.76 3, 11 1.98 3.08 3, 11, 16 2.75 4.06 27 none

Ad

commente

1.46

1 1, 2

-1.87 1.01 -1.00

1.09 1.05 1.66

-1.10 -1.31

1.46 3.18 -1.72 6.50 5.17 1.33

1, 4 2, 4 1, 3 2 2 2 2, 4 2

Applicable fragments from Table I. log K, derived from experimental measurements. "og K, calculated from eq 2. d A is experimental minus calculated. e 1, ionizes; 2, new correction factor needed?; 3, experimental value wrong?; 4, data source contains values that are consistently lower than values from other sources. (I

the validation set (0.856) as compared to the training set (0.955) probably reflects at least in part the addition of these chemicals. Finally, our results highlight a general concern about K , estimation. It is that, unlike virtually all other methods for predicting physical and chemical properties for environmental assessment, in KO, measurement important experimental parameters such as soil type have not been standardized. Sorption has been measured with many different soils having widely varying characteristics, resulting in diverse values for the soil sorption coefficient even when that parameter is normalized to organic carbon content as K,. This may derive in from factorssuch as pH effects on ionizable groups, which at least in theory may be corrected for, but it may also derive from factors that are less well understood, such as differing sorption by different types of organic matter (26).The standardization of test conditions for K,-_measurement is therefore a worthy goal. Conclusions

Even Withthe many qualifying statements that must be appended to the database Of measured Koc the present method produces estimates that correlate well with experimental values. Comparison of the MCI/fragment contribution method with methods based on log KO, or 1566

Envlron. Scl. Technol., Vol. 26, No. 8, 1992

water solubility demonstrates that our method not only produces more accurate estimates or K , but is also easier to use, since measured or estimated KO,or water solubility values are not needed. Further, our method is more comprehensive in its coverage of structurally diverse organic compounds and better validated than other K , estimation methods. These characteristics, combined with the availability of convenient software for executing the model, should make it the method of choice for rapid screening of chemical substances. Areas in which the model could be improved include better accounting for the effects of ionization and applicability to a still wider variety of organic compounds. Both await further research.

Literature Cited (1) Lyman, W. J. In Handbook of Chemical Property Estimation Methods, 2nd ed.; Lyman, W. J., Reehl, W. F., Rosenblatt, D. H., Eds.; American Chemical Society: Washington, DC, 1990; Chapter 4. (2) Karickhoff, S. W. In Environmental Exposure from Chemicals; Neely, W. B., Blau, G. E., Eds.; CRC: Boca Raton, FL, 1985; Vol. I, p p 49-64. (3) Green, R. E.; Karickhoff, S. W. In Pesticides in the Soil Environment: Processes, Impacts, and Modeling; Cheng, H. H., Ed.; Soil Science Society of America: Madison, WI, 1990; p p 79-101. (4) Boethling, R. S.; Campbell, S. E.; Lynch, D. G.; LaVeck, G. D. Ecotoxicol. Environ. Saf. 1988, 15, 21. (5) Sabljic, A. J. Agric. Food Chem. 1984, 32, 243. (6) Sabljic, A. Environ. Sci. Technol. 1987, 21, 358. (7) Koch, R. Toxicol. Environ. Chem. 1983, 6, 87. (8) Bahnick, D. A.; Doucette, W. J. Chemosphere 1988,17,1703. Saegusa, H. Bull. Chem. SOC. J p n . 1989,62, (9) Okouchi, 922. (10) Kenaga, E. E. EcotoxicoL Environ. Saf. 1980,4, 26. (11) Rao, P. S. C.; Davidson, J. M. Retention and transformation of selected pesticides and phosphorus in soil-water systems: A critical review; EPA-600/S3-82-060; US. E P A Athens, GA, 1982. (12) Howard, P. H.; Sage, G. W.; LaMacchia, A.; Colb, A. J. Chem. Znf.ComDut. Sci. 1982.22. 38. (13) Howard, P. H.; Hueber, A. E.; Mulesky, B. C.: Crisman, J. S.; Meylan, W. M.; Crosbie, E.; Gray, D. A.; Sage, G. W.; Howard, K. P.; LaMacchia, A.; Boethling, R.; Troast, R. Environ. Toxicol. Chem. 1986, 5, 977. (14) Lagas, p. Chemosphere 1988, 17, 205, (15) Schellenberg, K.; Leuenberger, C.; Schwarzenbach, R. P. Environ. Sci. Technol. 1984, 18, 652. (16) Zachara, J. M.; Ainsworth, C. C.; Cowan, C. E.; Thomas, B. L. Environ. Sci. Technol. 1987, 21, 397.

s.;

Environ. Sci. Techno/. lQQ2,26, 1567-1573

CoHort Software. CoStat Statistical Software, Version 4.00; CoHort Software: Berkeley, CA, 1990. (18) Hansch, C.; Leo, A. MEDCHEM Project, Issue No. 26; Claremont College: Pomona, CA, 1985. Weininger, D. J. Chem. Inf. Comput. Sci. 1988, 28, 31. Sabljic, A. EHP, Environ. Health Perspect. 1989,83, 179. Graphical Exposure Modeling System (GEMS) User’s Guide, Vol. 5: Estimation, Appendix E: AUTOCHEM; Task 3-2 under U.S.EPA Contract 68-02-3970;U.S.Environmental Protection Agency: Washington, DC, 1986. (22) Worthing, C. R., Walker, S. B., Eds. The Pesticide Manual, 8th ed.; Lavenham: Suffolk, England, 1987.

(23) Yalkowsky, S. H. ARIZONA dATAbASE of Aqueous Solubility; College of Pharmacy, University of Arizona: Tucson, AZ, 1989. (24) Khan, S.; Kahn, N. N. Soil Sci. 1986, 142, 214. (25) Sharma, S. R.; Singh, R. P.; Ahmed, S. R. Ecotoxicol. Environ. Saf. 1986, 11, 229. (26) Jota, M. A. T.; Hassett, J. P. Environ. Toxicol. Chem. 1991, 10. 483. Received for review October 9,1991. Revised manuscript received April 1,1992. Accepted April 16,1992.

Quantitative Structure-Property Relationships for Aqueous Solubilities and Henry’s Law Constants of Polychlorinated Biphenyls Frank M. Dunnlvant and Alan W. Elzerman’

Environmental Systems Engineering, Clemson University, Clemson, South Carolina 26934-09 19 Peter C. Jurs and Mohamed N. Hasan

Department of Chemistry, 152 Davey Laboratory, The Pennsylvania State University, University Park, Pennsylvanla 16802

rn Quantitative structure-property relationship (QSPR) models have been developed which accurately calculate the congener-specificaqueous solubilities (S)and Henry’s Law constants (HLCs) of polychlorinated biphenyls (PCBs). QSPRs were generated based on molecular models which were sensitive to slight changes in chemical structure. PCB aqueous solubilities were found to be a function of total surface area, melting point, and third shadow area. Observed HLCs were a function of the second moment of inertia, path-four connectivity index, path-three K index, and the second and third principal polarizabilities. These newly developed models agree well with other model predictions and expand, as well as complement, previous approaches because the new models are capable of accurately calculating S and HLC for structurally similar PCB congeners. Results from this work provide a better understanding of the chemical and structural characteristics governing the solubility of hydrophobic compounds.

Introduction Quantitative structure-activity relationships (QSARSs) are a well-established tool in pharmacology and drug development and are used to predict the biological effects (bioaccumulation, enzyme induction, biodegradation, toxicity) resulting from exposure of organisms to specific compounds under controlled conditions. However, for the estimation of chemical parameters, such as the aqueous solubility (S) or Henry’s Law constant (HLC), it is more correct to refer to the descriptor(s) as quantitative structure-property relationships (QSPRs) since evaluation of biological response or activity is not included or implied. QSPRa have recently been applied to predicting chemical and physical parameters of hydrophobic compounds and are rapidly becoming a useful tool in environmental chemistry. Several QSPR-based models are available for predicting aqueous solubilities, HLCs, and environmental partition coefficients of organic contaminants (1-12), with some of these approaches allowing the predictive modeling of each chemical parameter (e.g., S, HLC, etc.) directly from structural information. However, many of these QSPRbased models are unable to account for slight structural 0013-936X/92/0926-1567$03.00/0

changes between isomers and homologues, or this aspect of the models has not been adequately evaluated. The importance of highly specific QSPRs becomes obvious when the fate and transport of structurally similar compounds are considered, especially in the case of polychlorinated biphenyls (PCBs) (1,13-15). PCBs were released into the environment in the form of multicongener mixtures referred to as Aroclors. Each Aroclor (e.g., 1016, 1254,1260, etc.) contains a general mixture of congeners, and this “fingerprint of contamination” (based on relative amounts of major congener composition) has been used to estimate the type of Aroclor(s) present in environmental samples (16,17). However, numerous field investigations have shown that Aroclors extracted from environmental samples contain significantly different congener compositions than those present in the original Aroclor(s) (13, 14, 18). This change in congener composition has been attributed to selective microbial degradation/dechlorination (19-23) and chemical/physical partitioning processes (13-15, 18). Thus, in order to effectively evaluate the relative importance of congener-specificpartitioning processes, accurate values of aqueous solubilities, HLCs, and solid/aqueous partition coefficients must be measured or predicted for each congener. The purpose of this investigation was to utilize experimentally determined S and HLC data from previous investigations (15, 24, 26) in the development of QSPR models sensitive to slight changes in molecular structure such as those occurring in the homologous series of PCBs (e.g., dichlorobiphenyl, trichlorobiphenyl, etc.). A comparison of literature data reveals considerable scatter among solubilities and HLCs for identical PCB congeners. Therefore, use of an internally consistent data set, obtained under similar conditions, is necessary in order to develop a QSPR model which does not contain errors resulting from interlaboratory sources or from use of different analytical procedures (25). The data utilized here were generated for a previous, but unsuccessful, attempt to develop predictive models based solely on readily published QSPR parameters. Recent collaborations between Clemson University and The Pennsylvania State University have allowed the successful prediction of aqueous

0 1992 American Chemical Scjclety

Environ. Sci. Technol., Vol.

26, No. 8, 1992

1567