Environ. Sci. Technol. 1989, 23,672-679
Screening-Level Model for Aerobic Biodegradability Based on a Survey of Expert Knowledge Robert S. Boethllng*vt and Aleksandar Sablji8 U.S. Environmental Protection Agency, Office of Toxic Substances (TS-798), Washington, DC 20460, and Rudjer BogkoviE Institute, 4 1001 Zagreb, Croatia, Yugoslavia 8 A model was developed for classifying untested chem-
icals as readily or not readily biodegradable in receiving waters. The training set consisted of the collective judgments of 22 biodegradation experts as to the approximate time that might be required for aerobic ultimate degradation (AERUD) of 46 highly diverse chemicals. The fiist of two model components is a linear equation relating AERUD and three variables: the molecular connectivity indexes 2Xuand 4Xw,the latter normalized to molecular weight, and the number of covalent chlorine atoms, normalized to molecular weight. The second component is a series of correction factors designed to account for important structural influences not reflected by the other variables, such as the presence of hydrolyzable groups. The two-component model accounts for 88.8% of the variance in AERUD. Based on environmentally relevant experimental data, the model correctly classified 36 of 40 chemicals (90%) in two prediction sets.
Introduction The US. Environmental Protection Agency (EPA) annually reviews more than 2000 premanufacture notices (PMNs) for potential ecological and human health effects and exposure. For most chemicals, consideration of biodegradability is critical to the exposure assessment. New chemicals in initial review are scored for likely biodegradability in wastewater treatment and in receiving waters under both aerobic and anaerobic conditions. Under the Toxic Substances Control Act (TSCA) the EPA is also obligated to determine the risk from exposure to the thousands of chemicals already in commerce. The TSCA Interagency Testing Committe recently completed its sixth scoring exercise ( I ) , which involved the review of hundreds of chemicals by panels of experts for potential effects and exposure. These and other EPA activities create a continuing need for rapid methods capable of assisting in the assessment process. We recently conducted a survey in which 22 experts in the microbial degradation of xenobiotic chemicals were asked to estimate the biodegradability of 50 organic chemicals that approximately represent the range of structures found in PMN chemicals. The purpose of the survey was to determine the feasibility of developing a prototype expert system for predicting biodegradability at the screening level. Such a system would be advantageous because few PMNs (probably fewer than 1 in 100) contain biodegradation data, and most PMN chemicals are sufficiently complex that their biodegradability cannot be estimated by direct structural analogy with chemicals that have been studied. Although the survey chemicals constituted a highly diverse group with respect to molecular mass and structure, there was substantial consensus on both pathways and 'US.Environmental Protection Agency.
* Rudjer BoijkoviE Institute.
Symbols: REM, removal by biodegradation in wastewater treatment; AERUD, aerobic ultimate degradation in receiving valence waters; nC1, number of covalently bound chlorine atoms; 2Xu, second-order molecular connectivity index; 4X, = fourth-order path/cluster molecular connectivity index. 872
Envlron. Sci. Technoi., Vol. 23, No. 6, 1989
rates for most of the chemicals. From this information, a number of generalizations regarding the effects of chemical structure on biodegradability were derived. Among the negative influences are molecular mass, alkyl branching, halogenation, and nitrogen heterocycles. Positive influences include hydrolyzable groups, hydroxyl or carboxylic acid groups, and linear alkyl chains. Using these results, we have developed a model for predicting aerobic ultimate biodegradability in receiving waters. The model is based on linear statistical modeling with topological indexes as predictors and a series of simple rules designed to account for important structural influences not reflected by the indexes. We view this model primarily as a tool for rapid preliminary classification of untested chemicals as either rapidly or slowly biodegradable.
Methods Biodegradation Survey. The survey chemicals are listed in Table I, and their structures are shown in Figure 1. Chemicals were selected to represent the spectrum of organic chemicals subject to review by EPA. The range of structures and the number within a given class grossly approximate the relative numbers of PMN submissions for each class. Polymers are the most significant exception, since there were only two such chemicals in the survey, whereas polymers represent roughly 50% of all PMN submissions. Two criteria were employed in selecting potential panel members: (i) established eminence in the study of biodegradation of xenobiotic chemicals, based mainly on consistent contribution to the primary scientific literature over a number of years; (ii) area of specialization. We tried to construct a panel with as much technical diversity as possible. We also sought to include experts from the academic sector, industry, and government. Survey participants were presented with a package that included a questionnaire page for each of the 50 chemicals. At the top of each page was an arbitrary number assigned to the chemical, its name, structure, and molecular weight, and measured values for solubility in water and octanol/water partition coefficient, where available. Below the molecular structure, the questionnaire contained a series of questions (parts A and B) that solicited information on relative rates of biodegradation under a range of conditions. The content and format of this section followed closely that currently employed by EPA in screening-level PMN review. Below this were three free-form questions soliciting information on likely site(s) of initial microbial attack, initial and important intermediary metabolites, and general comments on biodegradation of the chemical (parts C-E) . Participants were asked to rate removal by biodegradation in typical wastewater treatment systems (REMS) as high, intermediate, or negligible. For aerobic ultimate degradation in receiving waters (AERUD), participants were asked to check one or more of a series of boxes marked days, weeks, months, and longer, to indicate the approximate time they thought would be required for the process to proceed to completion. As a measure of central
0013-936X/89/0923-0672$01.50/0
0 1989 American Chemical Society
V
CH2
-CH-CH2-
8
7
9
CI
'O/
17
16
SD3NI
19
18
26
27
38
28
40
39
41
42
CI CI
49
CI'
44
43
49
45
46
47
48
50
Figure 1. Structures of survey chemicals.
tendency of these semiquantitative estimates of biodegradability, we calculated an arithmetic mean score for each chemical after assigning numerical scores to each response as follows: days = 1, weeks = 2, months = 3, longer = 4 (AERUD); high = 1,intermediate = 2, negligible = 3 (REM). For modeling purposes, we used the 10% trimmed mean scores for AERUD (Table I), as recommended by Seber
(2). Trimming is a practice useful for reducing the effect of outlying observations (i.e., of a skewed distribution) on the mean of a distribution of observations. In this case we removed 10% of the total number of responses for each chemical, rounded to the next whole number, from each tail of its distribution. This procedure had the greatest effect on the AERUD values for chemicals like maleic anhydride (3), which had the following distribution of Environ. Sci. Technol., Vol. 23, No. 6, 1989
673
Table I. Biodegradability of 46 Survey Chemicals no.
obsd AERUD"
chemical
intb
model 1 AERUD resc
model 2 AERUD int
Group Ad vinyl acetate methyl methacrylate
22 21 10 43 12 37 30 11 3 2 41
chlorendic anhydride N,"-dimethylpropanamide tripropylene glycol diacrylate 2-pyrrolidone-5-carboxylicacid N-ethylbenzamide maleic anhydride phthalic anhydride butonate
16 38 1
carbazole vat dye methyl 3-amino-5,6dichloro-2-pyrazine carboxylate phthalylsulfathiazole
28* 6 40* 49 26 48 18 24 17 25* 34 23
3,3-dimethyl-l,2-epoxybutane
4-benzamido-2-chloro-5-methylbenzenediazonium chloride
1.56 1.79 2.81 3.88 1.80 2.24 1.80 1.70 1.26 1.63 2.70
H H L L H H H H H H L
1.36 1.82 2.92 4.03 1.97 2.49 2.06 2.06 1.63 2.16 3.26
+0.200 -0.032 -0.108 -0.148 -0.174 -0.253 -0.259 -0.355 -0.369 -0.528 -0.560
1.13 1.59 2.68 3.79 1.74 2.26 1.82 1.82 1.39 1.92 3.03
H H L
3.11 3.00 2.67
L L L
2.82 2.78 2.79
+0.290 +0.218 -0.123
2.95 2.91 2.92
L L L
2.68 1.60 2.60 3.06 2.40 2.15 2.58 1.85 2.00 2.33 2.05 2.00
L H L L H H L H H H H H
2.44 1.53 2.53 3.00 2.52 2.28 2.76 2.16 2.32 2.67 2.44 2.41
+0.238 +0.069 +0.067 +0.060 -0.122 -0.128 -0.175 -0.309 -0.318 -0.344 -0.389 -0,407
2.30 1.38 2.39 2.85 2.38 2.13 2.61 2.01 2.17 2.53 2.29 2.26
H H H L H H L H H L H H
3.41 2.56 3.58 3.21 2.07
L L L L H
2.63 1.90 3.08 3.10 2.19
+0.780 +0.657 +0.498 +0.106 -0.117
3.02 2.29 3.47 3.49 2.57
L H L L L
2.95 2.37 2.94 1.80
L H L H
2.95 2.48 3.17 2.08
+0.004 -0.111 -0.233 -0.282
2.79 2.33 3.02 1.93
L H L H
3.05 2.89 1.60 2.67 3.82 3.10 3.00 2.61 2.55 3.12 2.89
L L H L L L L L L L L
2.48 2.43 1.21 2.35 3.52 2.89 2.80 2.42 2.42 3.11 3.03
+0.574 +0.463 +0.390 +0.322 +0.300 +0.208 +0.202 +0.194 +0.129 +0.010 -0.136
2.72 2.67 1.45 2.59 3.76 3.13 3.04 2.66 2.66 3.35 3.27
L L H L L L L L L L L
L H H H H H H L
Group B
Group C 2,4-hexadien-l-ol
4-butyl-N-(4-ethoxybenzylidene)aniline (4-carboxybutyl)triphenylphosphonium bromide p-tert-butylphenol
(dimethy1amino)phenethylalcohol copper chelate complex 2-methyl-3-hexanone epichlorohydrin resol component of resin di-n-octyl ether o-benzoylbenzoic acid Group D
47 44* 15 42 31*
2,4-dichloro-6-ethoxy-1,3,5-triazine acetylpyrazine hydron yellow
4-amino-3,5,6-trichloropicolinic acid 3-pyrrolidine- l,2-propanediol Group E
50 46 32 14
hexatriacontane 4-decylaniline dichlorobenzalkonium chloride n-butylbenzene
13 8 7 36 33 20 35 39 4 9 5
N,"-diphenyl-p-phenylenediamine 2-methyl-2-nitro-n-butane trans-2-butene cyclopropyl phenyl sulfide decamethyltetrasiloxane diazo amino dye diasone durene-al,az-dithiol decalin trichloroethylene camphane
Group F
a AERUD, aerobic ultimate degradation in r e a . .ing waters. Each observed value is the 10% trimmec mean for all survey responses for that chemical, taken after assigning an integer score to each response as described in the text. bInterpretation of AERUD according to the following scheme: AERUD < 2.50 = H (high biodegradability); AERUD 2 2.50 = L (low biodegradability). Chemicals marked with an asterisk (*) are those for which observed and estimated (model 2) biodegradability disagree. CResidual. dGroups are defined in the text and in Figure 2.
responses: days, 16; weeks, 6; months, 0; longer, 1 (n = 23). For this chemical the simple and trimmed means are 1.39 and 1.26, respectively. The simple and trimmed means were generally very close ( hydroxyl > carboxylic acid group, epoxide, site of unsaturation > benzene ring, methyl group, methylene group (about equal where separated by commas). We inferred from this that compounds that are already partially oxidized are generally considered to be more easily attacked than those that are not, all other things being equal, and that hydrolyzable chemicals are considered to be the most readily attacked. It was also apparent from the responses that molecular mass, alkyl branching, halogenation, and nitrogen heterocycles were generally viewed as negative factors. These and other results of the survey are described in more detail elsewhere (Boethling et al. Ecotoxicol. Environ. Saf., in press). Using this information, we sought to improve model 1 by examining the residuals (Table I). We found that chemicals sharing certain structural features tended to have residuals with the same sign. For example, nearly all of the esters, amides, and anhydrides had negative residuals, which may suggest that the variables in model 1 did not adequately account for the apparent positive effect of hydrolyzability on perceived biodegradability. Further analysis showed that the survey chemicals could be separated broadly into five or six such groups. These groups are not mutually exclusive, but when arranged in a hierarchy, each chemical in the 46-chemical training set falls into only one. The logic for the arrangement we prefer is depicted in Figure 2, and the chemicals in Table I are grouped as defined by this logic. We then calculated the mean residual from model 1for each group (Figure 2), and added the appropriate number to each estimated value of AERUD, to arrive at corrected estimates of AERUD. These estimates are listed in Table I under the column “model 2”, which we define as the regression model (model 1)in combination with the scheme outlined in Figure 2. This is mathematically equivalent to adding a series of indicator variables to the regression model, which have coefficients equal to the mean residuals from model 1for the specified groups of chemicals. The regression equation for observed and corrected values of AERUD is AERUDobSd= 0.946AERUDCom + 0.137 (3) for which the correlation coefficient, standard error, and explained variance are 0.942, 0.219, and 88.8%, respecEnviron. Sci. Technol., Vol. 23,No. 6, 1989
675
C HE ME A L GROUP IN TRAINING SET (TABLE 1)
AERUD PREDICTED FROM MODEL 1
-
t
Table 111. Comparison of Model 2 Predictions to Measured Biodegradability for Selected Chemicals chemical
CONTAINS ESTER, AMIDE OR ANHYDRIDE? NO YES
I
CONTAINS HETEROCYCLIC N?
-
NO ADD (-0.235) Y E S - ADD (+0.128)
A
B
CONTAINS 0 BOUND TO C? NO YES CONTAINS HETEROCYCLIC N?
I
N O - A D D (-0.147) Y E S - ADD (+0.385)
C D
CONTAINS UNBRANCHED ALKYL GROUP WITH 2 4 CARBONS?
-
E F
Y E S ADD (-0.156) N O - A D D (+0.241)
Fbwe 2. Schematic for calculating scores for aerobic ultimate degradation in receiving waters (AERUD) according to model 2. The chemicals In the groups referred to by letter are ldenrmed in Table I. Each number to be added is the mean residual from model 1 for the indicated group.
3.5
-
3.0
-
K
w
a
-
1 0 1 0
I
I
I
I
I
I
1 5
2 0
2 5
3 0
3 5
4 0
AERUD Corr (Model 2)
Figure 3. observed scores for aeroblc utthate degadatron in redving waters (AERUD) vs corrected values from model 2, for 46 survey chemclals. The line Is described by eq 3 in the text.
tively. This relationship is shown graphically in Figure 3. Performance with Prediction Sets. It is essential to determine how well model 2 performs as a predictive tool with chemicals for which adequate experimental data are available. This necessitates the adoption of some scheme for treating fractional values of AERUD. We evaluated many such schemes, but in view of our central purpose, which was to develop a method for classifying chemicals as readily or not readily biodegradable in receiving waters, as well as the modest size of the training set, we propose the following: AERUD C 2.50 = high biodegradability (H) (4) AERUD 1 2.50 = low biodegradabaility (L) The value 2.50 is the midpoint of the ordinal scale (1 = days, 2 = weeks, 3 = months, 4 = longer) used to assign numerical scores to the experts' estimates of degradability. Biodegradability classifications derived from this scheme 676
1.81 1.83 2.23 2.25 2.25 2.29 2.34 2.37 2.48 2.50 2.57 2.60 2.62 2.63 2.81 2.91 3.02 2.80 2.96 3.06 3.51 3.58 3.98
H H H H H H H H H L L L L L L L L L L L L L L
lit. intC ref d-wk d-wk d-wk d-wk d-wk wk wk wk mo mo mo mo mo mo mo mo mo >mo >mo >mo >mo
>mo >mo
7 7, 8, 9 8 10, 11 12 13 13, 14 14 13 15 12 10 16 13 17 10 10 13 13 13 10
10 10
AERUD, aerobic ultimate degradation in receiving waters. *Interpretation of AERUD according to the following scheme: AERUD < 2.50 = H (high biodegradability); AERUD 2 2.50 = L (low biodegradability). CInterpretationof the literature data in terms of the approximate time required for aerobic ultimate degradation in receiving waters; d, day; wk, week; mo, month. See text for details.
n
2.0
p-cresol p-nitrophenol methyl parathion p-chlorophenol di-n-butyl phthalate hexadecane naphthalene octadecane 2-methylnaphthalene isopropylphenyl diphenyl phosphate di-2-ethylhexyl phthalate chlorobenzene tert-butylphenyl diphenyl phosphate phenanthrene chlorobenzilate 2,4,5-trichlorophenoxyaceticacid 2,4,5-trichlorophenol pyrene benzo[a]pyrene 3-methylcholanthrene hexachlorophene p,p'-DDE hexachlorobenzene
model 2 AERUD' intb
Environ. Sci. Technot., Vol. 23, No. 6, 1989
are given in Table I for the observed values of AERUD and the predicted values from model 2. Model 2 correctly classifies 41 of 46 chemicals (89.1%); model 1 correctly classifies 37 of 46 chemicals (80.4%). We pursued two different approaches in comparing model 2 predictions to published data. In the first, biodegradation data were obtained from the literature for 23 chemicals. Our requirements were that (i) the test systems must have incorporated natural (preferably fresh) water and detrital sediment, (ii) the investigator must have quantified some measure of ultimate degradation, such as I4CO2from I4C-labeled test chemical, and (ii) the degradation curve must have been described sufficiently to allow reasonable extrapolation to complete ultimate degradation. The second and third items are important because values of AERUD are estimates of the approximate time required for complete ultimate degradation in receiving waters, by definition. Item i is essential, we believe, because it reflects current understanding of the factors that are most critical in testing to determine environmentally relevant biodegradation rates. Our search concentrated on recent literature, but was not exhaustive. After reviewing the data, we assigned words such as "weeks" and "months" to describe approximately how long it should take for a chemical to undergo aerobic ultimate degradation. The results of this analysis are presented in Table 111. Any designation of weeks or less is consistent with high since the latter is assigned to any biodegradability (HI, value of AERUD < 2.50, and the integer 2 signified weeks in the scoring of survey responses. An analogous argument can be made for any designation of months or longer. The results therefore show that model 2 correctly classified 22 of the 23 chemicals examined (96%), the only misclassified chemical being 2-methylnaphthalene. Moreover, the chemicals in Table I11 are listed in approximate order of increasing persistence as indicated by the literature cited, and the predicted values of AERUD generally preserve this order.
Table IV. Comparison of Model 2 Predictions to Biodegradability Codes for Selected Chemicals in BIODEG" chemical
AERUDb
intC
BIODEG code
n-butylamine p-methoxyaniline o-cresol m-hydroxybenzoic acid ethyl o-hydroxybenzoate 3,4-dihydroxybenzoicacid isodecanol nicotinic acid stearic acid 4-aminopyridine 2-nitroaniline 3-nitroaniline o-chloronitrobenzene benz [alanthracene 2,3,6-trichlorobenzoicacid bromodichloromethaned pentachlorobenzene
1.47 1.78 1.84 1.85 1.87 1.97 2.12 2.26 2.41 1.92 2.24 2.24 2.68 2.83 2.96 3.46 3.84
H H H H H H H H H H H H L L L L L
BF BF BF BF BF BF BF BF BF BSA BSA BSA BSA BSA BSA BSA BSA
BIODEG is the Evaluated Biodegradation Database (6). Chemicals having aerobic biodegradability summary codes of BF (biodegrades at a fast rate) or BSA (biodegrades at a slow rate even with acclimation) and reliability codes of 1 were selected as described in the text. *AERUD, aerobic ultimate degradation in receiving waters. Interpretation of AERUD according to the following scheme: AERUD < 2.50 = H (high biodegradability);AERUD 2 2.50 = L (low biodegradability). dFor purposes of calculating AERUD, the chlorination index nCl/M, (no. of chlorine atoms divided by molecular weight of the chemical) was treated as a general halogenation index.
In the second approach we retrieved experimental data and evaluation codes from BIODEG, the Evaluated Biodegradation Database (6). We restricted our attention to chemicals having aerobic summary codes of either BF (biodegrades at a fast rate) or BSA (biodegrades at a slow rate even with acclimation) and the highest rating for reliability of the summary code. Our reasoning was as follows: (i) given the criteria for assigning codes in BIODEG, BF generally implies aerobic ultimate degradation in receiving waters in weeks or less; (ii) similarly, BSA generally implies degradation in periods longer than weeks; (iii) in most cases chemicals rated BF and BSA in BIODEG should therefore be classified H and L, respectively, by the proposed scheme for interpreting model 2 results (eq 4). We retrieved all of the BSA chemicals in the database as of August 25, 1987, and from these selected those chemicals for which a careful inspection of all available data clearly supported the assigned codes. An approximately equal number of BF chemicals were selected in a similar fashion, except that random selection was used for the initial retrieval, since BIODEG contains many times more BF than BSA chemicals. The results of this analysis are shown in Table IV. Model 2 correctly classified 14 of 17 chemicals (82%), the misclassified chemicals being 4-aminopyridine, 2-nitroaniline, and 3-nitroaniline. Treatability Classification. Values of AERUD from model 2 could also be useful for classifying chemicals with respect to removal due to biodegradation in wastewater treatment (REM). To explore this possibility, we first scored removal of the survey chemicals as high if the actual value of REM from the experts' responses in the survey (data not shown) was 2. By this scheme, removal was indeterminate for 3 of 46 chemicals. It was then necessary to adopt a scheme for assigning treatability descriptors to the model estimates of AERUD. We examined two different schemes. The first was the same as that proposed for describing aerobic ultimate
degradation in receiving waters; i.e.,