Uncertainty Reduction in Environmental Data with Conflicting

Jun 3, 2009 - quality of the decision process, uncertainty management and reduction techniques, such as the Dempster-Shafer theory of evidence, have t...
4 downloads 6 Views 188KB Size
Environ. Sci. Technol. 2009, 43, 5001–5006

Uncertainty Reduction in Environmental Data with Conflicting Information ´ NDEZ,† ROBERT RALLO,‡ ALBERTO FERNA A N D F R A N C E S C G I R A L T * ,† Departament d’Enginyeria Quı´mica, Universitat Rovira i Virgili, Av. Paı¨sos Catalans 26, 43007 Tarragona, Catalunya, Spain, and Departament d’Enginyeria Informa`tica i Matema`tiques, Universitat Rovira i Virgili, Av. Paı¨sos Catalans 26, 43007 Tarragona, Catalunya, Spain

Received December 24, 2008. Revised manuscript received May 13, 2009. Accepted May 15, 2009.

The assessment of ecotoxicological effects of chemicals for regulatory purposes requires large amounts of experimental data which are expensive to obtain and eventually might entail exhaustiveanimaltesting.Therequireddecision-makingprocesses in this regulatory context, must often be carried out with limited or even contradictory sources of information. To benefit from all sources of information without compromising the quality of the decision process, uncertainty management and reduction techniques, such as the Dempster-Shafer theory of evidence, have to be applied. This theory was applied to both experimental and in silico biodegradation data sources to assess chemical persistence. Uncertainties of the initially less uncertain estimates for biodegradation rates in water were reduced by as much as 20-60%. The analysis showed that conflicting evidence can be detected, quantified, and redistributed proportionally among all the feasible subsets of hypotheses. The advantages of the Dempster-Shafer theory over Bayesian approaches to represent evidence concerning hypotheses by assigning probabilities were also analyzed.

Introduction Most decision making processes concerning real world applications are based on data or information that are uncertain but for which there is some supporting evidence. Medical diagnosis and ecotoxicological assessment fall into the category of tasks that require expert reasoning. The mandatory registration, evaluation, authorization, and restriction of chemical substances (REACH, EC 1097/2006) for all chemicals that are imported or manufactured in quantities over 10 t/year in Europe is a critical decision-making process situation since massive testing and analyses will be needed to fill the enormous gaps of missing information that is required. As a result, the REACH decision making process will have to be carried out in many instances with limited and often contradictory evidence. Thus, data prediction and evaluation for REACH would require the development of an Intelligent Testing Strategy (ITS) framework suitable to optimize existing data by minimizing economic costs and the need for new animal testing. The resulting ITS framework * Corresponding author e-mail: [email protected]; phone: +34977559638; fax: +34977559621. † Departament d’Enginyeria Quı´mica. ‡ Departament d’Enginyeria Informa`tica i Matema`tiques. 10.1021/es803670c CCC: $40.75

Published on Web 06/03/2009

 2009 American Chemical Society

should integrate heterogeneous information gathered by several methodologies, including quantitative structureactivity relationship (QSAR), threshold of toxicological concern (TTC), read-across, in vitro, and in vivo tests. These methods are affected by different sources of uncertainty which have to be identified, managed and reduced in subsequent testing cycles by using decision theory tools. The effects of not accounting for or just underestimating uncertainties in the determination or assessment of chemical properties and biological activities have been reported elsewhere (1, 2). The first objective of the current study is to characterize the uncertainty associated with every piece of information entering an ITS framework to evaluate chemicals. The second objective is to apply decision theory tools based upon the Dempster-Shafer theory of evidence, which can incorporate all sorts of complex information, even conflicting information, into a mathematical framework where the uncertainty of all managed data is of major concern. Both objectives are illustrated in the following sections by showing how knowledge is represented and inference performed with the Dempster-Shafer theory of evidence in a representative case study dealing with the classification of chemicals as Readily and Not Readily biodegradable according to their persistence in the environment.

Materials and Methods Data Sources for Chemical Biodegradation in Water. The current persistence assessment study consists in the classification of a set of chemicals into two different biodegradability categories defined according to their biodegradation in water measured as percentage of biological oxygen demand (BOD). The biodegradation families considered are Readily biodegradable for chemicals with a BOD value greater or equal than 60%, and Not Readily biodegradable for BOD values of up to 60%. Seven chemicals covering the whole range of different situations that could be typically encountered during the data quality analysis were selected as model compounds to illustrate the whole process of uncertainty management (see Table 1). Two different sources of evidence for the percentage of biodegradation in water were considered. The first source of evidence was obtained from the Biowin5 QSAR model (3) since it estimates the probability of a compound to be Readily biodegradable under the MITI-I (Ministry of International Trade and Industry) screening test and OECD (Organization for Economic Co-operation and Development) 301-C test. The results of the MITI-I test are either Readily biodegradable or Not Readily biodegradable. Biowin5 QSAR model assigns a numeric value of 1 to Readily biodegradable and a numeric value of 0 to Not Readily biodegradable. The probability of rapid biodegradation is estimated by means of linear regression against counts of chemical substructures (molecular fragments) plus molecular weight. Therefore, probability estimates above 0.5 identify Readily biodegradable substances, while values below 0.5 correspond to the Not Readily biodegradable category. The initial Readily biodegradable probability estimates obtained from Biowin5 for the seven illustrative chemicals are given in Table 1. The second source of evidence considered was the CERI database (Chemicals Evaluation and Research Institute of Japan), which provides a maximum of three BOD values corresponding to MITI-I experiments for each molecule. The CERI data for the illustrative set of seven molecules are also given in Table 1. The last right-hand column in this table lists the evaluation about the biodegradability of each of the model VOL. 43, NO. 13, 2009 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

5001

TABLE 1. Initial Data Containing a Readily Biodegradable Probability Estimate and a Maximum of Three BOD Measuresa Biowin5c

CERId

CASRNb

name

probability

BOD1

BOD2

74-87-3 105-67-9 107-05-1 75-09-2 79-46-9 95-53-4 106-50-3

chloromethane 2,4-dimethylphenol 3-chloro-1-propene dichloromethane 2-nitropropane 2-methylbenzenamine 1,4-benzenediamine

0.56 0.52 0.55 0.51 0.40 0.31 0.11

1% 98% 69% 9% 14% 61% 2%

0% 84% 62% 5% 8% 69% 3%

BOD3 91% 55% 26% 10%

decisione ? Readily ? ? Not Readily ? Not Readily

a

The right-hand column includes the decision about chemical biodegradability that can be reached solely on the grounds of coincident experimental evidence for each chemical. b Chemical Abstracts Service Registry Number. c Biowin5 QSAR probability estimate of a chemical being Readily biodegradable. d A maximum of three independent BOD measures from the Chemicals Evaluation and Research Institute of Japan. e A decision is reached only when the Biowin5 probability estimate (cutting value equal to 0.5) and the CERI BOD measures (cutting value equal to 60%) point at the same biodegradability category.

compounds that could be made when all experimental evidence is consistent. Dempster-Shafer Theory of Evidence. There are several models that can be used to augment knowledge representation with statistical measures that describe the levels of evidence and belief, (4-8) such as the classic probabilistic method (9), Bayesian networks (10-12), rule-based systems with certainty factors (13), Dempster-Shafer theory of evidence (14-16), and fuzzy logic (17). Associated with each one of these formalisms there are methods to represent knowledge and to obtain new information from the old one through a process of reasoning or inference. The Dempster-Shafer theory of evidence has been selected since it is one of the most appropriate tools to deal with the current problem and because of the novelty of its application in the field of risk assessment. This formalism has been used in previous environmental studies to represent uncertainties when climate change is taken under consideration (18), as well as in forestry decision analyses (19, 20). It is a technique for decision-making under uncertainty, which considers sets of hypotheses and assigns probabilities to them. The interest in Dempster-Shafer theory stems from the richness of its uncertainty representation scheme. The Bayesian approach and Dempster-Shafer theory share fundamental ideas and produce identical results when uncertainties are not extreme. In fact, Bayesian analysis arises as a special case within the more generic Dempster-Shafer theory (14). Disagreement between these two theories occurs when quantifying weak evidence and its associated uncertainties since in such situations the Dempster-Shafer theory offers greater flexibility than the Bayesian approach. The following section describes how knowledge is represented and inference performed by applying the Dempster-Shafer theory of evidence to a case study dealing with the classification of chemicals according to their environmental persistence in water. Both Dempster-Shafer and Bayesian theories represent evidence concerning hypotheses by assigning probabilities. Probability assignments in either approach must always sum to 1, but the two theories differ in the subjects of the probability assignments. The Bayesian probability assignments must be made to individual hypotheses, while in Dempster-Shafer formalism the assignments may also be made to unions of hypotheses. Furthermore, these subsets are allowed to share hypotheses (i.e., they may overlap). It is this freedom to choose the subsets and to vary the resolution of the probability assignments that gives flexibility and enhanced representation capabilities to the DempsterShafer theory. In case of a finite sample space, like the one in the current analysis, the Dempster-Shafer theory of evidence first 5002

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 43, NO. 13, 2009

establishes an exhaustive universe of n mutually exclusive hypotheses hi. This set is called the frame of discernment and can be expressed as Θ ) {h1, h2, ..., hn}

(1)

The key function used in Dempster-Shafer theory of evidence is a probability assignment, usually called basic probability assignment, which we denote as m. The function m is defined not just for individual hypotheses in Θ but for all subsets of it, including Θ itself. According to Klir and Smith (21), for each subset H of Θ, the value m(H) expresses the degree of support of the evidential claim that the true hypothesis is in the set H but not in any special subset X of H. Any additional evidence supporting the claim that the true hypothesis is in a subset X of H must be expressed by another nonzero value m(X). The values assigned to m must satisfy the following two conditions:

∑ m(H) ) 1

(2)

m(Ø) ) 0

(3)

H⊆Θ

which define a less restrictive probability assignment than that of the classical probability theory. A probability assignment in the classical approach would be defined by the three following propositions: (i) the probability of the union of two disjoint hypotheses is equal to the sum of their individual probabilities, (ii) the sum of all the individual probabilities must be equal to 1, and (iii) no probability can be assigned to the impossible hypothesis (i.e., the empty set). In contrast, the Dempster-Shafer theory of evidence provides a less restrictive definition for a probability assignment by keeping proposition iii, which corresponds to eq 3, and substituting propositions i and ii by eq 2, which states that the sum of evidence on all the subsets of hypotheses (individual or not) must be equal to 1. If Θ contains n individual hypotheses, then we know that there are 2n possible subsets of Θ. Although dealing with 2n values may seem intractable, it usually turns out that many of the subsets will never have to be considered because they have no physical significance in the problem domain (and so their associated value of m will be 0). A conventional Bayesian probability assignment is a special case of a basic probability assignment, where the nonzero probability values are assigned only to the singleton (individual) hypotheses. Two additional quantities of conceptual importance can be obtained once a basic probability assignment m has been specified. The belief on a subset of hypotheses H, Bel(H), is

defined as the sum of all the values m(X) that have been assigned either directly to H, or to subsets of H Bel(H) )

∑ m(X)

(4)

TABLE 2. Number of Correct and Incorrect Classifications, Likelihood Probabilities, and Conditional Probabilities of the Biowin5 Model According to Boethling et al. (23)

X⊆H

Thus, Bel(H) measures the strength of the evidence in the fact that the correct answer lies somewhere in the set of hypotheses H. It ranges from 0 (indicating no evidence) to 1 (denoting certainty). It should be noted that eqs 2 and 4 state that our belief in the whole set of hypotheses (frame of discernment) must sum up to 1. The plausibility of a subset of hypotheses H, Pl(H), is defined as the sum of all the values m(X) of the subsets of Θ, which do not exclude H Pl(H) )



m(X)

(5)

X∩H*L

predictedc d

observed R+ Rtotals

p(T|R)a

p(R|T)b

predictedc

predictedc

T+

T-

totals

T+

T-

T+

T-

75 50

21 228

96 278

0.78 0.18

0.22 0.82

0.6 0.4

0.08 0.92

125

249

374

Likelihood probabilities: p(T|R) ) number of (R,T)/ number of (R). b Conditional probabilities: p(R|T) ) number of (R,T)/number of (T). c T+, predicted Readily biodegradable by Biowin5 test; T-, predicted Not Readily biodegradable by Biowin5 test. d R+, observed Readily biodegradable; R-, observed Not Readily biodegradable. a

Belief and plausibility are related by j) Pl(H) ) 1 - Bel(H

(6)

j stands for the complement of the subset H. where H Plausibility also ranges from 0 to 1 and measures the extent j leaves room for belief in H. to which evidence in favor of H The probabilistic nature of a basic probability assignment in Dempster-Shafer theory allows two extreme interpretations. Bel(H) measures the amount of information that supports the truth of proposition H, while Pl(H) measures the absence of information that refutes the truth of proposition H. In spite of this, belief and plausibility should not be interpreted as the lower and upper bounds of unknown true probabilities (16). Although such an interpretation seems possible when we consider only a single source of evidence, it breaks down when we consider different and possibly conflicting sources of evidence. For every subset of hypotheses H, a belief-plausibility interval [Bel(H), Pl(H)] can be defined to measure not only the level of confidence in the hypotheses in H but also the amount of information that is available, measured as the length of the interval. Of particular interest are the subsets that correspond to singleton hypotheses, that is, H ) {hi}, because their belief-plausibility intervals are important if the Dempster-Shafer theory is to be used for decisionmaking. For instance, imagine a situation where three hypotheses h1, h2, and h3 compete. If there is no evidence about the hypotheses, then a probability value of 1 would be assigned entirely to the subset, which includes all possible hypotheses, Θ ) {h1, h2, h3}. From the perspectives of belief and plausibility, any singleton hypothesis has a belief of 0, reflecting that the amount of information supporting that hypothesis under ignorance is 0. But each singleton hypothesis also has a plausibility of 1, reflecting the absence of information refuting it. This can be represented by saying that the likelihood of each individual hypothesis falls within the range [0, 1]. Thus, complete ignorance provides the largest possible range for the probability value of any singleton hypothesis. As evidence is accumulated, this interval is expected to shrink, representing increased confidence in the idea that it is known how likely each hypothesis is. Therefore, there is a way of measuring the uncertainty of each hypothesis, which can then be used to decide whether or not more evidence is needed. This stepwise uncertainty refinement would be of great importance for the decision-analysis scheme of an ITS. A pure Bayesian approach, on the other hand, would represent complete ignorance by distributing the prior probability uniformly among the singleton hypotheses, following what is known as the Laplacian concept of insufficient reason (22). For instance, in the example with

three hypotheses, the prior probability for each hypothesis hi would be distributed as p(hi) ) 1/3. The belief-plausibility approach clearly assumes that no information is available to start with. In the Bayesian approach, this is not the case since we could end up with the same probability values if we collected several evidence. This difference becomes relevant if one of the decisions concerns whether to collect more evidence or to act on the basis of the evidence that is already available. If two basic probability assignments on the same set of hypotheses were obtained from two independent sources of evidence, then they could be combined to yield a new basic probability assignment. The combination procedure can be performed with the Dempster’s rule of combination, which plays an analogous role to Bayes’ equation in the Bayesian scheme. Given two basic probability assignments, m1 and m2, the combination rule specifies that the value assigned to any nonempty subset Z of Θ in the resultant basic probability assignment m3 must be

m3(Z) )



m1(X) m2(Y)

X∩Y)Z

1-



(7) m1(X) m2(Y)

X∩Y)L

The denominator in the above equation considers values assigned to disjoint subsets in the basic probability assignments, m1 and m2, and redistributes them proportionately across the nonempty subsets Z, ensuring that the values assigned in the resultant m3 also sum up to 1. The significance of these cases, where nonzero values are assigned to disjoint subsets, is that there is conflicting evidence. Thus, the Dempster-Shafer theory provides a mechanism to detect conflicting evidence and to redistribute it proportionately among the feasible subsets of hypotheses.

Results and Discussion Bayesian Analysis. A classic Bayesian analysis is first performed with the Biowin5 data for the seven chemicals listed in Table 1. The addition of the CERI data as a second source of evidence is considered afterward to evaluate uncertainty following strictly the Dempster-Shafer theory of evidence. A Bayesian analysis requires information on both the likelihood probabilities of the test method and the prior probabilities (11, 12). The Biowin5 likelihood probabilities can be estimated from any of the several validations of the model that have been reported in the literature, such as the external validation performed by Boethling et al. (23) who used a test data set of 374 substances. The results obtained VOL. 43, NO. 13, 2009 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

5003

TABLE 3. Bayesian Posterior Probabilities Obtained from Biowin5 Data Using Noninformative and Informative Priors posterior probabilitiesa from noninformative priorsb

posterior probabilitiesa from informative priorsc

name

p(R+|T)d

p(R-|T)d

p(R+|T)d

p(R-|T)d

chloromethane 2,4-dimethylphenol 3-chloro-1-propene dichloromethane 2-nitropropane 2-methylbenzenamine 1,4-benzenediamine

0.81 0.81 0.81 0.81 0.21 0.21 0.21

0.19 0.19 0.19 0.19 0.79 0.79 0.79

0.6 0.6 0.6 0.6 0.08 0.08 0.08

0.4 0.4 0.4 0.4 0.92 0.92 0.92

Posterior probabilities: p(R+|T) ) [p(R+)p(T|R+)]/[p(R+)p(T|R+) + p(R-)p(T|R-)]; p(R-|T) ) [p(R-)p(T|R-)]/ [p(R+)p(T|R+) + p(R-)p(T|R-)]. b Noninformative priors, i.e., p(R+) ) p(R-) ) 0.5. c Informative priors from the sensitivity/ specificity analysis carried out by Boethling et al. (23), i.e., p(R+) ) 0.26 and p(R-) ) 0.74 (see Table 2). d R+, observed Readily biodegradable; R-, observed Not Readily biodegradable; T, Biowin5 test prediction. a

TABLE 4. Biowin5 and CERI Basic Probability Assignments (m1 and m2 in eq 7) and the Belief-Plausibility Intervals Calculated for Each Singleton Hypothesis (Readily, Not Readily) According to eqs 4 and 5 Biowin5 basic probability assignment (m1) name

Readily

Not Readily

chloromethane 2,4-dimethylphenol 3-chloro-1-propene dichloromethane 2-nitropropane 2-methylbenzenamine 1,4-benzenediamine

0.6 0.6 0.6 0.6 0 0 0

0 0 0 0 0.92 0.92 0.92

CERI Readily

Readily, Not Readily [Bel Pl] 0.4 0.4 0.4 0.4 0.08 0.08 0.08

0.6 0.6 0.6 0.6 0 0 0

1 1 1 1 0.08 0.08 0.08

with this validation process, which has been adopted in the current study to estimate prior probabilities, are shown in Table 2 in terms of correctly and incorrectly classified chemicals. This table also includes the corresponding likelihood probabilities needed to perform the Bayesian analysis. One of the main drawbacks of Bayesian analyses is their high dependence on prior probabilities (11), which can range from noninformative priors, representing complete uncertainty, to informative priors. To briefly illustrate this dependency in the current uncertainty analysis of biodegradation data let us consider both the simplest noninformative case with prior probabilities p(R+) ) p(R-) ) 0.5, and the informative priors based on the sensitivity/specificity data given in Table 2, that is, p(R+) ) 0.26 and p(R-) ) 0.74. Table 3 includes the posterior probabilities obtained when the Bayesian analysis is performed with these two different sets of prior probabilities for the same seven model compounds. The posterior probabilities determined from noninformative priors (two left-hand side columns of Table 3) are significantly different from those determined from informative priors (two right-hand side columns of Table 3). These differences in posterior probabilities clearly illustrate the high sensitivity of Bayesian analyses to different prior probabilities. Dempster-Shafer Analysis. The evaluation of the Dempster-Shafer theory of evidence is also performed with the same Biowin5 data given in Table 1. The frame of discernment Θ in the current example of chemical persistence assessment is the set {Readily, Not Readily}. Therefore, the amount of information supporting that the true hypothesis lays in the subsets Ø, {Readily}, {Not Readily}, or {Readily, Not Readily} has to be specified to define a basic probability assignment. The last case {Readily, Not Readily} denotes lack of evidence (i.e., persistence is unknown). To this end, the proportion of predictions that are correct is calculated from the validation data in Table 2, separating the positive predictions from the negative ones. This calculation deter5004

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 43, NO. 13, 2009

Not Readily basic probability assignment (m2) Readily Not Readily [Bel

Pl]

Readily

Not Readily

0 0 0 0 0.92 0.92 0.92

0.4 0.4 0.4 0.4 1 1 1

0 0.6 0.4 0 0 0.4 0

0.4 0 0.2 0.6 0.4 0 0.6

Readily, Not Readily [Bel Pl] 0.6 0.4 0.4 0.4 0.6 0.6 0.4

0 0.6 0.4 0 0 0.4 0

0.6 1 0.8 0.4 0.6 1 0.4

[Bel

Pl]

0.4 0 0.2 0.6 0.4 0 0.6

1 0.4 0.6 1 1 0.6 1

mines the conditional probabilities p(R|T) shown in the same table, which are then used to define the basic probability assignments corresponding to the Biowin5 data. The basic probability assignments for the Biowin5 data are given in Table 4. This table also presents the beliefplausibility intervals for each singleton hypothesis (Readily, Not Readily) computed with eqs 4 and 5 for the Biowin5 data. Up to now, only a single source of evidence has been considered for the Bayesian approach (Table 3) and the Dempster-Shafer theory (Table 4). Comparison of these results shows that the informative posterior probabilities are the extreme values of the belief-plausibility intervals because both have been built upon the same validation data of Table 2. However, it must be kept in mind that this interpretation is not always possible, especially when different and conflicting sources of evidence are considered. Let us now incorporate the second source of CERI evidence given in Table 1. This information consists of a maximum of three BOD values for each chemical considered. Following Ducey (19) and Walley (24, 25), every single item of evidence falling inside a certain class in k trials could be assigned a belief value equal to 1/(k+2). Since CERI provides a maximum of k ) 3 BOD measures for each chemical, a confidence of 1/5 ) 0.2 can be assigned to each of these three values. The remaining amount of information can be attributed to the subset containing the two individual hypotheses, that is, {Readily, Not Readily}. Table 4 shows the basic probability assignment m2 obtained from CERI, as well as the intervals for the degrees of belief-plausibility for each singleton hypothesis. The basic probability assignments m1 and m2 given in Table 4 for Biowin5 and CERI, respectively, can be combined according to the rule defined by eq 7, even when there is conflicting information, that is, when the denominator is lower than 1. When this is done, the new combined basic probability assignment, m3, is obtained, as shown in Table 5. This table also includes the intervals with the corresponding

TABLE 5. Combined Probability Assignment (m3) and Belief-Plausibility Intervals Calculated Using the Dempster’s Combination Rule in eq 7 and the Resulting Length of the Belief-Plausibility Intervals for Biowin5 and CERI Sources of Evidence from Table 4 combined probability assignment (m3) name

Readily

Not Readily

Readily, Not Readily

chloromethane 2,4-dimethylphenol 3-chloro-1-propene dichloromethane 2-nitropropane 2-methylbenzenamine 1,4-benzenediamine

0.47 0.84 0.73 0.375 0 0.05 0

0.21 0 0.09 0.375 0.95 0.87 0.97

0.32 0.16 0.18 0.25 0.05 0.08 0.03

Readily

Not Readily

[Bel Pl]

[Bel

Pl]

0.47 0.84 0.73 0.38 0 0.05 0

0.21 0 0.09 0.38 0.95 0.87 0.97

0.53 0.16 0.27 0.63 1 0.95 1

degrees of belief for each individual hypothesis. Table 5 also presents the underestimate of the uncertainty reduction for the model compounds that results from the combination of the two sources of evidence. For instance, chloromethane has an uncertainty of 0.4 according to the length of the intervals obtained from Biowin5. Analogously, CERI information is of less quality since it is associated with a larger uncertainty value of 0.6. Nevertheless, the combination of both sources of evidence yields an uncertainty equal to 0.32, which represents a 20% reduction in uncertainty with respect to the less uncertain evidence of 0.4 (underestimate of uncertainty reduction) provided by Biowin5. Significant reductions of approximately 60% are obtained for 2,4dimethylphenol and 3-chloro-1-propene where equally uncertain information is provided by both sources, and also for 1,4-benzenediamine, where Biowin5 information is significantly less uncertain than that of CERI. The lengths of the combined intervals in Table 5 coincide exactly with the product of the lengths of the corresponding intervals from the two sources of evidence, in the cases with no conflicting evidence. Otherwise, the lengths of the combined intervals must be rescaled (enlarged) dividing them by (1 - conflict). To see the effects of both factors, let us analyze what happens with the Readily belief-plausibility intervals corresponding to two different subsets of molecules in Table 5 (the Not Readily analysis is symmetric because of the property given in eq 6). The role played by the characteristics of belief-plausibility intervals in the reduction of uncertainty is illustrated in Figures 1 and 2. Figure 1 depicts the individual and combined intervals for 2,4-dimethylphenol, 3-chloro-1-propene and dichloromethane where the same length of 0.4 in the uncertainty intervals for both Biowin5 and CERI leads to gradation of different reductions (0.16, 0.18, and 0.25 for each chemical) in the corresponding combined uncertainties in Table 5. This conflicting evidence is obtained numerically from the denominator of eq 7, but it can also be observed at a glance

FIGURE 1. Comparison of the uncertainty intervals obtained for the Readily hypothesis in the case of 2,4-dimethyl-phenol, 3-chloro-1-propene, and dichloro-methane that have the same lengths of 0.4 in the uncertainty intervals for Biowin5 and CERI. The values of the uncertainty intervals for Biowin5, CERI and both evidence combined are taken from Tables 4 and 5, respectively.

0.79 1 0.91 0.63 0.05 0.13 0.03

length of belief-plausibility intervals Biowin5 CERI 0.4 0.4 0.4 0.4 0.08 0.08 0.08

0.6 0.4 0.4 0.4 0.6 0.6 0.4

Biowin5 x uncertainty CERI reduction (%) 0.32 0.16 0.18 0.25 0.05 0.08 0.03

20 60 55 38 38 0 63

decision Uncertain Readily Readily Uncertain Not Readily Not Readily Not Readily

in Table 1. For instance, in the case of 2,4-dimethylphenol we observe a complete agreement between the two sources of evidence, while 3-chloro-1-propene and dichloromethane show both in Table 1 and Figure 1 some degree of contradictory information. Table 1 shows that the three BOD measures given by CERI for 3-chloro-1-propene are partially contradictory while for dichloromethane the contradiction between Biowin5 and CERI sources of evidence is complete. Figure 2 illustrates the completely different situation where the same lengths in the uncertainty intervals for 2-nitropropane and 2-methylbenzenamine lead to completely different reductions of their corresponding combined uncertainties, although in this case the uncertainty values from Biowin5 (0.08) and CERI (0.6) are very different. The differences in the lengths of the resulting combined intervals, 0.05 for 2-nitro-propane with a 38% uncertainty reduction from the less uncertain Biowin5 information and 0.08 for 2-methylbenzenamine with a 0% reduction, arise from the complete contradictory information coming from Biowin5 and CERI evidence in the case of 2-methylbenzenamine (leftside of Figure 2), while no such contradictions are observed in the case of 2-nitropropane. The last column in Table 5 shows the evaluation of biodegradability that can be reached according to the results obtained after combining both sources of evidence. The decision is based on the dominating belief-plausibility interval (one interval is said to dominate over another one when its lower bound is greater than the upper bound of the other interval). In the cases where a final decision can be reached, a numerical value that measures the belief of being a correct decision is also obtained. Thus, we can confirm that 2,4-dimethylphenol is Readily biodegradable with a degree of belief equal to 0.84, while for 2-nitropropane we can conclude that it is Not Readily biodegradable with a degree of belief equal to 0.95. The application of the proposed methodology with the combination of both sources of evidence also indicates that 3-chloro-1-propene and 2-methylbenzenamine, which were uncertain in Table 1, are Readily biodegradable and Not Readily biodegradable, respectively, with a degree of belief of 0.73 and 0.87, as highlighted in boldface in Table 5. Chloromethane and

FIGURE 2. Comparison of the uncertainty intervals obtained for the Readily hypothesis in the case of 2-nitropropane and 2-methylbenzenamine that have the same lengths in the uncertainty intervals for Biowin5 (0.08) and CERI (0.6). The values of the uncertainty intervals for Biowin5, CERI and both evidence combined are taken from Tables 4 and 5, respectively. VOL. 43, NO. 13, 2009 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

5005

dichloromethane with Bel < 0.5 remain uncertain. Table 5 also shows that the combination of the two sources of evidence has caused the intervals for the hypotheses to shrink, that is, uncertainty has been reduced, for all chemicals considered except for 2-methylbenzenamine. The results obtained in the current biodegradation assessment indicate that (i) basic probability assignments are functions that can incorporate information from any source of evidence in an incremental manner, (ii) beliefplausibility intervals provide a way to deal with individual hypotheses, which is fundamental in any decision-making scenario, (iii) the width of every belief-plausibility interval is a measure of the uncertainty of the respective hypothesis and as such supports the dilemma of whether more evidence is needed or not, (iv) conflicting evidence can be detected, quantified and redistributed proportionally among the feasible subsets of hypotheses on two basic probability assignments with the application of Dempster’s rule of combination, and (v) the combination of two independent sources of evidence of chemical persistence in water has been proved to reduce the total uncertainty for the majority of model chemicals analyzed.

Acknowledgments This research was financially supported by the European Union (OSIRIS Project, European Commission, FP6 Contract No. 037017), the Departament d’Innovacio´, Universitats i Empresa de la Generalitat de Catalunya (2005SGR-00735), and European Social Founds. F.G. acknowledges the support received from the Distincio´ a la Recerca (Generalitat de Catalunya).

Literature Cited (1) Neumann, M. B.; Gujer, W. Underestimation of uncertainty in statistical regression of environmental models: influence of model structure uncertainty. Environ. Sci. Technol. 2008, 42, 4037–4043. (2) Weber, C. L.; VanBriesen, J. M.; Small, M. J. A stochastic regression approach to analyzing thermodynamic uncertainty in chemical speciation modeling. Environ. Sci. Technol. 2006, 40, 3872–3878. (3) Tunkel, J.; Howard, P. H.; Boethling, R. S.; Stiteler, W.; Loonen, H. Predicting ready biodegradability in the Japanese Ministry of International Trade and Industry test. Environ. Toxicol. Chem. 2000, 19, 2478–2485. (4) Krause, P.; Clark, D. Representing Uncertain Knowledge: An Artificial Intelligence Approach; Intellect Books: Oxford, U.K., 1993. (5) Rich, E.; Knight, K. Artificial Intelligence, 2nd ed.; McGraw-Hill: New York, 1991.

5006

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 43, NO. 13, 2009

(6) Ross, T. J.; Booker, J. M.; Parkinson, W. J. Fuzzy Logic and Probability Applications: Bridging the Gap; Society for Industrial and Applied Mathematics: Philadelphia, PA, 2002. (7) Klir, G. J. Generalized information theory: aims, results, and open problems. Reliab. Eng. Syst. Saf. 2004, 85, 21–38. (8) Helton, J. C.; Johnson, J. D.; Oberkampf, W. L. An exploration of alternative approaches to the representation of uncertainty in model predictions. Reliab. Eng. Syst. Saf. 2004, 85, 39–71. (9) Duda, R. O.; Hart, P. E.; Konolige, K.; Reboh, R. A ComputerBased Consultant for Mineral Exploration; Final Report SRI Project 6415; SRI International: Menlo Park, CA, 1979. (10) Billoir, E.; Delignette-Muller, M. L.; Pe´ry, A. R. R.; Charles, S. A Bayesian approach to analyzing ecotoxicological data. Environ. Sci. Technol. 2008, 42, 8978–8984. (11) McDowell, R. M.; Jaworska, J. S. Bayesian analysis and inference from QSAR predictive model results. SAR QSAR Environ. Res. 2002, 13, 111–125. (12) Small, M. J. Methods for assessing uncertainty in fundamental assumptions and associated models for cancer risk assessment. Risk Anal. 2008, 28, 1289–1308. (13) Shortliffe, E. H.; Buchanan, B. G. A model of inexact reasoning in medicine. Math. Biosci. 1975, 23, 351–379. (14) Dempster, A. P. A generalization of Bayesian inference. J. R. Stat. Soc. B 1968, 30, 205–247. (15) Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, 1976. (16) Shafer, G. Perspectives on the theory and practice of belief functions. Int. J. Approx. Reason. 1990, 4, 323–362. (17) Zadeh, L. A. A theory of approximate reasoning. In Machine Intelligence; Hayes, J., Michie, D., Mikulich, L. I., Eds.; Halstead Press: New York, 1979; Vol. 9, pp 149-194. (18) Luo, W. B.; Caselton, B. Using Dempster-Shafer theory to represent climate change uncertainties. J. Environ. Manage. 1997, 49, 73–93. (19) Ducey, M. J. Representing uncertainty in silvicultural decisions: an application of the Dempster-Shafer theory of evidence. Forest Ecol. Manage. 2001, 150, 199–211. (20) Kangas, A. S.; Kangas, J. Probability, possibility and evidence: approaches to consider risk and uncertainty in forestry decision analysis. Forest Policy Econ. 2004, 6, 169–188. (21) Klir, G. J.; Smith, R. M. On measuring uncertainty and uncertainty-based information: recent developments. Ann. Math. Artif. Intell. 2001, 32, 5–33. (22) Howson, C.; Urbach, P. Scientific Reasoning: The Bayesian Approach, 2nd ed.; Open Court: Chicago, IL, 1993; p 52. (23) Boethling, R. S.; Lynch, D. G.; Jaworska, J. S.; Tunkel, J. T.; Thom, G. C.; Webb, S. Using Biowin, Bayes, and batteries to predict ready biodegradability. Environ. Toxicol. Chem. 2004, 23, 911– 920. (24) Walley, P. Statistical Reasoning with Imprecise Probabilities; Chapman and Hall: London, 1991; pp 217-226. (25) Walley, P. Inferences from multinomial data: learning about a bag of marbles. J. R. Stat. Soc. B 1996, 58, 3–57.

ES803670C