Classification Criteria and Probability Risk Maps: Limitations and

Policy Analysis Classification Criteria and Probability Risk Maps: Limitations and Perspectives M I C H A E L A S A I S A N A , * ,†,‡ GREGOIRE DUBOIS,§ ARCHONTOULA CHALOULAKOU,† AND NIKOLAS SPYRELLIS† Department of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, Zografou Campus, 15780, Greece, and Technological and Economic Risk Management Unit and Emissions and Health Unit, Joint Research Centre, European Commission, Enrico Fermi 1, 21020, Ispra, Italy

Delineation of polluted zones with respect to regulatory standards, accounting at the same time for the uncertainty of the estimated concentrations, relies on classification criteria that can lead to significantly different pollution risk maps, which, in turn, can depend on the regulatory standard itself. This paper reviews four popular classification criteria related to the violation of a probability threshold or a physical threshold, using annual (1996-2000) nitrogen dioxide concentrations from 40 air monitoring stations in Milan. The relative advantages and practical limitations of each criterion are discussed, and it is shown that some of the criteria are more appropriate for the problem at hand and that the choice of the criterion can be supported by the statistical distribution of the data and/or the regulatory standard. Finally, the polluted area is estimated over the different years and concentration thresholds using the appropriate risk maps as an additional source of uncertainty.

Introduction Uncertainty assessment that is associated with estimated concentrations of environmental variables at unmonitored locations has received increasing attention in recent years, as has the need to incorporate this uncertainty in subsequent decision-making procedures, such as the delineation of areas where human health and environment are at risk (1). In geostatistical applications, uncertainties in the estimated attribute values have mostly been assessed using nonlinear geostatistical methods (e.g., disjunctive or indicator kriging), which aim at evaluating the local probability distribution of the variable to be above a specific threshold (2, 3). Alternatively, stochastic simulation algorithms (e.g., sequential indicator or Gaussian simulation, turning band method, p-field simulation, or simulated annealing) can also be used: equiprobable representations of the spatial distribution of the values of the investigated variables are generated and * Corresponding author current address: Technological and Economic Risk Management Unit, Joint Research Centre, European Commission, TP 361, Enrico Fermi 1, 21020, Ispra, Italy; phone: +390332785287; fax: +390332785733, e-mail: [email protected]. † National Technical University of Athens. ‡ Technological and Economic Risk Management Unit. § Emissions and Health Unit. 10.1021/es034652+ CCC: $27.50 Published on Web 01/23/2004

 2004 American Chemical Society

the differences between the realizations are used to estimate the uncertainty (4-6). Several authors have attempted to compare the performances between nonlinear kriging methods (7) and stochastic simulation algorithms (8). The conclusions of these studies are essentially the same: there is no best method for modeling the local conditional cumulative distribution function (ccdf) for all cases, but rather a variety of algorithms from which to choose, or to build the algorithm best suited for the problem at hand. Once the choice of a kriging- or a simulation-based approach has been matured, the next step consists of applying the chosen method to the data and to have its performance assessed in terms of the estimation bias (i.e., the difference between estimated and observed values). In geostatistics, the estimation bias is calculated by means of leave-one-out or k-fold cross-validation techniques. In k-fold cross-validation, the data set of the n observed values is divided into k training and n-k testing subsets. The estimation error of the algorithm is then obtained from an average measure of the errors over the k tests (e.g. mean absolute errors, root-meansquared errors). The leave-one-out method is a particular case in which only one point is used at a time as test set and where k equals n. These techniques and others, such as bootstrapping and jackknifing (9) are also widely used to calibrate machine-learning algorithms (10). A drawback of these cross-validation techniques, when applied to spatial data, is that errors may be largely influenced by a few points that are hard to predict, as it is often the case for points located at the borders of the investigated area (11). Instead of using the average errors of a cross-validation approach, the accuracy plots (qualitative measure) and the G statistic (quantitative measure) can be used to assess the closeness between the estimated values and the fraction of true values falling into the symmetric p-probability intervals (12). An extended version of this measure takes the spread of the ccdfs into account (13). The ccdfs can then be used to find out where the boundary between polluted and safe areas lies in respect to established thresholds. Such decisions are generally based on classification criteria that can be related to the violation of either a probability threshold (e.g. critical, marginal, negotiated) or a physical threshold, or even to a financial cost function (14-18). Although uncertainty derived from interpolation errors is accounted for by the local ccdfs, the selection of the criterion for site/area classification will itself introduce another uncertainty, given that, with respect to a regulatory threshold, different criteria will lead to classification results that may greatly differ. Legitimate questions are whether some of these criteria are more appropriate for the problem at hand and whether the choice of the classification criterion can be supported by the statistical distribution of the data and/or the regulatory standard. The aim of this paper is therefore to investigate the relative advantages and limitations of applying different classification criteria. The discussion will be illustrated by means of an environmental case study involving nitrogen dioxide concentrations measured at several air quality monitoring sites in and around the area of Milan, Italy.

Theory The theory of geostatistics is based on regionalized variables with properties that are intermediate between a truly random variable and a completely deterministic variable (19). Such variables (e.g., nitrogen dioxide concentration) vary in space VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1275

but show a spatial structure that can be described by a model of the spatial correlation (semivariogram). The probabilistic way to model the uncertainty of a regionalized variable at a single location u, where u is a vector of spatial coordinates, consists of viewing the unknown z-value as the realization of a random variable Z(u), and deriving the ccdf of Z(u)

F(u; z|(n)) ) Prob {Z(u) e z|(n)} ∀z

(1)

where the notation “ |n” expresses conditioning to the local information, say, n neighboring z-data z(uR). Equation 1 fully models the uncertainty at u because it gives the probability that the unknown value is not greater than any given threshold z and it can be established using a variety of algorithms. In this work, an indicator approach (20) is adopted, whereby the function F(u;z | (n)) is assessed for K threshold values, zk, discretizing the range of variations of z

F(u;zk|(n)) ) Prob {Z(u) e zk|(n)} ∀ k ) 1,...,K

(2)

Each observation z(ua) is coded into a series of K binary values indicating whether the threshold zk is exceeded or not: the indicator value I(uR;zk) ) 1 if z(uR) g zk and zero otherwise. The F(u;zk | (n)) values can be obtained by kriging the unknown indicator I(u;zk) using the binary transforms of the neighboring sampling points; e.g., for ordinary kriging which is used here n(u)

[F(u;zk|(n))]* )

∑λ

R

(u;zk)‚I(uR;zk)

(3)

a)1

where the kriging weights λR(u;zk) are solutions of a system of (n + 1) linear equations (14). The only information required by the kriging system are semivariogram values at different lags, and these are readily derived from the semivariogram model fitted to experimental values. Because [F(u; zk|(n))]* values for the different zk are determined independently of one another, there is no guarantee that they will sum to unity, or that they will lie in the interval [0, 1], or even that the ccdf will be a nondecreasing function of the threshold value zk. Inconsistent probabilities can be avoided, a priori, by imposing the order relations as constraints in the kriging algorithm (14) or by accounting for the cross-correlation of indicators at the different cutoffs in a cokriging system (21). Such solutions could be, however, computationally expensive. Instead, the common practice is to correct a posteriori for order relation deviations, using for example an algorithm that determines the ccdf values so as to minimize the weighted sum of squares of corrections over the K thresholds under (K + 1) linear constraints (22). A more straightforward approach used hereafter consists of averaging the results of an upward and downward correction of the sample ccdf values, after having reset to 1.0 or 0.0 any estimates above 1.0 or below 0.0, respectively (21). The resolution of the discrete local ccdf can then be increased by using a technique known as “linear interpolation between tabulated bounds” to interpolate within each class (zk, zk+1) and extrapolate beyond extreme thresholds z1 and zK (14). Given a ccdf model, site (or area) classification can be made on the basis of different criteria. In this paper, four criteria for declaring a location as polluted are reviewed: (1) the risk of pollution (i.e., the estimated probability to be above a concentration level) exceeds an a priori defined probability threshold, (2) the risk of pollution exceeds the probability at which the misclassification rate is a minimum, (3) the risk of pollution exceeds the (declustered) proportion of sites that are above the concentration level, and (4) the mean pollutant concentration estimate retrieved from the ccdf is greater than a concentration level. The comparative 1276

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 5, 2004

performance of these four classification criteria is investigated using a mapping problem of annual nitrogen dioxide concentrations (NO2) in the ambient air. Another approach, originally proposed by Srivastava in ref 23 and often applied to soil contamination problems (14, 17, 24), consists of assessing the expected economical impact of both types of misclassification through a loss function and taking the decision that would minimize the economical loss. The loss function is usually asymmetric (different for the two types of misclassification) and the cost of underestimating the extent of pollution is likely to be greater than the cost of overestimating it (25). This approach has not been considered in the present study due to the inherent difficulties in providing loss functions with respect to the concentration level for specific air pollutants (e.g., isolating NO2 from other pollutants and expressing in monetary terms the long- and short-term impact of NO2 on human health and the environment) (26).

Materials and Methods The data set consists of annual NO2 concentrations from 40 fixed air quality monitoring stations in and around the area of Milan, covering the biggest part of the study area, as shown in Figure 1. Stations located close to roads have not been included in the dataset to avoid local impacts. The cumulative spatial distributions of the annual data show a shift toward lower concentration levels from the past (1996) to the more recent years (2000). For each year the local ccdf values at 1 km × 1 km grid were determined using point indicator kriging at the nine deciles of each annual cumulative distribution, except for 1998, 1999, and 2000 for which the first six deciles were used as the semivariograms for the higher deciles showed pure nugget effects (complete lack of spatial correlation). The sample support for NO2 measurements at stations of urban-background type, such as those of the present study, is of several kilometers according to the European Council Directive 1999/30/EC, which is, however, a rather general definition for the sample support. Given the significant variation in the annual NO2 concentrations in the study area at distances less than 2 km, we consider that the sample support of the annual measurements at the analyzed monitoring stations is 1 km2. We have selected to work on a 1 km2 grid resolution which is frequently used in public safety and atmospheric modeling (27, 28). Order relation deviations in the estimated probabilities were handled upon averaging the results of an upward and downward correction of the ccdf values (21). Depending on the year, corrected order relation deviations fluctuated between 5% (1999) and 12% (1996). The accuracy plots of the proportion of the crossvalidated estimates falling within probability intervals versus the probability indicate that the estimated fractions for the annual NO2 concentrations are very close to the theoretical ones, which is further confirmed by the G statistic values (12) that are 0.9 or higher for all years. The continuous ccdfs are estimated using “linear interpolation between tabulated bounds” to interpolate linearly within each class (zk, zk+1) and extrapolate beyond extreme thresholds z1 and zK using a negatively skewed power model with ω ) 2.5 and a hyperbolic model with ω ) 1.5, respectively (14). The ccdfs allow acknowledging the uncertainty about the NO2 concentrations in the study area, yet there are alternative ways to account for such uncertainty in the decision making process. In the following, four classification criteria are applied to the five annual datasets, with a view to define the limitations in the application of the criteria, and to compare their relative performance for six thresholds [46:4:66] µg/m3 that correspond roughly to the various deciles of the annual distributions (from the 1st to the 8th decile, depending on the year). At the same time, two of the selected thresholds, the 54 and 50 µg/m3, represent pilot annual limit values for

FIGURE 1. Classed post map in 2000 (left) and cumulative distribution in 1996-2000 (right) of the 40 monitoring sites measuring annual nitrogen dioxide (NO2) concentrations (µg/m3).

TABLE 1. Proportion of the Sites (40 Total) Measuring Annual Nitrogen Dioxide that Cannot Be Classified as Safe or Polluted When Using Criterion 1 (15) threshold (µg/m3)

1996

46 50 54 58 62 66

10 10 14 29 35 86

unclassified sites (%) 1997 1998 1999 5 5 17 22 55 91

15 19 21 48 96 98

7 13 35 97 93 89

2000 10 22 63 80 90 80

the years 2003 and 2005, respectively, set by the European Council Directive for NO2 for the preparation of the Member States with a view to meet the 40 µg/m3 annual limit value for the protection of human health by 2010. Criterion 1. A site is classified as polluted if the risk of pollution (i.e., estimated probability of the annual NO2 concentration to be above a concentration level) is greater than a negotiated probability threshold. The choice of the probability threshold is related to information about tolerable risks and is mainly a political or social decision. In absence of information about tolerable risks, two probability thresholds can be used to delineate safe areas (p e 0.2), hazardous areas (p > 0.8), and uncertain or unclassified areas (0.2 < p e 0.8) where further investigation should be conducted (15). Table 1 presents the proportion of the NO2 sites (40 in total) that could not be classified as safe or polluted, using this criterion, for the various concentration levels and years. The proportions of unclassified sites over the years and for the 46 µg/m3 threshold are found to be relatively low (about 5-15%). The results of this criterion for thresholds up to 50 µg/m3, when at least 78% of the monitoring sites can be classified as safe or polluted, might be considered as satisfactory. For higher thresholds, for example in 1999 for the 58 µg/m3 threshold, there might be as many as 97% of the sites that cannot be classified (probabilities lay in the range (0.2, 0.8]). Consequently, this criterion can be efficiently used for this NO2 data set for concentration thresholds up to 50 µg/m3. Criterion 2. A site is classified as polluted if the risk of pollution is greater than the critical probability threshold, pc, at which the misclassification rate is a minimum (17). Figure 2 shows the probability threshold versus the misclassification rate in 1997 for the six thresholds. The intercepts indicate that the higher the value of the concentration threshold, the higher the proportion of misclassified sites for probability thresholds close to zero. The pc value that corresponds to a minimum misclassification rate identifies the acceptable risk of pollution and is used to delineate safe and polluted sites. In 1997, an 85% risk of pollution is enough to identify an area as polluted with respect to the 58 µg/m3 threshold, whereas for a 51% risk of pollution a location can

FIGURE 2. Impact of the probability threshold pc on the proportion of monitoring sites that are wrongly classified as safe or polluted for six nitrogen dioxide thresholds ranging between 46 and 66 µg/ m3 in 1997. A minimum misclassification rate is reached for the thresholds of 58 µg/m3 (at pc ) 0.85) and 66 µg/m3 (at pc ) 0.51). be declared as polluted with respect to the 66 µg/m3 threshold. However, out of the 30 cases studied (6 thresholds × 5 years), a pc value could be identified in only 15 cases. This lack of a minimum misclassification rate for certain cases in the present study could be due to the number of sites. It is expected that if more monitoring sites were available, there would have been far fewer cases where a pc could not be selected. The eventual lack of a minimum misclassification rate for the selection of a pc is one of the limitations of this approach as a classification criterion. Therefore, in case the minimum misclassification rate does not correspond to a single value of a probability threshold, we have selected the pc by following eq 4

pc )

{

max(arg min(M(p)), if max (arg min(M(p)) * 1 (4) min(arg min(M(p)), otherwise

where M(p) is the misclassification rate in function of the probability threshold. Criterion 3. A site is classified as polluted if the risk of pollution is greater than the proportion of sites that exceed the concentration level (marginal probability of pollution) (17, 18). No declustering issue prior to the calculation of the marginal probability is raised, as the data present no VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1277

FIGURE 3. Basic steps for classifying the study area in 2000 as safe/polluted with respect to the 54 µg/m3 nitrogen dioxide threshold. (a) Local conditional cumulative distribution functions; (b) probability map obtained through indicator kriging (circles indicate the locations of the 40 monitoring sites); (c) E-type estimates (i.e., mean of local ccdfs) in µg/m3 obtained through a Latin hypercube sampling. Different subdivisions of the study area into safe (light gray), polluted (black), or unclassified (white) zones using: (d) two probability thresholds defined a priori (0.2 and 0.8), (e) the marginal probability of pollution (0.60), (f) the critical probability threshold (0.44), and (g) the E-type estimates. preferential sampling scheme. A significant linear correlation Council Directive is 54 µg/m3. Figure 3 summarizes the main pattern (R2 ) 0.707) is revealed between the critical probability steps that have been followed for the classification of the threshold and the marginal probability of pollution for the study region into polluted/safe zones in 2000 with respect 15 cases of clear pc minima in the 1996-2000 NO2 data set. to the 54 µg/m3: first the local ccdfs (graph a), then the However, for the remaining cases, the correlation between probability map (graph b), and the E-type estimates map the critical and the marginal probability thresholds does not (graph c), and finally the four alternative classification maps indicate any degree of linearity (R2 ) 0.047). In a previous of the study region (graphs d-g) have been produced. The study (17), the critical probability threshold (pc ) 0.35) for light gray zones are the areas of lowest risk of pollution (safe delineating cadmium contaminated sites, which showed a areas). Black shaded areas indicate regions where the annual clear minimum, was sensibly different from the proportion NO2 concentration is most likely to exceed the 54 µg/m3 of cadmium data that actually exceeded the concentration (polluted areas). White areas in map d cannot be classified threshold (76.4%). as polluted or safe because the estimated probabilities lie in Criterion 4. A site is classified as polluted if the estimated the interval [0.2, 0.8]. It can be seen that the four criteria mean concentration derived from the local ccdf (E-type generate maps with significantly different patterns. In estimate) is greater than the concentration level itself (14, particular, the contrast between the subdivision of the study 16). In practical terms, the E-estimates have been estimated region on the basis of the E-type estimates (graph g) and using Latin hypercube sampling (LHS) of 100 sampling points using the various probability thresholds (graphs d-f) is more to ensure that each ccdf has all proportions of its statistical pronounced. Reasonable questions are which criterion is distribution represented by the sampled values. LHS involves more reliable for the identification of the polluted area in the stratifying the probability distribution into M disjunct study region and whether more criteria, though resulting in equiprobable classes, and then drawing randomly a value different outcomes, should be considered as equivalent in within each class. LHS requires far fewer samples than terms of decision-making consequences. random sampling for similar accuracy in the estimation of Figure 4 shows, per classification criterion and threshold, the mean of the population distribution (29). the mean and the range of the misclassification rates (F+, Results and Discussion false positives; F-, false negatives; and F(, total misclassifications) over the years, which are calculated through crossFor nitrogen dioxide, the tolerable maximum in 2003 for the validation. Criterion 1 based on the double-probability to protection of human health as defined by the European 1278

9


FIGURE 4. Average misclassification rates (false positives, false negatives, and total) for four criteria and six annual nitrogen dioxide concentration thresholds. Bounds represent minimum and maximum misclassification rates over 1996-2000. delineate safe areas (p e 0.2), hazardous areas (p > 0.8), and unclassified areas (0.2 < p e 0.8) has been applied efficiently for all types of misclassifications for thresholds up to 50 µg/ m3. Up to this threshold, the number of unclassified sites is rather low (22% at a worst case in 2000, see Table 1). For higher concentration thresholds, there are too many unclassified sites to allow for this criterion to be reasonably used. The method has similar performance to Criteria 2 and 3 for the low concentration thresholds (e 50 µg/m3), and thanks to its conceptual simplicity and straightforward application it can be preferred. The criterion based on the critical probability threshold (Criterion 2) performs well in terms of F+ and F( for all the concentration thresholds, but the F- rate for thresholds of 58 and 62 µg/m3 could be higher than 15%. Since this criterion underestimates the polluted sites/areas with respect to high thresholds, it would not be recommended when human health issues are of primary concern. Yet, if equal importance is attached to both the F+ and F-, this approach scores the lowest total misclassification rates in this case study. The marginal probability of pollution (Criterion 3) behaves similarly to the previous criterion in terms of F+, particularly for thresholds lower than 54 µg/m3. However, it has a higher number of F- than the previous criterion for thresholds e 54 µg/m3. Over the years, the F+ and F- can differ significantly between these two criteria. The most pronounced difference between these two criteria is found at 58 µg/m3, where the marginal probability results in a maximum F+ rate of 20%, while for the critical probability the worst case is 5.8%. For the F-, the greatest distance in the performance of the two criteria is found at 46 µg/m3, where the worst case of F- is almost 10% for the marginal probability and just 1.7% for the critical probability. All things considered, the marginal probability of pollution is recommended for thresholds e54 µg/m3 when a low value for F+ or F( is required.

The E-type estimate (Criterion 4) tends to overestimate the concentrations, and thus inaccurately classifies a great number of sites as polluted, particularly with respect to thresholds e62 µg/m3. A similar conclusion was found previously (17). On the other hand, this criterion hardly ever misses the identification as polluted of a site at which the estimated mean concentration actually exceeds the concentration threshold. Therefore, its application is recommended when human health issues are the major concern, even if unnecessary actions/costs are entailed. It is interesting to note that all four criteria presented here are conservative enough to be accepted by a regulator for thresholds lower than the 54 µg/m3 (pilot annual limit value for the year 2003), as the false negatives rates are slightly over 10% at the worst case. To summarize the overall conclusions, Table 2 recapitulates the strong and weak points of the four classification criteria based on the five-year NO2 data set. Recommended application boundaries for each criterion are given in Table 3 with respect to the concentration threshold and the type of misclassification. In an environmental/health study, where, for example, the total of false positives and false negatives should be kept to a low value, for the thresholds e50 µg/m3 the double probability (Criterion 1) and the critical probability (Criterion 2) are recommended for area classification. For the higher concentration thresholds, the critical probability and the marginal probability of pollution (Criterion 3) are suggested to be used. Table 4 shows the percentage of the total study area (5778 km2) that is expected to be polluted using the recommended criteria per case, as far as the total misclassifications are concerned. Thus, for example, between 31 and 43% of the study region is estimated (according to Criteria 2 and 3) to be polluted with respect to the 54 µg/m3 in 2000. The corresponding classification maps were previously shown in Figure 3 (maps e and f, respectively). On the other hand, the maps d and g in Figure 3 are not considered VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1279

TABLE 2. Strong and Weak Points of the Classification Criteriaa criterion

strong points

double probability (C1) (15) critical probability threshold (C2) (17)

marginal probability of pollution (C3) (2, 19) E-type estimate (C4) (14, 16)

a

weak points

Conceptual simplicity and ease at calculation. It can be applied efficiently, in terms of both F+ and F- for thresholds e 50 µg/m3. Good performance in terms of F+, and total misclassified sites for all the concentration thresholds. Fairly good performance in terms of F- up to the 54 µg/m3 threshold. Good performance in terms of F+ for thresholds e54 µg/m3. Good overall performance, similar to the performance of C1, for thresholds g 50 µg/m3. Fairly good behavior in terms of F+ and F( for very high thresholds (66 µg/m3). Best performance in terms of F- for all concentration thresholds. Recommended when human health is of concern.

Not recommended for thresholds g 54 µg/m3 due to many unclassified sites. A critical probability that corresponds to the minimum misclassification rate cannot always be identified. Yet, this effect might be reduced with a great number of monitoring sites/sampling points. Poorer performance in terms of F- than the critical probability threshold for thresholds e54 µg/m3. Poor performance in terms of F+ for thresholds e62 µg/m3. Not recommended due to unnecessary remedial actions/costs.

F+, false positives; F-, false negatives; F(, total misclassifications.

TABLE 3. Recommended Application Boundaries for Each Classification Criterion with Respect to the Concentration Thresholda 46-50 µg/m3 54-66 µg/m3 criterion

F( F+ F- F( F+ F-

double probability (C1) critical probability threshold (C2) marginal probability of pollution (C3) E-type estimate (C4)

x x -

x x x -

x x x

x x -

x -

x

a Classification is assessed in terms of total misclassifications (F(), false positives (F+), and false negatives (F-) for the 1996-2000 nitrogen dioxide data set.

TABLE 4. Estimated Range of the Polluted Area in the 5778 km2 Study Region for the Different Years and Thresholds, Using the Recommended Criteria (For the Case of Total Misclassifications) as a Source of Uncertainty threshold (µg/m3)

1996

46 50 54 58 62 66

[72, 77] [60, 76] [49, 58] [34, 37] [31, 33] [30, 31]

polluted area (%) 1997 1998 1999 [92, 98] [73, 96] [48, 50] [42, 47] [30, 35] [29, 32]

[84, 96] [62, 66] [53, 54] [37, 50] [27, 31] [23, 29]

[73, 92] [43, 57] [40, 44] [12, 37] [6, 27] [6, 25]

2000

criteria used

[43, 53] [42, 46] [31, 43] [22, 24] [21, 23] [0, 19]

C1, C2 C1, C2 C2, C3 C2, C3 C2, C3 C2, C3

to be reliable given the conclusions drawn above. As for spatial prediction and modeling of local uncertainty, decision-makers who are entrusted with the task of delineating polluted and safe zones are faced with the difficult choice of the classification criterion. The latter could be related to a probability threshold, a physical threshold, or even a costbenefit criterion. Studies have so far investigated this issue only with respect to one concentration threshold or a single type of data distribution (14-18). Furthermore, from a regulatory perspective, only a single negotiated probability threshold of 0.2 has actually been used at a real site for making soil remediation decisions (30), whereas there is no information on the application or actual adoption of any of these classification criteria on air quality issues. The current literature on air quality suggests that in the identification of polluted locations, modeled maps are applied in a straightforward way (simple comparison between estimated concentrations and the regulatory threshold), without often accounting for the uncertainty in the measurements or estimations, as recommended by the European Council Directive for NO2 1999/30/EC. It was necessary, therefore, to 1280

9


comparatively assess alternative classification criteria that account for such uncertainty, in respect to different concentration thresholds (regulatory standards in general) and data distributions (over time) for a given data configuration (monitoring network) and to present their advantages and limitations using a data set of annual nitrogen dioxide concentrations in the ambient air. The overall conclusion is that some of the classification criteria are more appropriate for the problem at hand and that the choice of the criterion can be supported by the statistical distribution of the data and/or the regulatory standard.

Acknowledgments M.S. gratefully acknowledges financial support from the European Commission, Joint Research Centre, Institute for the Protection and Security of the Citizen (Italy) and the Bodossakis Foundation (Greece). Fesil Mustaq from the Joint Research Centre is kindly thanked for his suggestions in revising the manuscript.

Literature Cited (1) Spear, M.; Hall, J.; Wadsworth, R. In Proceedings of Spatial Accuracy Assessment in Natural Resources and Environmental Sciences; Fort Collins, CO, 1996; pp 199-207. (2) Juang, K. W.; Lee, D. Y. Environ. Sci. Technol. 1998, 32, 24872493. (3) Saito, H.; Goovaerts, P. Environ. Sci. Technol. 2000, 34, 42284235. (4) Lapen, D. R.; Topp, G. C.; Hayhoe, H. N.; Gregorich, E. G.; Curnoe, W. E. Geoderma 2001, 104, 325-343. (5) Pachepsky, Y.; Acock, B. Geoderma 1998, 85, 213-229. (6) McKenna, S. A. Environ. Eng. Geosci. 1998, 4, 175-184. (7) Papritz, A.; Dubois, J. P. In Geoenv II-Geostatistics for Environmental Applications; Go´mez-Hernańdez, J., Soares, A., Froidevaux, R., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1999; pp 429-440. (8) Gotway, C. A.; Rutherford, B. M. In Geostatistical Simulations; Armstrong, M., Dowd, P. A., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994; pp 1-21. (9) Efron, B.; Gong, G. Am. Stat. 1983, 37, 36-48. (10) Reich, Y.; Barai, S. V. Artificial Intelligence in Engineering 1999, 13, 257-272. (11) Davis, B. M. Math. Geol. 1987, 19, 241-248. (12) Deutsch, C. V. In Geostatistics Wollongong ‘96; Baafi, E. Y., Schofield, N. A., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997; pp 15-26. (13) Goovaerts, P. Geoderma 2001, 103, 3-26. (14) Goovaerts, P. Geostatistics for Natural Resources Evaluation; Oxford University Press: New York, 1997. (15) Garcia, M.; Froidevaux, R. In Geoenv I-Geostatistics for Environmental Applications; Soares, A., Go´mez-Hernańdez, J., Froidevaus, R., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997; pp 309-325.

(16) Mohammadi, J.; Van Meirvenne, M.; Goovaerts, P. In Geoenv I-Geostatistics for Environmental Applications; Soares, A., Go´mez-Hernańdez, J., Froidevaus, R., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997; pp 360-372. (17) Goovaerts, P.; van Meirvenne, M. In Proceedings of the Seventh Annual Conference of the International Associations for Mathematical Geology, Cancun, Mexico, 2001; pp 1-12. (18) Saito, H.; Goovaerts, P. Environmetrics 2002, 13, 555-567. (19) Matheron, G. Econ. Geol. 1963, 58, 1246-1266. (20) Journel, A. G. Math. Geol. 1983, 15, 445-468. (21) Deutsch, C.; Journel, A. G. GSLIB: Geostatistical Software Library and User’s Guide; Oxford University Press: New York, 1998. (22) Sullivan, J. In Geostatistics for Natural Resources Characterization; Verly G., David, M.; Journel, A. G.; Marechal, A., Eds.; Reidel Publishers: Dordrecht, 1984; pp 365-384. (23) Srivastava, R. M. CIM Bull. 1987, 80, 63-68. (24) Goovaerts, P.; Journel, A. G. Eur. J. Soil Sci. 1995, 46, 397-414. (25) Myers, J. C. Geostatistical Error Management: Quantifying Uncertainty for Environmental Sampling and Mapping; Van Nostrand Reinhold: New York, 1997.

(26) European Commission. Economic Evaluation of Air Quality Targets for Sulphur Dioxide, Nitrogen Dioxide, Fine and Suspended Particulate Matter and Lead; European Commission: Luxembourg, 1997 (27) Chow, J. C.; Engelbrecht, J. P.; Watson, J. G.; Wilson, W. E.; Frank, N. H.; Zhu, T. Chemosphere 2002, 49, 961-978. (28) Nussbaum, S.; Remund, J.; Rihm, B.; Mieglitz, K.; Gurtz J.; Fuhrer, J. Environ. Int. 2003, 29, 385-392. (29) McKay, M. D.; Beckman, R. J.; Conover, W. J. Technometrics 1979, 21, 239-245. (30) Buxton, B. E.; Wells, D. E.; Diluise, G. In Geostatistics Wollongong ‘96; Baafi, E. Y., Schofield, N. A., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997; pp 984-995.

Received for review June 25, 2003. Revised manuscript received November 7, 2003. Accepted December 16, 2003. ES034652+

VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1281

Classification Criteria and Probability Risk Maps: Limitations and

Recommend Documents