Simple Indicator Kriging for Estimating the Probability of Incorrectly

up, where Ho: xo is safe; Ha: xo is hazardous. For both hypotheses, safe and hazardous, the stationary mean Io is assigned to be 1 and 0, respectively...
0 downloads 0 Views 328KB Size
Environ. Sci. Technol. 1998, 32, 2487-2493

Simple Indicator Kriging for Estimating the Probability of Incorrectly Delineating Hazardous Areas in a Contaminated Site KAI-WEI JUANG AND DAR-YUAN LEE* Graduate Institute of Agricultural Chemistry, National Taiwan University, Taipei, Taiwan

The probability of incorrectly delineating hazardous areas in a contaminated site is very important for decisionmakers because it indicates the magnitude of confidence that decision-makers have in determining areas in need of remediation. In this study, simple indicator kriging (SIK) was used to estimate the probability of incorrectly delineating hazardous areas in a heavy metal-contaminated site, which is located at Taoyuan, Taiwan, and is about 10 ha in area. In the procedure, the values 0 and 1 were assigned to be the stationary means of the indicator codes in the SIK model to represent two hypotheses, hazardous and safe, respectively. The spatial distribution of the conditional probability of heavy metal concentrations lower than a threshold, given each hypothesis, was estimated using SIK. Then, the probabilities of false positives (R) (i.e., the probability of declaring a location hazardous when it is not) and false negatives (β) (i.e., the probability of declaring a location safe when it is not) in delineating hazardous areas for the heavy metal-contaminated site could be obtained. The spatial distribution of the probabilities of false positives and false negatives could help in delineating hazardous areas based on a tolerable probability level of incorrect delineation. In addition, delineation complicated by the cost of remediation, hazards in the environment, and hazards to human health could be made based on the minimum values of R and β. The results suggest that the proposed SIK procedure is useful for decision-makers who need to delineate hazardous areas in a heavy metalcontaminated site.

Introduction In assessing soil contamination, knowledge of the spatial distribution of pollutants in a contaminated site is often needed for successful risk analysis. For example, if remediation of soils containing heavy metal concentrations above some maximum allowable limit is desired, the locations of these soils within a site should be known (1). A well-known technique known as kriging has been widely used to estimate the contaminant levels at unsampled locations in a contaminated site. Applications of kriging in contaminated sites have been investigated frequently. For example, Litaor (2) investigated plutonium and americium contamination of soils around Rocky Flats, CO, using ordinary kriging. Samra and Gill (3) used ordinary kriging to model the variation in * Corresponding author telephone: 886-2-23630231, ext. 2479; fax: 886-2-23638192; e-mail:[email protected]. S0013-936X(97)00600-7 CCC: $15.00 Published on Web 07/30/1998

 1998 American Chemical Society

a sodium-contaminated soil and associated tree growth. Arrouays et al. (4) used the kriging technique to investigate Pb variability in contaminated soils. Stein et al. (5) studied zinc concentrations in groundwater at different scales using geostatistical techniques. Zhu et al. (6) investigated the distribution of radon concentrations of groundwater using logarithmic kriging. When the threshold that is considered hazardous or undesirable is known, we can make a decision easily by means of comparison between the threshold and the kriging estimate at each unsampled location. Unfortunately, however, for a location in which it is found that the estimated value is close to the threshold, we remain in doubt about the true situation. That is, the true value at that location might exceed the threshold even though the estimate is less, or the true value at that location might be less than the threshold even though the estimate is higher. Therefore, identifying the probability of incorrectly delineating hazardous areas at an unsampled location is more important for decision-makers than is obtaining the best estimated value of a pollutant’s contents. The probability of incorrect delineation indicates the magnitude of confidence in determining hazardous areas. Recently, indicator kriging (IK) has been used to determine the probability distribution of pollutant concentrations lower than a threshold that is considered hazardous (1, 7). IK is a nonparametric approach and is entirely distribution free (8). The indicator approach differs from the theories of traditional multi-Gaussian geostatistics and disjunctive kriging in that it attempts to estimate the conditional probability distribution directly (9, 10). Thus, IK can do both jobs: it can handle highly variant phenomena without having to trim off important high-valued data and provide risk-qualified estimates. The IK procedure under the robust ordinary kriging system (11) is usually called ordinary indicator kriging (OIK). The probability of pollutant concentrations lower than a threshold obtained using OIK should be coupled with estimated pollutant concentrations obtained using ordinary kriging to obtain the probability of incorrectly delineating hazardous areas (7). Thus, in this paper, a different approach is proposed to directly estimate the probability of incorrectly delineating hazardous areas in a contaminated site without coupling OIK with kriging-estimated pollutant concentrations. We incorporate the hypotheses directly into the simple indicator kriging (SIK) procedure. In this procedure, the values of 0 and 1 are assigned to be the stationary means of the indicator codes to represent two hypotheses. The null hypothesis is that an unsampled location is safe (i.e., the true concentration of heavy metal is lower than or equal to the threshold), and the alternative hypothesis is that an unsampled location is hazardous (i.e., the true concentration of heavy metal is greater than the threshold). The spatial distribution of the conditional probability of heavy metal concentrations lower than the threshold, given each hypothesis, can be estimated using the SIK technique (11). Then, the probabilities of false positives (R) (i.e., the probability of declaring a location hazardous when it is not) and false negatives (β) (i.e., the probability of declaring a location safe when it is not) in delineating hazardous areas in a heavy metal-contaminated site can be obtained. Spatial distributions of the probabilities of false positives and false negatives are very important for prediction of the cost of remediation and for assessment of health and environmental hazards. Moreover, when probabilities of both false positives (R) and false negatives (β) are considered simultaneously, delineation of hazardous areas based on the minimum values of R and β can be performed. VOL. 32, NO. 17, 1998 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2487

Theory

of false positives) is denoted as R in eq 7:

The SIK technique has been described in several reports (8, 11). In the SIK procedure, the values of a pollutant’s concentrations, z(x), are first transformed into indicator codes, I(x,zc), given a threshold zc:

{

(1)

In eq 1, z(x) e zc means that the location x is safe, and z(x) > zc means that the location x is hazardous. Indicator codes transformed from concentration observations can be used to construct spatial structures. The spatial structure is shown by the semivariogram

1

N(h)

∑ [I(x + h,z ) - I(x ,z )]

2

2N(h) i)1

i

c

i

(2)

c

where γI(h) is the semivariance of the indicator codes and N(h) is the number of pairs of I(x,zk) at a separation distance h. Next, suppose the linear estimator used to estimate the conditional probability of z(x) e zc at an unsampled location xo is

prob*[z(xo) e zc|(n),Io] )

∑λ I(x ,z ) + (1 - ∑λ )I i

i

c

i

o

(3)

where λi are weights associated with I(xi,zc) of sampled locations xi, (n) is the information set consisting of the n indicator codes I(xi,zc), where i ) 1, 2, ..., n, and Io is the stationary mean of the indicator codes. Equation 3 is called the SIK estimator. Furthermore, a hypothesis test can be set up, where Ho: xo is safe; Ha: xo is hazardous. For both hypotheses, safe and hazardous, the stationary mean Io is assigned to be 1 and 0, respectively. Thus, eq 3 can be rewritten as

prob*[z(xo) e zc|(n),Io ) 1] )

∑λ I(x ,z ) + (1 - ∑λ ) i

i

c

for hypothesis Ho and

∑λ I(x ,z ) i

i

c

(5)

for hypothesis Ha. Both conditional probabilities, given both hypotheses, at an unsampled location are then estimated using the SIK estimator. When using the SIK system as shown in eq 6, the weight λi associated with I(xi,zc) can be solved. Then, we can obtain the estimate, prob*[z(xo) e zc|(n), Io ) 1] or prob*[z(xo) e zc|(n), Io ) 0]: N(h)

∑ λ γ (x - x ,z ) ) γ (x j

i

c

I

o

- xj,zc)

i, j ) 1, 2, ..., n (6)

where the semivariances γI(xj - xi,zc) and γI(xo - xj,zc) are derived from the semivariogram as shown in eq 2. In the equation, n is the number of indicator codes chosen for estimation. Then, the estimated conditional probabilities for both hypotheses obtained using eqs 4 and 5 are used to calculate the probabilities of false positives (R) and false negatives (β) when delineating hazardous areas. For any location xo, the probability of rejecting Ho (i.e., the probability 2488

9

and the probability of rejecting Ha (i.e., the probability of false negatives) is denoted as β in eq 8:

β ) prob[z(xo) e zc|(n), xo is hazardous] ) prob*[z(xo) e zc|(n), Io ) 0]

(8)

In contaminated sites, various pollutants may be present simultaneously. Delineation of hazardous areas should be performed based on the spatial distributions of multiple pollutants simultaneously. The composite probability (CP) of incorrectly delineating hazardous areas for multiple pollutants is proposed here. For two pollutants, pollutants 1 and 2, with independent misclassification errors, the CP (i.e., the composite probabilities of false positives (RCP) and of false negatives (βCP)) can be calculated using eqs 9 and 10:

RCP ) prob[z1(xo) > zc1 or z2(xo) > zc2|(n), xo is safe (i.e., actually z1(xo) e zc1 and z2(xo) e zc2)] ) R1R2 + R1(1 - R2) + (1 - R1)R2 ) 1 - (1 - R1)(1 - R2)

(9)

where zc1 and zc2 are thresholds for pollutants 1 and 2, respectively, R1 and R2 are the probabilities of false positives when delineating hazardous areas for pollutants 1 and 2, respectively, and

βCP ) prob[z1(xo) e zc1 and z2(xo) e zc2|(n), xo is hazardous (i.e., actually z1(xo) > zc1 or z2(xo) > zc2)] ) β1β2 + (1 - R1)β2 + β1(1 - R2)

prob*[z(xo) e zc|(n),Io ) 0] )

i I

(7)

i

(4)

i)1

) prob*[z(xo) > zc|(n), Io ) 1] ) 1 - prob*[z(xo) e zc|(n), Io ) 1]

1, if z(x) e zc I(x,zc) ) 0, otherwise

γ1(h) )

R ) prob[z(xo) > zc|(n), xo is safe]

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 32, NO. 17, 1998

) (1 - R1 + β1)(1 - R2 + β2) - (1 - R1)(1 - R2) (10) where β1 and β2 are the probabilities of false negatives when delineating hazardous areas for pollutants 1 and 2, respectively. Equation 9 indicates that a composite false positive occurs when either pollutant is predicted to be above the given critical concentration but both actually are not. Equation 10 indicates that a composite false negative occurs only when both pollutants are predicted to be below the given critical concentration but at least one of them is actually above the critical concentration. If one considers that the misclassification errors of pollutants 1 and 2 are dependent, then the definitions of the composite probabilities of false positives RCP and false negative βCP will be more complex, and they cannot be described as the combinations of the probabilities R1, R2, β1, and β2.

Materials and Methods Soil Sampling and Heavy Metal Measurements. The study site is a paddy field about 10 ha in area, situated in Taoyuan County, Taiwan (Figure 1). The soil is Typic Paleudult according to Soil Taxonomy. The paddy field was irrigated with water, which was contaminated by heavy metals from

FIGURE 1. Study site and the sampling points. the discharge of a chemical plant through irrigation channels (12). Because the heavy metals in the discharge from the chemical plant were mainly Cd and Pb, the concentrations of the two metals in the soils of the study area were determined. The basic sampling design for this study site has been presented in detail in our previous study (13). We obtained samples from each block, which was surrounded by paddy ridges and owned by a farmer. One sample was taken from each small block, and two samples were taken from each large block. Figure 1 shows the sampling locations; the total number of sampling points was 55. The soil (0-15 cm) was sampled at each sampling location. Ten grams of soil taken from each soil sample (passed through 20 mesh) and 100 mL of 0.1 M HCl were placed in a 250-mL flask. Each flask was shaken for 1 h. The soil suspensions were filtered through Whatman No. 42 filter paper. Cadmium and Pb in the filtrates were then determined by means of atomic absorption spectroscopy (12, 14). The Cd and Pb concentrations for each sampling point are shown in Figure 2, panels a and b, respectively. All observed Cd and Pb concentrations were higher than the background concentrations of Cd and Pb for this type of soil, which were 0.05 and 8.02 mg/kg, respectively. Spatial Estimations of Probabilities of Incorrectly Delineating Hazardous Areas. In this study, we used the SIK technique to estimate the probability of incorrectly delineating hazardous areas in the heavy metal-contaminated site. In Taiwan, based on an island-wide reconnaissance survey of 0.1 M HCl-extractable heavy metal contents in agricultural soils, soils with 0.1 M HCl-extractable Cd concentrations higher than 10 mg/kg are considered to be hazardous. The upper limit of the background concentration of 0.1 M HClextractable soil Pb is 15 mg/kg in Taiwan. Soils with 0.1 M HCl-extractable Pb concentrations higher than 15 mg/kg are suspected to be Pb contaminated (15). Thus, the thresholds of the soil 0.1 M HCl-extractable Cd and Pb concentrations for declaring a location hazardous were set to be 10 and 15 mg/kg, respectively, in this study. The cutoff value of 15 mg/kg (0.1 M HCl extracted), used as a cleanup standard for Pb, may be too low in practice. First, the 55 data points of

FIGURE 2. Postplots of soil (a) Cd concentrations (mg/kg) (1st quartile, 2.68 < + e6.96; 2nd quartile, 6.96 < × e9.90; 3rd quartile, 9.90 < 0 e15.36; 4th quartile, 15.36 < / e148.15) and (b) Pb concentrations (mg/kg) (1st quartile, 9.52 < + e15.82; 2nd quartile, 15.82 < × e22.14; 3rd quartile, 22.14 < 0 e28.67; 4th quartile, 28.67 < / e126.67) in the study site. VOL. 32, NO. 17, 1998 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2489

TABLE 1. Descriptive Statistics of Soil Cd and Pb Concentrations in the Study Site (n ) 55) mean SD median minimum maximum (mg/kg) (mg/kg) (mg/kg) (mg/kg) (mg/kg) Cd Pb

19.43 30.80

25.13 26.52

9.93 22.18

2.68 9.52

148.15 126.67

CV (%)

skewness

129.33 86.41

3.27 2.35

FIGURE 3. Postplots of the indicator codes of soil (a) Cd concentrations (indicator code ) 1, where Cd e 10 mg/kg; indicator code ) 0, where Cd > 10 mg/kg) and (b) Pb concentrations (indicator code ) 1, where Pb e 15 mg/kg; indicator code ) 0, where Pb > 15 mg/kg) in the study site. the soil Cd and Pb concentrations were transformed into indicator codes, given as zc ) 10 and 15 mg/kg, using eq 1, respectively. The transformed indicator codes are shown in Figure 3, panels a and b. Then, the experimental semivariograms and least-squares fitted models of the indicator codes transformed from the soil Cd and Pb concentrations were employed using the GS+ software (16). Next, the conditional probabilities of soil Cd concentrations lower than 10 mg/kg for both hypotheses at unsampled locations were estimated using the SIK estimator (eq 4 and 5). Likewise, the conditional probabilities of soil Pb concentrations lower than 15 mg/kg for both hypotheses at unsampled locations were also estimated. SIK estimation was performed using the GEOEAS software (17). The estimated conditional probabilities for both hypotheses were used to calculate the probabilities of false positives (R) and false negatives (β) when delineating hazardous areas using eqs 7 and 8, respectively. Moreover, the composite probabilities of incorrectly delineating hazardous areas when considering Cd and Pb simultaneously were obtained using eqs 9 and 10.

Results and Discussion Descriptive Statistics. The descriptive statistics of the Cd and Pb concentration data are listed in Table 1, including the mean, standard deviation, coefficient of variation (CV, %), skewness, median, minimum, and maximum. The mean values of the soil Cd and Pb concentrations (19.43 and 30.80 mg/kg) of this site are much larger than those of the background soil Cd and Pb concentrations (0.05 and 8.02 mg/kg). Moreover, the means are greater than the medians, 2490

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 32, NO. 17, 1998

FIGURE 4. Semivariograms of the indicator codes of soil (a) Cd and (b) Pb concentrations in the study site. so the distributions of both Cd and Pb concentrations are positively skewed. The high skewness and large CV also indicate that there are some extreme values in the data sets. The descriptive statistics show that the spatial distributions of the Cd and Pb concentrations were not homogeneous in this study area. Thus, delineation of hazardous areas should be done cautiously, and an appropriate method should be applied. Spatial Analysis. The experimental and spherical modelfitted semivariograms of the indicator codes transformed from the soil Cd and Pb concentrations are shown in Figure 4, panels a and b. The semivariogram of Cd has a nugget of 0.082, a sill of 0.291, and a range of 192.5 m while that of Pb has a nugget of 0.104, a sill of 0.183, and a range of 226.8 m. The ratio of nugget to sill expressed as a percentage can be regarded as a criterion for classifying the spatial dependence (18). The ratios of nugget to sill for these spatial structures, as shown in Figure 4, panels a and b, were close to 28% and 57%, respectively. Thus, the indicator codes transformed from the Cd and Pb concentrations indicated strong and moderate spatial dependence, respectively. Therefore, it was appropriate to estimate the spatial distributions of the conditional probabilities of Cd and Pb concentrations lower than the thresholds using the simple indicator kriging technique. Probability of Incorrect Delineation Estimated Using SIK. Shown in Figures 5 and 6 are the spatial distributions of the probabilities of false positives (RCd and RPb) and of

FIGURE 5. Spatial distributions of the probabilities (a) of false positives (rCd) and (b) of false negatives (βCd) when delineating hazardous areas for soil Cd concentrations. The blank spaces denote the areas for which there is no data for probability estimation.

FIGURE 7. (a) Delineating map for soil Cd concentrations based on the minimum values of rCd and βCd and (b) associated probability of incorrect delineation. The blank spaces are areas for which there is no data for probability estimation.

FIGURE 6. Spatial distributions of the probabilities (a) of false positives (rPb) and (b) of false negatives (βPb) when delineating hazardous areas for soil Pb concentrations. The blank spaces denote the areas for which there is no data for probability estimation.

FIGURE 8. (a) Delineating map for soil Pb concentrations based on the minimum values of rPb and βPb and (b) associated probability of incorrect delineation. The blank spaces are areas for which there is no data for probability estimation.

false negatives (βCd and βPb) when delineating hazardous areas for soil Cd and Pb concentrations in the contaminated site, respectively. The probability of false positives (R) indicates

the probability of declaring a location hazardous when it is not. In this case, a false positive decision will result in waste of resources used in remediation. Therefore, if the budget VOL. 32, NO. 17, 1998 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2491

FIGURE 9. Spatial distributions of the composite probabilities (a) of false positives (rCP) and (b) of false negatives (βCP) when delineating hazardous areas using simultaneous consideration of the soil Cd and Pb concentrations. The blank spaces denote the areas for which there is no data for probability estimation. for remediation is limited, the spatial distributions of the probabilities of false positives (RCd and RPb) can be used to delineate hazardous areas in need of remediation based on a tolerable probability level of incorrect delineation. The probability of false negatives (β) indicates the probability of declaring a location safe when it is not. In this case, a false negative decision will result in hazards to human health and the environment. If heavy metal hazards to human health and the environment are of much greater concern than the cost of remediation, the spatial distribution of the probability of false negatives (βCd and βPb) should be used to delineate hazardous areas based on a tolerable probability level of incorrect delineation. It is not appropriate to only consider the cost of remediation or only hazards of heavy metal to the environment and human health. As a rule, delineation based on both economic and safety considerations is most satisfactory. Thus, a compromise method, which is based on the minimum values of R and β as shown in isopleth maps of the conditional probability (Figures 5 and 6) for delineation is proposed. That is, at an unsampled location, if R is less than β, then the possibility of making a wrong decision of “hazardous” is smaller than that of making a wrong decision of “safe”; thus, the unsampled location is declared hazardous, and the associated probability of incorrect delineation is R. On the other hand, if R is greater than β, then the possibility of making a wrong decision of hazardous is larger than that of making a wrong decision of safe; thus, the unsampled location is declared safe, and the associated probability of incorrect delineation is β. Figures 7a and 8a are maps of the delineation of the soil Cd and Pb concentrations of the contaminated site obtained using the proposed method, respectively. The associated probability of incorrect delineation is also shown in Figures 7b and 8b. 2492

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 32, NO. 17, 1998

FIGURE 10. (a) Delineating map for considering soil Cd and Pb concentrations simultaneously based on the minimum values of rCP and βCP and (b) associated probability of incorrect delineation. The blank spaces are areas for which there is no data for probability estimation. When considering soil Cd and Pb concentrations simultaneously, the spatial distributions of the composite probabilities of false positives (RCP) and of false negatives (βCP) when delineating hazardous areas are those shown in Figure 9, panels a and b. In addition, the map of delineation and the associated probability of incorrect delineation based on the minimum values of RCP and βCP for considering soil Cd and Pb concentrations simultaneously are shown in Figure 10. In practice, actions taken by management at a contaminated site are usually applied to deal with multiple pollutants. Thus, the composite probability of incorrect delineation is more useful than the probability of incorrect delineation when only one kind of pollutant is considered. For example, the hazardous area of the delineating map as shown in Figure 10a indicates the prime candidates for cleanup. The map of the probabilities of incorrect delineation, as shown in Figure 10b, can be used to determine the area requiring cleanup based on a tolerable probability level of incorrect delineation. In addition, it is noted that the area of high probability of incorrect delineation is located on the border between the hazardous and safe areas (see Figures 7, 8, and 10). Thus, to increase the precision of delineation, unsampled locations with high probability of incorrect delineation are prime candidates for additional sampling.

Acknowledgments This research was sponsored by the National Science Council, Republic of China, under Grants NSC-86-2621-P-002-005 and NSC-87-2621-P-002-013.

Literature Cited (1) Murray, M. R.; Baker, D. E. In Engineering Aspects of MetalWaste Management; Iskandar, I. K., Selim, H. M., Eds.; Lewis Publishers: Boca Raton, FL, 1992; pp 25-47. (2) Litaor, M. I. J. Environ. Qual. 1995, 24, 506-516. (3) Samra, J. S.; Gill, H. S. Soil Sci. 1993, 155, 148-153.

(4) Arrouays, D.; Mench, M.; Amans, V.; Gomez, A. Can. J. Soil Sci. 1996, 76, 73-81. (5) Stein, A.; Varekamp, C.; van Egmond, C.; van Zoest, R. J. Environ. Qual. 1995, 24, 1205-1214. (6) Zhu, H. C.; Charlet, J. M.; Doremus, P. Environmetrics 1996, 7, 513-523. (7) Journel, A. G. In Principles of Environmental Sampling; Keith, L. H., Ed.; American Chemical Society: Washington, DC, 1988; pp 45-72. (8) Journel, A. G. Math. Geol. 1983, 15, 445-468. (9) Bierkens, M. F. P.; Burrough, P. A. J. Soil Sci. 1993, 44, 361-368. (10) Bierkens, M. F. P.; Burrough, P. A. J. Soil Sci. 1993, 44, 369-381. (11) Deutsch, C. V.; Journel, A. G. GSLIB: Geostatistical Software Library and User’s Guide; Oxford University Press: New York, 1992. (12) Chen, Z. S. Water Air Soil Pollut. 1991, 57-58, 745-754. (13) Juang, K. W.; Lee, D. Y. J. Environ. Qual. 1998, 27, 355-363. (14) EPA-ROC. The standard methods for determination of heavy

(15) (16) (17) (18)

metals in soils and plants; National Institute of Environmental Analysis of EPA-ROC: Taipei, Taiwan, 1994. EPA-ROC. Handbook of soil heavy metal survey in Taiwan area, 1987-1990; Environmental Protection Administration (EPAROC): Taipei, Taiwan, 1991. Gamma Design. GS+: Geostatistics for the agronomic and biological sciences, Version 2.3; Gamma Design: Plainwell, MI, 1994. Englund, E.; Sparks. A. GEO-EAS: geostatistical environmental assessment software, User’s Guide; U.S. EPA: Las Vegas, NV, 1988. Chien, Y. J.; Lee, D. Y.; Guo, H. Y.; Houng, K. H. Soil Sci. 1997, 162, 291-298.

Received for review July 9, 1997. Revised manuscript received June 11, 1998. Accepted June 16, 1998. ES9706007

VOL. 32, NO. 17, 1998 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2493