Monitoring statistics. An important tool for groundwater and soil studies

Monitoring statistics. An important tool for groundwater and soil studies ... Evaluating the effects of spatial monitoring policy on groundwater quali...
0 downloads 0 Views 7MB Size
Glenn E.Schweitzer Nmowl Research Council

Whington, D.C. 20418 Stuart C. Black Environmental Monitoring System Laboratory Environmental Protection Agency Las Vega,New 89114

Statistics increasingly is recognized as an important component of soil and groundwater monitoring programs. In the design of these programs, reliance on subjective professional judgments unaccompanied by statistically based objective information has become less acceptable to enforcement agencies and the scientific community. The use of statistics is now considered necessary to determining the location of sampling sites, the fresuency of sampliig, and the representativeness of individual samples. Statistical analysis is also used in quality assurance procedures for sampling in the field and for analysis in the laboratory, as well as for interpreting monitoring data. Standardized approaches for determining minimum detection levels, precision, and bias are evolving rapidly. EPA recently prepared detailed guidance on the analytical and statistical methods to be used in its programs for calculating these parameters (1). Useful statistical procedures have 1026 Environ. Sci.Technol., MI. IS,No. 11, 1985

I long been available for examining anomalies, differences, and trends in data. The development of these procedures intensified during the 1950s with the increase in interest in monitoring programs designed to measure radioactivity in environmental samples. The need to distinguish between changes in radioactivity caused by human activities and those ascribable to natural background variations, as well as recognition of the value of statistics in predicting radioactive decay, led to equations for combining errors of analysis and for determining the confidence level of the result. Additional procedures were developed by the early 1%Os that led to the issuance of guideliies to cover many aspects of quality control for analytical techniques. The use of control charts and methods for determining precision, accuracy, and minimum detection levels also were included. These guidelimes have been updated and expanded recently (2-5). Also during the 1%os, similar approaches for use in chemical monitor-

ing were published in a handbook by the National Bureau of Standards (a). This handbook includes many of the mdiation concepts (such as detection limits and control charts) and methods of testing hypotheses of distributions and concentrations of chemical contaminants. Statistical procedures often were baxd on the premise that the data contained negligible measurement errorsa questionable assumption in many environmental monitoring programs. Newer techniques do not rely as heavily on such an assumption. Recently, the field of chemomenics has attracted considerable attention. This field includes computer programs that analyze large sets of data through pattern recognition and related methods (7). Until recently, only limited effort has been directed toward the statistical aspects of designing procedures for sampliig under field conditions. Too often, unwarranted quantitative extrapolations have been made from small sets of data, or excessive numbers of samples have been taken to ensure the adequacy

0013.936WW~19-1M$01.50/0 @ 1985 American Chemical Society

of monitoring coverage. Now, the costs of cleaning up contaminated areas and of laboratory analysis have become so large that neither of these sampling approaches is acceptable. What is necessary, rather, is an optimum number of samples that furnishes a sufficient but not excessive amount of data points to characterize a contamination problem and to provide the degree of confidence in the quality of the data necessary to support their intended use. The American Chemical Society guidelines in "Principles of Environmental Analysis" touch on this topic (8).The guidelines provide general sampling rules and simple equations for determining sample size if the required precision and allowable emor are spxified in advance or can be estimated. The principles pertain mainly to laboratory analyses; a companion set of ACS guidelines for environmental sampling is in preparation. Recently, a government contractor prepared a series of bulletins discussing various statistical designs for monitoring programs, such as random, stratified, and grid designs and designs based on professional judgment (9). These bulletins are essentially a condensed presentation of techniques found in many textbooks on statistics, and they can be used readily by field personnel. A major area of concern is the assurance that an individual sample is representative of the environmental condition it is intended to define. The representativeness of the analytical data developed is directly affectedby the exact location and thing of sampling, the methods of preparing composite samples, and the techniques of preparing samples for chemical analysis. Soil and groundwater contamination are three-dimensional problems; the size of the individual sample and the number and location of the samples are critical in any attempt to portray environmental conditions. Sampling for trend analysis adds temporal variation as another dimension nf the problem. The use of statistics now plays a greater role in the design of sampling programs because of groundwater regulations promulgated by EPA under the Resource Conservation and Recovery Act (RCRA) (10). As applied to interim status sites (disposal sites operating under general regulations pending individual permitting), RCRA regulations call for a minimum nf one upgradient and three downgradient monitoring wells. Total organic carbon (TOC), total organic halogen (TOX),pH, and specific conductance-the indicatorparmeters-must be measured on a siteby-site basis. A statistical technique, known as the Student's t test, is used tn

compare upgradient and downgradient measurements (Figure 1). One problem with such sampling arises because samples often are taken from a small number of wells; therefore, they may not be representative of actual upgradient nr downgradient conditions. Also, the direction of movement of groundwater contaminan& may not be in total conformity with gradient conditions, and the direction of contaminant flow may vary over t h e . Another problem is that it is difficult to determine the extent to which the indicator parameters represent contaminant p a m n s involving a large number of chemicals. Researchers within and outside of EPA are working toward improving statistical procedures for characterizing contamination problems without incurring unacceptable costs for monitoring at waste disposal sites. The report of an ACS-EPA workshop nn environmental sampling for hazardous wastes provides an overview of current designs nf field sampling programs (11). Case studies underscore the continuing importance of professional judgments in addressing field problems when work must be done quickly. The report highlights the importance of the statistician working with other scientists in designing field programs and selecting models for data interpretation. It presents recently developed mathematical techniques for addressing spatial correlations among data points and for comparing data from contaminated areas with data from background or control areas. 'kvo of the major questions that arise when risk assessments are conducted at hazardous waste sites relate to the level and extent of contamination and to the way in which contamination compares with background levels. 'kvo recent EPA monitoring programs illustrate how statistics have been used to assess monitoring data in addressing these

Sam.,.- _.__ Airnates Number of samples = (Zo,/e)' ber of replicates = (ZO,,,/E)~ Standard normal variate (from es)(t from Student's t test may used instead of Z if number nf iples or replicates is less than

4"

seven) Assumed standard deviation of population a,,, = Standard deviation of the measurement e = Acceptable error of the mean of the samples E = Acceptable confidence interval np =

rce: Reference

FIGURE t

student'st tesr

I

-I

990h

-I

nlf y = upgradlentmean value. i = dwngrsdienf mean MlUe and s F rlandsrd W m of F,then ,I is I h e M I u e l o ' t e t e s ~ dUsi . her,which 18 dependen1.m Ihe number2 88mpIe used to . delermine x , fesl whether 4 > b . s I1so,then x is slalIo1icdiy greatsr than L With B probability P C 0.05 (the probability is I= lhan 0.05 that the means were derived horn the rame sample). The m g m d i e n t mean 1-4 would be reprented by me 88m Curve as Ihal shmm abwe,bUI displaced leha@ it was omitled lor the rake 01 CWly. Th+ term D as I W e n t r t horn a table 81 the 9596 confidence 1-1

questions. They provide some guidance as to how statistics might be used more effectively in the future. These programs-the 1980 Love Canal Study and the 1982 Dallas Lead Study-have attmcted considerable attention from the scientific community and the public. In a sense, they were bellwether cases for subsequent monitoring efforts.

Contaminationat Love Canal In 1980, EPA began a program to obtain environmental data that would assist researchers in determining the habitability of homes near Love Canal, N.Y. The project was allotted limited time and money, and access to some of the desired sampling sites also was l i i ited. Thus, from the outset of the program, it was recognized that efforts to conduct rigorous statistical analyses of the monitoring data would not be possible, either as a basis for quantitative risk assessment or as a means of comparing contamination near the canal with background contamination. The primary interpretation of the data for characterizing contamination was based on descriptive and graphic presentations that would be helpful in determining health risks tn the population. Despite known shortcomings in the data base, particularly in the limited number of samples from the control areas, limited statistical analyses also were carried out to identify questionable aspects of the descriptive presentations (12). For each medium-soil, groundwater, sumps, indoor and outdoor air, drinldng water, food, and biota-monitoring data were aggregated for three Environ. Sci. Technol., MI. 19, No. 11, 1985 1027

geographical areas: the canal area, an unoccupied residential area immediately adjacent to the canal;the declaration area, an area within 1-2 mi of the canal that included occupied residences; and control areas several miles from the canal. The statistical tabulations and analyses consisted of substance-by-substance comparisons of frequencies of detection and median concentration levels for up to 150 ana-

Fisher's exact tes

of percentaqes test determines whether area is more extensiveihan contamination in a control area. Here is how rne test is applied: If A or C is not 0, decrease the smaller value by 1 and change the other three values so that the margin totals remain constant, that is, so that the numerator of the equation remains constant. Calculate a new probability, PI, using the new values of A, E, C, and D in the denominator (N will not change). Continue to decrease the smaller value by 1 until A or C is 0. Add the results, Po + P, + . . . , to calculate the exact probability of null hypothesis. namelv. that contamination in the test area is no more extensive than that le con ea. I difference

lytes in each area. The extent of contamination in an area was defined as the percentage of samples that included a chemical contaminant at a trace or greater concentration level. A difference of percentages test, using Fisher's exact test to compute probability values, was used to determine whether statistically significant differences in the extent of chemical contamination existed among the canal, declaration, and control areas. An example of the results of applying this test i s shown in Table 1.

TABLE 1

Shallow groundwater contamination at L

compound ot element

2,+Dtchlorophenol 2,4,&TrichlorophenoI 1.+Dichlorobenzene

I ,2-Dichlorobenzene

1,2,+Trchlorobenzene 1,2,3,4-Tetrachlorobenzene

Acenaphthalene orene .Dohloroethylene achloroethylene hlomtoluene hlorotoluene rotoluene orobenzene . romium

Lead

'

~

*Significant dilfetences observed in exienf of Contaminstion bPercsntagedetecfed is the number of analyses showing trace or quantifiable amounts of a contaminant divided by the fatal number of analyses GThe canal area is compared With the declaration area, and the declaration area is compared with the control area based on the one-tailed, difterencb of.prcqorIionstesl, USsdtOdetermineststis~caliywhathervaluesofa parameter inonedomain winbegreaterthan thevaluesofthe %me Parameter in another domain. This test is based on Fisher's exact test '"Yes" means that the level of contaminants in one area is s;gn;f;cmfly greater than the level in anofher area; "no" means that there is no significanl difference .. . %=0.104, Where e is the probabiiily of being wrong. When 0-0.104 there is a 10.446 probability 01 a false positive or negative. For thi difterencs-of-propor~~ns test described above, m = 0.10

l v r o knvlmn. XI.

~m;nng dol. ~

19, No. 11.1985

:IGURE 2

Relationships between sample observations

a

I A

E

C

D

E

F

Distance between pairs of sample points

The degree of contamination was defined as the median concentration of all sample measurements for a chemical contaminant in the area of interest. A difference of medians test, again using Fisher's exact test to compute pmhabdity values, was used to determine whether statistically significant differences in the degree of chemical contamination levels existed among the three areas. Other statistical procedures used to summarize the data consisted of p u p ing the data into frequency distributions in which intervals were defied acwrding to concentration levels, computing various percentiles of interest, reporting ffite (quantified) m i n i u m and maximum observed concentrations, and computing the mean (arithmetic average) value of the observed finite concentrations. The statistical analyses were consistent with the descriptive and graphical interpretations of the data. They indicated substantial contamination in the canal area; there was no evidence of such contamination in the declaration area. There was concern about the adequacy of the power of the statistical analyses to detect contamination differences of possible interest between the declaration and the control areas, given the limited number of samples fromthe contml area (13). For this reason, the statistical analyses were not a pivotal factor in the determination of the hahitability of the area, but they were help ful in confirming at least an upper l i t to possible differences in contamination levels. The approach of comparing mtaminant levels in contaminated and control areas, as attempted in the Love Canal study, is now used frequently. Even with small amounts of data, statistical interpretations can help to point out possible problem areas, although such

interpretations are seldom completely reliable. Moreover, practical considerations often limit the number of 88113ples.

Contamination in Dallas In 1982, EPA conducted a soil monitoring program around two lead smelter sites in Dallas,E x . The study made extensive use of geostatistics-the a p plication of statistics to geological problems-to de€ermine the location of sampling sites and to interpret the monitoring data (14). Geostatistics rewg-

nizes the spatial dependence withii sampling pattern, as in the case of the deposition of materials in an air contaminant plume from a lead smelter Figure 2i. Firmre 2 shows that as the distance hem-een two samples increases and their correlation weakens, the difference in their values also increases, as represented by a rising curve. When this difference becomes great enough, the sample values become independent of one another, and the curve becomes a horizontal lie. The distance along the X-axis, through which the semivariogram curve rises, represents the range ofcorrelation. or the distance within which samples may be wrrelated. Thii range is used to determine the grid design for sampliig. A grid spacing of two thirds of the range of correlation usually ensures that the sampling points are close enough to one another to have correlated values. To sample at closer distances would provide liale new information; sampling at greater distances could miss a change in pollutant levels. A technique called kriging interpolates pollutant levels at points between the sampling sites so that the isopleths of pollutant levels can be map@. The kriging estimate of the pollutant level at any particular point is the weighted average of the values of the nearest neighboring samples. The size of the neigh-

FIGURE 3

Lead contaminationIn Dallasa I 1

12.500 I

ll

It

W 80% confidante band on 250 ppm lead in soil W zmppm

Envlron. Sol. Technol.,Vol. 19, No. 11, 19%

1029

borhood is determined by the range of correlation. By kriging, one can compute the standard errors of estimation for sample values when the range c correlation is used. Error estimates als can be mapped. On the basis of data from preliminary sampling, which indicate the degree of spatial correlation, geostatistics can be used to determine the appropriate spacing between sampling sites. The use ( geostatistics also permits easy and ot jective interpolations of values between data points and makes it possible to estimate errors that could be associated with the interpolations and with the data points themselves. Figure 3 shows one technique for displaying the results from the Dallas study. The interpolated concentration levels and.the associated error estimates were used to develop maps that showed which areas could be identified with a specified degree of confidence as being contaminated at levels above or below an action level, that is, the level of contamination that triggers remedial action. The soil-monitoring program in Dallas, which measured only one pollutant, was much less complicated than most monitoring programs. Still, the principles used can be applied to other situations-particularly to the study of contaminant plumes from waste sitesbecause spatial dependence among samples is probably the rule rather than the exception.

Improving data interpretation Research is under way to improve the statistical techniques for guiding the design of monitoring programs and for improving the interpretation of monitoring data. Methods are bemg developed for combining the use of statistics-especially geostatistics-with the application of hydrogeological and other models. Although such research efforts are helpful, the most immediate need is in the application of elementary statistical techniques to operational monitoring programs. A clear understanding of the purposes of the monitoring effort and of the quality of the data required to achieve these purposes is essential to the selection of statistical techniques. Many monitoring programs, however, are undertaken with neither a proper understanding of their purposes nor an appreciation of the need for data quality assurance. Experience has underscored the importance of many characteristics of successful sampling programs. These include the need for preliminary sampling, for adequate numbers of samples from control areas, and for using quality assurance procedures in providing 1030 Environ. SCi.TeChnOl., Vol. 19, NO. 11, 1985

I

Improving statistical procedures Experience with the application statistical techniques to monitorir programs has highlighted sever areas that need more anention. The first problem area invokt combining monitoring data with fa tors that are based on profession judgment about the likely subst face behavior of contaminant When monitoring data are inadequate in one aspect or another, such factors often help regulators to reach decisions. Guidelines and appropriate models should be developed that will permit assessment of the advantages and limitations of combining such factors with data. Another problem area in\ statistics for small sample sets. in many cases, the number of samples available is limited, and the variance of the results is so large that regulatory decisions cannot be reached, even when professional judgments are made. It would be helpful if methods could be developed to u such limited information with greal confidence. For example, the “bcx strap” technique, which is used manipulate small sample sets large numbers of different combir tions, might be useful (15). Statistical analyses of long-tei monitoring under varying bac ground conditions also should be examined. Trends in long-term monitoring results can be detected readily when the background data remain constant over time, but problems increase when the background changes because of variability anributable to natural causes.

data that will meet specific criteria of acceptability.

Acknowledgment George Flatman, Kenneth Brown, Robert Snelling, and Leslie McMillion of EPA’s Environmental Monitoring Systems Laboratory in Las Vegas, Nev., contributed to the development and review of this article. Before publication, this article was reviewed for suitability as an ES&T feature by John M. Hosenfeld, Midwest Research Institute, Kansas City, Mo. 64110; and Hubert A. Scoble, Massachusetts Institute of Technology, Cambridge, Mass. 02139. References ( I ) “Calculation of Precision, Bias and Methcd Detection Limit for Chemical and Physical Measurements,” Quality Assurance Management Staff Guidance Document; Office of Research and Development;

EPA: Washington, D.C., 1984. (2) Laevinger, R.; Berman, M. Nucleonics

1951.9.26.

(3) Altshuler,

B.; Pasternack, B. Health Phys.

1%3,9, 293-98.

(4) Rosenstein, M.; Goldin, Sci. 1965.2, 93-102.

A. S.HealthLab.

( 5 ) ”Upgrading Environmental Radiation Data,” EPA 520/1-80-012; Watson, 1. E., Ed.; Office of Radiation Programs; EPA: Washington, D.C., 1980. (6) Natrella, M. G. “Expcrimen:al Statistics,” NBS Handbook 91; National Bureau of Standards: Washington, D.C., 1966. (7) “Environmental Applications of Chemometrics”; Breen, I.; Robinson, E, Fds.; ACS Symposium Series No. 292; American Chemical Society: Washington, D.C., in

press.

(8) “Principles of Chemical Analysis,” Anal. Chem. 1983,55, 2210-18., (9) ‘TRANS-STAT (Statistics for Environmental Studies),” Reports PNL-SA-11551, PNL-SA-12180; Gilbert, R. O., Ed.; Battelle Pacific Northwest Laboratory: Richland, Wash., 1984. (10) Code of Federal Regulations, 40 CFR 165, Subpart F, 1983; pp. 506-10. (1 1) “Environmental Sampling for Hazardous Wastes”; Schweitzer, G. E.; Santolucito, I. A,, Fds.; ACS Symposium Series No. 267; American Chemical Society: Washington, D.C., 1984. (12) “Environmental Monitoring at Love Canal,” EPA 6W4-82-030a; Office of Research and Development; EPA: Washington, D.C., 1982; Vol. 1. (13) “Habitability of the Love Canal Area,” Technical Memorandum; Office of Technology Assessment: Washington, D.C., 1983. (14) Brown, K.W. et al. “Documentation of EMSL-LV Contribution to the Dallas Lead Shldy,” Report 600/x-83-007; Environmental Monitoring Systems Laboratory; EPA Las Vegas, Nev., 1983. (15) Diacanis, F’.; Efron, E. Sci. Am. 1983, 248, 116-30.

Glenn E. Schweitrer (I.) recently was appointed director of Soviet and East European Affairs for the National Research Council, Until April 1985, he was director of EPA’s Environmental Monitoring Systems Laboratory in Lar Vegas, Nev., where he directed a variety of research and development activities, including innovative field sampling programs at a number of locations throughout the country. He also served as director of EPA’s Ofice of Toxic Substances. Schweitzer has an M.S. in mechanical engineering from the California Institute of Technology. S h i m C. Black ( r ) is chief of the Dose Assessment Branch, Nuclear Radiation Assessment Division at EPA’s Environmental Monitoring Systems Laboratory. He has worked in the fields of radiation biology, radiation monitoring, and analytical quality assurance for EPA and the U.S. Public Health Service. Most recently, Black has been responsible for the statistical aspects of field monitoring programs, including EPA’s monitoring efforts at Love Canal and in Dallas. He has a Ph.D. in biophysics from the University of Rochester. N . !l