Data acquisition Cost-effective methorlsfor obtaining data on water quality
A
Brian W.Mar Richard R. Horner Joanna S. Richey Richard N. Palmer Universiry of Ubshington Seattle, Msh. 9819s Dennis €! Lenenmaier US.Geological Survey Reston. k.22092 Environmental monitoring is becoming a larger and more important part of en-
vironmental impact assessment, the d e velopment and management of ~ t u r a l resources, and the regulation of environmental qnality ( I ) . Despite this, until recently little work has been done on the design of cost-effective monitoring methcdology. We recently concluded a threeyear study, for the Electric Power Research htitute, on aquatic ecosystem monitoring programs in the electric utility industry. The study included an extensive survey of sampling program design concepts and data acquisition practices (2-4). We developed a for-
W13-9SWW~2M)545$01.0 ~ 1986 American Chemical Society
mal, systematic procedure based on extensive interviews and consultation with industry experts and information published in the literature. The procedure involves four general tasks, which accomplish the following: identification of environmental changes of interest and the effects that would most likely manifest these changes, formulation of potential changes into conventional scientific hypotheses; definition of variables to be sampled, the technique to be used, and any Environ. Sci. Technol., Vol. 20, No. 6, is86 545
alternative hypotheses that may cause the same type of change; and determination of whether complementary hypotheses provide the same information for decision making at lower cost, design of a cost-effective monitoring program to test each hypothesis by considering trade-offs between the improvement in statistical power (that is, the pmbabdity of discriminating between the various hypotheses)and the added cost of data acquisition, and integration of the individual monitoring programs by multiobjective ranking of all monitoring designs with respect to overall program goals. The general ideas underlying this procedure have been suggested by Platt 0,M m HoKhz and Rag0 et al. (8),but none of these authors implemented the concepts in a comprehensive operational framework. In this paper we explore certain aspects of the third task described above in greater detail. Specifically, we focus on three sampling program design issues that are essential to cost-effective hypothesis testing and hence to the successful implementation of environmental monitoring programs. The three issues are determining the best strategy to sample the quantity of interest, determining the statisticalbasis for the experimental design, and estimating the cost of such ObSeNations.
(a,
(a,
Design concepts
Designs MtionaUy used for monitoring programs can yield data with high natural and experimental variances. Our examinationof the literature describing the collection of watershed monitoring data suggested that typical measurement errors for physical and chemical characteristics are usually well withm 25% of observed mean values, whereas measurement errors associated with observations of biological entities are 50% or more of mean o b =Ned V d U S (2). Natural variations that result from temporal or spatial factors appear to be much greater than those that arise from collection or analytical errors. Qpical values of natural variations range from 100% to 400% of observed mean values for physical, chemical, and biotic characteristics. It may be possible to employ new sampling concepts or to alter sampling programs to reduce.these high variances, for example, by properly stratifying samples or by analyzing samples in a composite form.
est can assist the designers of a sampling program. The common practice of allowing equal intervals between sampling should be seriously reevaluated. In most cases, the phenomenon of interest will not vary randomly during the year. In the case of precipitation, there are usually wet and dry periods. With climate, there can be warmer and cooler years. Most biological organisms have periods of rapid growth and periods of dormancy or slower growth. Within a watershed, the kinds of soil and vegetation may not be uniformly distributed. Within water bodies, stratification and patchiness can occur. AU of this argues against uniform sampling in space and time. Large observed standard deviations can be traced to the mixing of data from several different populations. A simple example of this problem is the average weight of a group of 10 animals, five that weigh 10 kg and five that weigh 5 kg. Even if each animal were weighed StraWied samples precisely, the mean weight of an animal Knowledge of the natural variations in this group would he 7.5 kg with a in space and time of eacb area of inter- standard deviation of 2.6 kg. By divid-
546 Envimn. sci. Technol.. MI. 20. No. 0, 1886
ing the group of animals into the 5-kg and 10-kg subsets, the standard deviation of each group is reduced to zero. Stratifying a sampliig program into different time periods or spatial patterns to decrease the variance of the data may, however, increase the cost per sample. Concentration of sampling into short periods can cause overtime workloads some of the time and slack periods in others.
Composite samples Another method that can be used to reduce natural variations and e x p i mental error is to increase the time of observation, the volume of samples, or the area sampled. This produces an integrated observation over time or space that eliminates much of the localized variation contained in a point or grab sample. An example of this averaging of information is the measurement of storm runoff quality. If a flow-splitting device is constructed to allow the accumulation of a fixed fraction of the storm runoff, analysis of a composite sample will smooth out many of the short-term
variations in samples collected at discrete intervals. Of c o r n , in addition to smoothing out sampling variability, this procedure also averages the time-varying characteristics of the process. For instance, it is not possible to determine how much of the pollutant load comes from a “fust flush” effect associated with the rising l i b of the bydrograph and bow much is associated with the receding limb. This trade-off between what is often referred to as bias and variability is at the heart of all space-time aggregation methods. Fundamentals of design A recent article on monitoring statistics reviewed statistical concepts and discussed their application to groundwater and soil studies (9). The classic analysis of data presented in any introductory text on probabiity and statistics can be used to estimate the added confidence in statistical estimates that results from taking a large number of samples (10). In most cases the data are assumed to be normally and independently distributed with known means and standard deviations, or it is assumed that the data can be transformed to a normal distribution. Care needs to be exercised to avoid spatial and temporal correlation when many samples are taken within a short time or from sampling stations that are close together. In cases in which correlated data are unavoidable, more sophisticated statistical techniques are required (11, 12). One major consideration in the design of any method of data collection is to distinguish real information from measurement errors and natural variations. The fundamental approach to this problem is replication and averaging. By taking enough replicate samples where only one effect is allowed to vary and by averaging over replicates, the error due to that source of Variation can in theory be reduced to any desired level. This approach, called the analysis of variance., or ANOVA, is described in most basic statistical texts. It is essentially a generalization of the basic theory for determining the standard error of the mean. One major difficulty in both approaches is in estimating the variance of the different sources of variation that will enter into any design decisions. If experimental errors are negligible, then no replicate observations would be made at the same place or the same time.. If sampling errors are believed to be large but analytical errors are known to be small, then replicate samples, but not replicate analyses, would be needed for the same sample. If analytical errors
t
are suspected to be large, but the sam- number of observations. pliig errors are known to be small, This square root dependence holds then aliquot parts of a single sample true, at least approximately, for most could provide replicates to defme the statistical variables that have the same analytical errors. Precision of the mean dimensions as the data (for example, sampling criterion and the past per- for the estimated standard deviation) as formance of field and laboratory crews long as the data are independently disshould provide an adequate basis for tributed. Of greater importance is that these estimates. similar although more complicated Next, we will discuss the problem of sample size dependencies can be shown estimating the mean of a normally dis- when the sources of variation (that is, tributed population, and we will exam- the variance components) are isolated. ine the detection of a change in the If observation costs are equal, the mean of a population. The first prob- strategy should be to allocate more oblem is typically the underlying goal of a servations to periods or zones in which reconnaissance or baseline study; the the estimated natural variation and other is in the realm of a traditional measurement errors are the largest. If environmental monitoring or environ- there are differences in cost, then the mental impact assessment program. lowest cost and highest variation or highest error observations should be Estimation of means the most frequently measured. The key Reconnaissance and baseline studies to such an allocation is the ability to attempt to define current conditions. recognize and estimate the natural variBaseliie studies tend to provide much ances and experimental errors of the more quantitativeinformation, whereas desired observations. reconnaissance studies are generally Our research and surveys of cnvironmore descriptive. The results of such mental monitoring experts suggest that effortsare estimates of the means of the estimates of natural variation and of expopulations or characteristics selected perimental errors associated with the for study. For independent, normally observation of most phenomena are distributed samples (and, in fact, re- sufficient to design cost-effective monigardless of the distribution) the preci- toring programs (12. 13). Program desion of the estimate of the mean im- sign usually can be based on this prior proves with the square root of the information, rather than on an effort to Envimn. sci. Technol.. Val. 20, NO.6. 1986 547
reestablish these estimates of experimental errors. This is especially true if reasonable estimates of the ratios of the error variances can be obtained, because it is the ratios that govern sample allocation decisions and not the variances themselves. We have found that designs for data collection programs are in many cases sufficiently insensitive to changes in input that the use of expert knowledge, augmented by l i i t e d reconnaissance data, forms a basis for estimating the statistical parameters on which the designs are based. Information obtained during data collection can be used to verify the initial expert judgment, and any differences can be evaluated. The t-distribution defines the confidence interval for a population mean estimated from a small sample. Given the assumption that the population of interest is normally distributed, and knowing the mean and standard deviation of this population of interest, the tdistribution can be used to define the number of samples that is needed to estimate a mean for the desired level of significance. Figure 1 presents results of such an analysis for the 80%, W%,and 95% confidence levels. The curves show the number of samples required as a function of precision (the ratio of the estimated mean less the real mean to the standard deviation). Precision is a dimensionless indicator of the acceptable amount of error (the difference between the real and estimated mean) divided by the standard deviation. These results suggest that even for an 80% confidence level, four samples will still be required to estimate the mean to a precision of unity. When the precision must be as small as 0.1, hundreds of samples are repuired to obtain a good estimate of the mean. A program can be written on a personal computer to perform the trial-and-mor calculations needed to generate the data shown in Figure 1. Programs that perform the sample generation and the tdistribution analyses are available from the authors.
Measurement of change Monitoring programs that are designed to detect change require statistics different from those required by program that simply identify population means. Figure 2 illustrates the variables used for testing whether change has occurmi. In this example the test determines the probability of a sample observation belonging to the distribution associated with conditions before a human action occurred or to a distribution associated with altered conditions. Most standard texts on probability and statistics present this example as 548 Environ. Sci. Technol.. Vol. m,No. 6. 1986
the test of a null hypothesis (in which “before” and “after” observations are from the same distribution), H(1), against an alternative hypothesis (in which “before” observations belong to a distributiondifferent from the “after” observations), H(2). The shaded area in Figure 2a represents a I error-where H(l) is rejected when H(l) is true. Figure 2b shows both distributions (original and changed). The shaded area in Figure 2b represents the probability of a Type II error-where H(1) is accepted when in fact H(2) is true. The power of a test is d e f d as in staristics texts as 1 - 8, where j3 is the probability of a II error. Power thus represents the probability of rejecting H(1) when it fact it is false. It is obviously desirable for any given test to have the quantity 1 - j3 be as large as possible. Figure 2c illustrates the effect of the
m
value of the standard deviation on the power of any given test. For a given change, 6, the power increases as the standard deviation decreases. Figure 3 illustrates, for three levels of power, the relationship between the number of samples needed and the ratio of the change to be detected @I - k ) to the standard deviation. When the change is small relative to the uncertainty, the number of samples required can escalate substantially if a relatively high power of the test is desired or needed.
Data acquisition cmts Defining a statistic to measure the value of added information is only the first step in the evaluation of a costeffectivedata acquisition program. The other essentialinformation is the cost of data collection. Given both the value and the cost of added data, a trade-off can be made to identify cost-effective
on the analyses performed. Routine field measurements were relatively inexpensive; complex analyses for tmce organics cost much more. Automatic and continuous sampling of water quality introduced a problem in the defiition of a sample that needs to be resolved, but the analytical approach CanStillbeused.
I program designs. (A design will specify where, when, and how often to sample and how many replicates are needed to test each hypothesis.) There are two varieties of cost data that may be useful in determining the trade-off between added sampling costs and added information. It is useful to have cost-per-sample information for making allocations of money among variables to be sampled. It is also help ful to have information about the cost of sampliig a given variable. For example, the cost of sending a crew to the field for each sampling occasion, the cost of moving the crew to a sampling station to collect samples, or the cost of collecting and analyzing a sample are different components. Because the number of replicates, the number of stations, and the number of sampling occasionscan be sampling design parameters, the separate costs of these components are needed to optimize sampling programs. We have used a cost survey to acquire such information (2, 14), and we have proposed a model that separates the costs associated with obtaining replicate data, data at another station, and data on another occasion (2). Such an equation suggests that if there is a fixed number of total samples, then the number of replicates, stations, and occasions can be varied as long as their products remain the same. If the measurement error is larger than the natural variation, then using more replicates would be a better strategy than using more stations or occa-
, sions. If spatial variation is the greatest, then more stations would be needed. We discovered in analyzing our cost survey data that ratios of the various cost components are more uniform from respondent to respondent than are the absolute values (2). Table 1 shows typical ratios of cost for water quality measures of abundance. Reported water quality monitoring costs typically were $loo0 for overhead, $200 per occasion, $10 per station, and $10 per replicate. AU costs except those for replicates had standard deviations that were less than 100%of the reported mean values. The replicate costs were highly variable, depending
.
Costeffective designs Little attention has been given to op timizing environmental sampling pmgram design to maximize the power of a test given a budget constraint or to minimize the cost of a sampling program to achieve a given power. Bernstein and Zalinski were among the earliest researchers to address these problems (1.5). We have extended these concepts to situations in which the anticipated change is an immediate change to a new level (a step change), a steady change at a constant rate over time (a ramp change), or a combination of step and ramp changes. These analyses offer the option of assuming independent observations or correlated residual errors. Table 2 demonstrates the importance of considering cost in the design of a baseline sampling program. It is worth noting that collecting equal numbers of samples improves the estimate of the mean for each variable, but that variable Bs confidence interval is larger than the others. By allocating more samples to the variable with the least precision, and by allocating more samples to the variable for which the greatest reduction per dollar allocated is achieved, a more costeffective design can be di. In most cases this simple analysis is insufficient because uncertainties can be traced to measurement errors as well as spatial or temporal variations. Table
.. . ...
Environ. Sci. Technol., bi.20. No. 6, 1986 M O
3 shows the results of an optimization of a design to detect change using the following data overhead cost = $loo0 cost per occasion = $200
cost per statim = $10 cost per replicate = $10 change to be detected = 2.5 (250% of mean)
level of significance = 0.05 measurement variance = 0.25 natural variance = 1.00 Notice the sensitivity of the computed design to the input values. These examples are illustrations of the trade-offs that could be made in the design of environmental monitoring programs. The simple models can suggest where trade-offs are needed; the detailed models can provide specific design configurations. The bases for such trade-offs are reliihle cost infor-
mation and an undersianding of the statistical nature of the data. We have only briefly presented our approach to the cost-effective design of monitoring programs. Only the thiid of four basic steps has been discussed. The final and most difficult step is to decide how to allocate the available resources among the optimized set of experiments. Environmental monitoring programs will enccunta a multitude of formal and unwritten agendas. Not a l l goals will be f o d y defmed, and not all criteria can be articulated. The inability to make multiple criteria decisions can be a major pitfall in environmental monitoring program management. We suggest that the pairwise comparison ranking process of Saaty be used for this allocation, but this is an issue that requires more study (16). References (2) Mar, B. W. et al. “Sampling Design for Aquatic Eeological Monitoring,.’ Final ReDOrt on Electric Power Research Institute i’rojecl RPl729-1; Depament of Civil Engineering, University of Washington: Seattle, 1985; Vols. 1-5. (3) Horner, R. R.; Richey, J . S.; Thomas, G. L. “A Conceotual Framework to Guide Aquatic Monitoring Program Design far Thermal Electric Power Plants”; ,Special Technical Fwbliicatims Seritsl Amenean SoCieN for Testine and Materials: Philadelphi;, in press. (4) Horner, R. R.; Richey, 1. S.; Mar, B. W. In proceedings of the Pacific Section of the American Association for the Advancement
-
of Science Symposith on Biornonitors, Bioindicnrors, and Bimsays of Environmental Qualily; American Association for the Advancement of Science: Washington, D.C., in press. (5) Platt, J. R. Science 1964,146, 347-53. (6) Environmental Impact Assessment: Principles and Procedures; Munn, R. E., Ed.; Scope Workshop of Impact Studies in the Environment (WISE), United Nations Environment Program (UNEP), Environment Canada and UNESCO Scope Report No. 5 ;
TABLE 3
Sensitivity analysis sampling design GI”=” ~
Mndltlon
Saa8ons
Occasions 1 1 1
6 = 1.0 6 = 0.5
Natural variation 0.25, seasonal variation = 4.0 E
.Computed
550 Envimn. Scl. Technol., MI. 2-3,
No. 6, 1986
Scientific Committee on Problems of the Environment: Toronto, Om.. 1975. (7) "Adaptive Environmental Assessment and Management," Wiley International Series on Applied Systems Analysis, No. 3; International Institute for Applied Systems Analysis; Wiley: New York. 1978. (8) Rago. P J.: Fritz. E. S.; Murarka. 1. P. Environ. Monir. Assess. 1983, 3, 185-201. (9) Schweitzer, G. E.; Black, S. C. Environ. Sei. Technoi. 1B5, 19. 1026-30. (10) DeVore. 1. L. PmbabiliN ond Srarisrics for Enginewing and rhhp Sckees: Brooks/ Cole Publishing: Monterey, Calif., 1982. (11) Millard. S. P; Yearsley, J . R . ; Lettenmaier. D. P. Con. 1. Fish. Aquar. Sci. 1985, 42. 1391-1400. (12) Millard, S. P Lettenmaier. D. P Err. Coasr. Mar. Sci., in press. (13) Palmer, R. N.;MacKenzie. M. C. J . Worer RPS. Pion. Monoge. 1985, 111.478-93. (14) Mar. B. W.; Miner, W. "Ecological Survey and Monitoring Program: Data Summary"; East-West Center. East-West Environment and Policy Institute: Honolulu.
aquaric sysrems and in rhe development of inputs for ecological experr sysrems.
mar University His reaching and research are on water resource management and systems engineering. His recenr publicarions focus on oprimizarion techniques and experr sysrems. He is a regisrered professional engineer
Dennis I! Lanenmoier ( r j , currently on leave io rhe U.S. Geological Survey, has a Ph.D. in civil engineering and holds a B. S. from the University of Washington and an M. S. from George Washington Universiry. He is a research professor ar rhe Universiry of Washington,focusing on srochasric and Richard ,V. Palmer (I.) hos o Ph.D. from urban hydrology and on staristical design Johns Hopkins University, on M.S.from of environmental moniroring nerworkr. He Sranford Universiry, and a B. S. from Lo- is a regisrered professional engineer
IO*< .,"~.
(15) Bernstein, B. B.: Zalinski, 1. J . Environ. Monoge. 1983, 16, 35-43. (16) Saaty. T. J . Morh. Psyehol. 197, I S ,
234-8 I.
Brian W! Mar holds a Ph.D. from the University of Washingron in chemical and nuclear engineering, has master's degrees in civil and chemical engineering, and has done graduare srudy in economics. He has been associare dean of rhe universiry S College of Engineering and director of the environmenral engineering and science program. His research and reaching for rhe pasr 15 years have focused on sysrem engineering and environmental management.
Richard R. Horner (I.) hoids a Ph.D. from rhe Universiry of Washingron in environmental engineering and science. He also has an M.S. and a B.S. in mechanical engineering from the Universiry of Pennsylvania. He is a research assisrant professor and has been a principal invesrigaror on many projecrs concerning nonpoinr source monitoring. conrrol. and management.
Joanna S. Richey ( E ) has a Ph.D. in environmental engineering and science from the Universiry of Washingron and an M.S. in ecologyfrom Cornell Universiry She is an independent consultant in environmental mnagemenr and applied ecology. Her research has been in applied ecology of
lnformotiol, wrlte:
'
8 Dawes Highway, Wayne, NJ 07470 1 201-831 -0440 Environ. Sci. Technol.. Val. 20. NO. 6. 1986 551