Assessing the Concentrations of Polar Organic ... - ACS Publications

Jul 2, 2008 - Centre for Ecology and Hydrology, Wallingford, Oxfordshire, OX10 8BB, ... Environment, Brunel University, Uxbridge, Middlesex UB8 3PH, U...
0 downloads 0 Views 126KB Size
Critical Review Assessing the Concentrations of Polar Organic Microcontaminants from Point Sources in the Aquatic Environment: Measure or Model? A N D R E W C . J O H N S O N , * ,† T H O M A S T E R N E S , ‡ RICHARD J. WILLIAMS,† AND JOHN P. SUMPTER§ Centre for Ecology and Hydrology, Wallingford, Oxfordshire, OX10 8BB, U.K., Federal Institute of Hydrology, BFG, Mainzer Tor 1, D-56068 Koblenz, Germany, Institute for the Environment, Brunel University, Uxbridge, Middlesex UB8 3PH, U.K.

Received August 8, 2007. Revised manuscript received April 30, 2008. Accepted May 7, 2008.

To carry out meaningful ecotoxicity studies on novel polar organic microcontaminants, it is essential to know what concentrations wildlife may be exposed to. Traditionally these values were obtained by analytical chemistry, but in recent years GIS water quality models have been developed which may offer a quick and reliable way of getting the same information. Thus, two ways of obtaining basically the same information now exist, and an issue, therefore, arises as to which method is the most appropriate to use in which situation. To address this issue we have critically reviewed and compared measuring and modeling approaches for the determination of sewage effluent and river water concentrations of organic microcontaminants. Where model predictions and chemical measurements can be directly compared in sewage effluents, receiving waters, and across catchments, reported model mean values have all been within 1 order of magnitude of the measured values, with typically no more than a 3- or 4-fold difference. Interlaboratory chemical analysis of some organic microcontaminants in effluents in the challenging ng/L range have provided results which have varied from one another by a similar margin. No such comparison has been carried out yet for GIS water quality models to determine variation in predicted concentrations. As the level of ecotoxicological effects of many chemicals is often considerably higher than the reported measured or modeled values, such errors that might occur will often be of no consequence. But due to their extraordinary potency, much more accuracy is required with some natural and synthetic hormones. Significantly, modeling is no more complex to conduct when dealing with contaminants at ng/L compared with mg/L concentrations, but the same cannot be said for chemical analysis. A combination of modeling and measuring techniques will give the greatest confidence in risk assessment.

Introduction Contamination of the freshwater aquatic environment by chemicals remains a potent issue both for professionals * Corresponding author phone: +44 1491 692367; fax +44 1491 692424; email: [email protected]. † Centre for Ecology and Hydrology. ‡ Federal Institute of Hydrology. § Brunel University 5390

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 42, NO. 15, 2008

concerned about protecting our environment and also for members of the public for whom freshwaters are a source of food, drinking water, and recreation. Very few risk assessments of organic microcontaminants (present at trace, ng/L levels), such as pharmaceuticals, have been carried out, or exist in the public domain. To assess the ecotoxicological impact of a chemical it is essential to know the range of concentrations likely to be found in the aquatic environment. For scientists concerned with protecting wildlife the lack of such environmental data on chemicals is a problem. Recently, the development of geographic referenced (GIS) water quality models has opened up the question of whether such models, or traditional chemical measurements alone, will provide the most helpful data on environmental concentrations which are needed for meaningful ecotoxicity studies. This critical review will attempt to evaluate whether one approach provides more accurate and useful information than the other, and if so, under what circumstances. Traditional risk assessment methods for chemicals, with their focus on acute toxicity to a limited number of species, have not been able to completely predict, or protect aquatic wildlife from all eventualities. Examples include intersex in fish and water snails that were related to endocrine disrupting chemicals and the widespread loss of fish due to aluminum poisoning in acidic streams (1–3). There are now issues of concern with other substances and chemicals present at trace levels which may have subtle but harmful effects on wildlife such as some brominated flame retardants, pharmaceuticals and nanoparticles (4–6). In these circumstances, not only do scientists and regulators need to examine and understand how wildlife may be harmed by the chemicals, but also to assess the extent of the problem. It is in assessing the overall risk to the aquatic environment that the concentration of the chemical becomes the key question, specifically: (a) Is the chemical present in the aquatic environment? (b) Where and when will the concentration of the chemical be such that it might harm aquatic organisms? Clearly, only reliable measurements can unequivocally answer question (a). However, guidance from a hydrologist on when and where to measure would increase the chances of detecting the chemical, as would knowledge of the elimination efficiency of different sewage treatment plants in the case of point sources. Answering question (b) is a point where the approaches of scientists often diverge, with some choosing to rely only on a measured environmental 10.1021/es703091r CCC: $40.75

 2008 American Chemical Society

Published on Web 07/02/2008

concentration (MEC) and others on a predicted environmental concentration (PEC) following a modeling exercise. The chemical believed to be causing a problem may have a diffuse or a point source, or both, and these different sources require very different approaches whether measuring or modeling studies are undertaken. This review will focus on comparing the measuring with the modeling approach for chemicals which have a point source. This review will examine how modeling, or measuring, activities are undertaken, their strengths and weaknesses, their track records, and recommendations on how the two techniques may be combined. A Specific Example of Where the Concentration of a Contaminant in Water Could Be Critical. The issue of whether measuring or modeling concentrations of a “new” organic microcontaminant thought to be present in the aquatic environment is the best initial approach is exemplified by hormonally active organic microcontaminants. The case of the contraceptive ethinylestradiol (EE2) is well-known, but what about the synthetic progestogens, of which several differentsyntheticprogestogens(e.g.,levonorgestrel,desogestrel, gestodene) are in use currently. All bind to the human progesterone receptor (PR), and their ability to bind to the fish PR, which is known to have a similar specificity, has also been shown (7, 8). Endogenous progesterones play pivotal roles in reproduction of both male and female fish. Progesterones also function as pheromones in many species of fish. It is thus possible, perhaps even probable, that synthetic progestogens will affect reproduction in fish at low, or subng/L concentrations, just as EE2 does (9, 10) and that these concentrations may be present in the environment However, even if synthetic progestogens can disrupt reproduction in fish, to do so they will need to be present at, or above, a certain threshold concentration. What this concentration is, and how close it is to existing environmental (river water) concentrations, are unknowns. Yet if meaningful ecotoxicology with synthetic progestogens, or other potentially potent organic microcontaminants, is to be conducted, it needs to be done at concentrations close to those present in the aquatic environment, and not at utterly unrealistic (high) concentrations. Hence the need to determine the concentrations, which in turn, raises the issue of whether measuring, or modeling, would be the best approach.

The Modeling Approach: A Short History of the Development and Use of Models in Predicting Chemical Concentrations in Water from Point Sources Models have and continue to play an important role in predicting the fate of existing and novel chemicals in the environment based on their unique physicochemical characteristics. Such “multimedia” models predict the fate of chemicals in a generic environment which includes air, soil, water, and sediment (11) and this approach is now used in chemical risk assessment such as EUSES in the EU (12). Here they assist in the first tiers of a risk assessment, typically setting the discharge at 200 L/day/capita with an in-stream dilution of 10 (2 m3/day/capita) and comparing these concentrations with a predicted no-effect concentration (PNEC) (13). The greatest value of such models is in assessing the fate of chemicals with respect to one another. They can also help by predicting what proportion of the amount consumed may partition into what compartment. However, they are much less suitable for predicting water concentrations for real world situations, such as the concentration of the chemical that will be found in the water downstream of Oxford, Du ¨ sseldorf, Milan, or Las Vegas (14, 15). This is because they do not capture site specific information such as the contributing human population sizes, their locations in a catchment, and available dilutions which can differ by several magnitudes. It is to address this need to predict

concentrations in the real world, with its often dramatic spatial and temporal variety, that GIS water quality models have been developed. For example, in the West of England the Tamar catchment offers 140 m3/day natural flow to dilute each local person’s waste products, whereas the Aire and Calder catchment in the North East of England offers only around 1.6 m3/day, a difference of 2 orders of magnitude (based on mean annual flows (16);). Similarly, in North America, the Schuylkill catchment offers only 4 m3/day per capita dilution, whereas the Columbia catchment provides over 100 m3/day per capita (17). Of course, this dilution is not evenly distributed across these catchments, but it does give an impression of the importance geography/hydrology will have on the risk of exposure to a chemical in water. Geographic Referenced Point Source Water Quality Models. The PhATE (17) GREAT-ER (18), and LF2000-WQX (16, 19) models have been developed relatively recently to predict concentrations of “down the drain chemicals” in catchments, starting from the moment they leave the home until they reach the tidal limit, or until the end of the catchment as defined by the model. They require an individual loading input per capita for the chemical of interest. This may include either an assessment of consumption of a product, or pharmaceutical (17, 20), but often also an assessment of human excretion, which can vary between different members of the population (21). They can incorporate sewer transformation as well as removal associated with different types of sewage treatment plant (STP), and attenuation in the river (22). They use a digitized river network incorporating the STP discharge points. The often complex nature of the river network, including confluences, bifurcations, impoundments, and abstraction points must all be accommodated. A critical factor in these models is how they account for flow. Clearly, the natural flow and velocity in a river varies with the season, but even within a season a great variation can occur. PhATE calculates PEC values through the catchment at two reference flows: mean flow and the 7 day 10 year minimum flow (a low flow statistic). This variability is managed in the other two models through a statistical probability approach. A Monte-Carlo simulation is used to generate different scenarios from distributions describing the river and effluent flow rates, and the final model results (PECs) are expressed as distributions. Thus, many decades of flow, or rainfall, data are collected and different values assigned a different probability. For example, a Q95 flow refers to an extremely low flow exceeded 95% of the time, which could be used to predict a worst case (i.e., highest) concentration in a reach or catchment. The latest development is to model multiple, similarly acting, chemicals in a catchment and then convert these concentrations to a summated biological effect across the different parts of a catchment (23)

Major Complicating Factors Which Hinder the Accuracy of Point Source GIS Water Quality Models Assessing the Consumption and Discharge of a Chemical. The point source delivery part of a model starts with assumptions on the amount of a chemical, or drug, consumed (with natural hormones it is simply what is excreted). Unfortunately, consumption values are often hard to obtain, and using the consumption value of one country and assuming it is the same in another country would be inappropriate. For example, it appears that Germany consumes almost double the quantity per capita of diclofenac (an anti-inflammatory drug) than Britain (16), whereas it is hardly used at all in Canada (24). Even where the annual drug consumption in a country is known, there can be further prediction problems. Drug consumption for some compounds, such as anti-inflammatory drugs, can be considerably higher for the 75 and older age group than for the rest VOL. 42, NO. 15, 2008 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

5391

of the population (25), thus towns with a high proportion of elderly people may discharge more of these compounds than might have been expected. Consumption of personal care products can fluctuate with the season, such as the use of insect repellents and sun creams (26). Some chemicals have both a domestic (per capita) origin as well as an industrial one, thus, the model could underestimate the input if it considered only the human source, such as with EDTA (20, 27). Similarly, some pharmaceuticals, such as antibiotics, will also have an agricultural (diffuse) source. With pharmaceuticals and hormones it is necessary to review the medical literature to assess what proportion of free parent and conjugated parent are excreted. It is considered that the proportion of free parent, and parent conjugated with the glucuronide molecule, are most relevant as they are likely to remain, or become active in the environment (21). Unfortunately with some pharmaceuticals it is difficult to find information on the form in which they are excreted, or released. Nevertheless, as interest in this area of environmental research grows, so more and more information on consumption is becoming publically available to allow conservative model predictions to begin. Assessing Removal Rates in the Environment. Assessing removal rates for organic microcontaminants in sewage treatment (meaning here removal from the water phase) can be surprisingly difficult. For example, diclofenac removal in sewage treatment has been reported as varying from 0 to 75% (16), and for estrone in sewage from 0 to 98% (21). These wide variations in reported removal rates could be due to natural variations in sewage treatment performance, or practical problems (discussed later), or to analytical reasons. Different types of sewage treatment, such as trickling (biological) filters, are frequently less efficient at removing organic microcontaminants than activated sludge plants, which are in turn less efficient than plants with tertiary treatment (nitrification, denitrification) (21, 28–30). Does the modeler know what type of sewage treatment plant he, or she, is dealing with? The removal rate for sewage treatment may be considerably less in winter, particularly in cold climates, than in summer (31), and even one STP can, from day to day, have very different removal performances for the same chemical (32). Once the chemical has entered the receiving water, its biodegradation rate can also vary with location and season (33, 34). But the model can be set up with a wide probability distribution to accommodate such uncertainty. Also one of the advantages of models is that they can be run with different selected biodegradation rates and the output on the model predictions assessed (sensitivity analysis). It should be noted that some polar organic microcontaminants can occur in different species depending on their pKa and the pH of the water (i.e., protonated and/or deprotonated species) and so sorption by hydrogen bonding, or electrostatic interaction may be a more important process than had been expected. But overall, hydrology (dilution) is often seen to be the major determining factor. Data Input Errors. Aside from the hard scientific problems that need to be overcome, GIS catchment models require a large amount of data input on the location, size, human population, and flow for each STP. Questions can arise over whether the data were entered into the original database correctly, and whether it was transferred without further errors into the model database. Similarly, to get the hydrology right, considerable research may need to be taken on abstractions and industrial discharges, and these may be overlooked (35). Also, some rivers may have managed sections (weirs), or links to canals, where flows have an artificial influence. Where a rainfall-runoff model is used to generate the natural flows, problems may occur with estimations of evaporation in the warmest parts of the country where this can be a significant factor (36). But such errors that could 5392

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 42, NO. 15, 2008

affect the hydrology simulation can be checked by modeling a conservative determined such as boron or chloride for which monitoring data already exists. This is a good example of measurements and modeling working together.

Short Introduction to the Measurement of Organic Microcontaminants in Water A design of a monitoring program in the aquatic environment for organic microcontaminants should be based on physicochemical properties of the target contaminants, their routes and load variation entering the environment (e.g., diffuse or point sources) and the monitored systems (e.g., large rivers, small streams, lakes). Relevant properties of the analyte are chemical and microbiological stability (DT50), volatility, sorption affinity to sediments (Kd), water solubility, and pH dependency for (de)protonated forms (pKa). It has to be identified whether the target compounds are predominantly dissolved in the water phase, sorbed onto suspended matter, or if both compartments are playing an important role in fate. Appropriate analytical methods have then to be developed for the water phase and, if needed, for the solid phase. To date, the conventional chemistry analytical methods for organic microcontaminants are mainly based on liquid or gas chromatography (LC or GC) with detection by single or tandem mass spectroscopy (MS or MS/MS) after appropriate extraction, clean up, and derivatization procedures. For most of the organic microcontaminants these detection instruments enable limits of quantification (LOQ) down to 1-10 ng/L for the aqueous phase (1L) and down to 1-10 ng/g for solid matrices (1 g). Large sample volumes help to lower LOQs. In a few cases the water can be directly injected into an LC/MS/MS system avoiding any losses during sample preparation, but with the disadvantage of higher LOQs (37). Another method of analysis for monitoring organic microcontaminants in the environment is to use the science of immunology, such as in enzyme linked immunosorbent assays (ELISA (38, 39),). These hold the promise of extraordinary specificity and accuracy at low concentrations and can be performed by nonchemists, although they have their problems too, as will be alluded to later. Samples of rivers and streams should be collected at an appropriate frequency, taking into account the spatial distribution and the time resolution. The occurrence of human-derived organic contaminants in rivers and streams can vary at different time scales, i.e., seasonal (summer vs. winter), weekly (school holiday), daily (working days vs. weekends), or even hourly and can be driven by special events (e.g., rain events or spills). Therefore, integrated samples over time (e.g., a series of 24 h composite samples: time or flow proportional) are recommended (37). The installation of passive samplers can provide additional information (40).

Major Complicating Factors Which Can Hinder Accurate Measurement Where, When, and How to Take the Sample. Appropriate and representative sampling is a big, but far from insurmountable, challenge for measuring organic microcontaminants. The sampling location in rivers needs to be carefully considered, since any effluent entering the river will not initially be well mixed within the receiving waters. For lakes, the seasonal stratification has to be taken into account when sampling the water. For example, different concentration profiles of pharmaceuticals such as diclofenac and carbamazepine were detected in the stratified layers of Lake Greifensee (41). The errors based on inappropriate sampling may exceed the confidence interval based on measurements by many fold (42). For instance, the Rhine River at Mainz in Germany consists of two streams without lateral mixing; with

a preponderance of Rhine water nearest one bank and the water of its tributary Mainz nearest the other bank. Therefore, to measure the concentrations of pharmaceuticals in the Rhine River at Mainz, time proportional composite samples were taken at three locations which spanned this variation (43). Preserving the Sample Prior to Analysis. Analyte stability has to be ensured during sampling and storage prior to analysis to obtain representative and reliable results. Refrigeration on its own is often inadequate (44). In general, adding preservatives such as sodium azide, or adjusting to a pH 2 with HCl or H2SO4, efficiently inhibits microbial processes (e.g., prevents the oxidation of 17β-estradiol in water samples). However, frequently the optimum procedure is to perform the extraction immediately after sampling (37). Problems of Measurement in a Complex Matrix. Matrix effects are the main disadvantage of LC/MS detection associated with ionization interfaces such as electrospray ionization MS, where other nontarget organic molecules frequently cause reduced ionization of the target analytes (ion suppression) or elevated ionization (ion enhancement). Furthermore, matrix effects are known to reduce the recoveries during sample preparation. Both may result in incorrect quantitative results. Losses caused by matrix effects can reach very high values up to 90% (45). Matrix influences cannot be compensated by “off-line” calibrations over the whole method, since the matrix composition and matrix quantity differ from sample to sample. However, there are several common options for obtaining correct quantitative results for samples with difficult matrices. First, ion suppression and ion enhancement may be evaluated by dividing a sample in two and spiking one of the duplicates with a surrogate standard (deuterated or 13C-labeled compounds). Comparing the measurements between the spiked and nonspiked samples will inform the chemist if compensations are needed. Second, effective cleanup steps to remove matrix components can be taken; third using an alternative interface (e.g., atmospheric pressure chemical ionization) which causes less matrix suppression, or fourth using smaller samples volumes or injection volumes (46, 47). Matrix effects can also afflict the accuracy of the ELISA but for different reasons, such as binding with humic acids (38, 39) and cross reacting with very similar molecules such as the estrogen conjugates (38). Detection Limits Which Are Not Low Enough. Insufficient sensitivity may result in LOQs which are higher than the concentrations derived from ecotoxicology that cause effects for some very potent pollutants. For example where there is a need to confirm whether an environmental quality standard (EQS) (e.g., such as at 0.1 ng/L required with EE2) is exceeded, the respective analytical method needs to be able to accurately measure concentrations significantly lower than EQS (down to 0.01-0.03 ng/L for EE2), in order to attain a sufficient statistical certainty. Based on the current GC/MS/ MS and LC/MS/MS technology, it is currently not feasible to quantify 0.01 ng/L EE2 in 1 L of (waste)water with the appropriate quality assurance. In these circumstances it may be necessary to increase the sample volume by a factor of 10-100. This requires the enrichment of larger volumes, up to 10s of liters (e.g., by SPE disks). However, the time associated with the individual samples become much higher and, hence, the analytical capacity will be reduced. Interlaboratory Comparisons. When different laboratories measure subsamples of an original sample, it can give valuable information on the overall confidence which can be placed in measurements of that specific determinand. Van Leeuwen et al. (48) reported on an interlaboratory comparison of measurements of perfluorinated contaminants in different environmental matrices by 38 separate laboratories using largely state of the art equipment. In a brackish water sample spiked at 20 ng/L, the average reported

values were 30-40 ng/L but the range was between 4 and 190 ng/L, almost 2 orders of magnitude difference. The authors concluded that neither the experience of the participating laboratory, nor the level of its equipment, was a guarantee of accurate results with these compounds. De Boer and Cofino (49) reported on an interlaboratory comparison of prepared tissue and sediments spiked with polybrominated diphenylethers by 18 laboratories. Values for one of the congeners (BDE209) reported by some of the laboratories differed by an order of magnitude from the mean and target values. When a fish blood sample was divided up between eight participating laboratories, who all used radioimmunoassay techniques, to measure the 17β-estradiol (E2), values were between 0.08 and 1.8 ng/L, a 23-fold variation (50). In an interlaboratory comparison carried out by two laboratories in a survey of steroid estrogens in effluent (51), for the same sample, the two laboratories were within 1.1-3.1 fold difference of each other for estrone (E1). In a larger study by 14 laboratories, groundwater spiked at 5 ng/L EE2 yielded results of 4-16 ng/L by the participants using different chromatography mass spectrometry combinations (52). In an interlaboratory study carried out on several pharmaceuticals spiked into river water, or sewage effluent, at 100-300 ng/L concentrations by four to nine laboratories (depending on analyte), 2-fold differences and relative standard deviation ranging up to 60% were reported (53). Overall, it is encouraging that when several laboratories measure the same sample, the mean of all their results is frequently close to the spiking concentration. But in an environmental study it is usually only one laboratory, and not several, doing the analysis. So we should be wary of placing undue reliance on chemical data emanating from a single laboratory using a single technique, particularly where the target analyte is present at low ng/L, or sub-ng/L concentrations in a complex matrix. It would be fair to also compare different water quality models and their predictions for the same catchment. This has been done for pesticide leaching from soil, where three models given the same driving data sets gave predictions within 1 order of magnitude of each other, but not yet for GIS water quality models (54). EE2 As a Case Study of Problems in Analysis. It has been established that EE2 can have damaging endocrine disruption effects on fish at very low concentrations. The lowest concentration shown to cause a significant effect is less than 1 ng/L (55), and with sex reversal possible in male fish at 3 ng/L (56). This is a good example of a chemical where it is vital that environmentalists and regulators get high quality information to assess the danger to their fish populations. Just to set the scene, if the assumptions of Johnson and Williams (21) that 17% of women are taking 26 µg EE2/day and excreting 40% of it are accepted, this would suggest a per capita normalized discharge of 890 ng/day. Of course this percentage does vary a little between countries, as suggested by WHO reviews of contraceptive use, but there is a continuing trend to lower the EE2 dose in many contraceptive products. A typical wastewater discharge per capita value is 200 L/days (although often more, particularly where industrial waste streams also enter the sewage), which would suggest 4 ng/L EE2 would be found in a raw sewage influent stream, and 0.8 ng/L EE2 in the effluent if 80% removal takes place in sewage treatment (21). So if the receiving stream was composed entirely of effluent this would give us only 0.8 ng/L EE2 in the receiving water, or 4 ng/L EE2, if sewage treatment had failed to remove any EE2 from the waste stream. Therefore, it is surprising to find reports of EE2 being detected at concentrations up to 178 ng/L in sewage effluent (57), up to 273 ng/L in some U.S. streams (58), at 5 ng/L in a river estuary (59), and even 2 ng/L in drinking water (60), by researchers using GC/MS equipment. However, both Ternes et al. (61) and Huang and Sedlak VOL. 42, NO. 15, 2008 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

5393

(38) warned of a co-elutant in sewage that could cause overestimation of EE2 concentrations with the use of GC with single MS. Braun (62) elucidated that the silylated lignoceric acid, a common fatty acid, has similar retention times and the same mass as silylated EE2. This indicates that only GC tandem MS or GC ion trap MS in at least the MS2 mode are suitable. Indeed, with the interlaboratory comparison of Esperanza et al. (52) the EE2 measurements furthest from the mean (1.4 ng/L) in raw sewage were from participating laboratories using GC, or LC with single MS (4-50 ng/L). An analytical method was applied for the analysis of natural estrogens and EE2 down to a LOQ of 0.10 ng/L and a LOD of 0.050 ng/L in aqueous matrices after solid phase extraction of 10 L samples, silica gel cleanup and derivatization by MSTFA using GC/ion trap MS detection. With a detection level as low as 0.05 ng/L no EE2 were found in samples from the chosen surface waters receiving significant proportions of STP effluents (63). Using LC tandem MS, many nondetects in effluent for EE2 are reported (22, 63–67). Given the sub-ng/L EE2 concentrations predicted for most receiving waters this would be expected. But even so, use of LC tandem MS has not meant the end of occasional surprisingly high EE2 measurements being reported, such as 2-18 ng EE2/L reported in coastal areas of the Baltic sea (68, 69). However, Beck et al. (69) noted that the same water samples with high apparent EE2 concentrations did not correspond with an estrogenic response with the YES test. Schlu ¨ sener and Bester (70) reported, also using LC Tandem MS, false positive results might occur close to 1 ng/L, if only one precursor to product ion transition is used. The false positive results may be caused by co-extractants which are frequently unknown. Even standard addition would not help in those cases. Another problem seems to be the definition of the LOQ. It has to be confirmed that the analytical method is really appropriate to quantify close to the LOQ with reasonable errors. Otherwise an error of a factor of 2 or even more can easily occur and might lead to an overdetermination. Therefore, although both GC tandem MS and LC tandem MS can be used for EE2 analysis down to the 1 ng/L range and in a very few cases even below, the results have to be thoroughly confirmed in order to exclude false positive results. It might have been hoped that the high specificity of ELISA could have resolved many of these problems. Schneider et al. (39) compared ELISA with LC tandem MS and found results from the two methods differed by factors of 1.2-4 for the same sewage effluent samples. They had previously warned that ELISA had cross-reaction problems with EE2 conjugates and also with humics, leading to potential overestimations.

Attempts to Corroborate Point Source Model Predictions with Measurements; How Well Do They Agree? Unfortunately there is something of a philosophical problem here, since a test of a model against measured data assumes the measured data are correct, which they may not be, particularly when the chemical is present at a very low concentration (the focus of this review). Nevertheless it is important to examine just how close the two approaches come to agreement when tested in the field. Predictions for Individual Point Sources. The first step in predicting concentrations throughout a catchment is in making an accurate prediction of the effluent, or immediate receiving water, concentration (PECinitial) directly downstream of the point source. To compare observed against predicted the latter can be divided by the former to give a ratio, or accuracy factor, thus, a value above one would represent a situation where the model values were above the observed, and below one, where the model values were less than the observed. Vermeirssen et al. (71) found that an estrogen 5394

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 42, NO. 15, 2008

TABLE 1. Proximity of Model Predictions of Effluent Concentrations to Measured Values reference 73 71 44 22 16 20

determinands carbamazepine and diclofenac estrogen equivalents E1, E2 and EE2 E1 and E2 diclofenac and propranolol LAS, EDTA, triclosan

number of STPs 1a

mean predictive accuracy factorb 1-1.5

1 1 22

2 0.4-1.3 3.9 and 4.3

9

2.1 and 4.6

1

1.7, 1.1, 1.4

a

Comparison here was with the sewage influent, not effluent. b Assumes no error associated with the measured data.

excretion and effluent model (21) routinely overestimated effluent concentrations in one Swiss STP by a factor of 2 over a 48 day period. Using the same model, Johnson et al. (22) compared modeled effluent E1 concentrations with observed values for over 20 STP from a large scale England and Wales Environment Agency monitoring exercise. The original sampling data were from grab, rather than composite samples, so not ideal for this sort of comparison. In each case where a real measurement was recorded (39 data points from 22 STP), the mean predictive accuracy was within a factor of 3.9 (median 2.1) for E1 and 4.3 (median 3.2) for E2, with standard deviations of 4.9-5. Most recently Huo and Hickey (44) reported a series of estrogen sewage influent and effluent values for a single STP. When compared to the model (21) the mean predictive accuracy was 0.8 for E1 and E2, and 1.3 for EE2 in the influent, and 0.7 for E1, 1.4 for E2, and 0.4 for EE2 in the effluent. Using consumption, excretion, and fate and behavior information, Johnson et al. (16) developed a model to predict the concentrations of the pharmaceuticals diclofenac and propranolol in STP effluent and compared this with effluent monitoring data found in the literature for nine STPs. In this case the predicted mean values were within 2.1 and 4.6 of the observed effluent values, respectively. Alder et al. (72) examined the influents of German and Swiss STPs and compared the predicted and measured concentrations of several pharmaceuticals in Germany and Switzerland from literature values. In general, the predicted influent concentrations were in good agreement with measured values indicated by a deviation not higher than a factor 2 for most compounds. Heberer and Feldmann (73) used very precise consumption figures for two pharmaceuticals in a locality in Berlin, and found their predicted sewage influent loads were no more than a factor of 1.5 different from the measured values. Wind et al. (20) compared their predictions for linear alkylbenzene sulfonate (LAS), ethylenediaminetetraacetic acid (EDTA and triclosan for one STP effluent with measured values, and found that they were all within a factor of 2. Thus, model predictions for a range of human excreted and down the drain chemicals routinely predict within a factor of 5 of the measured values in the effluent, or immediate receiving waters (Table 1). Predictions at the Catchment Scale. The target, or benchmark, stated by the authors of GREAT-ER is that its predictions should be at least within a factor of 3 of the observed values. A benchmark determinand frequently used to test the ability of GIS hydrological models to predict discharges from point sources is boron. Boron is used in detergents in the form of sodium perborate tetrahydrate as a bleaching agent. It has the advantage of being relatively simple to analyze, although some natural sources can occur. Therefore, an input value is derived from an assessment on the consumption of detergent products. Schulze and Matthies

TABLE 2. Maximum Deviation of Any Catchment PEC from the Observed Measured Values reference (74) (27) (20) (17) a

location Aire/Calder, UK Rur, Germany Itter, Germany Several U.S. catchments

catchment area (km2)

boron

LAS

EDTA

triclosan

caffeine

1100 2500 40 50,000 to over 100 000