Subscriber access provided by UNIV OF CALIFORNIA SAN DIEGO LIBRARIES
Article
Acute toxicity prediction to threatened and endangered species using Interspecies Correlation Estimation (ICE) models Morgan M Willming, Crystal R Lilavois, Mace G. Barron, and Sandy Raimondo Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.6b03009 • Publication Date (Web): 01 Sep 2016 Downloaded from http://pubs.acs.org on September 12, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 29
Environmental Science & Technology
1
Acute toxicity prediction to threatened and endangered species using Interspecies Correlation Estimation (ICE) models
2 3 4
Morgan M. Willming*†, Crystal R. Lilavois‡, Mace G. Barron‡, Sandy Raimondo‡
5 6 †
7 8 9 10
‡
Oak Ridge Institute for Science and Education, U.S. Environmental Protection Agency, Gulf Ecology Division, 1 Sabine Island Drive, Gulf Breeze, FL USA 32561
U.S. Environmental Protection Agency, National Health and Environmental Effects Laboratory, Gulf Ecology Division, 1 Sabine Island Drive, Gulf Breeze, FL USA 32561
11 12
*Corresponding author:
[email protected]; Phone: 850-934-9297; Fax: 851-934-2406
1 ACS Paragon Plus Environment
Environmental Science & Technology
13 14
Page 2 of 29
Abstract Evaluating contaminant sensitivity of threatened and endangered (listed) species and
15
protectiveness of chemical regulations often depends on toxicity data for commonly tested
16
surrogate species. The U.S. EPA’s internet application Web-ICE is a suite of Interspecies
17
Correlation Estimation (ICE) models that can extrapolate species sensitivity to listed taxa using
18
least squares regressions of the sensitivity of a surrogate species and a predicted taxon (species,
19
genus, or family). Web-ICE was expanded with new models that can predict toxicity to over 250
20
listed species. A case study was used to assess protectiveness of genus and family model
21
estimates derived from either geometric mean or minimum taxa toxicity values for listed species.
22
Models developed from the most sensitive value for each chemical were generally protective of
23
the most sensitive species within predicted taxa, including listed species, and were more
24
protective than geometric means models. ICE model estimates were compared to HC5 values
25
derived from Species Sensitivity Distributions for the case study chemicals to assess
26
protectiveness of the two approaches. ICE models provide robust toxicity predictions and can
27
generate protective toxicity estimates for assessing contaminant risk to listed species.
28
2 ACS Paragon Plus Environment
Page 3 of 29
29 30
Environmental Science & Technology
Introduction The U.S. Endangered Species Act (ESA) requires EPA to determine risks of pesticides
31
and other chemicals to federally endangered and threatened (listed) species to ensure that
32
chemical registration and water quality criteria are protective of listed taxa. A significant
33
challenge in this process is determining the sensitivity of the diversity of listed species to
34
chemicals that have only been evaluated using common test species or have limited data
35
available. Historically, toxicity data for the majority of listed species or closely related
36
representatives has been unavailable because of a lack of standardized culture and test methods
37
and limited organism availability. However, previous research suggests that listed species are not
38
consistently more sensitive than more commonly tested species1, 2. In the absence of species-
39
specific toxicity data, conservative approaches such as applying generic safety factors to toxicity
40
values of surrogate species have been used to develop hazard levels assumed to be protective of
41
listed species. Recently, the U.S. National Research Council3 recommended the use of
42
Interspecies Correlation Estimation (ICE) models in pesticide risk assessments as an alternative
43
to generic safety factors.
44
ICE models estimate acute toxicity to aquatic or terrestrial organisms using the known
45
toxicity data of a chemical for a surrogate species4, 5. The models are log-linear least squares
46
regressions of the sensitivity of a predicted taxon (species, genus, or family) and a surrogate
47
species. They are constructed from existing acute toxicity values determined in each pair of taxa
48
across a number of chemicals. ICE models for aquatic species contain acute toxicity values
49
(median lethal or median effect concentrations; LC50/EC50) for a minimum of any three
50
different chemicals and have been demonstrated to be robust, accurate estimators of toxicity
51
when the surrogate and predicted taxa are within the same taxonomic Order4. Species, genus, and
3 ACS Paragon Plus Environment
Environmental Science & Technology
52
family level ICE models are publically available on the U.S. EPA internet application Web-ICE
53
(www3.epa.gov/webice).
54
Page 4 of 29
ICE model predictions are intended to supplement toxicity databases where species of
55
concern or a required diversity of species have not been or cannot be adequately tested6. Previous
56
studies evaluating the robustness and application of ICE model predictions have focused
57
primarily on species level models and their use in either direct toxicity estimation or in
58
developing species sensitivity distributions (SSDs)4-8. Less work has explored the protectiveness
59
of ICE predictions in risk assessments for listed species or the use of genus and family models.
60
Few species-specific ICE models can be developed for listed species due to limited existing
61
toxicity data, therefore genus or family models may be required to predict toxicity to the higher
62
taxonomic level. Genus and family models have historically been developed using geometric
63
means of toxicity values from multiple species, which are useful in generating toxicity estimates
64
in cases such as water quality criteria development. However, the protectiveness of these
65
predictions across the range of species sensitivity within the predicted taxon has not been
66
investigated, especially for listed or sensitive species. Demonstrating protectiveness of model
67
estimates is a critical step for their application in risk assessment of listed species.
68
The present study evaluates the use of ICE models for listed species and compares the
69
protectiveness of genus and family level models developed from either geometric means or
70
minimum toxicity values. Models used in this evaluation were developed from an acute toxicity
71
database that was substantially expanded from the previous version of Web-ICE (v3.2, release
72
April 2013) and contains 314 species and 1501 chemicals, including new toxicity records for
73
listed species. Because there are 87 federally listed species of freshwater unionid mussels, which
74
are often the most sensitive taxa to some contaminants9, development of models for this taxon
4 ACS Paragon Plus Environment
Page 5 of 29
Environmental Science & Technology
75
was a focus for the present study. We compare the development of genus and family models
76
using the most sensitive toxicity value for each chemical within the predicted taxon (minimum
77
models) to models built with geometric means in order to evaluate an approach for generating
78
protective estimates of toxicity. Models developed with minimum toxicity values were expected
79
to result in toxicity predictions that are protective of the most sensitive species within a taxon.
80
We also revised model selection guidelines to reduce reliance on professional judgement and
81
increase reproducibility of their application. Lastly, we demonstrate the application of models in
82
a case study of 11 federally listed species exposed to a diversity of contaminants with a range of
83
aquatic toxicity modes of action (MOA) and compared ICE predictions to hazard concentrations
84
derived from species sensitivity distributions (SSDs).
85 86
Methods
87
Database development
88
Standardization of toxicity data for inclusion in ICE models followed the approach and
89
selection criteria described in Raimondo et al.4 This included confirming chemical and species
90
identity; compiling acute toxicity values as 48 hour EC50/LC50 for specific invertebrate taxa
91
(e.g., daphnids, fairy shrimp) or 96 hour EC50/LC50 for fish, amphibians and other aquatic
92
invertebrates (e.g., insects); and determining whether each data record met standardization
93
criteria for life stage, test conditions, and water quality parameters (e.g., temperature, dissolved
94
oxygen, salinity)4. The database included records from the previous Web-ICE database appended
95
with new records from ECOTOX (www.epa.gov/ecotox/; downloaded September 2014), and
96
recently collected primary data on freshwater mussels9 and fairy shrimp (unpublished data).
97
Acute toxicity data, which can represent intrinsic species sensitivity, were used because of the
98
large diversity of chemicals and taxa available. Data were standardized for life stage by using 5 ACS Paragon Plus Environment
Environmental Science & Technology
Page 6 of 29
99
only juveniles for fish and decapods, immature aquatic life stages for amphibians and insects,
100
and juveniles and spat for molluscs. All life stages were included for other taxa groups except
101
egg or embryo stages. Specific aspects of the mussel toxicity dataset are detailed in Raimondo et
102
al.9. All chemicals in the ICE database were curated using the distributed structure-searchable
103 104
toxicity database (DSSTox; www.epa.gov/ncct/dsstox/). A single name and the confirmed
105
chemical abstract services registry number from the source material were checked against
106
DSSTox to validate their consistency. Names that were not contained within the DSSTox list of
107
synonyms for a particular chemical were manually checked to validate the agreement between
108
the chemical identifiers and confirm the chemical-data linkage. The ICE database toxicity value
109
was specified as the compound tested, except for some metal salts (recorded as the element or
110
hardness normalized element), pentachlorophenol (pH normalized) and ammonia (pH and
111
temperature normalized). Only records for chemicals with an active ingredient purity of ≥ 90%
112
were accepted. Open ended (e.g., LC50 > 100 mg/L) or unconfirmed toxicity values were not
113
used.
114 115 116
ICE Model Development Models were developed as least squares log-linear regressions by pairing all possible
117
surrogate species toxicity values with all possible predicted taxa toxicity values (i.e., a species,
118
genus, or family) by common chemical. Each ICE model describes the relationship of sensitivity
119
of the two species as Log10(Predicted Toxicity) = a*Log10(Surrogate Toxicity)+b.
120
Here, Predicted Toxicity is the EC/LC50 value of the predicted taxa, Surrogate Toxicity is the
121
EC/LC50 value of the surrogate species, a is the slope of the regression line and b is the
122
intercept. A slope of 1 describes two species whose relative sensitivity is consistent across a wide 6 ACS Paragon Plus Environment
Page 7 of 29
Environmental Science & Technology
123
range of chemicals. A slope of 1 and an intercept of 0 describes two species with identical
124
sensitivity. Each ICE model is also described by an R2 and a Mean Square Error (MSE). The R2
125
is a measure of goodness of fit of the regression, where a value closer to one represents a model
126
where data are tightly fit around the line and variability in the data are well represented by the
127
model. Model MSE is an estimate of average error contained within the model. Small MSE
128
indicates that the model is robust, or contains high probability of good performance for data
129
drawn from a wide range of samples.
130
Each model required toxicity values from at least three different chemicals available in
131
the database for both the surrogate species and predicted taxon. Therefore, each model data point
132
represents a unique chemical. For species level models, when multiple database records occurred
133
for the same species and chemical, the geometric mean of toxicity values was used. For genus
134
and family level models, two sets of models were developed: geometric mean and minimum
135
toxicity models. First, all species available in the database within a genus or family were
136
identified by common chemical. For geometric mean models, the species mean acute value
137
(SMAV; geometric mean for each chemical) was calculated. For genus geometric mean models,
138
the genus mean acute value was then calculated for each chemical as the geometric mean of the
139
SMAVs. For family geometric mean models, the family mean acute value was calculated for
140
each chemical as the geometric mean of the SMAV. For minimum toxicity models predicting to
141
both genus and family, the minimum toxicity value of all species within each predicted taxa was
142
used for each chemical. This second approach using the most sensitive toxicity value was
143
intended to generate toxicity estimations protective of more species within that genus or family
144
compared to geometric mean models. For both sets of genus and family level models, the
7 ACS Paragon Plus Environment
Environmental Science & Technology
145
predicted taxon toxicity value was paired with the SMAV of the respective chemical for the
146
surrogate species.
147
Page 8 of 29
For ICE models with four or more chemicals, model prediction accuracy was determined
148
using leave-one-out cross-validation4. In this process, each data point within a model (e.g.,
149
chemical pair of acute values for surrogate and predicted taxon) was systematically removed
150
from the original model and a new model was rebuilt with the remaining data. The new model
151
was used to estimate the toxicity value of the removed predicted taxa from the removed
152
surrogate species toxicity value. For each removed data point, the N-fold difference was
153
calculated as the greater value of the estimated/actual or actual/estimated for non-transformed
154
values. For each model, the number of removed data points predicted within 5-fold of the actual
155
value was determined and a cross-validation (prediction) success rate was calculated as the
156
percentage of removed values predicted within 5-fold of the actual value 4. The 5-fold range
157
represents standard inter-laboratory variation of a toxicity value tested for the same species and
158
chemical as demonstrated in previous studies4, 5, 9-11.
159 160 161
Model Selection Guidelines Model selection guidelines were developed to reduce the amount of professional
162
judgement needed to select the best available model when multiple ICE models (i.e., multiple
163
surrogate species) are available for predicting to one taxon. Details of guidance development are
164
provided in Supplementary Information (SI 001). Briefly, selection guidance was determined
165
from the combination of regression model attributes (MSE, R2, and slope) that corresponded to
166
the highest percent of cross-validated data points predicted within 5-fold of the measured value
167
using an iterative approach. The highest MSE, lowest R2 and lowest slope that corresponded to
168
the highest percent accuracy were then applied as the model selection guidelines. 8 ACS Paragon Plus Environment
Page 9 of 29
Environmental Science & Technology
169 170 171
Evaluation of Web-ICE for Listed Species A case study was used to assess the protectiveness of genus and family models applied to
172
listed species. The 11 selected species represented a diverse taxonomic range including fish,
173
amphibians, and fairy shrimp from the Sacramento California Valley12, and freshwater unionid
174
mussels from the Ohio River watershed and southeastern United States (Table 1). The 16
175
chemicals chosen for the case study represented priority chemicals encompassing a broad range
176
of MOAs (primarily pesticides from the Sacramento Valley) and contaminants of concern for
177
unionid mussels (Table 1). First, all minimum and geometric mean ICE models available for
178
predicting to the genera and families of the case study species were identified. For each of these
179
models, toxicity values in the Web-ICE database were extracted for each surrogate species and
180
case study chemical. When multiple toxicity records were available for the same surrogate
181
species and chemical, the SMAV was used as the model input. Toxicity estimates and 95%
182
confidence intervals (CI) were calculated from available surrogate species toxicity and models
183
for each chemical and taxa (genus, family). Where the taxa of a listed species could be predicted
184
from multiple surrogates, we applied the model selection guidance determined in the methods
185
outlined above to identify the best model (Fig. 1). According to the guidance, model selection
186
was first based on the taxonomic distance of the surrogate and predicted taxon, because previous
187
analyses indicated higher prediction accuracy for models from closely related taxa. Models were
188
then chosen with a low MSE (≤ 0.95), and a narrow 95% CI. If models had similar MSE, the
189
model with the greater N was selected.
190
Protectiveness of the toxicity estimates and their lower 95% CI predicted from the
191
selected model for the genus and/or family of each listed species was determined by comparing
192
these values to the measured minimum toxicity value for that taxon available in the ICE 9 ACS Paragon Plus Environment
Environmental Science & Technology
Page 10 of 29
193
database. For example, the model prediction to the family Salmonidae for malathion was
194
compared to the minimum Salmonidae LC50 (i.e., the most sensitive measured species value for
195
all salmonids) for malathion available in the ICE database. If the model prediction was greater
196
than the minimum toxicity value in the database, then the lower 95% CI of the prediction was
197
compared to the minimum taxa toxicity value. For each listed taxa and chemical, it was
198
determined if the prediction was protective, the lower 95% CI was protective, or models were
199
not protective at all. This was performed for genus and family models developed using both
200
minimum toxicity values and geometric means.
201
Species sensitivity distributions (SSDs) were developed for each chemical to compare the
202
5th percentile hazard concentrations (HC5) values to ICE generated predictions and sensitive taxa
203
values for the case study species. SSDs were developed from freshwater species data available in
204
the ICE database for each case study chemical2, 6. Toxicity data selection followed the same
205
standardization criteria as ICE database development, using the SMAV for each chemical to
206
build the SSD. SSDs were fit using maximum likelihood estimation in R statistical software13 to
207
a log-logistic cumulative distribution so that Y = 1/{1 + exp[(α- X)/β]}
208 209
where Y is the cumulative proportion, X is the log-transformed toxicity, α is the intercept, and β
210
is the scale parameter. The HC5 values were derived from this distribution for each chemical and
211
were compared to the most sensitive taxa values and ICE minimum model predictions to assess
212
protectiveness of each approach.
213 214
Results
215
Database and model development
10 ACS Paragon Plus Environment
Page 11 of 29
216
Environmental Science & Technology
The Web-ICE database (v. 3.3) has an expanded number of toxicity records, species, and
217
chemicals, and yielded considerably more models, including those predicting to listed species,
218
compared to the previous version (Table 2). This suite of ICE models can predict toxicity values
219
to 25 listed species and 20 genera and 20 families containing U.S. federally listed species (Table
220
3). Overall, species model prediction accuracy and cross-validation success were consistent with
221
previous results for the previous dataset 4. Models developed with predicted and surrogate
222
species within the same family estimated toxicity within 5- and 10-fold of the actual value for 92
223
and 98% of data points, respectively (Table S1). All models are available on the EPA Web-ICE
224
application webpage (www3.epa.gov/webice).
225
226 227
Model selection & prediction guidance Analysis of model parameters (MSE, R2, and slope) indicated that models with MSE
0.6, and slope > 0.6 will have the highest prediction accuracy (Table S2). For models
229
with a taxonomic distance of 6 (same Kingdom) that have N > 10, models with MSE < 0.55 will
230
have improved prediction accuracy because of the increased variation in toxicity data associated
231
with less closely related taxa (Table S2). Distributions of each model parameter by taxonomic
232
distance are available in the Supporting Information (Fig. S1).
233
Figure 1 summarizes the procedures for model selection based on the model guidelines
234
when models for multiple surrogates are available. Web-ICE users should first select models
235
with the closest taxonomic distance and then apply the model parameter guidelines to identify
236
the best model. The analysis also indicated selection of models with confidence intervals within
237
5-fold of the predicted value, although using 3-fold slightly improves prediction accuracy, but
238
not substantially. 11 ACS Paragon Plus Environment
Environmental Science & Technology
Page 12 of 29
239
240 241
Case study analysis of model protectiveness For the case study of 16 chemicals and 11 listed species, there were 787 minimum
242
toxicity models and 807 geometric mean models from which predictions could be made from
243
available surrogate toxicity values and ICE models for the predicted taxa. Applying model
244
selection guidance resulted in 50 ICE predictions for the listed species for each set of models
245
(Supporting Information SI 002; Table S3). For both minimum and geometric mean models, 25
246
of the 50 predicted values were from family models and 25 were genus models. The Supporting
247
Information (SI 002) provides all available minimum models predicting to these taxa with the
248
best selected model identified. When model predictions were available for both a family and a
249
genus within that same family (e.g., Oncorhynchus and Salmonidae), we selected the family
250
model if both models had similar parameters (Table S4). Because family models are comprised
251
of greater taxa diversity, estimates derived from these models can be considered to be protective
252
of more species. In some cases, genus models may generate more conservative estimates, but
253
model selection should ultimately be guided by the goal of the particular analysis; if the goal is
254
specific to a species and location, a genus level model will more accurately reflect the sensitivity
255
of the species, if the goal is comprehensive protection of listed species, the family model will be
256
more inclusive.
257
The predicted values for genus level minimum models were protective of the most
258
sensitive species within the genus in 84% of cases and for 100% of instances when using the
259
lower 95% CI (Fig. 2). For family level minimum models, predictions were protective in 72% of
260
cases and the lower 95% CI was protective in 88% of instances (Fig. 2). Of the family level
261
models where the toxicity of the most sensitive species was less than the lower 95% CI of the
12 ACS Paragon Plus Environment
Page 13 of 29
Environmental Science & Technology
262
prediction (12%, N = 3 models), two of these models were at a taxonomic distance of greater
263
than 5 because no models with a closer taxonomic distance were available. For models
264
developed with geometric means, model predictions were protective of the most sensitive species
265
in 28% of cases for both genus and family models (Fig. 2). The lower 95% CI was protective in
266
50% of genus models and 48% of family models built with geometric means.
267
SSDs were developed for 13 chemicals that had sufficient data. Fipronil, glyphosate, and
268
imidacloprid each only had 4 standardized toxicity values, so SSDs were not constructed. The
269
number of species in the SSDs ranged from 7 (thiobencarb) to 66 (copper), and HC5 values
270
ranged from 0.08 to 1,389,897 µg/L (Table 4). SSDs with large N contained diverse taxa while
271
those with small N ( 0.6 and slope > 0.6
Fig. 1. ICE model selection guidance steps based on model parameters used to identify the best model when predictions from multiple surrogates are available. TD = Taxonomic distance of surrogate and predicted taxon; MSE = Mean square error; N = Number of data points in the model (df + 2); CI = Confidence interval.
27 ACS Paragon Plus Environment
Environmental Science & Technology
Page 28 of 29
Figure 2
Fig. 2. Comparison of model protectiveness for geometric mean and minimum genus and family models used with priority chemicals and taxa containing listed species in the Sacramento Valley and listed mussels. There were 25 genus model and 25 family model predictions for each set of models (geometric mean or minimum models).
28 ACS Paragon Plus Environment
Page 29 of 29
Environmental Science & Technology
TOC/Abstract Art
29 ACS Paragon Plus Environment