Practical Considerations on the Use of Predictive Models for

The fourth Setubal principle indicates that QSAR models should be .... ready biodegradability are 70% for tests based on dissolved organic carbon (DOC...
0 downloads 0 Views 290KB Size
Environ. Sci. Technol. 2005, 39, 2188-2199

Practical Considerations on the Use of Predictive Models for Regulatory Purposes J A Y T U N K E L , * ,† K E L L Y M A Y O , ‡ CARLYE AUSTIN,‡ AMY HICKERSON,‡ AND PHILIP HOWARD† Syracuse Research Corporation, Environmental Science Center, 301 Plainfield Road, Suite 350, Syracuse, New York 13212, and Syracuse Research Corporation, Environmental Science Center, Crystal Gateway, Suite 405, 1215 South Clark Street, Arlington, Virginia 22202

Interest in the use of quantitative structure-activity relationships (QSARs) for regulatory purposes has been growing steadily over the years, and many models have been evaluated under the guidance and acceptability criteria defined at the Setubal workshop held in March 2002. This work explores some of the practical issues related to the use of QSARs for regulatory purposes using results obtained from rat oral lethality and fish acute toxicity estimates generated from computational models (including TOPKAT, MCASE, OASIS, and ECOSAR). Using data submitted under the Environmental Protection Agency’s (EPA’s) High Production Volume (HPV) Challenge Program, the results on the quality of the estimations are compared using a standard statistical review and an additional classification approach in which the hazard predictions were grouped using well-defined regulatory criteria (those used in EPA’s New Chemical Program). Our results indicate that an evaluation of a model’s regulatory applicability and predictive power is ultimately dependent on the specific criteria used in the assessment process. This work also discusses the practical difficulties associated with defining the domain of a predictive model using the estimates of four different ready biodegradation models and experimental data submitted under the EPA’s New Chemical program. Our results suggest that the method a model employs for its predictions is as important as the training set in determining its domain of applicability. Together, these results highlight the challenges associated with developing reliable and easily applied acceptability criteria for the regulatory use of QSAR models.

Introduction Predictive models in the form of quantitative structureactivity relationships (QSARs) provide estimates for physical/ chemical properties, reaction rates, and biological activity of chemical substances. They are based on the concept that the activity of a substance is a function of its structure and that the activity can be determined through mathematical relationships developed from architecturally similar compounds. Traditionally, these activity data have been generated * Corresponding author phone: (315)452-8436; fax: (315)452-8440; e-mail: [email protected]. † Syracuse Research Corp., Environmental Science Center, NY. ‡ Syracuse Research Corp., Environmental Science Center, VA. 2188

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 7, 2005

by experimental testing of the chemical substances using a number of well-defined protocols. With laboratory costs and animal testing on the rise and the need to evaluate many more chemicals, regulatory agencies and other decision makers have become increasingly interested in new methods for determining the potential fate and effects of chemical substances. This is especially true for the toxicity and environmental fate assessments, where the appropriate experimental studies can be very expensive and span many months. In addition, the use of QSARs to predict the toxicity of chemical compounds has the potential to reduce the use of animal testing. A workshop held in Setubal, Portugal in March of 2002 (1) was designed to develop more definitive guidance on the development and use of QSARs (2). The regulatory use of QSARs includes the screening of existing chemical inventories and the assessment of new chemical substances (3, 4), although they also hold great promise in the identification of safer materials through the application of the principles of green chemistry (5) and green engineering (6), especially at the research and development phases of new product design (7). The consensus of the Setubal workshop resulted in the development of six basic principles on the use of QSARs in regulatory and other decision-making frameworks. The predictive model should: (1) be associated with a defined endpoint that it serves to predict; (2) take the form of an unambiguous and easily applicable algorithm for predicting a pharmacotoxic endpoint; (3) ideally have a clear mechanistic basis; (4) be accompanied by a definition of the domain of its applicability; (5) be associated with a measure of its goodness of fit and internal goodness of prediction estimated with cross validation or a method similar to a training set of data; and (6) be assessed in terms of its predictive power by using data sets that were not used in the development of the model. Within this broad outline, QSARs may be used in a much broader scope than is currently practiced (2), including risk assessment, chemical screening, and priority setting. The steady growth in the development and use of QSARs is also demonstrated in a recent edition of Environmental Toxicology and Chemistry (Volume 22, Number 8, 2003) that contains 23 review articles organized to cover estimation methods for physical/chemical properties, environmental fate, ecological effects, and biological activity as well as other important aspects of their development and use (8). These reviews provide an excellent discussion on many of the issues brought forward during the Setubal workshop including those that focus on the construction of QSARs (principles 1, 2, 3, and 5). A series of monographs arising from the Setubal workshop also discuss the mathematical and statistical aspects of building QSARs (9) and describe how the models used by international regulatory bodies were developed (3, 4). Unlike the body of work that has appeared on the construction of predictive models, there has been a scarcity of reports on methods that can be used to evaluate the predictive accuracy of QSARs for regulatory purposes using external data sets (sixth Setubal principle). One of the biggest challenges currently within the QSAR community is to develop methodology to quantify the quality of predictive models (10). In this work, we performed a comparative evaluation of a number of general predictive models using experimental data submitted under the U.S. Environmental Protection Agency’s (EPA) High Production Volume (HPV) 10.1021/es049220t CCC: $30.25

 2005 American Chemical Society Published on Web 02/19/2005

Challenge Program (11). Fish acute toxicity estimates obtained from TOPKAT, MCASE, OASIS, and ECOSAR (12-15) were compared to data for HPV chemicals not present in each model’s training set. A unique aspect of this work is that the results were assessed from two different perspectives. The first utilized a standard statistical approach based on the coefficient of determination for the experimental and estimated values. The second method employed a classification approach based on the regulatory criteria currently used in the EPA’s New Chemical Program (16) to assign the hazard concern level for substances entering the U.S. commercial market. Of the approximately 2000 chemicals that are evaluated for their potential hazard each year by the EPA, the vast majority of substances are assessed in the absence of experimental toxicity data (17-19). The criteria used in the hazard assessments represent the current practices, policies, and precedents of the EPA that have been developed through over 20 years of regulating new chemical substances submitted under Section 5 of the Toxic Substances Control Act (TSCA). Assessing the predictive accuracy of the models against these criteria provides important insight into the practical aspects of defining applicability within a defined regulatory context. We believe this work is the first comparative assessment of the predictive accuracy of different continuous models employing both statistical and regulatory criteria. Our results indicate that determining a model’s predictive power and its potential for regulatory acceptance is ultimately dependent on the specific criteria used in the assessment process. Rat oral lethality estimates from TOPKAT and MCASE were also evaluated against a data set of experimental values obtained from HPV submissions. Unlike the fish toxicity values, assessing the predictive accuracy of this endpoint is not amenable to standard statistical correlation given that current testing procedures limit the maximum dose given to test animals if lethality is not observed. The practical aspects of assessing the regulatory acceptability of predictive methods for rat oral lethality are addressed in this work using a classification approach based on criteria used in established decision-making frameworks. The fourth Setubal principle indicates that QSAR models should be accompanied by a definition of the domain of applicability. The assumption is that reliable estimates are only provided for chemicals within the domain of a model. A model’s domain arises from the training set used in its development (20), although no guidance on how to accomplish the task of defining a domain has yet been presented. In this work, we compare the results of four ready biodegradation models using an independent validation set of 370 structurally diverse chemicals. These results combined with the high degree of overlap in the training sets used to develop these biodegradation models provides a unique opportunity to address the practical aspects of defining a model’s domain of applicability.

Methods Data Collection under the HPV Challenge Program. Experimentally measured values for toxicity endpoints were obtained from the Robust Summaries and Test Plans submitted to the EPA under the HPV Challenge Program through December 31, 2002 (21). The Challenge Program is a voluntary initiative aimed at making uniform health and environmental screening information on HPV chemicals publicly available. Experimental data contained in HPV submissions are provided to characterize a chemical using the Organization for Economic Cooperation and Development (OECD) Screening Information Data Set (SIDS) endpoints (22). Submissions undergo a two-tier review process by the Agency to determine if the published or unpublished data meet a minimum standard of acceptability for the

program. The Tier I screening is used to identify the potential fitness of the data, while a Tier II analysis determines the pertinence of the data in describing the endpoint being measured. Extensive guidance on both the preparation and the review of the Test Plans and Robust Summaries is provided on the EPA’s website (21). Data collection was limited to those chemicals (and chemical classes) that had undergone the complete review process by the Agency. Test results included data obtained using either EPA or OECD methods as well as nonstandardized studies that, upon expert review, were determined to be of sufficient quality such that further testing would not be required for a given endpoint. Experimentally measured data were collected only from those studies reported to be adequate to characterize the endpoint of interest as published in EPA comments for each HPV submission (23). Studies reported in the HPV submissions were also subjected to further review before being included in this investigation. Each experimental value was reviewed by experts in the field specifically for their adequacy to evaluate the results obtained by predictive models. This included a determination of the chemical identity of each substance. Given that QSAR models perform their estimates on a discrete chemical structure, each experimental study was reviewed to establish that it was conducted on a single chemical substance (and not on a representative component of a mixture) and that the test material was of sufficient purity to eliminate confounding factors resulting from the presence of other substances. Therefore, data for polymers and mixtures were not included in this investigation. As a result, a number of studies found to be adequate for determining the testing needs of the commercial HPV substances were not adequate for the purposes of evaluating QSARs. Although data were not collected for inorganic compounds, as none of the QSAR methods were developed for this class of chemicals, data were collected for organometalics (excluding coordination compounds). Collection of Experimental Toxicity Data. Chemical Abstract Service (CAS) Registry numbers and experimentally measured toxicity values were abstracted from the Robust Summaries and Test Plans submitted through December 31, 2002 (21). Fish 96-h LC50 values in mg/L were collected for studies on fathead minnows (Pimephales promelas), the species-specific endpoint estimated by the QSAR models. A total of 90 experimental values were collected. When more than one experimental study was available for the species Pimephales promelas, the average value was used; for the six chemicals that had more than one experimental value, there was no significant difference when the geometric mean was used. One chemical, an isocyanate, was removed from the assessment because it was expected to significantly hydrolyze during the test period. A final data set of 32 discreet organic chemicals was derived after removing metal-containing compounds such as salts and complex mixtures. Salts were removed because some of the models will not provide estimates for these materials. The 32 chemicals span a broad range of substances covering 23 different HPV chemical categories (Table 1). Rat oral LD50 values were collected for different strains of rats, and a total of 132 studies were identified. When more than one experimental study was available, the average LD50 value was used for the chemical. Many of the LD50 values obtained from the summaries were greater than the limit dose for the experimental study. Normally, for nontoxic chemicals, values were only reported up to the limit dose of 2000 mg/kg. However, some older studies used a value of 5000 mg/kg (24). Therefore, many of the chemicals did not have single quantitative values for the endpoint and were simply reported as being greater than the highest dose tested. VOL. 39, NO. 7, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2189

TABLE 1. HPV Chemical Categories with Experimental Fish LC50 Data and Rat Oral LD50 Data

HPV chemical or category 1,2,3-trinitroglycerin 2-hydroxy-4-n-octoxybenzophenone 3-ethoxypropionic acid ethyl ester acetoacet-o-anisidide alkylphenols anethole aromatic terpene hydrocarbons benzothiazole-based thiazoles C6-C10 aliphatic aldehydes and carboxylic acids cinnamyl derivatives cyclohexanol dicamba and acifluorfen intermediates dicarboxylic acids dinitriles diethylene glycol dibenzoate diisopropylbenzenes dipropylene glycol dibenzoate fatty nitrogen-derived amides hindered phenols isodecyl/phenyl phosphate methanol methyl isoamyl ketone methyl n-amyl ketone m-nitrotoluene monoterpene hydrocarbons neoacids C5-C28 N-methyl acrylamides phosphoric acid derivatives phthalate esters substituted diphenylamines substituted p-phenylenediamines sulfenamide accelerators terpenoid tertiary alcohols and related esters thiodipropionates tricresyl phosphate trimellitates trixylenyl phosphate

no. of no. of chemicals chemicals with fish with rat oral LC50 data LD50 data 1 1 2 1

1 1 1 1 5

2 3

2 3 4

1 1

4 1 2

1 1 1 1 1 1 1 1 2 1 1 1 5 1 1

1 2 1 3 5 4 1 1 1 1 2 1 3 2 3 5 6 2 1 2 1

These chemicals were included in the data set, but this prevented statistical analysis of the coefficient of determination (r2) values for predictive accuracy as discussed later. After removing salts, complexes, and chemicals with representative structures, a final data set of 73 discrete organic chemicals covering 32 HPV chemical categories was derived (Table 1). The rat lethality and fish acute data sets are available from us upon request; an appropriate clearance is required for access to the biodegradation data set because it contains TSCA confidential business information. Ready Biodegradation Data. Experimental ready biodegradation data were collected from approximately 16 000 Premanufacture Notice (PMN) submissions received by the EPA from fiscal year (FY) 1995 through FY2002 (25). After removing mixtures, complexes, and chemicals containing salts, a resultant data set of 370 structurally diverse PMN substances was derived. The PMN data set contained 94 ready biodegradable (RB) and 276 not ready biodegradable (NRB) substances with at least one test result from a ready biodegradability test (25). The PMN biodegradation database was restricted to structures with molecular weight 0.5 were considered as RB, and those of 0.7 were interpreted as RB, and those of 50% BOD was considered RB (25). MCASE v.MC7-12-2002 (Multicase Inc., Beachwood, OH) - MCASE does not contain a module for predicting ready biodegradability. The MCASE program has a feature to automatically build an internal model when the system is provided with a data set of experimental values and chemical structures (SMILES notation) for an endpoint (36). Using the model building capabilities of MCASE, a new ready biodegradation model was built with the training set used to develop the EPISuite MITI model (39). MCASE does not accept organometallics, and, therefore, 16 chemicals from the BIOWIN MITI training set could not be used to build the new MCASE model. Once the model was built, predictions on the PMN data set were obtained and chemicals with a resulting estimate of 0.5 were counted as RB. A similar model created with MCASE using a training set of 200 chemicals that contain at least one benzene ring has been reported (40).

Results Acute Toxicity Results. External validation of QSARs requires that the model estimates be compared to measured toxicity values for chemicals that were not included in the training set (10). Therefore, chemicals from the validation set that were identified to be in the training sets for the individual

models were removed before an evaluation of the predictive accuracy was completed. For some of the chemical substances in the experimental data sets, the models were not capable of providing an estimate. This generally occurred when the computer model could not recognize an unusual functional group. These chemicals were not included in the evaluation of each model’s predictive accuracy, although an important aspect of this evaluation was to determine what percentage of the chemicals could be run through each estimation method. Chemicals flagged by MCASE as containing either a structural component that could not be recognized or a fragment that was not statistically significant were counted as chemicals that the model could not provide a reliable estimate for and were not included in the subsequent analysis of the predictive accuracy. Similarly, for TOPKAT, chemicals that were flagged as being outside of the optimal prediction space (OPS) of the model were removed before analysis of the predictive accuracy was performed. Both TOPKAT and OASIS provided a message to the user if the program was not capable of performing an estimate on a given chemical (or functional group), and these compounds were removed from the analysis of those models. The TOPKAT OPS is a mathematical description that determines if a chemical falls within its predictive space. The ECOSAR results were scrutinized to identify those chemicals that did not have a class-specific 96-h equation for fish acute toxicity. Given that the toxicity estimation methods were developed using different training sets, and that some chemicals produced a result in one or more models and were not accepted by others, the numbers of chemicals ultimately used to evaluate the predictive accuracy of each program were not the same (Figures 1 and 2). Fish LC50 Estimates. Two separate types of assessments were done to evaluate the predictive accuracy for each of the four individual QSAR models. The first approach was based solely on a statistical analysis of the data set comparing experimental values for the chemicals to the estimates provided by the programs. See Figure 1 for r2 values for ECOSAR, TOPKAT, OASIS, and MCASE. Both ECOSAR and OASIS had r2 values of 0.74, although OASIS was only able to provide estimates for less than one-half of the chemicals in the data set. ECOSAR determined estimates for all but two chemicals. For one of these chemicals, the program did not contain an appropriate class-specific 96-h SAR, and the prediction for the others was determined to be well above the water solubility of the material. TOPKAT and MCASE did not perform as well with r2 values of 0.14 and 0.53, respectively. A second assessment of the predictive accuracy was then performed using a classification approach by assigning high, VOL. 39, NO. 7, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2191

FIGURE 2. Model estimates for the rat LD50 endpoint for the HPV data set.

TABLE 2. Model Accuracy for the Fish LC50 Endpoint Using the Classification Approach on the HPV Data Set correct predictions

ECOSAR ) 0.74) TOPKAT (r2 ) 0.14) OASIS (r2 ) 0.74) MCASE (r2 ) 0.53) (r2

low concern >1000 mg/L

moderate concern 1-100 mg/L

high concern 2000 mg/kg

500-200 mg/kg

18/22 (82%) 30/30 (100%)

2/6 (33%) 0/11

50-500 mg/kg

moderate, and low concern levels to the quantitative estimates using criteria set forth by the EPA in the New Chemical Program for acute aquatic toxicity (41). A low concern level is identified for chemicals with an LC50 value >100 mg/L, a moderate concern level for chemicals with an LC50 value between 1 and 100 mg/L, and a high concern level for those chemicals with an LC50 value 2000 mg/kg), a statistical assessment of the models could not be undertaken. Therefore, the predictive accuracy was subsequently assessed using only a classification approach. The estimates were assigned to one of five categories that the EPA would use in the New Chemical Program to assign health hazard concern levels, and the number of chemicals correctly classified using these 2192

9

high concern

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 7, 2005

15-50 mg/kg 0/1 0/1

2000 mg/kg are assigned a level 1 (low) concern, 500-2000 mg/kg are level 1-2 (low to moderate) concern, 50-500 mg/kg are level 2 (moderate) concern, 15-50 mg/kg are level 2-3 (moderate to high) concern, and chemicals with values 2000 or >5000 mg/kg). However, the predictive models do not truncate their estimates at this limit dose, and, therefore, assessing the accuracy of the predicted value using a purely statistical analysis would be, at best, problematic. That is, how does one obtain a meaningful correlation coefficient when plotting a measured experimental value of >2000 mg/kg against an estimated value of 5900 mg/kg? Similarly, it is not possible to determine if an estimated value

FIGURE 3. Graph of the ECOSAR model results for the fish LC50 endpoint with new chemical program criteria for high, moderate, and low hazard concerns.

FIGURE 4. Graph of the TOPKAT model results for the fish LC50 endpoint with new chemical program criteria for high, moderate, and low hazard concerns. is within a factor of 2, or a factor of 10, of an experimental value reported as >2000 mg/kg because the true value was not determined experimentally. Therefore, assessments of predictions for this endpoint based on a statistical analysis cannot adequately account for experimental values in the low toxicity range. In contrast, the predictive accuracy can be determined effectively using the classification approach, as demonstrated using the rat oral LD50 results. Table 3 provides the appropriate hazard categories obtained from MCASE and TOPKAT using the five-tier criteria used in the EPA’s New Chemical Program (42). Overall, TOPKAT returned a correct

hazard concern for 67% of the chemicals, while MCASE returned a correct call for 70% of the materials. However, it should be noted that the HPV data set for rat oral LD50 values is heavily skewed toward low concern chemicals, which both models predicted correctly with a high degree of accuracy (82% and 100% correct, respectively). The models did not perform as well on the moderate and high hazard chemicals. TOPKAT predicted 33% of the six moderate hazard materials correctly, but did not provide a correct result for the one high hazard compound in the data set. MCASE returned an incorrect prediction for all 12 moderate and high hazard compounds. Further inspection reveals that MCASE predicted VOL. 39, NO. 7, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2195

FIGURE 5. Graph of the OASIS model results for the fish LC50 endpoint with new chemical program criteria for high, moderate, and low hazard concerns.

FIGURE 6. Graph of the MCASE model results for the fish LC50 endpoint with new chemical program criteria for high, moderate, and low hazard concerns. that all 43 compounds assessed would be inactive (low concern). The high degree of false negatives for moderate and high concern HPV chemicals (TOPKAT, 72%; MCASE, 100%) suggests that these programs might not be useful in a regulatory decision-making process designed to be protective of human health. As work on the collection of additional rat LD50 values from HPV submissions continues, a less biased data set will likely result. Nevertheless, the HPV data set used in this investigation is comprised of materials that possess a wide variety of chemical structures and functional groups (Table 1) representing 32 different HPV categories. 2196

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 7, 2005

The OECD has also published a set of criteria for the rat LD50 endpoint (46) that are similar to those used by the EPA. The OECD used a five-tier scheme, 99% (the training set used in the development of TOPKAT has not been published). Therefore, for all practical purposes, the training sets for these two programs are identical. The BIOWIN MITI linear and nonlinear models were developed on a randomly selected subset of 564 chemicals from the MITI data set, and, after 16 metal-containing compounds were removed, it was also used to train the new model within MCASE. For the evaluation of discreet organic chemicals, therefore, the BIOWIN MITI and the new MCASE ready biodegradation models were built on the exact same training set. We believe there are no other instances where different predictive models for physical/ chemical properties, environmental fate parameters, or toxicity endpoints were built on training sets that overlap to such a high degree. There are numerous instances where different class-specific models have been developed on the same training set, for example, the genotoxicity of aromatic amines (50), although these models are typically not distributed in a computerized form. This high degree of overlap in the training sets of these four ready biodegradation models provides a unique opportunity to address the model domain issue set forth in the Setubal principles. Given that the BIOWIN MITI and MCASE models were built on the exact same training set (for discreet organics) and that the domain is based on the training set, it follows that these models should have the exact same domain. The results provided in Table 4 indicate that the predictive accuracies determined for these two models vary widely. Additionally, the two BIOWIN MITI models provided predictions for all 370 chemicals (100%) in the PMN data set, while the new MCASE model provided conclusive results for only 56%. It has been argued that inconclusive model results occur when the structure or the functional groups present in the test molecule are not represented in compounds of the training set (domain) of the model. This argument is

clearly not sufficient to explain the above results. This suggests the method a model employs for its predictions is as important as the training set in determining its domain. Similar observations were made for the TOPKAT and CATABOL models, which were developed on essentially identical training sets. CATABOL provided results for 90% of the PMN chemicals, while TOPKAT only provided results for 31%. The primary reason that TOPKAT provided so few results is that 257 chemicals were outside of its OPS, a mathematical surrogate for the domain of the model (TOPKAT User’s Guide). As with the previous results for BIOWIN MITI and the new MCASE model, the differences in the apparent domain of TOPKAT and CATABOL cannot be explained by differences in their training sets. It appears that the domain of a model is linked in some way to the methodology it employs. Of the 257 chemicals that TOPKAT flagged as being outside of its OPS (domain), the BIOWIN 5 model predicted 213 (83%) correctly. Given that the 564 chemicals in the training set used to develop the BIOWIN 5 model is a subset of 894 chemicals used to train TOPKAT, both models were trained on compounds with similar functional groups and structural architectures. This result is even more striking if one considers that the new MCASE model and the BIOWIN 5 model were trained on the exact same set of discreet organic chemicals. Of the 165 chemicals that the new MCASE model could not provide conclusive estimates for, the MITI linear method provided correct predictions for 127 (77%) of these chemicals. It is not clear how the inner workings of a general model can be used to define its domain of applicability. Of the four ready biodegradation models evaluated in this work, the two BIOWIN MITI models provided results for the highest percentage of chemicals, produced the most consistent results for RB and NRB chemicals, and demonstrated the highest overall predictive accuracy for the PMN data set. Yet, the BIOWIN MITI models can be considered the simplest of the four models investigated. The 46 fragments that they use represent only a tiny fraction of those that could be defined for the 564 chemicals in its training set. Different techniques would be required to define the domain of a model built primarily on molecular indices (TOPKAT) as compared to one built on expert elucidation of transformation pathways (CATABOL). This discussion suggests that defining a model’s domain to determine its regulatory acceptability of QSARs (fourth Setubal principle) may be a highly resource-intensive exercise. Rigorously defining a model’s domain may also be a continual process as new chemicals are introduced into the marketplace from a seemingly inexhaustible variety of known organic compounds containing functional groups or structural features not previously considered in the domain analysis.

Acknowledgments We gratefully appreciate the assistance by Joanne White in the preparation of this manuscript.

Literature Cited (1) Workshop on Regulatory Use of (Q)SARs for Human Health and Environmental Endpoints; European Centre for Ecotoxicology and Toxicology of Chemicals; Setubal, Portugal, March, 2002. Available online at: http://www.cefic-lri.org/files/EventDocs/ICCA_LRI_QSARSWS2002_171002_Web.pdf. (2) Jaworska, J. S.; Comber, M.; Auer, C.; Van Leeuwen, C. J. Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ. Health Perspect. 2003, 111, 1358-1360. Available online at: http://www.jrc.cec.eu.int/more_information/download/qsarminimonogr.pdf. (3) Cronin, M. T. D.; Jaworska, J. S.; Walker, J. D.; Comber, M. H. I.; Watts, C. D.; Worth, A. P. Use of QSARs in international decision-making frameworks to predict health effects of chemical substances. Environ. Health Perspect. 2003, 111, 1391-1401. Available online at: http://ehp.niehs.nih.gov/members/2003/ 5760/5760.pdf. VOL. 39, NO. 7, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2197

(4) Cronin, M. T. D.; Walker, J. D.; Jaworska, J. S.; Comber, M. H. I.; Watts, C. D.; Worth, A. P. Use of QSARs in international decision-making frameworks to predict health effects of chemical substances. Environ. Health Perspect. 2003, 111, 1376-1390. Available online at: http://ehp.niehs.nih.gov/members/2003/ 5759/5759.pdf. (5) Anastas, P. T.; Williamson, T. C. Green Chemistry: An Overview. In Green Chemistry; Anastas, P. T., Williamson, T. C., Eds.; Oxford University Press: New York, 1998; pp 1-26. (6) Shonnard, D. R. An Introduction to Environmental Issues. In Green Engineering; Allen, D. T., Shonnard, D. R., Eds.; Prentice Hall: Upper Saddle River, NJ, 2002; pp 3-33. (7) USEPA (Environmental Protection Agency). New Chemicals Program - Sustainable Futures; Washington, DC, 2004. Available online at: http://www.epa.gov/opptintr/newchems/sustainablefutures.htm. (8) Walker, J. D. QSARS promote more efficient use of chemical testing resources-carpe diem. Environ. Toxicol. Chem. 2003, 22, 1651-1652. Available online at: http://entc.allenpress.com/ pdfserv/10.1897%2F03-189. (9) Eriksson, L.; Jaworska, J. S.; Worth, A. P.; Cronin, M. T. D.; McDowell, R. M.; Gramatica, P. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ. Health Perspect. 2003, 111, 1361-1375. Available online at: http:// ehp.niehs.nih.gov/members/2003/5758/5758.pdf. (10) Perkins, R.; Fang, H.; Tong, W.; Welsh, W. J. Quantitative structure-activity relationship methods: Perspectives on drug discovery and toxicology. Environ. Toxicol. Chem. 2003, 22, 1666-1679. (11) USEPA (Environmental Protection Agency). High Production Volume (HPV) Challenge Program; Washington, DC, 2004. Available online at: http://www.epa.gov/chemrtk/volchall.htm. (12) TOPKAT v.6.1; v.3.2 module; Accelrys Inc.: San Diego, CA, 2004. Product information available online at: http://www.accelrys. com/products/topkat/. (13) MCASE v.MC7-12-2002; Multicase Inc.: Beachwood, OH, 2004. Product information available online at: http://www.multicase.com/products/prod09.htm. (14) OASIS, packaged with CATABOL 12/18/2001; Ovanes Mekenyan, Laboratory of Mathematical Chemistry; Bourgas, Bulgaria, 2004. Product information available online at: http://omega.btu.bg/ showsoft.php?item)CATABOL. (15) ECOSAR. Ecological Structure Activity Relationships, v.0.99g; U.S. Environmental Protection Agency; Washington, DC, January, 2004. Product information available online at: http://www. epa.gov/opptintr/newchems/21ecosar.htm. (16) USEPA (Environmental Protection Agency). New Chemicals Program; Washington, DC, 2004. Available online at: http:// www.epa.gov/oppt/newchems/index.htm. (17) Boethling, R. S.; Nabholz, J. V. Environmental Assessment of Polymers under the U.S. Toxic Substances Control Act. In Ecological Assessment of Polymers: Strategies for Product Stewardship and Regulatory Programs; Hamilton, J. D., Sutcliffe, R., Eds.; Van Nostrand Reinhold: New York, 1996; pp 187-234. (18) Nabholz, J. V.; Miller, P.; Zeeman, M. Environmental Risk Assessment of New Chemicals under the Toxic Substances Control Act (TSCA) Section Five. In Environmental Toxicology and Risk Assessment; Landis, W. G., Hughes, J. S., Lewis, M. A., Eds.; ASTM STP 1179, American Society for Testing and Materials: Philadelphia, PA, 1993; pp 571-590. (19) Wagner, P. M.; Nabholz, J. V.; Kent, R. J. The new chemicals process at the Environmental Protection Agency (EPA): Structure-activity relationships for hazard identification and risk assessment. Toxicol. Lett. 1995, 79, 67-73. (20) Walker, J. D.; Carlsen, L.; Jaworska, J. Improving opportunities for regulatory acceptance of QSARs: The importance of model domain, uncertainty, validity and predictability. QSAR Comb. Sci. 2003, 22, 346-350. (21) USEPA (Environmental Protection Agency). High Production Volume (HPV) Challenge Program, Robust Summaries and Test Plans; Washington, DC, 2004. Available online at: http:// www.epa.gov/chemrtk/viewsrch.htm. (22) OECD (Organization for Economic Cooperation and Development). Screening Information Data Set Manual of the OECD Programme on the Cooperative Investigation of High Production Volume Chemicals, 3rd revision; Updated SIDS Manual; OECD: Paris, France, July, 1997. Additional information available online at: http://www.oecd.org/document/21/ 0,2340,en_2649_201185_1939669_1_1_1_1,00.html. 2198

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 7, 2005

(23) USEPA (Environmental Protection Agency). High Production Volume (HPV) Challenge Program, Guidance Documents; Washington, DC, 2004. Available online at: http://www.epa.gov/ chemrtk/guidocs.htm. (24) USEPA (Environmental Protection Agency). Health Effect Test Guidelines: Acute Oral Toxicity; OPPTS 870.1100; Washington, DC, 1998. Available online at: http://www.epa.gov/opptsfrs/ OPPTS_Harmonized/870_Health_Effects_Test_Guidelines/Series/ 870-1100.pdf. (25) Boethling, R. S.; Lynch, D. G.; Jaworska, J. S.; Tunkel, J. L.; Thom, G. C.; Webb, S. Using BIOWIN, Bayes, and Batteries to predict ready biodegradability. Environ. Toxicol. Chem. 2004, 23, 911920. (26) OECD (Organization for Economic Cooperation and Development). OECD Guidelines for the Testing of Chemicals. Guideline 301: Ready Biodegradability; Updated guideline; OECD: Paris, France, 1992. Available online at: http://www.oecd.org/ dataoecd/17/16/1948209.pdf. (27) Bealing, D. Thoughts on Biodegradability Testing; Document ISO/TC 147/SC 5/WG 4, N 311, International Organization for Standardization: Berlin, Germany, 2002. (28) Painter, H. A. Detailed Review Paper on Biodegradability Testing; OECD Series on the Test Guidelines Programme No. 2; Environment Monograph No. 98; Environment Directorate, Organization for Economic Cooperation and Development: Paris, France, 1995. Available online at: http://www.olis.oecd. org/olis/1995doc.nsf/LinkTo/ocde-gd(95)43. (29) Painter, H. A.; King, E. F. A respirometric method for the assessment of ready biodegradability: results of a ring test. Ecotoxicol. Environ. Saf. 1985, 9, 6-16. (30) Painter, H. A.; Reynolds, P.; Comber, S. Application of the headspace CO2 method (ISO 14 593) to the assessment of the ultimate biodegradability of surfactants: results of a calibration exercise. Chemosphere 2003, 50, 29-38. (31) Weininger, D. SMILES 1. Introduction and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31-36. (32) BIOWIN v.4.01; Syracuse Research Corp.: Syracuse, NY, 2004. Available online at: http://www.syrres.com/esc/biowin.htm. (33) CATABOL 9/9/2002; Ovanes Mekenyan, Laboratory of Mathematical Chemistry: Bourgas, Bulgaria, 2004. Product information available online at: http://omega.btu.bg/ showsoft.php?item)CATABOL. (34) Dimitrov, S. D.; Mekenyan, O. G.; Walker, J. D. Nonlinear modeling of bioconcentration using partition coefficients for narcotic chemicals. SAR QSAR Environ. Res. 2002, 13, 177-184. (35) Jaworska, J.; Dimitrov, S.; Nikolova, N.; Mekenyan, O. Probabilistic assessment of biodegradability based on metabolic pathways: CATABOL System. SAR QSAR Environ. Res. 2002, 13, 307-323. (36) Klopman, G. J. The multiCASE program II. Baseline activity identification algorithm (BAIA). J. Chem. Inf. Comput. Sci. 1998, 38, 78-87. (37) Moore, D. R. J.; Breton, R.; MacDonald, D. A comparison of model performance for six quantitative structure-activity relationship packages that predict acute toxicity to fish. Environ. Toxicol. Chem. 2003, 22, 1799-1809. (38) Rorije, E.; Loonen, H.; Muller, M.; Klopman, G.; Peijnenburg, W. J. G. M. Evaluation and application of models for the prediction of ready biodegradability in the MITI-I test. Chemosphere 1999, 38, 1409-1417. Available online at: http://www. terra.es/personal/emiel.rorije/pdf-files/1998-2.pdf. (39) Tunkel, J.; Howard, P.; Boethling, R.; Stiteler, W.; Loonen, H. Predicting ready biodegradability in the MITI test. Environ. Toxicol. Chem. 2000, 19, 2478-2485. (40) Klopman, G.; Tu, M. Structure-biodegradability study and computer-automated prediction of aerobic biodegradation of chemicals. Environ. Toxicol. Chem. 1997, 16, 1829-1835. (41) USEPA (Environmental Protection Agency). Pollution Prevention (P2) Framework; EPA-748-B-03-001; Washington, DC, 2003. Available online at: http://www.epa.gov/p2/p2policy/framework.htm. (42) Jones, R. Risk Assessment Division, Office of Pollution Prevention and Toxics, USEPA; April 22, 2003; personal communication. (43) Clements, R. G.; Nabholz, J. V.; Johnson, D. W.; Zeeman, M. The Use and Application of QSARs in the Office of Toxic Substances for Ecological Hazard Assessment of New Chemicals. In Environmental Toxicology and Risk Assessment; Landis, W. G., Hughes, J. S., Lewis, M. A., Eds.; ASTM STP 1179, American Society for Testing and Materials: Philadelphia, PA, 1993; pp 56-64. (44) OECD (Organization for Economic Cooperation and Development). Guidance Document on the Use of the Harmonized System

for the Classification of Chemicals which are Hazardous for the Aquatic Environment. Guideline; OECD: Paris, France, July, 2001. Available online at: http://www.olis.oecd.org/ olis/2001doc.nsf/43bb6130e5e86e5fc12569fa005d004c/ c1256985004c66e3c1256a9200400c84/$FILE/JT00111073.PDF. (45) Hulzebos, E. M.; Posthumus, R. (Q)SARs: gatekeepers against risk on chemicals? SAR QSAR Environ. Res. 2003, 14, 285-316. (46) OECD (Organization for Economic Cooperation and Development). Harmonized Integrated Classification System for the Human Health and Environmental Hazards of Chemical Substances and Mixtures. Guideline; OECD: Paris, France, 2001. Available online at: http://puck.sourceoecd.org/vl)1799611/ cl)138/nw)1/rpsv/ij/oecdjournals/1607310x/v1n5/s30/p1. (47) Walker, J. D.; Jaworska, J.; Comber, M. H. I.; Schultz, T. W.; Dearden, J. C. Guidelines for developing and using quantitative structure activity relationships. Environ. Toxicol. Chem. 2003, 22, 1653-1665.

(48) Lindgren, F.; Nouwen, J.; Loonen, H.; Worth, A.; Hansen, B.; Karcher, W. Environmental modeling based on a structural fragments approach. Indoor+Built. Environ. 1996, 5, 334-340. (49) Eriksson, L.; Johansson, E.; Wold, S. Quantitative StructureActivity Relationship Model Validation. In Quantitative Structure-Activity Relationships in Environmental Sciences - 7; Chen, F., Schuurmann, G., Eds.; SETAC: Pensacola, FL, 1997; pp 381-397. (50) Cash, G. G. Prediction of the genotoxicity of aromatic and heteroaromatic amines using electrotopological state indices. Mutat. Res. 2001, 491, 31-37.

Received for review May 26, 2004. Revised manuscript received January 4, 2005. Accepted January 6, 2005. ES049220T

VOL. 39, NO. 7, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

2199