Chem. Res. Toxicol. 2004, 17, 545-554
545
Assessment and Modeling of the Toxicity of Organic Chemicals to Chlorella vulgaris: Development of a Novel Database Mark T. D. Cronin,* Tatiana I. Netzeva, John C. Dearden, Robert Edwards,† and Andrew D. P. Worgan‡ School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, England Received December 4, 2003
This study reports a database of toxicity values for 91 compounds assessed in a novel, rapid, and economical 15 min algal toxicity test. The toxicity data were measured using the unicellular green alga Chlorella vulgaris in an assay that determined the disappearance of fluorescein diacetate. The chemicals tested covered a wide range of physicochemical properties and mechanisms of action. Quantitative activity-activity relationships with the toxicity of the chemicals to other species (Tetrahymena pyriformis, Vibrio fischeri, and Pimephales promelas) showed strong relationships, although some differences resulting from different protocols were established. Quantitative structure-activity relationships (QSARs) were determined using linear [multiple linear regression (MLR)] and nonlinear [k-nearest neighbors (KNN)] methods. Three descriptors, accounting for hydrophobicity, electrophilicity, and a function of molecular size corrected for the presence of heteroatoms, were found to be important to model toxicity. The predictivity of MLR was compared to KNN using leave-one-out cross-validation and the simulation of an external test set. MLR demonstrated greater stability in validation. The results of this study showed that method selection in QSAR is task-dependent and it is inappropriate to resort to more complicated but less transparent methods, unless there are clear indications (e.g., inability of MLR to deal with the data set) for the need of such methods.
Introduction To evaluate the risk to the environment of specific chemicals, information regarding toxicity is required for various trophic levels. Within the European Union (EU), toxicity data are normally required as part of a base set for primary producers (i.e., algae), invertebrates (e.g., Daphnia magna), and vertebrates (i.e., fish) (1). It is also envisaged that data from properly evaluated predictive (computational) models may also be used to provide toxicological information for regulatory purposes (2-5). These predictive models normally take the form of quantitative structure-activity relationships (QSARs). In addition to the EU, QSARs are commonly used to provide environmental information in the United States of America, for instance by the U.S. Environmental Protection Agency as part of the premanufactory notification assessment (6). In addition to their requirement for risk assessment, toxicity data are also required as the inputs for QSARs. It has long been recognized that there is a paucity of toxicity data for the development of toxicological QSARs (2, 3) and that the quality of the data plays a crucial role in the quality of the resultant model (7, 8). There have been a number of attempts to describe what is meant by the quality of toxicological data (for use in QSAR modeling) and to determine how it may be defined (7-10). * To whom correspondence should be addressed. Tel: +44(0)151 231 2066. Fax: +44(0)151 231 2170. E-mail:
[email protected]. † Current address: Hawkins & Associates, 18-20 Manchester Road, Wilmslow, Cheshire, SK9 1BG, England. ‡ Current address: Centre of Ecology and Hydrology, CEH Dorset, Winfrith Technology Centre, Winfrith Newburgh, Dorchester, Dorset DT2 8ZD, England.
However, the key requirements for toxicological data for the development of QSARs to be considered as high quality include the fact that they should be from a wellstandardized and validated assay, with a clear and defined endpoint, and that, ideally, they should all be measured by the same protocol, preferably in the same laboratory and with the same workers. Currently, of the three most studied trophic levels (primary producer, invertebrate, and vertebrate), relatively large databases of toxicity values have been developed for the fish [e.g., the fathead minnow (11) and D. magna (12)]. Fewer, if any, consistent data are available for algal endpoints, and hence, fewer QSARs are available for algal toxicity. To help address the shortfall in algal toxicity data, the authors have developed a short-term algal toxicity assay using Chlorella vulgaris (13). The test is based upon the premise that all living organisms, including algae, contain nonspecific esterases, the activity of which can be assessed by the measurement of the disappearance of an ester or the appearance of the product. As a substrate of esterases in algae, fluorescein diacetate (FDA) is used in this assay and the fluorescence of the product (fluorescein) is measured spectrofluorimetrically. The measured toxicity of small sets of chemicals, together with aquatic quantitative activity-activity relationships (QAARs) and mechanism-based QSARs, have been already published in the literature (13-16). There are a large number of methods that may be used to develop QSARs, and there has been considerable debate as to the relative merit of each (17). The debate often centers on whether mechanistic transparency should
10.1021/tx0342518 CCC: $27.50 © 2004 American Chemical Society Published on Web 03/20/2004
546
Chem. Res. Toxicol., Vol. 17, No. 4, 2004
be sacrificed for predictive power. For regulatory purposes, it is often considered that QSARs are required that are transparent and for which an applicability domain can be easily defined (2, 3). In fact, the more mechanistically transparent the QSAR, the more reliably the applicability domain can be defined. The comparison of methods is often complicated by the use of different descriptors and/or number of compounds in the models. Further difficulties arise from the lack of an agreed set of criteria for high quality QSARs. The choice of the methods for comparison is restricted because not all available methods may be appropriate for a particular purpose, and application of standard approaches to different data sets could go against the principle of Occam’s Razor, i.e., to maintain and utilize the simplest model. The objective of this study, therefore, was to describe a novel algal toxicity database that has not previously been published in its entirety and to attempt QAAR and QSAR analyses upon it. The toxicity data were all measured in the same laboratory according to the same protocol, so they meet many of the criteria for high quality data. In addition, they provide a suitable basis upon which to explore the comparative development of linear and nonlinear QSAR methods for predicting toxicity.
Materials and Methods Chemicals Tested. A total of 91 chemicals were selected for toxicological assessment, and these are listed in Table 1. An effort was made to select diverse chemical structures incorporating narcosis, as well as other more specific mechanisms of toxic action. A number of other criteria were applied to select the chemicals: the chemicals were required to span a sufficient range of hydrophobicity to make the QSAR meaningful; there should be comparative toxicological information available for the majority of chemicals tested; and finally, they should be easily and cheaply available. All chemicals were purchased from Aldrich Chemical Co., Poole, Dorset, England, with chemical purity more than 95%. Chemicals were not repurified prior to use. SMILES strings for the 91 compounds are available, upon request, from the corresponding author. Fifteen Minute C. vulgaris Toxicity Assay. Toxicity data [log(1/EC50)] were determined in a biochemical assay utilizing the unicellular alga C. vulgaris. Algae in the logarithmic phase of their growth cycle were used. All toxicological analyses were performed in a buffer solution with a pH of 6.9 and a temperature between 25 and 30 °C. Assays were conducted following the protocol described by Worgan et al. (13) with a 15 min static design. The disappearance of FDA was accounted for by spectrofluorimetric measurement of fluorescein (the product of hydrolysis) (18) at an excitation wavelength of 465 nm and an emission wavelength of 515 nm. Range-finding experiments were performed in order to determine the highest and lowest concentrations required to produce a dose-response relationship ranging from 100% inhibition of enzyme activity to no observed toxicological effect. Blank buffer solution was utilized as a control, and the relative responses to it were used to generate the dose-response curve. The 50% effective concentration was estimated by Probit analysis using the SPSS ver. 10.0 (SPSS Inc., Chicago, IL) software. The average EC50 was taken from a minimum of three analyses. Toxicity Assays to Other Species. Toxicity data to Tetrahymena pyriformis [log(1/IGC50)] were obtained from literature sources (19-21). They were determined in a population growth impairment assay. Assays were conducted following the protocol described by Schultz (22) with a 40 h static design and population density measured spectrophotometrically as the endpoint. Large data sets of industrial organic chemicals tested for acute toxicity to T. pyriformis can be found in the papers of Schultz (19) and Schultz et al. (20, 21).
Cronin et al. Toxicity data to Vibrio fischeri [log(1/EC50)] were retrieved from the literature (23). They were determined in a static assay using the luminescent bacterium V. fischeri. A detailed description of the protocol can be found in references 24 and 25. The endpoint measured is the concentration causing a 50% decrease in bacterial bioluminescence as compared to a control. Toxicity data from 15 min assays were preferred. If a compound had more than one measured toxicity value, a mean value was calculated for the purposes of this study. When 15 min data were not available, but the compound was tested in a 5 or 30 min assay, a toxicity value from a different time endpoint was used. Acute toxicity data to Pimephales promelas [log(1/LC50)] were taken from and measured following the protocol described by Russom et al. (11). The assay had a flow-through design and duration of 96 h. The endpoint was measured as a median lethal concentration. Calculation of Physicochemical Properties and QSAR Analysis. A total of 110 molecular descriptors were calculated. A full list of descriptors is given in Table 2. Hydrophobicity was quantified by the logarithm of the 1-octanol/water partition coefficient (log Kow). The hydrophobicity values were measured or estimated (the measured value was used when available) by the ClogP for Windows version 1.0.0 software (BIOBYTE Corp., Claremont, CA). Nine quantum chemical descriptors were obtained from MOPAC93 [Stewart, J. J. P. (1993) Windows 95/ 98/NT/2k Adaptation and MO Indices, Fujitsu Ltd.; Kaneti, J. (1988-1994) MO-QC] using the AM1 Hamiltonian. Initially, SMILES strings were converted into three-dimensional (3D) structures by the CORINA conformation analysis software, as implemented in the TSAR version 3.3 molecular spreadsheet (Accelrys Ltd, Oxford, England). The 3D structures underwent energy minimization utilizing full geometry optimization with the AM1 Hamiltonian in the VAMP module of TSAR. The optimized structures were exported to MOPAC93 for the calculation of quantum chemical indices. The remaining 100 descriptors were calculated using TSAR and QSARis version 1.1 (SciVision-Academic Press, San Diego, CA) software. Descriptors containing more than 85 zero values (∼92%) were excluded from the analysis. QSAR Development. QSARs were developed using multiple linear regression (MLR) as implemented in MINITAB version 13.1 (Minitab Inc., State College, PA) and k-nearest neighbor (KNN) analysis, the algorithm for which was written in-house using the Excel 2000 (Microsoft Corp.) spreadsheet. Initially, an empirical approach for the selection of variables was applied. This was to select descriptors known to be successful in the modeling of acute aquatic toxicity endpoints, i.e., log Kow to account for hydrophobicity and lowest unoccupied molecular orbital (LUMO) to account for reactivity. To improve the quality of the two-descriptor model, a third descriptor was selected using both forward and backward stepwise regression analysis in MINITAB version 13.1 (R to enter and remove equal to 0.15). Further development, using KNN, employed the best three descriptors selected for MLR analysis. The descriptors were standardized in the range of 0 to 1. The nearest neighbors of a compound were defined in terms of standard Euclidean geometry (distances between objects in n-dimensional space). The distance between two objects [d(a,b)] is given by the following equation:
x∑ n
d(a,b) )
(xa,i - xb,i)2
(1)
i)1
where xa and xb are the values of descriptor i for compounds a and b and n is the number of descriptors (i.e., the dimensionality of the space). The nearest neighbor was identified as the compound with the shortest Euclidean distance to the chemical of interest. Up to 20 nearest neighbors were considered in turn (k ) 1-10, 12, 15, and 20). The predicted toxicity was calculated as a mean value of the toxicities of the KNNs to the considered compound.
Toxicity of Organic Chemicals to Chlorella vulgaris
Chem. Res. Toxicol., Vol. 17, No. 4, 2004 547
Table 1. Chemical Abstracts Service (CAS) Number, Chemical Name, Acute Algal and Other Toxicities, and Calculated Descriptors (Compounds Listed in Increasing Order of Toxicity to C. vulgaris)
CAS 67-56-1 64-17-5 75-65-0 78-92-2 868-77-9 818-61-1 96-33-3 71-36-3 78-93-3 80-62-6 96-22-0 4170-30-3 6728-26-3 1576-87-0 108-95-2 96-05-9 62-53-3 110-43-0 100-66-3 367-12-4 348-54-9 108-39-4 150-76-5 95-55-6 90-05-1 87-62-7 100-52-7 95-48-7 90-02-8 98-95-3 950-37-8 106-44-5 95-65-8 104-87-0 94-71-3 24964-64-5 99-08-1 106-48-9 97-02-9 106-41-2 106-40-1 108-42-9 2495-37-6 618-87-1 89-98-5 540-38-5 4748-78-1 58-27-5 88-69-7 626-43-7 603-71-4 608-31-1 88-18-6 95-50-1 99-65-0 51-28-5 100-25-4 99-61-6 732-11-6 298-00-0 121-75-5 99-30-9 86-50-0 121-14-2 2636-26-2 3531-19-9 99-28-5
namea methanol ethanol 2-methyl-propan-2-olb butan-2-ol 2-hydroxyethyl methacrylate 2-hydroxyethyl acrylate methyl acrylate butan-1-olb butanone methyl methacrylate pentan-3-one crotonaldehyde trans-2-hexenalb trans-2-pentenal phenol allyl methacrylate aniline 2-heptanoneb anisole 2-fluorophenol 2-fluoroaniline 3-cresol 4-methoxyphenolb 2-hydroxyaniline 2-methoxyphenol 2,6-dimethylaniline benzaldehyde 2-cresolb 2-hydroxybenzaldehyde nitrobenzene methidathion 4-cresol 3,4-dimethylphenolb 4-tolualdehyde 2-ethoxyphenol 3-cyanobenzaldehyde 3-nitrotoluene 4-chlorophenolb 2,4-dinitroaniline 4-bromophenol 4-bromoaniline 3-chloroaniline benzyl methacrylateb 3,5-dinitroaniline 2-chlorobenzaldehyde 4-iodophenol 4-ethylbenzaldehyde 2-methyl-1,4-naphthoquinoneb 2-isopropylphenol 3,5-dichloroaniline 1,3,5-trimethyl-2-nitrobenzene 2,6-dichloroaniline 2-tert-butyl phenolb 1,2-dichlorobenzene 1,3-dinitrobenzene 2,4-dinitrophenol 1,4-dinitrobenzene 3-nitrobenzaldehydeb phosmet methylparathion malathion 2,6-dichloro-4-nitroaniline methyl azinphosb 2,4-dinitrotoluene cyanophos 6-chloro-2,4-dinitroaniline 2,6-dibromo-4-nitrophenol
C. vulgaris
T. pyriformis
V. fischeri
P. promelas
log(1/EC50) (mM)
log(1/IGC50) (mM)
log(1/EC50) (mM)
log(1/LC50) (mM)
-4.06 -3.32 -3.16 -2.98 -2.82 -2.79 -2.75 -2.73 -2.51 -2.24 -2.23 -1.98 -1.94 -1.88 -1.46 -1.42 -1.34 -1.18 -1.09 -1.08 -1.05 -1.01 -0.97 -0.91 -0.88 -0.87 -0.81 -0.81 -0.80 -0.78 -0.73 -0.66 -0.65 -0.65 -0.62 -0.57 -0.50 -0.42 -0.36 -0.35 -0.33 -0.31 -0.21 0.03 0.06 0.16 0.16 0.16 0.17 0.24 0.25 0.26 0.29 0.37 0.38 0.40 0.41 0.45 0.47 0.60 0.64 0.64 0.69 0.70 0.79 0.80 0.81
QSAR Validation. Validation of the QSARs was required to test the predictivity of the methods as well as to enable comparison between them. Initially, both the MLR and the KNN models underwent a leave-one-out (LOO) procedure. Every compound was omitted once, and its toxicity was predicted using
-2.67 -1.99 -1.79 -1.54 -1.08 0.69 0.55 -1.43 -1.75 -1.22 -1.46 0.76 0.66 -0.21 -0.68 -0.23 -0.49 -0.22 0.28 -0.37 -0.06 -0.14 0.94 1.34 -0.43 -0.20 -0.27 0.42 0.14 -0.19 0.12 -0.06 -0.34 -0.02 0.05 0.55 0.53 0.68 1.01 0.22 0.65 0.80 0.49 0.85 0.29 1.54 0.80 0.71 0.86 0.33 1.30 0.53 0.76 1.08 1.30 0.11
-3.21 -2.70
-2.96 -2.49 -1.94 -1.69 -0.24 1.38
-1.64 -1.76
-1.37 -1.65 -0.41 -1.25
-0.99
0.50 -0.30
0.51 2.11 -0.16 -0.06
0.76 1.16 1.46 -0.28
0.29 0.05 1.65
1.32 0.74 0.93 0.84
1.14 0.89 1.73 0.01
1.67 2.40 1.38
0.82 0.94
1.54 1.15 0.56 2.63
0.73 1.32 1.07 0.56
0.98 1.58 0.91 2.94 2.44 2.81
3.19
1.18 1.98 1.56 0.69 1.28 3.07
1.19
1.05 1.80
1.37
0.87
0.55
1.12 1.36
1.04
1.14 2.37
3.70 0.87
log Kow -0.77m -0.31m 0.35m 0.61m 0.47m -0.21m 0.80m 0.88m 0.29m 1.38m 0.99m 0.52 c 1.58c 1.05c 1.47m 1.68c 0.90m 1.98m 2.11m 1.71m 1.26m 1.96m 1.34m 0.62m 1.32m 1.84m 1.47m 1.95m 1.81m 1.85m 2.42m 1.94m 2.23m 1.99c 1.85m 1.18m 2.42m 2.39m 1.72c 2.59m 2.26m 1.88m 2.53m 1.89m 2.33m 2.91m 2.52c 2.20m 2.88m 2.90m 3.22c 2.82m 3.27 m 3.43m 1.49m 1.67m 1.47m 1.47m 2.78m 2.86m 2.36m 2.80m 2.75m 1.98m 2.75m 2.46c 3.57m
LUMO (eV)
∆1χv
3.778 3.565 3.438 3.554 -0.074 -0.102 0.001 3.425 0.882 0.055 0.910 -0.141 -0.115 -0.115 0.398 0.045 0.639 0.879 0.483 0.013 0.266 0.396 0.313 0.474 0.392 0.595 -0.435 0.396 -0.434 -1.068 -2.550 0.429 0.436 -0.430 0.422 -0.917 -1.017 0.095 -1.475 0.020 0.218 0.263 0.079 -1.780 -0.683 0.024 -0.423 -1.493 0.408 -0.042 -0.857 -0.006 0.407 -0.142 -1.911 -1.807 -2.208 -1.404 -2.349 -2.068 -2.658 -1.096 -2.494 -1.841 -1.832 -1.667 -1.452
0.051 -0.130 -0.728 -0.500 -2.275 -2.044 -1.509 -0.428 -0.686 -1.736 -0.717 -0.972 -1.094 -1.024 -1.477 -2.240 -1.260 -0.902 -1.644 -2.166 -1.998 -1.622 -2.200 -1.806 -2.194 -1.476 -1.732 -1.616 -2.282 -2.293 -1.436 -1.622 -1.750 -1.866 -2.184 -2.452 -2.481 -1.603 -3.840 -1.578 -1.289 -1.350 -3.044 -3.846 -1.838 -1.590 -1.842 -3.045 -1.754 -1.428 -2.808 -1.416 -1.747 -1.011 -3.532 -3.976 -3.532 -3.102 -2.322 -3.077 -2.457 -2.970 -2.186 -3.760 -2.349 -4.080 -3.315
all remaining chemicals (in MLR) or the KNNs, selected from all remaining chemicals. The relationship between predicted and observed values for toxicity was assessed. This procedure enabled a comparison to be made of the overall predictivity of the methods.
548
Chem. Res. Toxicol., Vol. 17, No. 4, 2004
Cronin et al. Table 1. Continued
C. vulgaris
T. pyriformis
V. fischeri
P. promelas
log(1/IGC50) (mM)
log(1/EC50) (mM)
log(1/LC50) (mM)
1.13
1.36
CAS
namea
log(1/EC50) (mM)
640-15-3 89-61-2 94-62-2 939-97-9 634-93-5 83-42-1 5388-62-5 528-29-0 100-00-5 2463-84-5 128-37-0 3481-20-7 609-89-2 83-38-5 55-38-9 96-76-4 87-86-5 122-14-5 89-69-0 6284-83-9 1689-82-3
thiometonb 2,5-dichloronitrobenzene piperine 4-tert-butylbenzaldehyde 2,4,6-trichloroaniline 2-chloro-6-nitrotolueneb 4-chloro-2,6-dinitroaniline 1,2-dinitrobenzene 1-chloro-4-nitrobenzene dicapthon 2,6-di-tert-butyl-4-methyl phenolb 2,3,5,6-tetrachloroaniline 2,4-dichloro-6-nitrophenol 2,6-dichlorobenzaldehyde fenthion 2,4-di-tert-butylphenolb pentachlorophenol fenitrothion 1,2,4-trichloro-5-nitrobenzene 1,3,5-trichloro-2,4-dinitrobenzene phenylazophenolb 4-(dibutylamino)benzaldehyde 2,3,5,6-tetrachloronitrobenzene pentabromophenol
0.94 0.97 0.97 1.00 1.11 1.17 1.19 1.23 1.25 1.36 1.45 1.48 1.50 1.50 1.56 1.60 1.69 1.71 1.88 1.89 2.16 2.18 2.34 3.10
117-18-0 608-71-9 a
1.56 1.01 0.68 1.25 0.43
1.65 2.29 1.73 1.32 0.77
1.80 1.76 1.75
1.41 2.15
2.78 2.93
2.05
2.45
3.08
1.53 2.19 1.66
1.51 2.41
1.82 2.66
1.57 2.74
The SMILES strings are available from the corresponding author. b Used in the test set.
3.09 2.23 3.72 m
log Kow
LUMO (eV)
∆1χv
3.15m 3.03m 2.70c 3.32c 3.69m 3.09m 2.46c 1.69m 2.39m 3.58m 5.89c 4.47m 3.07c 3.08c 4.09m 4.36m 5.12m 3.30m 3.47m 2.97c 3.96c 5.06c 4.38m 4.85c
-2.632 -1.296 -0.767 -0.391 -0.240 -1.219 -1.895 -1.840 -1.344 -2.124 0.464 -0.560 -1.431 -0.473 -1.628 0.431 -0.978 -2.027 -1.536 -2.037 -0.768 -0.097 -1.419 -1.193
1.134 -2.633 -3.985 -2.210 -1.485 -2.636 -4.080 -3.526 -2.476 -3.256 -2.040 -1.535 -3.174 -1.931 -1.486 -1.936 -1.931 -3.256 -2.774 -4.179 -3.417 -2.392 -2.895 -1.459
Measured. c Calculated.
Table 2. Descriptors Calculated for the Compounds Listed in Table 1 software
descriptors calculated
Clog P MOPAC
logarithm of the octanol-water partition coefficient (log Kow) energy of the highest occupied molecular orbital (HOMO), energy of the LUMO, electronegativity (EN), absolute hardness (AH), maximum acceptor (Amax) and donor (Dmax) superdelocalizability, maximum positive partial charge (Qmax), maximum positive partial charge on a hydrogen atom (QHmax), and maximum negative partial charge (Qmin) dipole moment (Dip), polarizability (Pol), molecular volume (MV), molecular surface area (MSA), inertia moments (IM1, IM2, and IM3 size and IM1, IM2, and IM3 length), Wiener and Balaban topological indices, ellipsoidal volume (MVelipp), molecular refraction (MR), and number of hydrogen bond donors (NHdon) and acceptors (NHacc) Kier simple, valence, and ∆ molecular connectivity indices (0χ, 1χ, 2χ, 3χp, 4χp, 5χp, 6χp, 7χp, 8χp, 9χp, 10χp,3χc, 4χc, 4χpc, 6χch, 0χv, 1χv, 2χv, 3χpv, 4χpv, 5χpv, 6χpv, 7χpv, 8χpv, 9χpv, 10χpv, 3χcv, 4χcv, 4χpcv, 6χchv, ∆0χ,∆1χ, ∆2χ, ∆3χp, ∆4χp, ∆5χp, ∆6χp, ∆7χp, ∆8χp, ∆9χp, ∆10χp, ∆0χv, ∆1χv, ∆2χv, ∆3χpv, ∆4χpv, ∆5χpv, ∆6χpv, ∆7χpv, ∆8χpv, ∆9χpv, ∆10χpv), E state indices (SsCH3, SdCH2, SssCH2, SdsCH, SsssCH, SdssC, SaasC, SsNH2,StN, SddsN, SsOH, SdO, SssO, SdsssP, SdS, SssS, SsCl, SHHBd, and SHBa), κ and κR shape indices (0κ, 1κR, 2κ, 3κ,1κR, 2κR, and 3κR), the sum of absolute values of the charges on each atom (ABSQ), the sum of absolute values of the charges on the nitrogen and oxygen atoms in a molecule (ABSQon), ovality, and dipolar descriptors (Qs, Qv, and Qsv)
TSAR
QSARis
A second validation procedure simulating external validation was introduced to assess the stability of the predictions. The whole data set was split into training and test sets in a ratio of 80%:20%. For this purpose, all compounds were sorted in order of increasing toxicity and every fifth compound, starting with 2-methyl-propan-2-ol, was selected for the test set. Both MLR and KNN methods used the same training set for model development/calibration and test set for external prediction. The predicted toxicity values were compared with the observed toxicity by regression with, and without, the intercept.
Results Toxicity data to C. vulgaris for a total of 91 industrial organic chemicals, including aliphatic and aromatic compounds as well as a small subset of pesticides, are presented in Table 1. Compounds exhibited a wide range of algal toxicity from -4.06 to 3.10 [on a log(mM) scale]. Hydrophobicity, modeled by log Kow, ranged from -0.77 to 5.89 and the reactivity, modeled by LUMO, ranged from -2.658 to 3.778 eV. To assess the reliability of the novel algal toxicity test, the measured toxicity values were correlated with the acute toxicity to other aquatic species such as the bacter-
ium V. fischeri, ciliate T. pyriformis, and fish P. promelas to yield QAARs. Although there are some gaps in the toxicity values, a relatively large number of data for comparison were collected. A plot of the correlations between toxicity to the four species is presented in Figure 1. Figure 1 shows that the acute algal toxicity measured in the 15 min algal assay correlates reasonably well with that to T. pyriformis and P. promelas (n ) 73, R 2 ) 0.729). However, some outliers were detected to these relationships. It was found that methyl acrylate, 2-hydroxyethyl acrylate, trans-2-pentenal, and trans-2-hexenal are more toxic to T. pyriformis [(1/IGC50)T.p.] as compared to C. vulgaris [(1/EC50)C.v.]. Their exclusion resulted in a much more robust model:
log(1/IGC50)T.p. ) 0.696 (0.034) log(1/EC50)C.v. + 0.551 (0.049) (2) n ) 69, R2 ) 0.861, RCV2 ) 0.854, s ) 0.402, and F ) 417 where n is the number of compounds; R2 is the coefficient
Toxicity of Organic Chemicals to Chlorella vulgaris
Chem. Res. Toxicol., Vol. 17, No. 4, 2004 549
Figure 1. Interspecies correlations between toxicity to C. vulgaris, T. pyriformis, P. promelas, and V. fischeri. The outliers are indicated by empty circles.
of determination; RCV2 is the cross-validated (LOO procedure) coefficient of determination; s is the standard error of the estimate; and F is Fisher’s criterion. The numbers in parentheses are the standard errors on the coefficients. Two outliers, both more toxic to P. promelas than to C. vulgaris, were found in the correlation (n ) 42, R 2 ) 0.745) between fish and algal toxicities. These were allyl methacrylate and 2-hydroxyethyl acrylate. They were excluded, and the following equation was obtained
log (1/EC50)P.p. ) 0.934 (0.067) log (1/EC50)C.v. + 1.35 (0.116) (3) n ) 40, R2 ) 0.835, RCV2 ) 0.818, s ) 0.688, and F ) 193 Statistical outliers were not detected in the correlation between V. fischeri and C. vulgaris toxicities. However, the correlation coefficient was relatively low:
Figure 2. Plot of algal toxicity [log(1/EC50)] against hydrophobicity (log Kow) for all compounds considered in the study. Also shown is the QSAR [log(1/EC50) ) 1.04 (0.07) log Kow - 3.28 (0.11); n ) 10, r2 ) 0.96] for nonpolar narcosis (baseline toxicity) developed by Worgan et al. (13).
log (1/EC50)V.f. ) 0.709 (0.087) log (1/EC50)C.v. + 1.15 (0.123) (4)
n ) 91, R2 ) 0.761, RCV2 ) 0.747, s ) 0.728, and F ) 290
n ) 50, R2 ) 0.577, RCV2 ) 0.530, s ) 0.875, and F ) 65.6
There also appears to be a trend that chemicals with relatively low and high log Kow values are below the baseline. Therefore, some nonlinearity in the relationship between toxicity and hydrophobicity was suggested. However, the inclusion of a quadratic term in eq 5 did not significantly improve the statistics of the model (n ) 91, R2 ) 0.794, RCV2 ) 0.778, s ) 0.673, and F ) 169). While eq 5 and its parabolic counterpart describe a clear trend between toxicity and hydrophobicity for this heterogeneous group of compounds, they lack the statistical power to be predictive for practical purposes. Therefore, to increase statistical fit, and hence predictive power, additional descriptors were included in an attempt to model the data more successfully. The response-surface, or two parameter, QSAR analysis provides a mechanistic method to analyze toxicity data and develop models. It is based on the premise that much of toxicity is driven by the ability of a xenobiotic
Furthermore, correlations between algal toxicity and calculated physicochemical descriptors were attempted. The relationship between toxicity and log Kow is illustrated in Figure 2. As expected, there is a strong trend between toxicity and hydrophobicity, although log Kow alone fails to describe the toxicity of all chemicals. Figure 2 also illustrates the previously developed QSAR for nonpolar narcosis (13), which appears as the characteristic baseline. The analysis of Figure 2 revealed that many of the compounds with toxicity above that predicted by the baseline effect are bioreactive in nature (26). The relationship between toxicity and log Kow is described by the following equation:
log (1/EC50) ) 1.03 (0.060) log Kow - 2.51 (0.158) (5)
550
Chem. Res. Toxicol., Vol. 17, No. 4, 2004
Figure 3. Plot of the observed toxicity to the alga C. vulgaris against that predicted by eq 6.
Cronin et al.
Figure 4. Plot of the observed toxicity to the alga C. vulgaris against that predicted by eq 7.
to reach the active site (distribution, normally modeled by hydrophobicity), and its intrinsic reactivity (that may lead to covalent interactions with biological macromolecules, normally modeled by electrophilicity) (26, 27). Accordingly, the relationship between toxicity with hydrophobicity (expressed by log Kow) and electrophilicity (expressed by LUMO) for all chemicals was
log (1/EC50) ) 0.829 (0.052) log Kow - 0.405 (0.047) LUMO - 2.21 (0.124) (6) n ) 91, R2 ) 0.868, RCV2 ) 0.857, s ) 0.538, and F ) 290 The plot of observed algal toxicity against that predicted by eq 6 is shown in Figure 3. Inclusion of a quadratic term [(log Kow)2] did not improve the correlation, since it had low statistical significance (p ) 0.427). The presence of a number of compounds with relatively large residuals, but the absence of statistical outliers with standardized residuals greater than three, stimulated the search of a third descriptor to complement log Kow and LUMO. The use of the first-order ∆ valence connectivity index (∆1χv) improved the statistical fit and predictivity of the two descriptor model and contributed to minimize the error:
log (1/EC50) ) 0.838 (0.047) log Kow - 0.268 (0.054) LUMO 0.278 (0.066) ∆1χv - 2.76 (0.174) (7) n ) 91, R2 ) 0.890, RCV2 ) 0.875, s ) 0.494, and F ) 235 The intercorrelations between the descriptors are low (log Kow and LUMO r ) -0.456, log Kow and ∆1χv r ) -0.261, LUMO and ∆1χv r ) 0.639). The plot of observed algal toxicity against that predicted by eq 7 is shown in Figure 4. To test further the hypothesis for the presence of nonlinearity in the relationship between the toxicity and the calculated descriptors and to compare the predictive performance of the linear and nonlinear methods for QSAR, KNN analysis was performed. The same three descriptors as in eq 7 were utilized. Initially, each compound was omitted once and its toxicity was predicted using the mean toxicity of the KNNs, which were selected
Figure 5. Plot of the observed algal toxicity against that predicted in the LOO procedure using the KNN method. All 91 compounds were used and k ) 7 was specified for prediction. The outliers are indicated by empty circles.
from the remaining 90 compounds. Because a model is not available in the KNN methodology, the LOO procedure gives an indication of the predictivity of the method. Consequently, the R2 between the observed and the predicted toxicity from LOO can be used to compare MLR and KNN. As described below, a range of values of k were investigated. The best correlation, with R2 ) 0.855 and s ) 0.560, between observed and predicted toxicity from LOO for all 91 compounds was obtained at k ) 7. It is illustrated in Figure 5. Two compounds were systematically poorly predicted: methidathion and pentabromophenol with negative and positive residuals, respectively. These are indicated by empty circles in Figure 5. For the purpose of comparison, the whole data set was split into training and test sets. The compounds in the subsets were the same for both the MLR and the KNN studies. The chemicals from the test set were used only for validation and not for model development. Thus, a scenario of external validation was simulated. The redevelopment of eq 7 on the reduced training set did not affect the statistical performance of the model significantly. The only exception was a slight decrease in Fisher’s criterion due to the lower number of compounds as compared to eq 7:
Toxicity of Organic Chemicals to Chlorella vulgaris
Chem. Res. Toxicol., Vol. 17, No. 4, 2004 551
log (1/EC50) ) 0.900 (0.047) log Kow - 0.156 (0.079) LUMO 0.369 (0.092) ∆1χv - 3.03 (0.231) (8) n ) 73, R2 ) 0.892, RCV2 ) 0.878, s ) 0.496, and F ) 189 Equation 8 was used to predict the toxicity of the compounds from the external test set. The following equation between the observed and the predicted toxicity was obtained
log (1/EC50) observed ) 0.961 log (1/EC50) predicted - 0.030 (9) n ) 18, R2 ) 0.901, R2 (without intercept) ) 0.860, and s ) 0.476 where R2 (without intercept) is the coefficient of determination in a regression through the origin. The plot of observed algal toxicity against that predicted by eq 8 for the external test set is shown in Figure 6. The KNN methodology requires preliminary investigation to determine the number of nearest neighbors that is necessary for the most accurate prediction of toxicity. This was determined by the calculation of the individual toxicities by LOO for the compounds in the training set for k ) 1-10, 12, 15, and 20. It was assumed that the optimal number of neighbors would be the same for both the training and the test set. The results showed that k in the range of 2-10 yielded R2 > 0.8. An optimum, however, was detected at k ) 5-7 (R2 ) 0.822, 0.822, and 0.818, respectively). Methanol (with a negative residual), in addition to the previously mentioned methidathion and pentabromophenol, were outside the 95% prediction interval. The correlation between the mean toxicities, predicted by the LOO procedure, for k ) 5-7 of the compounds in the training set and the observed toxicity resulted in R2 ) 0.824 and s ) 0.623. Despite the relatively disappointing predictivity of the KNN method in the LOO procedure for the training set, an excellent fit between observed and predicted (mean k ) 5-7 neighbors) toxicity was obtained for the compounds in the external test set:
Figure 6. Plot of the observed algal toxicity against that predicted by eq 8 for the external test set.
Figure 7. Plot of the observed algal toxicity against that predicted using the KNN method for the external test set. The training set of 73 compounds was used. A mean toxicity, predicted from k ) 5-7, was plotted vs observed toxicity.
log (1/EC50) observed ) 1.12 log (1/EC50) predicted - 0.057 (10) n ) 18, R2 ) 0.941, R2 (without intercept) ) 0.937, and s ) 0.377 It should be noted that the predictions for the test set were performed using the training set of only 73 compounds. The averaging of the predicted toxicities for k ) 5-7 displayed slightly better correlations as compared to the individual predictions (R2 ) 0.926, 0.937, and 0.938 for k ) 5, 6, and 7, respectively). The exclusion of predicted toxicities using k ) 5 did not improve the overall R2. The relationship described by eq 10 is illustrated in Figure 7. Figure 8 summarizes the changes in R2 and s for the different number of neighbors from LOO calculations of the training set and the external test set.
Discussion Reliable toxicity data are required for all trophic levels of the environment to ensure appropriate risk assessment
Figure 8. Coefficient of determination (R2) and standard error (s) in the correlation between observed and predicted toxicity using the KNN method for the training and external test set. R2 and s were accounted at k ) 1-10, 12, 15, and 20.
of chemicals. It is recognized that there are currently insufficient publicly available toxicity data to meet the needs of regulation and QSAR model building (7, 10). This study reports for the first time a comprehensive database of toxicity values to the alga C. vulgaris. The toxicity data set meets many of the criteria for high
552
Chem. Res. Toxicol., Vol. 17, No. 4, 2004
quality data, i.e., it has been produced to a standard protocol, in a single laboratory by a single worker. The toxicological data have previously been evaluated (and undergone a process of prevalidation) by the development of QSARs for nonpolar and polar narcosis (13) and investigation of QAARs with other species such as V. fischeri and T. pyriformis (15, 16). The data set is comprehensive in terms of covering a number of mechanisms of action and as such is broadly comparable to the larger fathead minnow data set (11). The algal toxicity database in this study contains a number of nonpolar narcotics such as aliphatic alcohols and ketones, as well as a number of polar narcotics, such as aniline, phenol, and other benzene derivatives such as cresols, monohalogenated and mononitro-substituted benzenes, anilines, and phenols. The reactive nature of the chemicals in the algal data set was extended further by the inclusion of a number of more electrophilic chemicals such as acrylates, methacrylates, and R,βunsaturated aldehydes, as well as compounds that could act as oxidative uncouplers including benzaldehydes, dinitroanilines, and mononitropolyhalogenated anilines and phenols. The data set also contains toxicity values for 10 pesticides. The diversity in chemical structure ensured a large variability in the physicochemical properties, reactivity indices, and toxicity, which is considered a prerequisite for the development of meaningful QAARs and QSARs. The interspecies QAARs are useful for several reasons. They can be exploited to predict unknown toxicity and to verify the validity of other toxicity tests. In addition, analysis of the slopes and intercepts of the QAARs can be used to indicate the sensitivity of the biological species and protocols to the tested chemicals. In this study, the toxicity values to C. vulgaris correlate well with those to T. pyriformis and P. promelas and can be used for predictive purposes. The outliers to both eqs 2 and 3 are electrophilic in nature, and more specifically, they are expected to act as Michael type acceptors (28). This observation shows that either the alga is less sensitive to electrophiles or the protocol utilized to measure algal toxicity does not offer the necessary conditions for the more reactive chemicals to demonstrate their toxicity, possibly as a result of the short duration of the test (12). Another possible reason for the behavior of the R,β-unsaturated compounds could be the greater error in the toxicity measurement for the reactive compounds as compared to the narcotics (29). However, the outliers observed show that caution is needed when using QAARs for predictive purposes and that their domain of application should be restricted to exclude potential Michael type acceptors. The correlation between C. vulgaris and V. fischeri toxicities has a relatively low coefficient of determination, which can be explained either by the quality of the bacterial toxicity data themselves (7, 30) or with the way they were collected and presented in this study. It is interesting to note that the V. fischeri toxicity correlates only poorly with the toxicity to T. pyriformis. The relatively good correlation with P. promelas toxicity is probably due to the smaller number of compounds (n ) 30) as compared to the other QAARs (n ) 50 with C. vulgaris and n ) 47 with T. pyriformis). The slopes in eqs 2-4 are less than unity, which indicates relatively lower toxicity for the more toxic compounds in the algal test as compared to the other
Cronin et al.
tests. One possible explanation for the lower susceptibility of algal cells to the more toxic compounds could be the short duration of the algal test. The deviations of the intercept from zero could be associated with the differences in the test protocol (e.g., duration, medium, design, etc.) (12). The positive intercepts in eqs 2-4 show that the Chlorella test is the least sensitive. Comparison of the intercepts resulted in the following order of the tests in terms of sensitivity: P. promelas > V. fischeri > T. pyriformis > C. vulgaris. This result is in agreement with the observation made earlier by Cronin and Schultz (31) and Netzeva et al. (15, 16). The QSARs have been developed for this novel algal toxicity data set not only to provide a useful means of predicting toxicity but also to compare the predictivity of two widely used methods to develop QSARs. The choice of method to develop a QSAR is not always easy to make due to the large variety available and the restrictions imposed by the job in hand. To select an appropriate method, the QSAR modeler most often juggles the transparency and robustness of the methods. Requirements for interpretability and portability also play an important role in the selection process (5). Regression analysis is usually a first choice since it offers greatest transparency, portability, and interpretability (in terms of a mathematical model), although its robustness has been questioned in more complex studies (8). Very often, regression analysis is the first method attempted, and if it performs satisfactorily, no further methods are utilized. It is outside the scope of this study to review all available methods for QSAR, but it should be noted that other regression-based approaches such as principal component regression and partial least squares analysis are more powerful tools when an acceptable result cannot be obtained by regression analysis. However, their transparency and interpretability are considered lower as compared to regression analysis (32). The robustness of the methods generally increases going from linear to nonlinear methods (8). In this study, a KNN algorithm was used to compare with MLR since it possesses most of the advantages of the nonlinear methods, plus the simplicity of its philosophy (the active analogue principle). KNN has been used on a number of occasions in QSAR studies for predictive purposes (33-36) although its performance has not been compared critically to MLR previously. In the QSAR analysis, the first two descriptors (log Kow and LUMO) were empirically selected as a result of extensive experience in modeling of acute aquatic toxicity (26, 27). A strong trend of increasing toxicity with increasing hydrophobicity underpins this data set. Indeed, log Kow alone accounts for approximately 76% of the variance in the toxicity data (eq 5). As such, it seems logical that log Kow should appear as an integral part of any models of aquatic toxicity data with the exception, probably, of some specific reactivity domains. The inclusion of LUMO accounts for an additional 11% of the variability in toxicity (eq 6). It is interesting to note that the third descriptor (∆1χv) demonstrates the same importance (measured as coefficient/error ratio) as the reactivity term LUMO (eq 7). It is a common opinion that the interpretation of connectivity indices is difficult (37). ∆1χv is the difference between 1χv for the compound in question and the straight chain alkane with the same molecular formula. The lower order connectivity indices are generally considered to
Toxicity of Organic Chemicals to Chlorella vulgaris
account for molecular size (38). Furthermore, the valence connectivity indices encode electronic information in addition to that described by the simple connectivity indices (39). ∆1χv correlates modestly with descriptors such as the electrotopological indices for double-bonded oxygen atom (SdO) and hydrogen bond acceptors (SHBa), which is a reflection of the valence nature of the descriptor. The analysis of the residuals from eqs 6 and 7 illustrated that ∆1χv contributed to the minimization of the error associated with compounds that contain the fragment ()O). These included several pesticides (e.g., methidation, methyl azinophos, phosmet, and malation), dinitrobenzenes, which could be either soft electrophiles (e.g., 1,3,5-trichloro-2,4-dinitrobenzene and 2,4-dinitrotoluene) or uncouplers of oxidative phosphorylation (e.g., 4-chloro-2,4-dinitroaniline and 2,4-dinitrophenol), and R,β-unsaturated aldehydes (e.g., trans-2-pentenal and trans-2-hexenal). The chemical diversity of these chemicals, and their different mechanisms of toxic action, led to the conclusion that ∆1χv does not encode any particular property or process. Rather, it appeared as a suitable statistical supplement to the other two descriptors (log Kow and LUMO) in eq 6. Nevertheless, the three descriptor model (eq 7) had a satisfying fit and predictivity and it was accepted as a final model to be compared with the results of the KNN analysis. To maximize objectivity in the comparison between the methods, both MLR and KNN used the same descriptors and were attempted on the same data set. Because KNN does not offer a model, the performance of the methods was compared by LOO and external validation. Both methodologies provided similar results from LOO crossvalidation. The nonlinear KNN method had lower errors for the predictions than MLR. However, a difference of 2% in the R2 between the methods is probably not sufficiently significant to conclude that one method provides a superior model to another. Two outliers were apparent from the KNN modeling, which were not observed in MLR. This could be as a result of the lack of neighboring molecules or that the nearest neighbors, as a group, had toxicity different from these two outliers. The first hypothesis is easily rejected since there were many compounds with well-predicted toxicity, which have neighbors at a greater distance than methidathion and pentabromophenol. Additionally, the two compounds have no extreme values for the physicochemical descriptors considered. The second hypothesis suggests that the principle of active analogues is violated for some reason. This could be due either to differences in mechanisms of action or inaccuracy in the toxicity measurement. Pentabromophenol has a unique structure as compared to the other benzene derivatives included in the study and is highly hydrophobic (which will limit its solubility). Methidathion is the only pesticide studied with a sulfur-containing aromatic heterocycle. However, it is difficult to judge whether this structural feature is responsible for the slight deviation of the observed from the predicted toxicity. The simulation of an external test set was performed to validate the predictive ability of the methods as well as to investigate some practicalities in the use of KNN for predictive purposes. The results provide an additional basis for the comparison of MLR and KNN. MLR demonstrates remarkable stability in this form of validation, with a R2 in the regression between observed and predicted toxicity (eq 9), very similar to the statistical
Chem. Res. Toxicol., Vol. 17, No. 4, 2004 553
fit of the model developed on the complementary subset (eq 8). Conversely, the KNN has a large difference in R2 values (0.824 for the training set vs 0.941 for the test set). Although the observed vs predicted toxicity of the external set has a higher R2, it is a consequence of the fact that, by chance, the outliers remained in the training set and one additional outlier (methanol) also appeared in the training set. This result indicates the greater sensitivity of KNN for compounds with known activity and those for which activity is to be predicted. Additionally, the final result depends on the choice of the correct numbers of neighbors. As this is not definable before modeling, it contributes to the ambiguity of the method. To assess the significance of the number of nearest neighbors, R2 and s obtained from the correlation between observed and predicted toxicity for the training set (in LOO procedure) and the test set (simulation of external prediction) were plotted against the number of neighbors (Figure 8). Although it is difficult to determine positively which is the best number of neighbors to be used for prediction (i.e., there were several maxima in R2 and, respectively, minima in s with very small differences between them), a good correspondence between training and test sets was observed. It was encouraging to register that the optimal solutions (with high R2 and low s) appeared at the same number of neighbors in both the training and the test sets. This optimization is enhanced slightly if a mean predicted value from several nearest neighbors, with similar performance, is considered instead of a single prediction.
Conclusions A rapid and economical 15 min algal toxicity test has been developed. It was utilized to measure the toxicity of 91 chemically and mechanistically heterogeneous compounds to the alga C. vulgaris. The test was validated by investigating the relationship with toxicity data to other species (T. pyriformis, V. fischeri, and P. promelas). It is concluded that the QAARs between Chlorella, Tetrahymena, and Pimephales can be used to fill toxicity data gaps, albeit with some restrictions in the applicability domain. A successful three descriptor QSAR model, accounting for hydrophobicity, electrophilicity, and a function of molecular size corrected for presence of heteroatoms, was developed using MLR. Its predictivity was compared to that of a KNN algorithm developed on the same descriptors and toxicological data. Predictivity of the models was assessed by both LOO cross-validation and simulation of an external test set. MLR demonstrated greater stability in validation. Bearing in mind its transparency, portability, and ease for interpretation, it should be preferred, where possible, instead of the more robust but less transparent methods. The results of this study showed that method selection in QSAR is taskdependent and there should be clear indications (e.g., inability of MLR to deal with the data set) supporting the need of more complicated, but less comprehensible, methods when such are favored for QSAR.
Acknowledgment. This work was supported in part by the European Union IMAGETOX Research Training Network (HPRN-CT-1999-00015) (T.I.N.) and by the School of Pharmacy and Chemistry, Liverpool John Moores University (A.D.P.W.).
554
Chem. Res. Toxicol., Vol. 17, No. 4, 2004
References (1) European Economic Community (1993) Commission Directive 93/ 67/EEC of 20 July 1993. Laying down the principles for assessment of risks to man and the environment of substances notified in accordance with Council Directive 67/458/EEC. Off. J. Eur. Communities L227, 1-9. (2) Cronin, M. T. D., Jaworska, J. S., Walker, J. D., Comber, M. H. I., Watts, C. D., and Worth, A. P. (2003) Use of QSARs in international decision-making frameworks to predict health effects of chemical substances. Environ. Health Perspect. 111, 13911401. (3) Cronin, M. T. D., Walker, J. D., Jaworska, J. S., Comber, M. H. I., Watts, C. D., and Worth, A. P. (2003) Use of QSARs in international decision-making frameworks to predict ecologic effects and environmental fate of chemical substances. Environ. Health Perspect. 111, 1376-1390. (4) Jaworska, J. S., Comber, M., Auer, C., and van Leeuwen, C. J. (2003) Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ. Health Perspect. 111, 1358-1360. (5) Worth, A. P., Cronin, M. T. D., and van Leeuwen, C. J. (2004) A framework for promoting the acceptance and regulatory use of (quantitative) structure-activity relationships. In Predicting Chemical Fate and Toxicity (Cronin, M. T. D., and Livingstone, D. J., Eds.) pp 429-440, CRC Press, Boca Raton FL. (6) Walker, J. D. (2003) Applications of QSARs in toxicology: a U.S. Government perspective. J. Mol. Struct. (THEOCHEM) 622, 167184. (7) Cronin, M. T. D., and Schultz, T. W. (2003) Pitfalls in QSAR. J. Mol. Struct. (THEOCHEM) 622, 39-51. (8) Schultz, T. W., and Cronin, M. T. D. (2003) Essential and desirable characteristics of ecotoxicity quantitative structure-activity relationships. Environ. Toxicol. Chem. 22, 599-607. (9) Klimisch, H.-J., Andreae, M., and Tillmann, U. (1997) A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data. Regul. Toxicol. Pharmacol. 25, 1-5. (10) Cronin, M. T. D. (2004) Toxicological information for use in predictive modelling: quality, sources, and databases. In Predictive Toxicology (Helma, C., Ed.) Marcel Dekker, New York. (11) Russom, C. L., Bradbury, S. P., Broderius, S. J., Hammermeister, D. E., and Drummond, R. A. (1997) Predicting modes of toxic action from chemical structure: acute toxicity in the fathead minnow (Pimephales promelas). Environ. Toxicol. Chem. 16, 948967. (12) Cronin, M. T. D., Dearden, J. C., and Dobbs, A. J. (1991) QSAR studies of comparative toxicity in aquatic organisms. Sci. Total Environ. 109/110, 431-439. (13) Worgan, A. D. P., Dearden, J. C., Edwards, R., Netzeva, T. I., and Cronin, M. T. D. (2003) Evaluation of a novel short-term algal toxicity assay by the development of QSARs and inter-species relationships for narcotic chemicals. QSAR Comb. Sci. 22, 204209. (14) Cronin, M. T. D., Dearden, J. C., Duffy, J. C., Edwards, R., Manga, N., Worth, A. P., and Worgan, A. D. P. (2002) The importance of hydrophobicity and electrophilicity descriptors in mechanistically based QSARs for toxicological endpoints. SAR QSAR Environ. Res. 13, 167-176. (15) Netzeva, T. I., Worgan, A. D. P., Dearden, J. C., Edwards, R., and Cronin, M. T. D. (2004) Toxicological evaluation and QSAR modelling of aromatic amines to Chlorella vulgaris. Bull. Environ. Contam. Toxicol. Submitted for publication. (16) Netzeva, T. I., Dearden, J. C., Edwards, R., Worgan, A. D. P., and Cronin, M. T. D. (2004) QSAR analysis of the toxicity of aromatic compounds to Chlorella vulgaris in a novel short-term assay. J. Chem. Inf. Comput. Sci. 44, 258-265. (17) Cronin, M. T. D., Aptula, A. O., Duffy, J. C., Netzeva, T. I., Rowe, P. H., Valkova, I. V., and Schultz, T. W. (2002) Comparative assessment of methods to develop QSARs for the prediction of the toxicity of phenols to Tetrahymena pyriformis. Chemosphere 49, 1201-1221. (18) Leszczynska, M., and Oleszkiewic, J. A. (1996) Application of fluorescein diacetate hydrolysis as an acute toxicity test. Environ. Technol. 17, 79-85. (19) Schultz, T. W. (1999) Structure-toxicity relationships for benzenes evaluated with Tetrahymena pyriformis. Chem. Res. Toxicol. 12, 1262-1267.
Cronin et al. (20) Schultz, T. W., Cronin, M. T. D., Netzeva, T. I., and Aptula, A. O. (2002) Structure-toxicity relationships for aliphatic chemicals evaluated with Tetrahymena pyriformis. Chem. Res. Toxicol. 15, 1602-1609. (21) Schultz, T. W., Netzeva, T. I., and Cronin, M. T. D. (2003) Selection of data sets for QSARs: analyses of Tetrahymena toxicity from aromatic compounds. SAR QSAR Environ. Res. 14, 59-81. (22) Schultz, T. W. (1997) TETRATOX: The Tetrahymena pyriformis population growth impairment endpoint. A surrogate for fish lethality. Toxicol. Methods 7, 289-309. (23) Kaiser, K. L. E., and Palabrica, V. S. (1991) Photobacterium phosphoreum toxicity data index. Water Pollut. Res. J. Can. 26, 361-431. (24) Ribo, J. M., and Kaiser, K. L. E. (1987) Photobacterium phosphoreum toxicity bioassay, I. Test procedures and applications. Toxicol. Assess. 2, 305-323. (25) Kaiser, K. L. E., and Ribo, J. M. (1988) Photobacterium phosphoreum toxicity bioassay, II. Toxicity data compilation. Toxicol. Assess. 3, 195-237. (26) Cronin, M. T. D., Manga, N., Seward, J. R., Sinks, G. D., and Schultz, T. W. (2001) Parametrization of electrophilicity for the prediction of the toxicity of aromatic compounds. Chem. Res. Toxicol. 14, 1498-1505. (27) Dimitrov, S. D., Mekenyan, O. G., Sinks, G. D., and Schultz, T. W. (2003) Global modelling of narcotic chemicals: ciliate and fish toxicity. J. Mol. Struct. (THEOCHEM) 622, 63-70. (28) Lipnick, R. L. (1991) Outliers: their origin and use in the classification of molecular mechanisms of toxicity. Sci. Total Environ. 109, 131-153. (29) Seward, J. R., Sinks, G. D., and Schultz, T. W. (2001) Reproducibility of toxicity across mode of toxic action the Tetrahymena population growth impairment assay. Aquat. Toxicol. 53, 33-47. (30) Cronin, M. T. D., and Schultz, T. W. (1997) Validation of Vibrio fisheri acute toxicity data: Mechanism of action-based QSARs for nonpolar narcotics and polar narcotic phenols. Sci. Total Environ. 204, 75-88. (31) Cronin, M. T. D., and Schultz, T. W. (1998) Structure-toxicity relationships for three mechanisms of action of toxicity to Vibrio fischeri. Ecotoxicol. Environ. Saf. 39, 65-69. (32) Cronin, M. T. D., and Schultz, T. W. (2001)Development of quantitative structure-activity relationships for the toxicity of aromatic compounds to Tetrahymena pyriformis: comparative assessment of the methodologies. Chem. Res. Toxicol. 14, 12841295. (33) Basak, S. C., Gute, B. D., and Mills, D. (2002) Quantitative molecular similarity analysis (QMSA) methods for property estimation: a comparison of property-based, arbitrary, and tailored similarity spaces. SAR QSAR Environ. Res. 13, 727-742. (34) Basak, S. C., Gute, B. D., Mills, D., and Hawkins, D. M. (2003) Quantitative molecular similarity methods in the property/toxicity estimation of chemicals: a comparison of arbitrary versus tailored similarity spaces. J. Mol. Struct. (THEOCHEM) 622, 127-145. (35) Shen, M., LeTiran, A., Xiao, Y., Golbraikh, A., Kohn, H., and Tropsha, A. (2002) Quantitative structure-activity relationship analysis of functionalized amino acid anticonvulsant agents using k nearest neighbor and simulated annealing PLS methods. J. Med. Chem. 45, 2811-2823. (36) Shen, M., Xiao, Y., Golbraikh, A., Gombar, V. K., and Tropsha, A. (2003) Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. J. Med. Chem. 46, 3013-3020. (37) Netzeva, T. I. (2004) Whole molecule, and atom based, topological descriptors. In Predicting Chemical Fate and Toxicity (Cronin, M. T. D., and Livingstone, D. J., Eds.) pp 61-83, CRC Press, Boca Raton, FL. (38) Dearden, J. C., Bradburne, S. J. A., Cronin, M. T. D., and Solanki, P. (1988) The physical significance of molecular connectivity. In QSAR 88sProceedings of the Third International Workshop on Quantitative Structure-Activity Relationships in Environmental Toxicology (Turner, J. E., England, M. W., Schultz, T. W., and Kwaak, N. J., Eds.) pp 43-50, USDOE, Oak Ridge, TN. (39) Hall, L. H., and Kier, L. B. (2001) Issues in representation of molecular structure. The development of molecular connectivity. J. Mol. Graphics Modell. 20, 4-18.
TX0342518