Prediction of Fathead Minnow Acute Toxicity of ... - ACS Publications

This paper describes a QSAR study for predicting acute fathead minnow toxicity values of a diverse set of simple organic compounds using only calculat...
0 downloads 0 Views 87KB Size
670

Chem. Res. Toxicol. 1999, 12, 670-678

Prediction of Fathead Minnow Acute Toxicity of Organic Compounds from Molecular Structure Donald V. Eldred,† Cara L. Weikel,† Peter C. Jurs,*,† and Klaus L. E. Kaiser‡ Department of Chemistry, 152 Davey Laboratory, The Pennsylvania State University, University Park, Pennsylvania 16802, and National Water Research Institute, Burlington, Ontario L7R 4A6, Canada Received December 28, 1998

Interest in the prediction of toxicity without the use of experimental data is growing, and quantitative structure-activity relationship (QSAR) methods are valuable for such predictions. A QSAR study of acute aqueous toxicity of 375 diverse organic compounds has been developed using only calculated structural features as independent variables. Toxicity is expressed as -log(LD50) with the units -log(millimoles per liter) and ranges from -3 to 6. Multiple linear regression and computational neural networks (CNNs) are utilized for model building. The best model is a nonlinear CNN model based on eight calculated molecular structure descriptors. The root-mean-square log(LD50) errors for the training, cross-validation, and prediction sets of this CNN model are 0.71, 0.77, and 0.74 -log(mmol/L), respectively. These results are compared to a previous study with the same data set which included many more descriptors and used experimental data in the descriptor pool.

Introduction Each year, many novel products are synthesized for commercial and industrial use. These new products must not only succeed in accomplishing their designed task but also meet a battery of physical and activity requirements. Obviously, such compounds should have desirable toxicity values. However, experimental determination of toxicity can be expensive and time-consuming, and is frequently poorly received by the public. Given the quantity of substances to be characterized and the difficulties encountered in testing, there is considerable interest in the prediction of toxicological activities from molecular structure without the use of measured experimental variables. Quantitative structure-activity relationships (QSARs)1 are useful for such predictions. QSARs seek connections directly linking molecular structure and biological activity. The relationships are established by numerically encoding structural information for a group of molecules with known activity values (the training set) and then manipulating this information to determine a relationship to the activity of interest. The method is inductive, elucidating relationships imbedded with a training set of compounds. A successful QSAR correlates structural features to the target activity value with minimal error and is statistically reviewed and validated. After a relationship is refined, interpolation can be used to predict the activities of compounds not included in the training set. These methods have provided excellent statistical correlations in the case of physiochemical * To whom correspondence should be addressed: Telephone: (814) 865-3739. E-mail: [email protected]. † The Pennsylvania State University. ‡ National Water Research Institute. 1 Abbreviations: QSAR, quantitative structure-activity relationships; CNN, computational neural network; CPSA, charged partial surface area; GA, genetic algorithm; HOMO, highest occupied molecular orbital; LD50, lethal dose for 50% of the population; LUMO, lowest unoccupied molecular orbital; MOPAC, molecular orbital package; rms, root-mean-square.

properties such as aqueous solubility (1), autoignition temperatures (2), boiling points (3, 4), ion mobility constants (5), vapor pressure (6), and other properties. Studies have also appeared that deal with biological activities, including acute toxicity (7, 8) and human intestinal absorption (9). Successful QSAR models for toxicity have been developed which include correlations with octanol-water partition coefficients (log P) (10-16). The most obvious successes in correlating log P to toxicities have come from studies of aquatic toxicity. In such studies, log P operates as an indicator for how toxin-saturated an aqueous environment can become. Other studies have successfully linked toxicity to both structural and experimental information (8, 11, 17-19). While these models provide good estimates of toxicity, they require experimentally measured variables with their corresponding time and expense. This paper describes a QSAR study for predicting acute fathead minnow toxicity values of a diverse set of simple organic compounds using only calculated structural features. Acute aquatic toxicity is presented here as the concentration in millimoles per liter at which half of the population of fathead minnows died after exposure for 96 h (96 h LD50). Other published work with this database provides points of comparison for this work (11, 19, 20). For example, Kaiser and co-workers (19) developed a regression model that used as independent variables two experimental parameters, log(MW), 17 formula descriptors, and 31 functional group descriptors. Their model had a standard error of 0.701 log unit. By contrast, the work presented here develops models based on whole-molecule calculated descriptors and no experimental variables.

Experimental Section The data for this study came from the COMPUTOX (1995) toxicity database (21). The original data set contained 419

10.1021/tx980273w CCC: $18.00 © 1999 American Chemical Society Published on Web 06/17/1999

Prediction of Fathead Minnow Acute Toxicity compounds. A subset of 375 compounds was selected which met the requirements of the ADAPT software package for this study. Toxicity values were translated to -log(millimoles per liter) to reduce the range of the data. The least toxicologically active compound was methanol with an activity of -2.95 log units, and the most toxicologically active compound was fenvalerate with an activity of 6.00 log units. From the 375-member data set, 88 compounds were randomly selected for inclusion into two unique groups. In the case of multiple linear regression modeling, these were each treated as external prediction sets. Later, the two sets were assigned as cross-validation and prediction sets for the implementation of CNN modeling. The remaining 287 compounds were assigned as training set members. All feature selection, regression, and CNN training was performed exclusively on the training set. A list of the 375 compounds used in this study is presented in Table 1. These compounds were studied as a whole; no subsetting of the data set was necessary to achieve good toxicity models. The ADAPT software package used for this work has been developed by the Jurs research group and is available to other researchers. The system was designed and implemented to handle ordinary organic compounds, and it supports atom types C, H, O, N, and P, halogens, and some sulfur compounds. The system handles only neutral, complete molecules, but not charged species, salts, or disconnected partial structures. Each compound can have, at maximum, 46 non-H atoms, 100 H atoms, and 51 bonds between non-H atoms. This work was carried out on a DEC Alpha workstation running under Unix. Structure Entry and Modeling. The compounds were entered by sketching their structures using Hyperchem Lite (Hypercube, Inc., Waterloo, ON). The resulting connection tables were then submitted to MOPAC for geometric optimization using the PM3 Hamiltonian (22, 23). The resulting geometries were examined to ensure that reasonable conformations were developed. The resulting geometries were then used to calculate geometrical descriptors. The structures were submitted to MOPAC for optimization once again using the AM1 Hamiltonian, which is known to give better electronic values (23). Once again, each structure was examined to ensure that a reasonable conformation had been achieved. From the new configurations, electronic descriptors were then calculated and stored. Descriptor Generation. Topological descriptors were then generated from connection table information. These descriptors ranged from atom counts to electrotopological state indices (24). To address spatial and steric features of the molecules, geometric information was encoded using descriptors such as surface area, moments of inertia, and cross-sectional areas resulting from planar profiles of the molecules (25). The surface area used is the solvent-accessible surface area. Electronic features for each molecule were captured by calculating descriptors such as the charge on the most negative or positive atoms and HOMO and LUMO energies. Finally, hybrids of the electronic and geometric information were calculated as charged partial surface area (CPSA) descriptors (26). Examples of CPSA descriptors range from the partial negative surface area to the difference between partial positive and negative surface areas. All the descriptors were calculated with descriptor development routines that are part of the ADAPT software system. In all, 242 descriptors were calculated for each compound in the data set. Descriptor Reduction. The informational diversity captured by these descriptors was then maximized by objective feature selection. Featureless descriptors, or descriptors which exhibited >90% redundancy in their respective response values, were removed from the descriptor pool. Further reduction of the descriptor pool was attained by removing descriptors which were pairwise correlated with an r of >0.90. These reductions resulted in a reduced pool of 123 descriptors. To minimize the risk of chance correlations, Topliss and Edwards have suggested that the ratio of descriptors to observations be less than 60% (27). Here, the 123-member descriptor pool with 287 observations in the training set easily met this criterion. Subjective feature selection was then pursued by using algorithms combining

Chem. Res. Toxicol., Vol. 12, No. 7, 1999 671 simulated annealing or the genetic algorithm with fitness evaluation by regression or computational neural networks, as described in the next section. Model Building. With the reduced pool established, model generation was then performed. Subsets of the descriptor space were explored using a simulated annealing (SA) directed search algorithm for potential models (28). The quality of each subset of descriptors selected by the SA algorithm was governed by the root-mean-square (rms) error of an ordinary multiple linear regression model developed for each new subset of descriptors. The best potential models resulting from this directed search were validated by evaluating descriptor T values and a model’s R value, and searching for multiple colinearities. Finally, the two external subsets of observations were submitted to the final regression equation, and their resulting rms errors were analyzed. The subset of descriptors which yielded both good statistical validation and good external prediction validation was retained as the type I model. Type I linear models have the form

-log(millimoles per liter) ) a1X1 + a2X2 + ... + anXn where aj is an adjustable parameter and Xj is a descriptor value. The descriptors from the type I model were then submitted to a three-layer, fully connected, feed-forward computational neural network (CNN). For QSAR studies, CNNs can be thought of as highly nonlinear fitting routines. The application of the CNN to the descriptors found in the type I case resulted in a new nonlinear model (type II). Generally, it has been observed that this application of a CNN to type I descriptors yields an improvement in model precision. Improvements gained in this fashion are mainly attributed to the CNN’s ability to address nonlinear relationships and also to an increase in the number of adjustable parameters. For a three-layer, feed-forward, fully connected CNN with i input neurons, h hidden-layer neurons, and one output neuron, the number of interconnections (adjustable weights), P, is given by the equation P ) ih + 2h + 1. Each interconnection has an associated weighting factor that is altered during training of the CNN. The CNN used in this work utilizes a quasi-Newton (BFGS) (29-33) method to optimize the weights (34). To avoid overtraining of the CNN, one of the two external subsets was selected as a cross-validation set and its rms error was periodically monitored during training. The point at which the external cross-validation set error reaches it minimum is used to identify the point at which the CNN has completed its learning of general characteristics. Overtraining results from the learning of idiosyncratic information of the training set compounds by the CNN, that is, memorizing the training set compounds. Once network training was completed, the remaining external subset, the prediction set, was then submitted to the final CNN model. Agreement in rms error between the training, cross-validation, and prediction sets is used as one indicator of model validation. Finally, using CNN network features (e.g., architecture) optimized during type II modeling, a genetic algorithm (GA) was employed to search the reduced descriptor pool. The cost function of the GA is established by evaluating the rms error resulting from CNN training for each of the members of the parent populations. The best potential models resulting from this directed search were then examined more closely and again validated through application of the external prediction set in the same fashion as before. Models found utilizing this nonlinear feature selection, coupled with the nonlinear CNNs, are labeled as type III models. Such models typically provide the best performance in QSAR studies such as this one.

Results and Discussion Type I Model Construction. The descriptor space was thoroughly explored by multiple searches using simulated annealing with multiple linear regression modeling. Models with varying numbers of descriptors

672 Chem. Res. Toxicol., Vol. 12, No. 7, 1999

Eldred et al.

Table 1. Common Name, CAS Registry Number, Experimental Toxicity, and Calculated Toxicity from the Type III Model compound 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70

2,4,5-trichlorophenol benzyl chloride N,N-dimethylanilineb N,N-dimethylbenzylamine Benomyl N-allylaniline pentachloroethane 2-dodecanoneb cyclohexanone oleic acidb n-heptylamine salicylanilidec acetone 3,5-dibromo-4-hydroxybenzonitrile Chlordane resorcinol 1,4-dichlorobenzene N-phenyldiethanolamine n-butyl acetate 1-bromooctane phenyl salicylate 1,3-dichloro-1-propene dodecylaminec 3,6-dithiaoctane di-n-butyl etherb diethanolaminec Amytal 3,4-dimethylphenol Methomyl 3-methyl-2-butanonec 1,2,3,4-tetrachlorobenzenec catechol 1,5-dichloropentane p,p′-DDT 2,4-dinitroaniline 2-chloroaniline 1-heptanolc 4-nitrophenol 1-chloro-2-nitrobenzene 2-chloro-4-nitroaniline m-cresol 2-(dimethylamino)pyridine tert-butylstyrene 4,6-dimethoxy-2-hydroxybenzaldehydeb hexyl acetateb propionitrileb 1,2-diaminopropane styrene 2,4-dichlorotoluene benzaldehydec Diuron 2,2,2-trifluoroethanolc 1,2-dichloroethanec ethylenediamine Diflubenzuron 1,4-dicyanobutane 2,6-dimethylphenolb n-octyl cyanide Trifluralin naphthalene xanthone trichloroacetic acid (TCA)b 1,4-bis(chloromethyl)benzene methyl methacrylate 2-chloro-4-methylaniline 2,2,2-trichloroethanol Carbarylb 2,3,5,6-tetrachloroaniline 3-chlorotoluene 2,4-D

CAS-RN

pT expt

pT calcc

95-95-4 1.86 1.58 100-44-7 1.40 0.86 121-69-7 0.19 0.39 103-83-3 0.55 0.28 17804-35-2 2.12 1.85 589-09-3 0.57 1.20 76-01-7 1.44 1.57 6175-49-1 2.19 1.61 108-94-1 -0.81 -0.12 112-80-1 0.14 1.82 111-68-2 0.72 0.89 87-17-2 1.73 0.97 67-64-1 -2.15 -0.45 1689-84-5 1.35 0.60 57-74-9 108-46-3 106-46-7 120-07-0 123-86-4 111-83-1 118-55-8 542-75-6 124-22-1 5395-75-5 142-96-1 111-42-2 57-43-2 95-65-8 16752-77-5 563-80-4 634-66-2 120-80-9 628-76-2 50-29-3 97-02-9 95-51-2 111-70-6 100-02-7 88-73-3 121-87-9 108-39-4 5683-33-0 1746-23-2 708-76-9 142-92-7 107-12-0 78-90-0 100-42-5 95-73-8 100-52-7 330-54-1 75-89-8 107-06-2 107-15-3 35367-38-5 111-69-3 576-26-1 2243-27-8 1582-09-8 91-20-3 90-47-1 76-03-9 623-25-6 80-62-6 615-65-6 115-20-8 63-25-2 3481-20-7 108-41-8 94-75-7

3.55 0.04 1.62 -0.61 0.81 2.36 2.26 2.67 3.26 0.40 0.60 -1.65 0.42 0.94 1.89 -1.00 2.43 1.08 0.75 4.46 1.07 1.34 0.53 0.57 0.73 0.96 0.29 -0.02 2.51 1.83 1.52 -1.44 -1.13 1.41 1.54 1.14 1.22 -0.08 -0.14 -0.28 2.86 -1.25 0.74 1.42 3.50 1.32 1.73 -1.09 2.65 -0.41 0.60 -0.30 1.36 2.92 0.84 -0.08

4.67 1.40 1.56 0.69 0.03 2.47 1.76 1.06 2.25 0.74 0.53 -0.98 1.06 0.78 1.41 -0.31 2.59 1.28 1.45 4.68 1.58 0.14 0.88 0.37 0.69 0.97 0.46 -0.62 1.94 1.17 1.05 -0.33 -0.51 0.75 1.55 0.55 1.35 -1.07 0.46 -0.53 3.16 0.52 0.61 1.37 3.28 1.27 2.16 0.72 2.24 0.44 0.34 0.12 1.50 2.52 0.96 2.49

71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140

compound

CAS-RN

pT expt

pT calcc

3,4-dichlorotoluene n-butyl phenyl ether 3-bromo-6-hydroxybenzaldehyde allyl cyanide 1,4-dioxanec 1,3-dichloro-4,6-dinitrobenzene p-cresol 1,1,2,2-tetrachloroethane diethyl adipate 2-methyl-1-propanol methanol Resmethrin 2-chloroethanol 4,4′-isopropylidenebis(2,6-dichlorophenol) tridecylamineb 1,3-dichlorobenzene malononitrile 2-allylphenol 1-nitronaphthalene pyrrole caffeine 4-ethoxybenzaldehyde 2,3-dibromopropanol n-decylamine 3-pentanone ethyl acetateb 2-methoxyethylamine 4-chlorophenol 4-picoline 2,4-pentanedione 4-hexyloxyanilinec 1,3-dichloropropane 3,5-dinitroaniline trichloroethylene bromobenzene benzene acridine 4-dimethylaminocinnamaldehyde tert-butyl acetate DL-1-aminopropan-2-ol benzyl alcohol 2-decanone ethylene glycolc ethyl hexanoate 4-nitrobenzaldehyde N,N-dimethylformamide 4-chloro-3-methylphenol N,N′-diphenylthiourea 2-octanone acetonitrile 4-nitrobenzamide 1-dodecanolc phthalazine 5-ethyl-2-methylpyridine Dicofol (Kelthane) 4-cyanonitrobenzene 1,3-dinitrobenzeneb 4-phenylazophenol 1-methylnaphthaleneb aniline ethyl carbamate 2,3-dinitrotoluenec butyl butyrate 2-fluorobenzaldehydec 4-methyl-2-pentanone 2,6-di-tert-butyl-4-methylphenolb isopimaric acid Fenvalerate benzonitrile bis(2-hydroxyethyl)ether

95-75-0 1126-79-0 1761-61-1 109-75-1 123-91-1 3698-83-7 106-44-5 79-34-5 141-28-6 78-83-1 67-56-1 10453-86-8 107-07-3 79-95-8

1.74 1.42 2.19 -0.43 -2.05 3.77 0.58 0.92 1.05 -1.30 -2.95 4.74 0.34 2.44

1.43 0.94 1.45 -0.18 -2.78 2.51 0.60 1.15 2.03 -1.69 -1.86 4.44 -0.41 3.49

2869-34-3 541-73-1 109-77-3 1745-81-9 86-57-7 109-97-7 58-08-2 10031-82-0 96-13-9 2016-57-1 96-22-0 141-78-6 109-85-3 106-48-9 108-89-4 123-54-6 39905-57-2 142-28-9 618-87-1 79-01-6 108-86-1 71-43-2 260-94-6 6203-18-5 540-88-5 78-96-6 100-51-6 693-54-9 107-21-1 123-66-0 555-16-8 68-12-2 59-50-7 102-08-9 111-13-7 75-05-8 619-80-7 112-53-8 253-52-1 104-90-5 115-32-2 619-72-7 99-65-0 1689-82-3 90-12-0 62-53-3 51-79-6 602-01-7 109-21-7 446-52-6 108-10-1 128-37-0 5835-26-7 51630-58-1 100-47-0 111-46-6

3.48 1.30 2.07 0.95 1.28 -0.50 0.11 0.73 0.49 2.18 -1.25 -0.42 -0.84 1.32 -0.64 0.02 1.78 -0.06 0.93 0.47 0.89 0.62 1.89 1.42 -0.45 -0.64 -0.63 1.44 -2.93 1.21 1.18 -2.16 1.40 1.08 0.54 -1.61 0.10 2.27 0.11 0.17 2.79 0.78 1.38 2.26 1.20 0.00 -1.77 2.01 1.10 1.96 -0.71 2.78 2.54 6.00 0.21 -2.85

2.12 1.46 0.10 0.54 1.28 0.35 1.04 1.15 0.64 2.20 -0.19 -0.77 -1.38 1.04 -0.37 0.73 2.49 0.65 1.66 1.08 1.07 0.54 2.06 1.50 -0.66 -1.39 0.53 1.40 -2.85 0.88 1.13 -1.11 0.95 2.72 0.82 -0.52 1.10 2.95 0.67 0.01 3.15 0.64 1.72 2.57 1.20 -0.18 -0.87 1.54 0.84 0.53 -0.08 1.99 1.97 4.19 0.77 -2.95

Prediction of Fathead Minnow Acute Toxicity

Chem. Res. Toxicol., Vol. 12, No. 7, 1999 673

Table 1 (Continued)

141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210

compound

CAS-RN

pT expt

pT calcc

1,1,1-trichloro-2-methyl-2-propanolc chlorobenzenec n-octylamine 1,4-dinitrobenzene 4-chlorotoluene Bromacilb 4-tert-butylphenol Alachlor (Lasso) 2-picolineb 2,4,6-trichlorophenol 1-propanol 4-toluidine 2-methyl-2,4-pentanediol tert-butyl methyl ether (MTBE) 1-fluoro-4-nitrobenzenec o-xylene 2,3-benzofuran 4-amino-2-nitrophenol N-vinylcarbazole N,N-dibutylformamide allyl methacrylate 2-butanone oxime 2,3,6-trimethylphenol glyoxalb 3-pyridinecarboxaldehyde acrylonitrile 4-fluoroanilinec acenaphthene 2-methyl-2-propanol 4-acetamidophenol 3,4-dichloro-1-butene cyclohexanol diphenyl phthalate 2-butanol 4-acetylpyridineb 3,4-dichloroanilineb 1,2-dinitrobenzene 3-pyridinepropanol 2,4-dinitrotoluenec chloroacetonitrile n-butylaminec 1,2,3-trichlorobenzenec 4-isopropylbenzaldehyde m-xylene dimethyl phthalate (DMP)c dimethyl sulfoxide isopropylbenzene (cumene) Carbofuran 2,6-dinitrophenol dichloromethane di-n-butyl phthalateb diethylamine 4-phenylpyridine hexanal diethyl sebacateb methyl 4-nitrobenzoateb 5-methyl-2-hexanone ethylbenzenec benzoyl chloride hydrogen cyanideb epichlorohydrinb 2-tolunitrile 1,2,3-trichloropropane 2-methyl-3-nitroanilinec Dieldrin Hexachlorophene Permethrinb 2,3,4-trichloroanilineb p-xylene 2-sec-butyl-4,6-dinitrophenol

57-15-8 108-90-7 111-86-4 100-25-4 106-43-4 314-40-9 98-54-4 15972-60-8 109-06-8 88-06-2 71-23-8 106-49-0 107-41-5 1634-04-4 350-46-9 95-47-6 271-89-6 119-34-6 1484-13-5 761-65-9 96-05-9 96-29-7 2416-94-6 107-22-2 500-22-1 107-13-1 371-40-4 83-32-9 75-65-0 103-90-2 760-23-6 108-93-0 84-62-8 15892-23-6 1122-54-9 95-76-1 528-29-0 2859-67-8 121-14-2 107-14-2 109-73-9 87-61-6 122-03-2 108-38-3 131-11-3 67-68-5 98-82-8 1563-66-2 573-56-8 75-09-2 84-74-2 109-89-7 939-23-1 66-25-1 110-40-7 619-50-1 110-12-3 100-41-4 98-88-4 74-90-8 106-89-8 529-19-1 96-18-4 603-83-8 60-57-1 70-30-4 52645-53-1 634-67-3 106-42-3 88-85-7

0.12 0.70 1.40 2.22 1.33 0.15 1.47 1.73 -0.98 1.85 -1.88 -0.14 -1.96 -0.88 0.70 0.81 0.93 0.65 4.78 0.25 2.11 -0.99 1.22 -0.57 0.81 0.47 0.82 1.95 -1.94 -0.73 1.25 -0.85 3.60 -1.41 -0.14 1.36 2.45 -0.04 0.75 1.75 -0.56 1.89 1.35 0.82 0.21 -2.64 1.28 2.42 0.67 -0.56 2.40 -1.07 0.98 0.75 1.98 0.89 -0.14 0.40 0.62 2.30 0.86 0.42 0.35 0.48 4.33 4.29 4.39 1.74 1.21 2.54

0.26 0.89 1.43 1.77 0.98 0.90 1.02 2.03 -0.36 1.48 -1.73 -0.07 -2.24 -1.10 0.63 0.50 1.15 0.35 4.32 -0.03 1.44 -0.58 1.01 0.67 0.25 0.69 -0.07 1.61 -1.65 0.66 1.00 -1.30 3.48 -1.56 0.14 0.85 1.46 0.30 1.73 0.07 -0.36 1.80 0.73 0.51 1.43 -0.54 0.46 1.78 1.64 0.33 2.26 -0.56 0.89 0.32 2.32 1.45 0.21 0.42 0.87 0.60 -0.40 0.37 0.97 0.25 3.49 3.58 4.57 1.51 0.54 2.00

211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280

compound

CAS-RN

pT expt

pT calcc

Aldicarb 4-benzoylpyridine formaldehyde benzamide 5-nonanonec 1-pentanol dibutyl adipate 2-methyl-6-nitroanilineb dehydroabietic acid 3,4-dinitrotoluene dimethyl malonate nitrobenzene benzothiazole ethanolb 2,3,4,5-tetrachlorophenol chloroformc 2-nitrophenol 1-octanolc 1-ethynyl-1-cyclohexanol 1,2,4,5-tetrachlorobenzene 2,6-dinitrotoluene 5-chloro-2-pyridinol 3-(trifluoromethyl)benzonitrile methyl 4-cyanobenzoate abietic acid 1,2-dichloropropane 4-chlorobenzaldehyde N-methylanilinec 2-butanone pentabromophenolc acetic acid 2-propanol 2-ethoxyethyl acetate 1-decanol 1-chloro-2-propanol dimethoxymethaneb n-hexaneb toluene acrolein 1-naphthol γ-hexachlorocyclohexane 2,4-dimethylphenol 1,9-decadiene 2-bromo-2′,5′-dimethoxyacetophenone 2-cyanopyridine 1-octyn-3-olc 2,5-dimethyl-2,4-hexadiene 3-(trifluoromethyl)-4-nitrophenol diphenyl ether 1,1,3,3-tetramethylbutylamine 2-chloro-6-fluorobenzaldehyde 1,2-dichlorobenzene 2,4,6-trinitrotolueneb 2-acetamidophenol phenol N-ethyl-m-toluidine 2,4,6-tribromophenol diethyl phthalate (DEP) 3-acetamidophenol ethyl benzoate n-propylamine 2,6-diisopropylaniline carbon tetrachloride 6-methyl-5-hepten-2-oneb 1,2,4-trichlorobenzene N,N-diethylacetamide (R)-(+)-limoneneb n-hexylamine ethyl acrylate benzophenone

116-06-3 14548-46-0 50-00-0 55-21-0 502-56-7 71-41-0 105-99-7 570-24-1 1740-19-8 610-39-9 108-59-8 98-95-3 95-16-9 64-17-5 4901-51-3 67-66-3 88-75-5 111-87-5 78-27-3 95-94-3 606-20-2 4214-79-3 368-77-4 1129-35-7 514-10-3 78-87-5 104-88-1 100-61-8 78-93-3 608-71-9 64-19-7 67-63-0 111-15-9 112-30-1 127-00-4 109-87-5 110-54-3 108-88-3 107-02-8 90-15-3 58-89-9 105-67-9 1647-16-1 1204-21-3 100-70-9 818-72-4 764-13-6 88-30-2 101-84-8 107-45-9 387-45-1 95-50-1 118-96-7 614-80-2 108-95-2 102-27-2 118-79-6 84-66-2 621-42-1 93-89-0 107-10-8 24544-04-5 56-23-5 110-93-0 120-82-1 685-91-6 5989-27-5 111-26-2 140-88-5 119-61-9

3.83 0.25 0.10 -0.74 0.66 -0.73 1.85 0.80 2.16 2.08 1.03 0.01 0.33 -2.51 2.72 0.23 0.00 1.00 -0.31 2.85 0.99 -0.94 0.56 0.54 2.10 -0.05 1.81 0.03 -1.65 3.72 -0.12 -2.21 0.50 1.84 -0.41 -1.96 1.54 0.43 3.53 1.53 3.52 0.86 2.68 3.69 -0.84 2.49 1.47 1.36 1.63 0.72 1.23 1.40 1.88 0.73 0.51 0.44 1.70 0.84 -0.87 1.14 -0.72 1.03 0.55 0.17 1.78 -1.11 2.28 0.25 1.60 1.08

1.13 1.25 0.13 0.04 1.10 -0.80 2.23 0.06 2.55 1.52 0.79 0.10 1.20 -1.94 1.90 0.67 0.06 1.76 0.43 2.89 1.61 0.51 0.65 0.69 2.00 0.41 1.03 -0.06 -0.34 2.95 -0.44 -1.84 0.09 2.86 -0.34 -2.22 -0.10 0.39 0.70 1.33 3.54 0.78 2.51 2.35 -0.60 1.01 0.88 1.33 1.62 0.04 1.16 1.26 2.39 0.44 0.88 0.39 2.04 1.76 0.62 0.82 -0.59 0.67 1.06 0.94 2.11 -0.83 1.38 0.39 0.54 0.93

674 Chem. Res. Toxicol., Vol. 12, No. 7, 1999

Eldred et al.

Table 1 (Continued)

281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 a

compound

CAS-RN

pT expt

pT calcc

2-hydroxybenzaldehydeb 4-phenoxyphenol allyl alcohol 4,6-dinitro-2-cresol furanb 4-fluoro-N-methylaniline methyl acetate acetophenone 4-nitroaniline triethylene glycol 2-naphtholb 2,4-dichlorophenol diethyl ether 1,4-benzoquinoneb undecylamine cyclohexanone oxime 1-butanol 4-(dimethylamino)benzaldehyde 1,1,2-trichloroethane acetaldehydec Endosulfanc 1-hexanol 2,6-dinitro-4-methylaniline dibenzofuran di-n-hexylamine [(IR)-endo]-(+)-3-bromocamphor ethanolamine pyridine 1,1,1-trichloroethane 2,4-diaminotoluene 1,4-dimethoxybenzene quinoline 3-butyn-1-ol 2-ethyl-1-hexanolc 3-methoxyphenol 4-methoxyphenolc 2-butyn-1-ol ethyl 4-aminobenzoate 4-nitrotoluene 3-chloro-2-(chloromethyl)-1-propenec 1-chloro-3-nitrobenzene 4-ethylaniline benzoylacetone 2,4,6-trimethylphenol 2-chlorophenol 2-nitrotoluene 4-methyl-3-nitroaniline 2-methyl-4-nitroanilinec

90-02-8 831-82-3 107-18-6 534-52-1 110-00-9 459-59-6 79-20-9 98-86-2 100-01-6 112-27-6 135-19-3 120-83-2 60-29-7 106-51-4 7307-55-3 100-64-1 71-36-3 100-10-7 79-00-5 75-07-0 115-29-7 111-27-3 6393-42-6 132-64-9 143-16-8 10293-06-8 141-43-5 110-86-1 71-55-6 95-80-7 150-78-7 91-22-5 927-74-2 104-76-7 150-19-6 150-76-5 764-01-2 94-09-7 99-99-0 1871-57-4 121-73-3 589-16-2 93-91-4 527-60-6 95-57-8 88-72-2 119-32-4 99-52-5

1.73 1.58 2.26 2.05 0.05 0.51 -0.64 -0.13 0.04 -2.67 1.62 1.32 -1.54 3.38 2.91 -0.26 -1.37 0.51 0.21 0.15 5.43 0.02 1.18 1.97 2.38 0.53 -1.53 -0.10 0.45 -1.07 0.07 0.22 0.29 0.66 0.21 0.05 0.84 0.66 0.76 2.82 0.92 0.22 2.21 1.02 1.02 0.57 0.77 0.24

1.08 1.76 0.15 1.75 0.32 0.43 -0.91 0.39 0.30 -2.08 1.48 1.17 -1.21 2.48 2.31 -0.27 -1.38 0.86 0.78 -0.48 4.34 -0.05 1.48 2.64 2.52 2.72 -1.50 -0.19 0.71 0.11 0.45 0.48 -0.25 0.09 -0.01 0.12 -0.38 1.03 0.42 1.02 1.05 0.10 1.27 0.98 0.87 0.15 0.25 0.27

329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375

compound

CAS-RN

pT expt

pT calcc

3,3-dimethylbutan-2-oneb biphenyl (PCB-0) butanal triethanolamine methyl 4-chlorobenzoate tetrachloroethylene vanillinb 1,6-dicyanohexane diethyl L-tartrate hexyl acrylate 4-nitrophenyl phenyl etherc 2,3,4,6-tetrachlorophenol isophoronec 4-chloroanilinec Propanil benzylamine diphenylaminec 2,4-dichlorobenzaldehyde 2-(2-ethoxyethoxy)ethanol N-nitrosodiethylamine 3-picoline 4-methyl-2-nitroanilinec 4-(dimethylamino)-3-methyl-2-butanone hydroquinone 2-phenoxyethanol o-cresol 2-methyl-5-nitroanilinec 2,4-dinitrophenol diisopropyl ether 2-(diisopropylamino)ethanol 2-phenylphenolc L-arabinose 3-nitrotoluene 2-chlorotoluene tetrahydrofuran pentachlorophenol 5-methyl-2-nitroanilineb 2,6-difluorobenzoic acid N,N-diethylaniline 1,3,5-trichlorobenzeneb hexachloroethane propyl acetate 2,4,6-tri-tert-butylphenol acrylamide 2-methyl-1,4-naphthoquinoneb diethylchloromalonate 4-ethylphenolb

75-97-8 92-52-4 123-72-8 102-71-6 1126-46-1 127-18-4 121-33-5 629-40-3 87-91-2 2499-95-8 620-88-2 58-90-2 78-59-1 106-47-8 709-98-8 100-46-9 122-39-4 874-42-0 111-90-0 55-18-5 108-99-6 89-62-3 22104-62-7 123-31-9 122-99-6 95-48-7 99-55-8 51-28-5 108-20-3 96-80-0 90-43-7 87-72-9 99-08-1 95-49-8 109-99-9 87-86-5 578-46-1 385-00-2 91-66-7 108-70-3 67-72-1 109-60-4 732-26-3 79-06-1 58-27-5 14064-10-9 123-07-9

0.07 1.90 0.65 -1.90 1.20 1.09 0.23 -0.59 -0.50 2.16 1.91 2.35 -0.22 0.62 1.40 0.02 1.65 1.99 -1.94 -0.88 -0.19 0.79 1.19 3.40 -0.40 0.77 0.35 1.29 -0.89 -0.14 1.45 -2.40 0.63 1.23 -1.48 3.06 0.80 0.36 0.96 1.74 2.20 0.23 3.63 -0.19 3.20 2.31 1.07

-0.23 1.64 -0.20 -2.00 1.44 1.66 1.02 0.53 -0.30 1.21 2.15 1.96 0.54 0.41 1.76 -0.05 1.67 1.38 -2.00 -0.62 -0.37 0.08 -0.09 1.51 -0.03 0.34 0.37 1.64 -0.99 -0.87 1.38 -3.39 0.35 0.82 -1.42 2.37 0.14 1.00 0.51 2.01 2.07 -0.42 2.07 0.09 2.09 1.66 0.72

Type III model calculated pT values. b Member of the cross-validation set. c Member of the prediction set.

were examined. An eight-descriptor model was found which exhibited the best T values, multiple correlation coefficient, R value, and rms error. The descriptors forming this model are presented and described in Table 2. The eight descriptors selected for the type I model are focused on three major features of the molecules. Four of the descriptors (FPSA 1, FNSA 2, RPCS 1, and CHAA 1) are encoding hybrid electronic and geometric information. It seems reasonable that these descriptors are capturing the molecule’s ability to hydrogen bond. The three topological and geometrical descriptors (MOMH 5, MOLC 9, and WTPT 1) are encoding the degree of branching which the molecules have. In the case of WTPT 1, a moderate correlation r of 0.64 is found when comparing it to acute aquatic toxicity. The seven descriptors above also, to some degree, encode a measure of the molecule’s ability to partition into water. When the fact that the toxicity measured here is given in millimolar concentrations is considered, the water partitioning ability is a valuable observation to note. The final

descriptor (NDB 13) exhibits a very weak correlation r of 0.25 with the acute aquatic toxicity. However, when the fact that the toxicity data span nearly 9 orders of magnitude is considered, and the number of double bonds is limited here to five integer values, one would expect poor correlation values. A closer comparison of this descriptor and the dependent variable shows that from zero to three bonds there is a stepping pattern demonstrated. With the addition of each double bound, the minimum toxicological activity increases. The greatest pairwise correlation among these eight descriptors existed between FPSA 1 and FNSA 1 with an r of 0.62. The mean absolute value for all of the pairwise correlations is only 0.23, showing that the descriptors are quite independent. Descriptors which had been previously removed during objective feature selection were systematically substituted for their correlative counterparts, but no significant improvements were noted through these substitutions. The training set was checked for outliers, revealing three compounds as potential outliers. These three compounds

Prediction of Fathead Minnow Acute Toxicity

Chem. Res. Toxicol., Vol. 12, No. 7, 1999 675

Table 2. Descriptors Chosen for the Type I Linear Model for the Prediction of Fathead Minnow Acute Toxicity label MOMH 5 FPSA 1 FNSA 2 RPCS 1 CHAA 1 MOLC 9 NDB 13 WTPT 1 CONS

coeff

SE coeff 10-2

4.0902 × -2.718 1.054 5.7228 × 10-2 1.692 0.5591 0.3550 0.1443 -1.435

10-3

7.0020 × 0.3628 0.2273 1.6713 × 10-2 0.1927 0.1256 7.7713 × 10-2 8.4520 × 10-3 0.5128

range 1.00-62.91b 0.00-0.9607 -2.543 to -0.0332 0.00-24.79b -1.761 to 0.1219 1.00-4.156 0-4 3.00-59.95

descriptiona moment of inertia ratio fractional positive SA fractional negative SA relative positive SA sum charge acceptor atoms average distance sum conn. no. of double bonds molecular ID

a MOMH 5, ratio of the firat major moment of inertia of the molecule to the third major moment of inertia; FPSA 1, fractional positively charged partial surface area ) (sum of positively charged partial surface area)/(total molecular surface area) (26); FNSA 2, fractional negatively charged partial surface area ) (sum of negatively charged partial surface area × sum total of negative charges for the molecule)/ (total molecular surface area) (26); RPCS 1, relative positive charge ) (surface area of positively charged atoms) × [(charge of the most positive atom)/(sum total of positive charge)] (26); CHAA 1, sum of charges on hydrogen-bond acceptor atoms; MOLC 9, topological index J of Balaban, the sum of weighted distances on a molecular graph, the average distance-sum connectivity (36, 37); NDB 13, the number of double bonds in the molecule (aromatic rings do not contribute to this count); WTPT 1, molecular ID for the molecule, which equals the sum over all unique paths with each term being the path weight (38). b Does not include values for HCN.

Figure 1. Calculated/predicted -log(millimoles per liter) vs observed -log(millimoles per liter) for the type I multiple linear regression model.

were removed individually from the training set, and regression was performed for each case. None of the three potential outliers significantly affected the regression coefficients, and therefore, they were all retained. This type I model had a rms error of 0.83 log unit, and r ) 0.82 for the training set. The first prediction set had a rms error of 0.90 log unit, and the second prediction set had a rms error of 0.76 log unit. (It should be noted that HCN was not included in the calculation of the rms error for the second prediction set for reasons which will be discussed later.) A plot of the calculated/predicted -log(millimoles per liter) versus observed -log(millimoles per liter) for the training set and both of the external prediction sets is shown in Figure 1. Clearly, for such a diverse set of compounds, no one specific toxicological mechanism can be expected. As a result of this diversity, and the complex nature of toxicity, the relatively large rms errors reported here are understandable. Previously published work with these data identified a 51-descriptor model with a standard error of 0.70 log unit (11). The QSAR modeling techniques applied in this research have resulted in a new model that exhibits a slight improvement in standard error to

0.68 log unit. While this reduction in standard error is modest at best, it should be pointed out that associated with this improvement is a dramatic reduction in the number of independent variables (from 51 to eight). Furthermore, unlike the previous model, none of the descriptors presented in this research are dependent upon experimental data. It should be noted that the second prediction set contained hydrogen cyanide. The residual associated with HCN was very large, and this warranted closer examination. An out-of-bounds value had been assigned to HCN for the MOMH 5 descriptor. (The largest true value for the descriptor was 63, as shown in Table 2.) This is the default value given when the descriptor generation algorithm is unable to calculate a value for a compound. As the MOMH 5 descriptor is a ratio between the moments of inertia in the x and y directions, it is probable that this error is a result of the linear geometry of this three-atom compound. The regression algorithm and CNNs have no ability to recognize such error outputs; hence, there should be little surprise in the large resultant residual. In light of these facts, this model should not be applied to one-dimensional organic molecules. This limitation does not significantly degrade the overall value of this model as there are very few compounds which fall into this category. Type II Model Construction. The eight descriptors identified in the type I model were then submitted to a CNN to generate a nonlinear model. Overfitting of the data was avoided by keeping the ratio between the number of observations and the number of adjustable parameters greater than 2. The large number of observations in the training set permitted a wide range of CNN architectures to be explored. As well as the manipulation of the number of hidden neurons, both three- and fourlayer CNNs were extensively explored. In these explorations, no significant improvements were observed between the three- and four-layer network architecture types. This is in good agreement with other results realized by studying the number of layers that are appropriate for such fitting using CNNs (35). The best CNN found through this search had three layers, with the eight descriptors fed to the input layer, six hidden neurons in the second layer, and one output neuron in the final layer (8-6-1 architecture). The CNN has 61 adjustable parameters. The errors for this model demonstrate no real improvement over the type I model. Specifically, the training set had a rms error of 0.81 log

676 Chem. Res. Toxicol., Vol. 12, No. 7, 1999

Eldred et al. Table 3. Descriptors Chosen for the Type III Nonlinear Computational Neural Network Model for the Prediction of Fathead Minnow Acute Toxicity label

Figure 2. Calculated/predicted -log(millimoles per liter) vs observed -log(millimoles per liter) for the type II computational neural network model.

unit and a standard error of 0.65 log unit. The rms error of the cross-validation set was 0.88 log unit, and the rms error of the prediction set was 0.84 log unit. Figure 2 plots the calculated/predicted -log(millimoles per liter) versus observed -log(millimoles per liter) for the training, crossvalidation, and prediction sets. It is interesting to observe that the residual of HCN was reduced to 1.45 log units for this model, despite the difficulties with the MOMH 5 descriptor. The CNN was able to minimize the effect of the extreme descriptor value on the overall fit. Type III Model Construction. To fully exploit the nonlinear nature of the CNN, the reduced pool of descriptors was then submitted to a GA feature selection routine which incorporated a CNN for fitness evaluation. This combination of GA feature selection combined with CNN fitness evaluation allows the direct generation of a nonlinear model which fully exploits nonlinear relationships between the descriptors and the dependent variable. The 10 best models found by the GA/CNN routine were reviewed and submitted to external prediction where appropriate. The descriptor set supporting the best nonlinear type III CNN model is presented in Table 3. The general characteristics of the descriptors chosen for the type III model are very similar to those previously discussed for the type I model. Three descriptors (QNEG 1, WNSA 1, and CHAA 2) are encoding electronic and hybrid information and can be ascribed to capturing hydrogen-bonding functionality. Four descriptors (N7CH, WTPT 4, GEOH5, and MOMH 2) are then encoding the degree of branching for each molecule. The final descriptor, NDB 13, reappears from the type I model. This further underlines the case made previously that the number of double bonds has a positive relation with the toxicity of the molecules. The greatest pairwise correlation among these eight descriptors existed between MOMH 2 and WNSA 1 with an r of 0.64. The mean absolute value for all of the pairwise correlations is 0.21. Visual inspection of this model was performed by plotting the calculated/predicted -log(millimoles per liter) versus observed -log(millimoles per liter), as

range

N7CH

0-106

NDB 13 WTPT 4

0-4 0.00-15.49

GEOH 5

1.001-3776

QNEG 1

-0.556 to -0.0812

MOMH 2

11.42-7797

WNSA 1

4.751-224.0

CHAA 2

-0.4899 to 0.122

rms error of 0.71 log unit

standard error of 0.62 log unit

descriptiona count of 7χch for sevent-order chains no. of double bonds sum of weighted paths from oxygens geometry index ratio most negatively charged atom second major moment of inertia weighted negative surface area relative sum of charge on acceptor atoms

a N7CH, count of the number of chains of length 7 in the molecule (39); NDB 13, number of double bonds (aromatic rings do not contribute to this count); WTPT 4, sum of weighted paths originating from oxygen atoms (38); GEOH 5, ratio of intermediate to shortest geometric axis of molecule including hydrogen atoms; QNEG 1, the charge of the most negatively charged atom in the molecule; MOMH 2, second major moment of inertia; WNSA 1, (sum of negatively charged partial surface area) × (total molecular surface area) (26); CHAA 2, (sum of charges on acceptor atoms)/ (number of acceptor atoms).

Figure 3. Calculated/predicted -log(millimoles per liter) vs observed -log(millimoles per liter) for the type III nonlinear computational neural network model developed with genetic algorithm feature selection.

presented in Figure 3. The plot is noticeably better then the plot for the previous two models, and fewer outliers are observed. The rms error of the training set was 0.71 log unit, with an improved standard error of 0.62 log unit. The rms errors were 0.77 log unit for the cross-validation set and 0.74 log unit for the prediction set. Not only do the rms errors decrease for this fully nonlinear type III model, but perhaps more significantly there is better agreement between the three subsets of compounds. Specifically, the range of error has been decreased from 0.15 for the type I model to 0.06 for the type III model.

Prediction of Fathead Minnow Acute Toxicity

To summarize, the best model found for these data is a nonlinear CNN model which is based on the eight calculated molecular structure descriptors identified in Table 3. Possession of the values of these eight descriptors for a compound is the information needed by the CNN to predict its -log(LD50) value. Monte Carlo Experiments. To further validate the models presented here, the dependent variable was randomly scrambled and new models were generated. The same training set, cross-validation set, and prediction sets were used. The feature selection, regression, and computational neural network studies were all repeated. The computational sequence was exactly the same as for the real experiments, and only the dependent variable sequence was scrambled. This randomization permits the identification of the limitations for the methods used to seek true correlations. The resulting type I Monte Carlo model had rms errors of 1.32, 1.62, and 1.63 log units for the training, cross-validation, and prediction sets, respectively. The type II Monte Carlo model had rms errors of 1.37, 1.43, and 1.53 log units for the training, cross-validation, and prediction sets, respectively. Finally, the type III Monte Carlo model had rms errors of 1.37, 1.41, and 1.53 log units for the training, cross-validation, and prediction sets, respectively. The much larger errors associated with the Monte Carlo models as well as the large differences between training, cross-validation, and prediction set errors are both positive indicators that the models presented earlier result from true correlations and are not due to chance. Conclusions. Three QSAR models were developed to address acute aquatic toxicity for a highly diversified set of 375 small organic compounds. The models presented here each utilized eight calculated molecular structure descriptors. The type I model had a rms error of 0.83 log unit for the training set, 0.90 log unit for the first prediction set (not including HCN), and 0.75 log unit for the second prediction set. Comparison of these results with previous modeling of the data shows a modest improvement in error. An additional advantage of this new model is the reduction of independent variables from 51 descriptors (which included two experimental parameters) to eight descriptors which are all computed directly from the molecular structure. Application of these eight descriptors to CNNs (type II) showed no significant improvement. However, new descriptors were selected by a GA with a CNN included to determine the cost of potential models. The best resulting type III model had an rms error of 0.71 log unit for the training set, 0.77 log unit for the cross-validation set, and 0.74 log unit for the prediction set. Gains in the new model may be attributed to an increased number of adjustable parameters but primarily to the CNN’s ability to implement a nonlinear relationship between the descriptors and the toxicity. This study shows that a well-behaved statistical correlation between a small number of molecular descriptors and the acute aquatic toxicity can be developed. Models found in this study underline the value of hydrogen bonding, molecular branching, and a measure of a molecule’s activity through the number of double bonds, and their relationship to acute aquatic toxicity. The ability of the methods used to search through complicated descriptor spaces and develop mathematical relationships directly linking molecular structure and toxicological activity is demonstrated.

Chem. Res. Toxicol., Vol. 12, No. 7, 1999 677

Acknowledgment. The National Science Foundation Research Experience for Undergraduates (REU) Program is acknowledged for financial support for C.L.W.

References (1) Sutter, J. M., and Jurs, P. C. (1996) Prediction of Aqueous Solubility for a Diverse Set of Heteroatom-Containing Organic Compounds Using a Quantitative Structure-Property Relationship. J. Chem. Inf. Comput. Sci. 36, 100-107. (2) Mitchell, B. E., and Jurs, P. C. (1997) Prediction of Autoignition Temperatures of Organic Compounds from Molecular Structure. J. Chem. Inf. Comput. Sci. 37, 538-547. (3) Wessel, M. D., and Jurs, P. C. (1995) Prediction of Normal Boiling Points for a Diverse Set of Industrially Important Organic Compounds. J. Chem. Inf. Comput. Sci. 35, 841-850. (4) Katritzky, A. R., Lobanov, V. S., and Karelson, M. (1998) Normal Boiling Points for Organic Compounds: Correlation and Prediction by a Quantitative Structure-Property Relationship. J. Chem. Inf. Comput. Sci. 38, 28-41. (5) Wessel, M. D., and Jurs, P. C. (1994) Prediction of Reduced Ion Mobility Constants from Structural Information using Multiple Linear Regression Analysis and Computational Neural Networks. Anal. Chem. 66, 2480-2487. (6) Liang, C., and Gallagher, D. A. (1998) QSPR Prediction of Vapor Pressure from Solely Theoretically-Derived Descriptors. J. Chem. Inf. Comput. Sci. 38, 321-324. (7) Johnson, S. R., and Jurs, P. C. (1997) Prediction of Acute Mammalian Toxicity from Molecular Structure for a Diverse Set of Substituted Anilines Using Regression Analysis and Computational Neural Networks. In Computer-Assisted Lead Finding and Optimization (van de Waterbeemd, H., Testa, B., Folkers, G., Eds.) pp 29-48, Verlag Helvetica Chimica Acta, Basel, Switzerland. (8) Eldred, D. V., and Jurs, P. C. (1999) Prediction of Acute Mammalian Toxicity of Organophosphorus Pesticide Compounds from Molecular Structure. SAR QSAR Environ. Res. (in press). (9) Wessel, M. D., Jurs, P. C., Tolan, J. W., and Muskal, S. M. (1998) Prediction of Human Intestinal Absorption of Drug Compounds from Molecular Structure. J. Chem. Inf. Comput. Sci. 38, 726735. (10) Schultz, T. W., Lin, D. T., and Aronold, L. M. (1992) QSARs for Monosubstituted Anilines Eliciting the Polar Narcosis Mechanism of Action. Sci. Total Environ. 109/110, 569-580. (11) Harada, A., Hanazawa, M., Saito, J., and Hasimoto, K. (1992) Quantitative Analysis of Structure-Toxicity Relationships of Substituted Anilines by Use of Balb/3T3 Cells. Environ. Contam. Toxicol. 11, 973-980. (12) Nendza, M., and Russom, C. L. (1991) QSAR Modelling of the ERL-D Fathead Minnow Acute Toxicity Database. Xenobiotica 21, 147-170. (13) Schultz, T. W., Cajina-Quezada, M., and Wesley, S. K. (1989) Structure-toxicity Relationships for Mono Alkyl- or Halogensubstituted Anilines. Bull. Environ. Contam. Toxicol. 43, 564569. (14) Furay, V. J., and Smith, S. (1995) Toxicity and QSAR of Chlorobenzenes in Two Species of Benthic Flatfish, Flounder (Platichthys flesus L.) and Sole (Solea solea L.). Bull. Environ. Contam. Toxicol. 54, 36-42. (15) Newsom, L. D., Johnson, D. E., Lipnick, R. L., Broderius, S. J., and Russom, C. L. (1991) A QSAR Study of the Toxicity of Amines to the Fathead Minnow. Sci. Total Environ. 109/110, 537-551. (16) Arnold, L. M., Lin, D. T., and Schultz, T. W. (1991) QSAR for Methyl- and/or Chloro-substituted Anilines and the Polar Narcosis Mechanism of Toxicity. Chemosphere 21, 183-191. (17) Cronin, M. T. D., and Dearden, J. C. (1995) QSAR in Toxicology. 2. Prediction of Acute Mammallian Toxicity and Interspecies Correlations. Quant. Struct.-Act. Relat. 14, 117-120. (18) Sun, K., Krause, G. F., Mayer, F. L., Jr., and Eellersieck, M. B. (1995) Predicting Chronic Lethality of Chemicals to Fishes from Acute Toxicity Test Data: Theory of Accelerated Life Testing. Environ. Toxicol. Chem. 14, 1745-1752. (19) Kaiser, K. L. E., Niculescu, S. P., and McKinnon, M. B. (1998) On Simple Linear Regression, Multiple Linear Regression, and Elementary Probabilistic Neural Network with Gaussian Kernel’s Performance in Modeling Toxicity Values to Fathead Minnow. Quantitative Structure-Activity Relationships in Environmental Sciences, Vol. 7, pp 285-297, SETAC Press, Pensacola, FL.

678 Chem. Res. Toxicol., Vol. 12, No. 7, 1999 (20) Cronin, M. T. D., and Dearden, J. C. (1995) QSAR in Toxicology. 1. Prediction of Aquatic Toxicity. Quant. Struct.-Act. Relat. 14, 1-7. (21) COMPUTOX Toxicity Database [CD-ROM], version 5.0 (1995) National Water Research Institute, Burlington, ON. (22) Stewart, J. P. P. (1990) MOPAC 6.0, Quantum Chemistry Program Exchange, Indiana University, Bloomington, IN (Program 445). (23) Stewart, J. P. P. (1990) MOPAC: A Semiempirical Molecular Orbital Package. J. Comput.-Aided Mol. Des. 4, 1-103. (24) Hall, L. H. (1991) The Electrotopological State: An Atom Index for QSAR. Quant. Struct.-Act. Relat. 10, 43-51. (25) Stouch, T. R., and Jurs, P. C. (1986) A Simple Method for the Representation, Quantification, and Comparison of Volumes and Shapes of Chemical Compounds. J. Chem. Inf. Comput. Sci. 26, 4-12. (26) Stanton, D. T., and Jurs, P. C. (1990) Development and Use of Charged Partial Surface Area Structural Descriptors for Quantitative Structure-Property Relationship Studies. Anal. Chem. 62, 2323-2329. (27) Topliss, J. G., and Edwards, R. P. (1979) Chance Factors in Studies of Quantitative Structure-Activity Relationships. J. Med. Chem. 22, 1238-1244. (28) Sutter, J. M., and Jurs, P. C. (1995) Selection of Molecular Descriptors for Quantitative Structure-Activity Relationships. In Adaption of Simulated Annealing to Chemical Optimization Problems (Kalivas, J., Ed.) Elsevier Science Publishing Co., Amsterdam.

Eldred et al. (29) Broyden, C. G. (1970) The Convergence of a Class of Double-Rank Minimization Algorithms. J. Inst. Math. Appl. 6, 76-90. (30) Fletcher, R. (1970) A New Approach to Variable Metric Algorithms. Comput. J. 13, 317-322. (31) Goldfarb, D. (1970) A Family of Variable-Metric Methods Derived by Variational Means. Math. Comput. 24, 23-26. (32) Shanno, D. F. (1970) Conditioning of quasi-Newton Methods for Function Minimization. Math. Comput. 24, 647-656. (33) Fletcher, R. (1980) Practical Methods of Optimization, Vol. 1, John Wiley & Sons, New York. (34) Xu, L., Ball, J. W., Dixon, S. L., and Jurs, P. C. (1994) Quantitative Structure-Activity Relationships for Toxicity of Phenols Using Regression Analysis and Computational Neural Networks. Environ. Toxicol. Chem. 13, 841-851. (35) Brown, C. W., and Lo, S. C. (1998) Chemical Information Based on Neural Network Processing of Near-IR Spectra. Anal. Chem. 70, 2983-2990. (36) Balaban, A. T. (1982) Highly Discriminating Distance Based Topological Index. Chem. Phys. Lett. 89, 399-404. (37) Mihalic, Z., and Trinajstic, N. (1992) A Graph-Theoretical Approach to Structure-Property Relationships. J. Chem. Educ. 69, 701-712. (38) Randic, M. (1984) On Molecular Identification Numbers. J. Chem. Inf. Comput. Sci. 24, 164-175. (39) Kier, L. B., and Hall, L. H. (1986) Molecular Connectivity in Structure-Activity Analysis, John Wiley & Sons, New York.

TX980273W