An Automated Group Contribution Method in Predicting Aquatic Toxicity

Over the past decade, quantitative structure−activity relationship (QSAR) development has become vital in the field of aquatic toxicity assessment (...
0 downloads 0 Views 392KB Size
740

Chem. Res. Toxicol. 2005, 18, 740-746

An Automated Group Contribution Method in Predicting Aquatic Toxicity: The Diatomic Fragment Approach Mose´ Casalegno,*,† Emilio Benfenati,† and Guido Sello‡ Institute for Pharmacological Research Mario Negri, IRFMN, via Eritrea 62, 20157 Milano, Italy, and Dipartimento di Chimica Organica e Industriale, Universita´ degli Studi di Milano, via Venezian 21, 20133 Milano, Italy Received December 2, 2004

We developed a group contribution method (GCM) to correlate acute toxicity (96 h LC50) for the fathead minnow (Pimephales promelas) for 607 organic chemicals. Unlike most of the existing methods, the new one makes no use of predefined groups as descriptors. A simple general rule is proposed to break down any molecule into diatomic fragments. The entire data set was partitioned three times. Each time, a training set and a test set were obtained with a ratio of 2:1. For each partition quantitative structure-activity relationship, models were developed using Powell’s minimization method, multilinear regression, neural networks, and partial least squares. The GCM method achieved a good correlation of the data for both training and test sets, regardless of the partition considered. The method is therefore robust and can be generally applied. Further model improvements are described.

Introduction Over the past decade, quantitative structure-activity relationship (QSAR) development has become vital in the field of aquatic toxicity assessment (1-3). Every year, a large number of new chemicals are synthesized. Environmental safety regulations require them to be carefully tested before entering the market, to check for any negative effects on the environment. The aquatic environment is highly vulnerable to pollutants. Contamination of water systems not only affects aquatic organisms but also has dramatic consequences on water for drinking, household needs, fishing, transport, and commerce. Experimental toxicity test methodologies represent the obvious way to collect water quality data. Unfortunately, performing such analyses is often costly and time consuming and almost impractical considering the large number of compounds manufactured. To meet its regulatory mandate, in 1979, the U.S. Environmental Protection Agency (EPA) began using QSAR in the absence of test data. Since then, QSAR methods have been widely used to meet the challenge of water quality assessment. To fulfill toxicologists requirements, a QSAR method should be accurate, general, and easy to implement. Most of all, a QSAR should be robust, i.e., applicable to chemicals over a wide range with no loss of performance. In traditional QSAR practice, the octanol/water partition coefficient, molecular weight, highest occupied molecular orbital/lowest unoccupied molecular orbital energy gap, polar surface area, etc. have been the attributes used by most investigators to correlate structure and toxic effect (4-6). Although these holistic descriptorss i.e., quantities referring to the entire molecular structures * To whom correspondence should be addressed. Tel: ++390239014586. Fax: ++39-023546277. E-mail: [email protected]. † Institute for Pharmacological Research Mario Negri. ‡ Universita ´ degli Studi di Milano.

have so far been effective in predicting toxicity, they cannot describe important chemical features. For example, the presence of internal hydrogen bonds, ionizable acid-base couples, or electrophilic/nucleophilic functional groups might make substances more toxic than analogues of equal hydrophobicity. Clearly, because most of these chemical features are substructure specific, holistic descriptors cannot help to identify them. This considerably limits their predictive capability. To overcome this limitation, a number of different solutions based on chemical class assignment have been proposed (7, 8). However, the definition of a chemical class can be ambiguous, and many new compounds defy clear-cut classification, so new chemical classes have to be defined, and additional predictive models must be developed. These drawbacks, and the demand for more robust methods, have stimulated the development of tools to investigate molecular toxicity at a substructural level, such as group contribution methods (GCMs). GCMs use molecular fragments (single or grouped atoms), usually referred to as groups, as descriptors. A molecule is dissected into basic fragments, and its toxicity value is obtained by summing the contribution of each fragment occurring in the molecule (when an additive equation is used in modeling). Fragment activity coefficients are usually calculated by statistical techniques, such as multilinear regression (MLR), neural networks (NN), and partial least squares (PLS). Simplicity and ease of implementation have made GCM very popular for estimating the properties of organic molecules (912), and in the field of aquatic toxicity, meaningful GCM applications have been reported (13, 15). The success of these applications is due to an appropriate choice of the size and chemical composition of the groups used in model building. In modeling the 96 h LC50 toxicity for the fathead minnow, Gao, Tabak, and Govind (13) achieved an excellent correlation between observed

10.1021/tx049665v CCC: $30.25 © 2005 American Chemical Society Published on Web 03/12/2005

Diatomic Fragments to Predict Aquatic Toxicity

and predicted values, using small groups as basic parameters. These groups, containing 1-4 atoms, were similar to those defined within the UNIFAC model (14). To further improve this model, Young and Martin (15) added more complex groups accounting for chemical features whose excessive toxicity had already been experimentally confirmed. Although their presence was important to enhance the model’s performances, their number was small as compared to the simplest ones. At first sight, the efficacy of such simple models looks surprising. In principle, the use of small constructs would be expected to enhance general applicability at the expenses of accuracy. However, as shown by the studies mentioned (13, 15), small fragments better suit the statistical approaches commonly employed, such as MLR or NN, than big, more specific, ones. This can be reasonably explained considering that the loss in specificity due to the use of small constructs is counterbalanced by the gain in statistical significance, therefore increasing the final accuracy of the results. Given the possibility to combine small fragments successfully with statistical approaches, one might ask whether the use of small fragments in GCM could be developed more generally. In both of the studies cited, the developers address the choice of the groups suitable for modeling. In contrast, in the present work, we developed a new GCM requiring no predefined or UNIFAC type groups to be effective. Following a simple and straightforward rule, a molecule is broken down into diatomic fragments (DFGs). Use of these fragments guarantees high statistical significance and a wide range of applicability at the same time. To properly test model performances and to investigate the limits and advantages of our model, we took a very heterogeneous data set containing 607 chemicals. Acute toxicity data (fathead minnow 96 h LC50) were obtained from the EPA Duluth database (16-19). QSARs were developed using MLR, NN, and PLS. We directly compared the models’ performance to measure their robustness. As we mentioned above, this parameter is important in model validation, but as far as we know, it has never been concretely tested in this field of research. To put our work in the correct perspective, we must add some more comments on the paper by Russom et al. (1). In this paper, the authors performed a deep analysis of aquatic toxicity in relation to the mode of action of the chemical compounds. Using the experimental observations of the fish syndromes, they classified toxicants into eight modes of action: baseline narcotics, polar narcotics, ester narcotics, oxidative phosphorylation uncouplers, respiratory inhibitors, electrophiles/proelectrophiles, acetylcholinesterase inhibitors, and affectors of central nervous system seizure response. Looking at the structures of the compounds listed in the syndrome classes, a set of rules were then developed; these rules can be used for an a priori classification of compounds, whose toxicity should be predicted. The result is certainly interesting, but the gained knowledge is much more important. The use of a preclassification scheme can be of great help in the prediction of aquatic toxicity. However, our aim is different. We would like to develop a completely automated system that can predict toxicity using theoretical rules, that can be discussed in the light of experimental data, but that must originate from the information directly available from the chemical structures with no preclassification.

Chem. Res. Toxicol., Vol. 18, No. 4, 2005 741

Figure 1. Breakdown of N,N-diethyl-m-toluamide in DFGs. DFG hydrogens are displayed. Dashed double bonds are aromatic.

Materials and Methods The database provided by the U.S. EPA covers a collection of 607 industrial organic compounds for which the median lethal concentrations (LC50) in 96 h flow-through exposures referred to juvenile stage of fathead minnow are reported (16-18). LC50 values are expressed in mmol/L. For modeling purposes, the log(1/LC50) was used instead of the LC50. Two-dimensional (2D) chemical structures were drawn for all compounds as text files in MDL MOL format. In a preliminary stage, the MDL MOL files were entered in our algorithm and processed. Atomic reference indexes, atom types, connectivity matrices, and bond orders were extracted and revised so as to optimally restructure the chemical information at our disposal. Any hydrogen atoms were deleted from the original structures, to permit the distinction between different atomic saturation patterns. For example, for the carbon atom, C, we obtained the following atom types: C, CH, CH2, and CH3, with 4, 3, 2, and 1 binding positions, respectively. The program was instructed to detect six-membered aromatic rings in the molecules. To all bonds belonging to an aromatic ring, a bond order of 4 was assigned instead of the original value of 1 or 2. This expedient was introduced to equalize aromatic bond orders and to distinguish their aromatic character from aliphatic ones. Condensed aromatic systems were also taken into account, by increasing the bond order of common bonds by 1. For example, a bond order of 5 would be obtained for the bond common to both naphthalene aromatic rings. Distinguishing these chemical features enhanced the diversification among atom types, thus widening the DFG variety. The molecular representation described so far was used as the starting point to generate DFGs. A simple, straightforward rule was adopted to break down each molecule, and one atom was selected, in turn. Then, each of its nearest neighbors was considered, one at a time. Each time, the resulting atomic pair was isolated from the surrounding atoms by cutting all bonds connecting it to other atoms, providing a DFG. The procedure applied to generate DFGs can be easily visualized looking at Figure 1, where N,N-diethyl-m-toluamide is provided as an example (hydrogens are not shown). Atomic pair indexes were reported in order to identify the correspondence between atomic pairs and DFGs more easily. Atomic pair indexes providing equivalent DFGs are grouped in the same cell. In this case, molecular breakdown yielded 14 atom pairs, reduced to eight different DFGs. It was thus possible to detect different saturation patterns and aromatic rings. DFG numbers 6 and 7 differ each other for a single hydrogen atom, evidencing different saturation patterns. DFG no. 5 is not symmetrical, because of the dashed double bonds, which have an aromatic

742

Chem. Res. Toxicol., Vol. 18, No. 4, 2005

Casalegno et al.

character. The rule was applied to all linear, branched molecules, as well as to molecules containing rings. DFGs from breakdown of the data set molecules were afterward collected and compared with the aim of eliminating redundancies and to account for multiple fragments occurrences. Three features, summarizing the definition of DFG, were used to univocally identify each DFG: (i) The main bond order, referred to the bond joining the DFG constituent atoms; (ii) the elements the constituent atoms were made of, such as C, N, P, O, etc.; and (iii) The nearest neighbor bond orders, i.e., the orders of all bonds connecting the constituent atoms with their nearest neighbors. This scheme gave Ntot ) 147 different DFGs, describing the entire data set. To each DFG, we assigned an integer from 1 to Ntot. For each molecule, a fingerprint array was then filled with integers accounting for fragment occurrences. For QSAR purposes, we created a matrix containing all descriptors and acute toxicity values log(1/LC50). The matrix listing all molecules and their DFG representations was then partitioned three times. Each time, a training and a test set were obtained, with a ratio of 2:1. All 607 toxicity end point values log(1/LC50) were preliminarly ordered in monotonic decreasing fashion. Then, beginning from the highest value, one out of three molecules was picked out and placed in the test set, leaving the other two in the training set. This procedure was repeated three times, picking out a different molecule each time. This partitioning guarantees that both the training and the test sets contain a representative amount of compounds showing different toxicities; in addition, all of the compounds pass through the test phase. About 200 molecules were thus placed in each test set. The partitions obtained so far will be indicated as A, B, and C. Training and test set molecules served different purposes in model development. Training set molecules provided the necessary input for computing DFG toxicity coefficients, while test set molecules were only used for validation. This gave us the opportunity to carry out a severe test to quantify the robustness and consistency of our method. Each time, only 2/3 of the entire data set was used for model building, leaving the remaining third for testing. For each partition, four different models were developed using Powell’s minimization method (PMM), MLR, NN, and PLS. In modeling with PMM and MLR, we employed the following linear QSAR equation:

Tpred ) log(1/LC50) )



n k ‚ fk

(1)

k)1,Ntr

where nk is the number of groups of the k-th type present in the molecules, fk is the toxicity coefficient referred to the k-th group, and Ntr is the number of DFGs describing the training set. This number would generally be smaller than Ntot. Indeed, partitioning might cause some fragments to move from the training set to the test set. For such missing fragments, no toxicity value could be computed, so their coefficients were set to zero; this is a well-known problem, called the missing fragment problem. Fortunately, our partitioning showed a limited amount of compounds with missing fragments (12, 10, and 7 in test sets A, B, and C, respectively), thus the missing fragment problem did not significantly influence the results. Nevertheless, we also prepared a fourth experiment where we randomly put 75% of the compounds in the training set and the remaining 25%, purged from the molecules showing missing fragments, in the test set. The obtained test set contains 142 (23%) compounds out of 154 (25%). To predict the toxicity values, we used PLS and NN approaches only. The result is in complete agreement with those obtained using the original partitions (PLS r2 ) 0.83, 0.62; NN r2 ) 0.89, 0.72, training and test sets, respectively). In eq 1, it is easy to see that no group-group interaction terms were introduced in the model. Including all possible group-group interactions would have prohibitively increased the number of free parameters to be optimized. On the other hand, in the absence of objective criteria, selecting the main

Table 1. Number of DFGs for All Data Sets data set

Ntr

Nte

Nco

Rte

A B C

139 135 137

120 121 116

112 109 106

93.4 90.1 91.4

group-group interactions would have spoiled any impartial investigation of the model’s capabilities. Our research group developed the algorithm implementing the PMM. A complete description of this method applied to multidimensional functions minimization can be found in ref 20. The direction set (Powell’s) method is widely used to minimize multidimensional functions, giving quadratic convergence with little computational effort. In the present work, we used Powell’s method to optimize the DFGs’ coefficients fi in eq 1. As cost function, i.e., the function to be minimized, we chose the following:

cost )



e|Ti

pred-T obs|+1 i

-e

(2)

i)1,Mtr

where Tipred and Tiobs are the predicted and observed toxicity end point values for the i-th molecule and Mtr is the number of training set molecules. This functional form was devised for minimizing the differences between observed and predicted toxicity values. The exponential term was introduced to maximally uniform the deviations from the observed values. Big differences would be exponentially magnified, forcing the algorithm to smooth them. The additional term -Exp[1] was added to correct the asymptotic behavior of the cost function. Setting Tipred ) Tiobs (for all i) zeroes the cost function, as would be reasonably expected in the case of optimal fitting. For the other three models, MLR, PLS, and NN, both model building and statistical analysis were done using the Statistica software (StatSoft, release 6.0). For MLR, default σ-restricted parametrization was adopted with a sweep of 1.E-7, and an inverse matrix threshold of 1.E-12. To preserve eq 1, no use of intercept was made. PLS parameter values were also left as default, and the automatic scaling option was enabled. The number of components resulting from the fitting procedure was respectively 7, 4, and 6 for data sets A, B, and C. In building NNs, we chose a multilayer perceptron architecture with one hidden layer and 13 units within the layer for all data sets. The logistic interval was set at 0.5. NNs were first trained with backpropagation (100 epochs) and then conjugate gradient (500 epochs) algorithms. For each data set, 60 cycles of crossvalidated resampling were carried out. In all approaches, crossvalidation was done to simultaneously manage training and test molecules. Results obtained by the different approaches were then reported and compared. For the sake of clarity, squared correlation coefficients for training and test sets will be here indicated as r2 and rcv2, respectively. Accordingly, root mean square errors (RMSEs) will be abbreviated to rm and rmcv.

Results and Discussion During the preliminary stages of our work, we computed the effective number of fragments resulting from the breakdown of training set molecules for each data set considered (see Ntr in Table 1). As we mentioned earlier, this number was expected to be smaller than Ntot (147), in view of the partition scheme applied, but large enough to express the overall data set chemical diversity, to ensure reliable model building. From the first column in Table 1, one can see that Ntr and Ntot differ very little in all partitions considered, and only about 10 fragments were not available in the training set after partitioning. As compared to Ntot, this fraction becomes small enough (about 7%) for model building. To make a similar reliability assessment for the test set, we considered the

Diatomic Fragments to Predict Aquatic Toxicity

Chem. Res. Toxicol., Vol. 18, No. 4, 2005 743

Table 2. r2 and rcv2 for All Models PLS data set A B C

rcv2

r2

NN

PMM rcv2

r2

r2

MLR

rcv2

rcv2

r2

0.818 0.671 0.893 0.702 0.788 0.587 0.830 0.465 0.814 0.587 0.906 0.617 0.821 0.481 0.869 0.544 0.832 0.660 0.915 0.670 0.825 0.601 0.845 0.487 Table 3. rm2 and rmcv2 for All Models PLS

data set A B C

rm

rmcv

NN rm

rmcv

PMM rm

rmcv

MLR rm

rmcv

0.585 0.821 0.502 0.792 0.634 0.911 0.560 1.167 0.555 0.900 0.427 0.859 0.566 1.052 0.504 1.006 0.571 0.818 0.527 0.761 0.584 0.902 0.548 1.098

number of DFGs needed to fully describe the test set (Nte) and the number of DFGs common to the training and test sets (Nco). Both are reported in Table 1, with the percentage Rte ) Nco/Nte × 100. This ratio accounts for the fraction of DFGs actually employed for estimating the test molecules’ LC50 and also provides a quantitative measure of the method’s predictive capacity toward unknown species. Indeed, the larger the Rte, the better the test molecules are represented by the training set and the more accurate the final result. In Table 1, the proportion of “useful” fragments recovered was large, more than 90%, for all partitions. Of course, some fragments belonging to the test set were missing in the training set and vice versa. We found the missing fragment problem to arise in all data sets, despite the small size of the fragment employed, proving the whole data set to be very heterogeneous from the chemical point of view. Given the reliability of the proposed method, we applied the approach described in the previous section. For each data set, four models were finally obtained. Tables 2 and 3 summarize our results in terms of squared correlation coefficients (r2 and rcv2) and RMSEs (rm and rmcv) for each of the resulting 12 models. These results refer to all molecules since no outliers were discarded. Training set modeling provided excellent squared correlation coefficients (around 0.8, regardless of the method considered) within a narrow range of possible values (from 0.788 for data set A with PMM to 0.915 for data set C with NN) for all data sets. NN yielded the best results in terms of r2 (from 0.893 to 0.915). Quite a large gap was found in passing from NN to other statistical techniques. Squared correlation coefficients for MLR, PLS, and PMM are comparable, but PMM applied to data set A gave the lowest value (0.788). RMSE estimates showed NN and MLR superiority in fitting, although the magnitude of PLS error makes it comparable with the other ones. The situation was completely different in test set modeling, where the results varied much more widely, strongly related to the statistical approach used. Model performances generally decreased from NN to MLR. Application of NN yielded the best rcv2 for all data sets. This result, comparable to that of the most recent QSAR aquatic toxicity estimates based on the use of LogP in combination with other 2D/3D descriptors (21), was achieved without adding any external or human-based information to the present model or discarding outliers. These features are increasingly important in the light of the growing need for automated, general predictive tools in water quality assessment. PLS was also successful, showing a good compromise between r2 and rcv2 re-

sponses. PLS, however, was more sensitive to the partition choice than NN, as PLS rcv2 broke down to 0.587 for data set B. PMM and MLR were much less effective in correlating toxicity LC50 values than nonlinear approaches. The difference between NN performances and the other methods becomes even more evident looking at Table 3. With no exceptions, NN gave the lowest RMSE estimates, confirming this approach as the most suitable for the current DFG representation. PMM and MLR were less effective in test modeling than NN and PLS, indicating that a linear model eq 1 performed poorly in prediction, despite its efficacy in fitting training set toxicity values. A detailed inspection of squared correlation coefficients indicated that PMM and MLR gave very different values although the same modeling equation was used for both. The reason is that Powell’s minimization algorithm quadratically converges to the minimum whereas the multilinear regression does so only linearly. The huge size of the parameter space to be handledsabout 150 parametersssuggests that there are several saddle points, which might erroneously drive the linear minimization process toward local minima. For such a complex parameters space, Powell’s method would target the goal of minimization. Indeed, estimating the Hessian matrix makes it better able to prevent the risk of in local minima falling (20). Comparing the accuracy of the results for different test sets immediately yielded the following hierarchy: data set C > A > B. Data sets C and A provided rcv2 values of comparable magnitude when the same method was used. For both of these data sets, NN and PLS were the most effective approaches. Cross-validated squared correlation coefficients became systematically worse in passing to data set B, with the sole exception of MLR. A possible explanation of the rcv2 worsening comes directly from Table 1. Among all data sets, B possesses the lowest descriptive capability for both training and test sets (respectively, 135 DFGs out of 147 and 109 DFGs out of 121). Following this line of reasoning, one might anticipate the result of test set modelling by looking at the Rte value. Of course, quantitatively predicting rcv2 from Rte is far from accurate, because of the multitude of factorss such as DFG frequency of occurrence, modeling algorithm, for exampleswhich contributed to the overall goodness of the model. Nonetheless, given the modeling algorithm, this estimate can be used to provide a qualitative indication about the final accuracy in prediction. So far, these results confirm what we anticipated in the Introduction: Use of small constructs in combination with statistical approaches provides good correlations between aquatic toxicity and 2D chemical structure for a large number of chemicals, especially when nonlinear models are built. There are, of course, some compounds whose predicted LC50 significantly deviates from the observed value regardless of the statistical approach used. Investigating these outliers, and their influence on model performances, is fundamental for two reasons. First, it allows us to circumscribe the limits and advantages of the present DFG representation. Second, it provides useful indications for further improvements. To carry out this investigation, we first modeled the entire data set using no partitions. Fully exploiting the DFG modeling potential meant that we could promptly recognize bad predicted molecules. The following squared correlation coefficients were obtained for the whole data

744

Chem. Res. Toxicol., Vol. 18, No. 4, 2005 Table 4. Outliers and Mean Residuals for the Entire Data Set and All Partitionsa

Casalegno et al.

Diatomic Fragments to Predict Aquatic Toxicity

Chem. Res. Toxicol., Vol. 18, No. 4, 2005 745

Table 4 (Continued)

a

The superscript T indicates test set molecules.

set: 0.826 (PLS), 0.848 (NN), 0.804 (PMM), and 0.809 (MLR). For each molecule, the mean residual, i.e., the average difference between observed and predicted log(1/LC50) for all models developed, was computed. Table 4 reports the 20 worst predicted molecules, the first 10 underestimated, and the last 10 overestimated. Beside the names and chemical structures, we have listed the mean residuals for the entire data set (whole) and its partitions. Values labeled with a superscript T were referred to as the test set molecules. A preliminary inspection revealed that all outliers recognized within the entire data sets were often outliers in all of the partitions too. Partitioning had little effect on the mean residuals: The deviations from the observed values were fairly close to those for the whole data set, regardless of whether they belonged to the training or test set. This points to the existence of structural features, which could not be detected at a DFG level. Looking at the compounds outlined in Table 4, we can note that, excluding cyclohexanone, tetrahydrofuran, acetone, and ethanol, most of the molecules contain special groups, such as allylic and benzylic activated groups, nitriles, or R-alogenated carbonyl groups. Some of these groups, such as allylic and benzylic, have already been identified excessively toxic in previous studies (1, 18). Thus, it is clear that the problem is the representation of compounds whose toxicity might be influenced by fragments longer than two atoms. This problem has been solved in GCMs introducing ad hoc fragments. For example, allylic and benzylic groups can be regarded as 1-3 interactions, so the introduction of appropriate threeatom fragments should correct most of the deviations. In this concern, a comparison to the most recent and meaningful work (15) can help to put into the correct perspective our contribution. Martin and Young used a smaller data set (397 compounds) from the same source and applied a GCM to predict aquatic toxicity. Their result is very good (r2 > 0.9 and >0.85 for training and

prediction sets, respectively). The significant difference is in the selection of the fragments that contributes to the calculation of toxicity values. In fact, they selected some fragments on the basis of the current knowledge, accurately solving all of the critical situations by choosing ad hoc groups (e.g., a group is selected to represent propargyl alcohol toxicity and a different one to take into consideration tertiary propargyl alcohols). Knowledgebased models are without doubts very effective because they contain all of the current information and a good prediction can be consequently expected. We explicitly excluded this a priori guided selection, because with a view at developing automated algorithms for predictive toxicology, one would do better to prefer to extract these fragments automatically, rather than including them on purpose. The other possible solution would be to add descriptors explicitly addressing the issue of unusually toxic molecules. However, this solution is outside our current approach. We are currently working in the direction of the automatic extraction of longer fragments to improve our model. The other possible solution is the use of a preclassification scheme using rules derived from experimental knowledge, as suggested by Russom et al. (1). This approach certainly gives good results, but as all expert systems, it is limited by the current level of the knowledge and its extension to currently unknown cases can be very difficult. In contrast, we would prefer a completely automated system that can predict toxicity using theoretical rules without a priori handling of special cases. The system must work on the information directly present in the chemical structures, following paths that have been developed using general and flexible rules. This fact is commented also by Russom et al.; they note that the classification using the observed mode of action can be imprecise: There are examples where the classification is not in agreement with the mode of action. In addition, when more substructures, codifying for

746

Chem. Res. Toxicol., Vol. 18, No. 4, 2005

different modes of action, are present in a compound, it is always difficult to predict the mode of action.

Conclusions A new automated GCM method was developed exploiting the DFG representation in combination with statistical approaches to correlate acute toxicity data on the fathead minnow for 607 chemicals. Among all of the models developed, NN gave the best results in terms of squared correlation coefficients for both test and training sets. Investigation of the method’s descriptive capacity showed its general applicability toward unknown chemicals and its consistency. The analysis of outliers in the final section provided indications about the method’s robustness and useful insights for further improvements. The developed method is a first step toward a general and robust model for aquatic toxicity prediction based on fragments. At this stage, the method already demonstrated its power in the toxicity prediction of those compounds whose action can be described at two atoms level. This is an interesting result if we consider that it has been reached without using any other information but the molecular structure. The addition to the model of more complex substructures (three or four atoms fragments) should certainly improve the model; we will put our further efforts in this direction, developing a completely automated approach, where the knowledge is extracted from the molecular structures rather than from the experimental data. These last should be only used as reference and not to a priori correct potential outliers.

Acknowledgment. We acknowledge the EU project IMAGETOX (Contract HPRN-CT-1999-00015) and DEMETRA (Contract QLK5-CT-2002-00691) for partially supporting and funding the current study. We thank the reviewers for their helpful suggestions.

References (1) Russom, C. L., Bradbury, S. P., Broderius, S. J., Hammermeister, D. E., and Drummond, R. A. (1997) Predicting modes of toxic action from chemical structure: Acute toxicity in the fathead minnow (Pimephales Promelas). Environ. Toxicol. Chem. 16, 948967. (2) Noegroathi, S., and Hammers, W. E. (1991) Regression models for octanol-water partition coefficients, and for bioconcentration in fish. Toxicol. Environ. Chem. 34, 155-173. (3) Basak, S. C., and Gute, B. D. (1997) Predicting acute toxicity (LC50) of benzene derivatives using theoretical molecular descriptors: A hierarchical QSAR approach. SAR QSAR Environ. Res. 7, 117-131. (4) Konemann, H. (1981) Quantitative structure-activity relationships in fish toxicity studies. Toxicology 19, 209-221.

Casalegno et al. (5) Nendza, M., and Russom, C. L. (1991) QSAR modeling of the ERL-D fathead minnow acute toxicity database. Xenobiotica 12, 147-170. (6) Shultz, T. W., Wilke, T. S., Bryant, S. E., and Hosein, L. M. (1991) QSARS for selected aliphatic and aromatic amines. Sci. Total Environ. 109/110, 581-587. (7) Bradbury, S. P., and Lipnick, R. L. (1990) Introduction: structural properties for determining mechanisms of toxic action. Environ. Health Perspect. 87, 181-182. (8) Verhaar, J. M., Leeuwen, C. J., and Hermens, J. L. M. (1992) Classifying environmental pollutants. 1: Structure-activity relationships for prediction of aquatic toxicity. Chemosphere 25, 471-491. (9) Lydersen, A. L. (1955) Estimation of Critical Properties of Organic Compounds by the Method of Group Contributions, Eng. Exp. Stn. Report 3, University of Wisconsin, Chemical Engineering Department. (10) Constantinou, L., and Gani, R. (1994) New group contribution method for estimating properties of pure compounds. AIChE 40 (10). (11) Constantinou, L., Gani, R., and O’Connell, J. P. (1995) Estimation of the acentric factor and the liquid molar volume at 298 K using a new group contribution method. Fluid Phase Equilib. 103, 1122. (12) Klincewicz, K. M., and Reid, R. C. (1984) Estimation of critical properties with group contribution methods. AIChE 30 (1), 137. (13) Gao, C., Govind, R., and Tabak, H. H. (1992) Application of group contribution method for predicting the toxicity of organic compounds. Environ. Toxicol. Chem. 11, 631-636. (14) Fredenslud, A., Jones, R. L., and Prausnitz, J. M. (1975) Groupcontribution estimation of activity coefficients in nonideal liquid mixtures. AIChE 21, 1086-1099. (15) Martin, T. M., and Young, D. M. (2001) Prediction of the Acute Toxicity (96-h LC50) of organic compounds to the Fathead Minnow (Pimephales promelas) using a group contribution method. Chem. Res. Toxicol. 14, 1378-1385. (16) OAO Corporation (2000) ECOTOX, ECOTOXicology Database System, Code List, U.S. Environmental Protection Agency, Office of Research, Laboratory Mid-Continent Division (MED), Duluth, Minnesota. (17) OAO Corporation (2000) ECOTOX, ECOTOXicology Database System, Data Field Definition, U.S. Environmental Protection Agency, Office of Research, Laboratory Mid-Continent Division (MED), Duluth, Minnesota. (18) OAO Corporation (2000) ECOTOX, ECOTOXicology Database System, User Guide, U.S. Environmental Protection Agency, Office of Research, Laboratory Mid-Continent Division (MED), Duluth, Minnesota. (19) Selection of toxicity values from the Duluth database was performed selecting the lowest toxic concentration (worst case scenario) in agreement with EU directives [Ruden, C., and Hansson, S. O. (2003) How accurate are the European Union’s classifications of chemical substances. Toxicol. Lett. 144, 159172]. (20) Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1986-1992) Numerical Recipes in Fortran 77: The Art of Scientific Computing, pp 406-412, Cambridge University Press, New York. (21) Gini, G., Craciun, M. V., Ko¨nig, C., and Benfenati, E. (2004) Combining unsupervised and supervised artificial neural networks to predict aquatic toxicity. J. Chem. Inf. Comput. Sci. 44, 1897-1902.

TX049665V