Identification of the Structural Requirements for Mutagencitiy, by

Mar 24, 2007 - The original model was derived for a single tester strain, Salmonella typhimurium (TA100), using the Ames test by the National Toxicolo...
0 downloads 0 Views 439KB Size
662

Chem. Res. Toxicol. 2007, 20, 662-676

Identification of the Structural Requirements for Mutagencitiy, by Incorporating Molecular Flexibility and Metabolic Activation of Chemicals. II. General Ames Mutagenicity Model R. Serafimova,† M. Todorov,† T. Pavlov,† S. Kotov,† E. Jacob,‡ A. Aptula,§ and O. Mekenyan*,† Laboratory of Mathematical Chemistry; UniVersity “Prof. As. ZlataroV”, 8000 Bourgas, Bulgaria, Department of Product Safety, Regulations, Toxicology and Ecology, BASF Aktiengesellschaft, D-67056, Ludwigshafen, Germany, and SEAC, UnileVer Colworth, Sharnbrook, Bedford, MK44 1LQ, England ReceiVed NoVember 23, 2006

The tissue metabolic simulator (TIMES) modeling approach is a hybrid expert system that couples a metabolic simulator together with structure toxicity rules, underpinned by structural alerts, to predict interaction of chemicals or their metabolites with target macromolecules. Some of the structural alerts representing the reactivity pattern-causing effect could interact directly with the target whereas others necessitated a combination with two- or three-dimensional quantitative structure-activity relationship models describing the firing of the alerts from the rest of the molecules. Recently, TIMES has been used to model bacterial mutagenicity [Mekenyan, O., Dimitrov, S., Serafimova, R., Thompson, E., Kotov, S., Dimitrova, N., and Walker, J. (2004) Identification of the structural requirements for mutagenicity by incorporating molecular flexibility and metabolic activation of chemicals I: TA100 model. Chem. Res. Toxicol. 17 (6), 753-766]. The original model was derived for a single tester strain, Salmonella typhimurium (TA100), using the Ames test by the National Toxicology Program (NTP). The model correctly identified 82% of the primary acting mutagens, 94% of the nonmutagens, and 77% of the metabolically activated chemicals in a training set. The identified high correlation between activities across different strains changed the initial strategic direction to look at the other strains in the next modeling developments. In this respect, the focus of the present work was to build a general mutagenicity model predicting mutagenicity with respect to any of the Ames tester strains. The use of all reactivity alerts in the model was justified by their interaction mechanisms with DNA, found in the literature. The alerts identified for the current model were analyzed by comparison with other established alerts derived from human experts. In the new model, the original NTP training set with 1341 structures was expanded by 1626 proprietary chemicals provided by BASF AG. Eventually, the training set consisted of 435 chemicals, which are mutagenic as parents, 397 chemicals that are mutagenic after S9 metabolic activation, and 2012 nonmutagenic chemicals. The general mutagenicity model was found to have 82% sensitivity, 89% specificity, and 88% concordance for training set chemicals. The model applicability domain was introduced accounting for similarity (structural, mechanistic, etc.) between predicted chemicals and training set chemicals for which the model performs correctly. Introduction The focus of the latest legislative and governmental efforts is to establish simple screening tools for identifying those chemicals most likely to cause adverse effects without experimental testing of all chemicals of regulatory concern. The use of quantitative structure-activity relationship (QSAR)1 models is a powerful in silico technique that should be considered for prioritizing chemicals for subsequent experimental verification. This is supported also by the five-level conceptual framework of the Organisation for Economic Co-operation and Development (OECD) (2) according to which the use of in vitro and QSARs is foreseen before using in vivo tests. Thus, QSAR can be used as a supporting piece of information in a weight of evidence approach helping in the replacement or minimization of animal testing, i.e., integrated testing strategies. Alternative motivations for developing QSAR models are for use in the safety assessments that are conducted within industry research and development (R&D) cycles. * To whom correspondence should be addressed. Tel: ++359 56 880230. Fax: ++359 56 880230. E-mail: [email protected]. † University “Prof. As. Zlatarov”. ‡ BASF Aktiengesellschaft. § Unilever Colworth.

Efforts in QSAR modeling of mutagenicity and carcinogenicity are much more pronounced than for any of the other human health end points (3). Major difficulties in modeling the carcinogenicity end point are principally due to the diversity of carcinogenicity pathways and the nonavailability or subcritical 1 Abbreviations: TIMES, tissue metabolic simulator; NTP, National Toxicology Program; COREPA, common reactivity pattern; QSAR, quantitative structure-activity relationship; OECD, Organisation for Economic Co-operation and Development; R&D, research and development; DEREK, a computer-based expert system (derived from the LHASA chemical synthesis design program); OncoLogic, a computer system to evaluate the carcinogenic potential of chemicals; CASE, computer automated structure evaluation; MULTICASE, multiple computer automated structure evaluation; TOPKAT, toxicity prediction by computer-assisted technology; COMPACT, computer optimised molecular parametric analysis of chemical toxicity; CAS, Chemical Abstracts Service; SMILES, simplified molecular input line entry specification; GA, genetic algorithm; ELUMO, energy of the lowest unoccupied molecular orbital (eV); EHOMO, energy of the highest occupied molecular orbital (eV); Egap, EHOMO - ELUMO (eV); EN, electronegativity - 1/2(EHOMO + ELUMO) (eV); µ, dipole moment (D); VolP, volume polarizability (a.u./eV); GW, Geom. Wiener (Å); Lmax, greatest interatomic distance (Å); qi, atomic charges; fHOMOi, frontier atomic charge of highest occupied molecular orbital (a.u.); fLUMOi, frontier atomic charge of lowest unoccupied molecular orbital (a.u.); SEi, donor superdelocalizabilities (a.u./eV); SNi, acceptor superdelocalizabilities (a.u./eV); CPSAs, charged partial surface areas; PPSA, partial positive surface area; PNSA, partial negative surface area; logKow, the octanol/water partition coefficient; mol. weight, molecular weight; QC, quantum chemical.

10.1021/tx6003369 CCC: $37.00 © 2007 American Chemical Society Published on Web 03/24/2007

Structural Requirements for Mutagenicity

number of reliable data sets, which have hampered the applicability of the QSAR approach. Because of this, efforts have been focused on the modeling of in vitro genotoxicity, which is better defined by underlying molecular interaction mechanisms. Generally, genotoxic chemicals cause a broader spectrum of adverse effects including interactions with DNA, altering its structure and information content; segregation of DNA in chromosome structure; damaging normal replication processes, etc. The mutagenic chemicals are more restricted in their mode of action, increasing the occurrence of mutations in populations of cells and/or organisms only on account of interaction with DNA. To detect and measure this potency, a short-term, simple, and inexpensive in vitro assay was designed, namely, the Ames test, which identifies genetic damage caused by chemicals on bacterial cells. As shown by Ariens (4) and discussed in ref 5, Ames mutagenicity has been modeled by two well-known QSAR approaches: (i) the functional (alerting) group approach and (ii) a correlative approach. The first approach is associated with the local reactivity of chemicals, i.e., reactivity of functional groups also called alerts, pharmacophores, or toxicophores. This approach handles chemicals that have well-defined reactive groups and that cause their effect by interacting covalently with biological macromolecules. With respect to mutagenicity, these reactive substructures are either electrophiles per se or can be metabolically activated to reactive intermediates. Ashby and Tennant (6-9) introduced a list of structural alerts for DNA reactivity that could be used for identifying potentially mutagenic chemicals. An excellent demonstration of the capabilities of the alerting group approach is the recent work of Kazius et al. (10), where the 2401 mutagens in the data set were separated from 1936 nonmutagens with 18% error. The two well-recognized knowledge-base systems for predicting mutagenicity and carcinogenicity, DEREK (11, 12) and OncoLogic (13, 14), are based on a logical combination of expert rules. They codified existing knowledge derived from human experts into generalized rules to use for predicting mutagenicity and carcinogenicity. Both systems provide structural reasoning for eliciting the effects considering general reactivity features, structural modulators, and associated interaction mechanisms. The CASE (computer-automated structure evaluation)/MULTICASE (multiple computer-automated structure evaluation) system is a statistically driven fragment-based approach (15, 16). It does not use prior knowledge of the mechanisms of action or structural alerts but develops its own rules during model development relating test chemical structures with the end point under investigation. In CASE, each chemical from the training set is decomposed into all possible constituent fragments of lengths up to 10 atoms. Then, MULTICASE partitions the chemicals in the training set into groups by selecting the most statistically important fragments, called biophores, assuming that they are responsible for the appearance of the effect for the respective group containing it. Subsequently a CASE analysis is applied to each biophore class to determine substructural modulators to biophore activity. Knowledge-based and statistically driven methods have been comparatively analyzed in a number of reviews (5, 10, 17-23). Irrespective of the encouraging results of the toxicophore approach for mutagenicity predictions, the methods developed so far are unable to account for metabolic activation and detoxification of chemicals. The correlative QSAR approaches are attractive for application within a congeneric series of chemicals; however, they

Chem. Res. Toxicol., Vol. 20, No. 4, 2007 663

should have limited applicability for heterogeneous databases. The reasoning of these limitations could be associated with the difficulties in modeling multiple or overlapping mechanisms by a single relationship, as well as defining the model applicability domains when more than one toxic pathway or interaction mechanism holds (24, 25). Of the correlative systems for predicting mutagenicity and carcinogenicity, the most developed are TOPKAT (toxicity prediction by computerassisted technology) (26, 27) and CASE (28, 29). The correlative models, however, could be very useful when applied in combination with the alerting group approach. Thus, within the structural space defined by a single alerting group, believed to define a single interaction mechanism, such models could explain the variation in the reactivity of the alert conditioned by the rest of the molecular structure. In these cases, structural parameters describing reactivity of the entire molecule or associated with the specified alert should be related to the studied effect. Purdy (30) has applied such an approach to predict carcinogenicity. The proposed model is in the form of a decision tree with individual rules or QSARs for different chemical classes. A similar scheme is used in COMPACT (computer-optimized molecular parametric analysis of chemical toxicity) (31). The advantage of such models is their ability to handle end points conditioned by different mechanisms of action. Similarly, in our preceding work on modeling bacterial mutagenicity (TA100) of chemicals, we have combined the alerting group approach with a pattern recognition type of model to delineate reactivity of chemicals toward DNA within a given interaction mechanism (1). The collection of alerts defined the multiplicity of interaction mechanisms for training set chemicals [National Toxicology Program (NTP) database], whereas the COREPA (common reactivity pattern) models (32, 33) generalized the effect of modulating factors on the reactivity of each alert. The COREPA models defined the structural requirement in terms of two- (2D) and three-dimensional (3D) parameter ranges providing firing of alerting groups and subsequent mutagenic effects. This was the first attempt to combine onto both the same modeling platform called TIMES (tissue metabolic simulator), a toxicodynamics model (describing the reactivity toward DNA), and a toxicokinetics model, explicitly simulating S9 metabolism (1, 34). Our recent analysis showed that the mutagenicity effects associated with different strains strongly intercorrelate; for example, the concordance between TA100 and TA98 for NTP database chemicals was about 90% (concordance between active chemicals only was about 70%). This fact changed the strategic direction of the investigation. From a mechanistic model focusing on one strain, only TA100 (1) in the new formulation, we moved to a general solution. Instead of analyzing the other strains one by one, the TIMES methodology was applied to predict mutagenicity of chemicals in a training set that is mutagenic to any of the Ames tester strains. Interaction mechanisms, with DNA, associated with each individual toxicophore, were identified using literature and expert sources. According to these mechanisms, the alerts were discriminated as direct acting and alerts resulting from metabolic activation initiated by bacterial strains. Our experience in computer modeling of metabolism (35-37) was implemented to improve the TIMES, which was trained to mimic liver metabolism on the basis of collected documented metabolic pathways for 332 chemicals. The explicit generation of metabolites allowed the DNA reactivity model to be applied not only to parent chemicals but also to their stable metabolites. The model for general mutagenicity was initially developed for the NTP database. The

664 Chem. Res. Toxicol., Vol. 20, No. 4, 2007

SerafimoVa et al.

Figure 1. Illustration of the conformer distributions of two chemicals, 4-(2,4-dinitroanilino)phenol (CAS 119-15-3) and 2-nitrodiphenylamine (CAS 119-75-5), across Egap. The overlap between conformer distributions is used in the COREPA method to evaluate the similarity between chemicals with respect to Egap.

model was externally validated by 1626 BASF noncongener (proprietary) chemicals showing a reasonable balance between the fitting accuracy and the prospective predictive power. These chemicals were subsequently used to expand the training set. The model applicability domain was defined. The derived model for general mutagenicity was combined in a battery with two other models for predicting major mechanisms of mutagencitys frameshift and base pair substitution.

Materials and Methods Mutagenicity Database. The NTP database with 1341 chemicals tested in Salmonella typhimurium was used for deriving the general mutagenicity model; 252 of these chemicals were positive in any of the strains studied without S9. According to the adopted semiquantitative activity scale, the chemicals were classified as (negative), ?, and W (questionable and weak positive response, respectively), or 1+, 2+, 3+, and 4+, etc, which correspond to 2n-fold increased activity as compared to the controls. In this study, questionable mutagens were treated as nonactive, whereas weak positive responses (1+) were treated as actives. Another training set of 289 chemicals was also prepared including chemicals that elicited mutagenic activity but only after activation by the S9 enzymatic system. Eight hundred (60%) of 1341 chemicals were found to be nonmutagenic with and without S9. The potency values were important to distinguish frameshift mutagens from base pair substitution (electrophilic) type mutagens, as the strain distinction was not absolute. Eighty-eight chemicals with a predominantly electrophilic mechanism of mutagenicity (without S9) were specified having activity against TA100 with an at least two-fold (2+) higher activity than against TA98. Ninety-six chemicals were specified with a predominant frameshift mechanism of mutagenicity (without S9); their TA98 activity exceeded by at least 2+ the activity against TA100. A set of 1626 proprietary chemicals provided by BASF was used to first validate the model and subsequently to expand its training set. The resulting training set consisted of 435 chemicals, which were mutagenic as parents; 397 chemicals that were mutagenic after S9 metabolic activation; and 2012 nonmutagenic chemicals. Chemical Abstracts Service (CAS) numbers, names, simplified molecular input line entry specifications (SMILES), and mutagenic activities of all nonproprietary chemicals used in this study can be found in the Supporting Information (Table S1). A specific training set with documented metabolic pathways produced by liver enzymes, predominantly in humans, rats, and dogs, for 332 chemicals were collected from literature (38-47) and Internet sources. These data were used to train the metabolism simulator to reproduce enzymatic reactions in liver.

Modeling Methodology. 1. Conformational Flexibility of Chemicals. On the basis of thermodynamic and kinetic considerations (48), it was shown that at the macromolecular binding sites, conformational states can be populated, which are substantially different from those of isolated, lowest energy or crystal phase states. Conformers of an individual chemical, which have free energy within a range of approximately 20 kcal/mol from the lowest energy structure (usually accepted as threshold), often exhibited significant variation in potentially relevant electronic descriptors. The observation that relatively small energy differences between conformers can result in significant variations in electronic structure highlighted the necessity of including all energetically reasonable conformers when defining COREPAs (49, 50). The recently developed nondeterministic method based on genetic algorithm (GA) for coverage of the conformational space by a limited number of conformers was applied to generate conformers for the QSAR analysis (51). To minimize the effects of the nondeterministic character of GA algorithms on the reproducibility of conformers and their distribution in structural space, a new procedure for saturation of a conformation space was developed (52). The goal of the saturated was to represent the conformational space of chemicals with an optimal number of conformers providing stable conformational distributions across selected molecular descriptors, which were no longer perturbed by the addition of new conformers. Such conformer distributions were expected to eventually provide reliable COREPA patterns. Each of the generated conformers was submitted to a geometry optimization procedure by quantum chemical (QC) methods. Usually, MOPAC 93 (53, 54) was employed by making use of the AM1 Hamiltonian. Next, the conformers were screened to eliminate those whose heat of formation, ∆Hf°, was greater than the ∆Hf° associated with the conformer with an absolute energy minimum by a user-defined threshold (20 kcal/mol). Subsequently, conformational degeneracy, due to molecular symmetry and geometry convergence, was detected within a user-defined torsion angle resolution. 2. Molecular Descriptors. A variety of mechanistically sound molecular descriptors have been used in the OASIS software to assess receptor binding interactions (49, 50). In previous work, we have also tried to classify these descriptors based on their ability to describe toxic end points with different specificities (55). The effects mediated by DNA interaction could be characterized with lower specificity. Subsequently, mutagenicity could be modeled by a set of traditionally used reactivity parameters. These include the energies of frontier orbitalssELUMO and EHOMO (eV)sassessing global electrophilicity and nucleophilicity of molecules, respectively; the difference between these energiessEgap (eV)sas a

Structural Requirements for Mutagenicity

Chem. Res. Toxicol., Vol. 20, No. 4, 2007 665

Table 1. Identified Alerting Groups and Associated Parametric Boundaries Used for Building a Bacterial Mutagenicity Model Based on the NTP Database

measure of molecular reactivity; electronegativity EN (eV); dipole moment µ (D); volume polarizabilitysVolP (a.u./eV)sas an averaged ability of a chemical to change electron density at its atoms during chemical interactions; degree of stretching or compactness [quantified as the sum of interatomic steric distances, Geom. Wiener (GW)]; greatest interatomic distance (Lmax); planarity (normalized sum of torsion angles in a molecule); Van der Waals surface; and solvent accessible surface calculated by making use of the Connoly algorithm (56). The local specificity of molecular structure was also described by making use of atomic charges [qi (a.u.)], frontier atomic charges [fHOMOi and fLUMOi (a.u.)], donor and acceptor superdelocalizabilities [SEi and SNi (a.u./eV)], and charged partial surface areas (CPSAs) as introduced by Stanton and Jurs (57), etc. Among the CPSAs, one could distinguish partial positive surface area (PPSA), partial negative surface area (PNSA), etc. Less specific molecular descriptors have also been used in describing receptormediated effects such as water solubility, logKow, etc, which are

important for nonreactive components of the effect (penetration, diffusion, etc.). 3. Basic Principles of the COREPA Method. The COREPA is a probabilistic classification scheme identifying criteria that will classify an unknown object into predefined classes using a training set of objects from multiple classes. The COREPA formalism uses a Bayesian probabilistic method to identify common structural characteristics among chemicals that elicit similar biological activities. Instead of comparing single conformational representations of the chemical, their probabilistic conformational distributions in the molecular descriptor space are analyzed and compared, thus accounting for molecular flexibility of the chemicals. The COREPA is developed through seeking overlap between conformer distributions of biologically similar chemicals in the specific structural space (Figure 1). Thus, the problem of structure alignment traditionally used for similarity assessments is circumvented in COREPA by overlapping

666 Chem. Res. Toxicol., Vol. 20, No. 4, 2007 and comparing conformational distributions of chemicals across the descriptor axis. Although the latter does not require the alignment of structures, it does enable identification of the COREPA. The COREPA consists of a structural subspace populated mainly by the conformers of biologically similar chemicals. For a mathematical formalism of the current algorithm, the reader is encouraged to consult refs 48, 58, and 59. 4. OASIS Model Applicability Domain. The reliability of the predictions made with the mutagenicity model was evaluated by the recently developed stepwise approach of determining the model applicability domain, which was developed recently (60). Four stages were applied to account for the diversity and complexity of QSAR models, reflecting both their mechanistic rationale (including metabolic activation of chemicals) and transparency. General parametric requirements were imposed in the first stage, specifying the domain for only those chemicals that fall in the range of variation of selected physicochemical parameters of chemicals in the training set. The second stage defined the structural similarity between chemicals that were correctly predicted by the model. The structural neighborhood of atom-centered fragments was used to determine this similarity. The training set chemicals for which the QSAR model provides correct predictions (within user-defined accuracy thresholds) were used for extracting atom-centered fragments. They formed the list of “good fragments”, which could be used to assess external chemicals. If the atom-centered fragments for each atom constituting an external chemical were determined to be elements of this list, then that chemical belonged to the structural domain of the model. The third stage in defining the domain was based on a mechanistic understanding of the modeled phenomenon, i.e., domain of the mechanistic hypothesis. Here, the model domain combined the reliability of specific reactive groups, hypothesized to cause the effect and the domain of explanatory variables, which determined the parametric requirements for functional groups to elicit their reactivity. Finally, the reliability of simulated metabolism (metabolites, pathways, and maps) was taken into account in assessing the reliability of predictions, if metabolic activation of chemicals was a part of the (Q)SAR model. Some of the stages of the proposed approach for defining the model domain were skipped depending on the availability and quality of the experimental data used to derive the model, the specificity of (Q)SARs, and the goals of their ultimate application. 5. TIMES. The metabolic simulator is based on the probabilistic approach. It consists of a list of hierarchically ordered transformations and a substructure-matching engine for their implementation. According to the probabilistic approach that we have developed, the hierarchy of transformations is defined by the probabilities of transformations determined in a way to reproduce a database of documented metabolic transformations or the rate of disappearance data. The transformation probabilities are related to rate constants associated with the feasibility of occurrence of reactions within the time frame of metabolism tests. It is assumed that the transformations are independent and performed sequentially. Each molecular transformation consists of parent submolecular fragments, transformation products, and inhibiting masks. The latter play the role of a reaction inhibitor. If the fragment assigned as a mask is attached to the target subfragment, the execution of the transformation on the parent chemical is prevented. The presence of groups that can promote or inhibit metabolic reactions significantly increases the number of principal transformations. Although the number of organic functional groups known in intermediary and specialized metabolism is less than 60, the reactions that may occur for polyfunctional compounds can be uncountable (42). Positional isomerism also adds to the combinatorial explosion. Currently, 343 principal transformations are used to model metabolism in the liver. The transformations are separated into two major classes: nonratedetermining and rate-determining reactions. The first class includes 41 abiotic and enzyme-controlled reactions, which occur at a very high rate as compared to the duration of the tests. Transformations

SerafimoVa et al. of highly reactive groups and intermediates are included here. Various chemical equilibrium processes like tautomerism are also included in this class of transformations. The second type of reaction includes 302 metabolic transformations such as oxidative, redox, reductive, hydrolytic, and synthetic reactions. The simulator starts by matching the parent molecule with the reaction fragment associated with the transformation having the highest probability of occurrence. When a match is identified, the molecule is metabolized and transformation products are treated as parent molecules for the next degradation step. The procedure is repeated for the newly formed chemicals until the product of probabilities of consecutively performed transformations reaches a user-defined threshold. In the multipathway formulation of the metabolic simulator, it is assumed that along the most probable pathway chemicals could be metabolized by a number of less probable pathways (1, 34, 61). Initially, the parent chemical is submitted to the list of transformations and all transformations meeting the associated substructures are implemented on the parent producing the list of the first level metabolites. Each of the generated metabolites is then submitted to the same list of transformations to produce the second level of metabolites, etc. The mathematical formalism is based on the assumption that transformations occur sequentially; i.e., the most probable transformation is applied first to the parent chemical; then, the remainder of nonmetabolized parent molecules undergoes the second transformation with lower probability, etc. The mathematical expressions defining the amount of metabolite, the probabilities to be obtained and metabolized, etc. are given in our recent publications (1, 34, 61-81). The reaction probabilities of the metabolic simulator were adjusted to reproduce a database with 332 documented maps for rat (mammalian) liver metabolism (1, 39). The degree of reproducibility of the training set with documented maps defined the performance of the simulator. Similarly, assessments evaluating the reliability of generated metabolites and metabolic maps were introduced based on the rate of reproduction of documented metabolites by the individual transformations of the simulator (1, 34).

Results and Discussion Model Based on the NTP Data Set. 1. DNA Reactivity Model. The first part of the modeling exercise encompassed the development of a model for reactivity of chemicals toward DNA. The model was derived differentiating 252 chemicals that were positive in any of the Ames tester strains without S9 metabolic activation vs all of the other 1089 chemicalss including the negatives and the positives with S9. At this stage, no simulation of metabolism was performed. The consideration of structural alerts as necessary attributes for eliciting a single molecular interaction mechanism allowed the training data set to be segmented according to independent interaction mechanisms that eventually cause a mutagenic effect. Subsequently, the COREPA approach was applied to each subset of chemicals associated with specific alerts to identify concomitant molecular parameters defining the degree of activation of the alert by the rest of the molecule. The COREPA model represented the COREPAs associated with mutagenic chemicals having the specified alert in their structure. The structural alerts identified in the NTP training set are listed in Table 1 together with the ranges for each associated molecular 2D and 3D feature and physicochemical parameter activating the alerting groups. The present collection of alerts is much smaller than the list of alerts used in the work of Kazius et al. (10). However, in ref 10, no attempt was made to review or assess the mechanism of mutagenicity of all individual toxicophores. In the present work, all individual toxicophores included were found to have a clear mechanism resulting in mutagenic effects. No toxicophores were added to the model without mechanistic explanation of how

Structural Requirements for Mutagenicity

Chem. Res. Toxicol., Vol. 20, No. 4, 2007 667

Table 2. Interaction Mechanisms of Alerting Groups Used in This Work: Alerting Groups Directly Acting with DNA (a) and Alerts Acting after Bacterial Enzymatic Activation (b)a

668 Chem. Res. Toxicol., Vol. 20, No. 4, 2007

SerafimoVa et al.

Table 2 (Continued)

a

The reference sources supporting the mechanisms are also listed.

they elicited their effect at the molecular level. The interaction mechanisms are summarized in Table 2 with their supporting literature sources. As can be seen, some of the alerts interact directly with DNA, whereas others cause their effect after initial metabolic activation in the bacterial growth media. Some of the alerting groups used in the present work were additionally found to be slightly different in terms of their structures from the alerts in the list of Ashby (6-9) and Kazius (10). For example, an aromatic nitro alert is defined in refs 5 and 10 whereas only a nitro group is defined here in our model (not necessarily an aromatic one). This difference might be explained by the way in which the alerts are defined herein. As shown in Table 1, a nitro group is combined with a global reactivity requirement in terms of Egap. As defined (see subsection Molecular Descriptors), the smaller the value of this parameter is, the more reactive the chemicals are. Two ranges are defined for this molecular descriptor. The lower values of the parametric range of 6.0/8.5 eV actually in fact encompass the majority of aromatic nitro compounds. The range of Egap above 8.5 eV corresponding to a lower reactivity of chemicals could, however, explain the mutagenicity of aliphatic nitro compounds, such as 3-nitropropionic acid. Aromatic nitro compounds could also be identified in this range, but the requirement for high hydrophilicity (requirement logKow < 2.3) corresponds to the presence of a polar group, which could either additionally activate the molecules (and which is not detected by Egap) or facilitate the transport of the chemicals in the aqueous phase. This example demonstrates the more general character of the modulating structural requirements combining structural alerts with specific parameter ranges. The analysis of the modulating 2D and 3D structural factors revealed that two of the alerts, propiolactone and hydroxylamines, were identified as causing effects without the need for modulating factors. In eight cases, the alerts were combined with global physicochemical (2D) parameters such as logKow and mol. weight. In five cases only, QC requirements were imposed as modulating factors. These were the reactivity parameters Egap and VolP; the latter correlated with the logKow for certain classes of chemicals (lower VolP values reflected higher charge localizations and more polarizable, i.e., less lipophilic, molecules). For two alerts (#12 and #13 in Table 1), two parameter COREPA models were derived as modulating

components explaining the multiplicity of parameter ranges corresponding to complex COREPA patterns. Single parameter requirements were derived using one parameter COREPA models for the remaining alerts. The parametric ranges of alerting groups should be further studied to gain deeper mechanistic insights on ligand-DNA interactions at the molecular level. The use of ranges of physicochemical parameters and molecular descriptors as modulating factors of alerting group appears to be an additional generalization of the approach based on collections of discrete substructural fragments (10). This generalization is at the cost of slowing down the screening process, on account of the QC calculations required to derive reactivity parameters. Fortunately, only five out of a total of 15 cases necessitated the use QC parameters (see Table 1). 2. Performance and Validation of the Derived Reactivity Model. The derived model based on alerting groups combined with COREPA type modulating factors correctly predicted the mutagenic potential of 210 out of 252 chemicals eliciting mutagenicity without S9 metabolic activiation. This corresponds to a sensitivity of 83% (results can be found in Supporting Information, Table S2). Better results were obtained for the nonmutagenic chemicals. Of 1089 nonmutagenic chemicals, 931 were classified correctly, i.e., with a specificity of 88% (36 chemicals were unclassified by the model in that they did not meet the probabilistic level for reliable predictions) (results can be found in the Supporting Information, Table S3). The higher proportion of nonmutagens in the training set produced a concordance of 87%, which is closer to the specificity value. As can be seen, the retrospective predictability of our model does not exceed the statistics produced by other models, such as MULTICASE or TOPKAT. In this respect, we should mention that when prediction accuracy of mutagenicity models is in excess of 80%, they already explain the variation of experimental error. Thus, as shown in Table 3, the interlaboratory reproducibility of mutagenicity data across different inventories was found to be about 20 and 35% without and with S9 metabolic activation, respectively. According to the OECD model validation principles, once the experimental error of the end point under investigation is reached by the models, the latter should be evaluated by the causal association between modeling parameters and activity mechanisms. Providing such an association is a principal goal of the present work.

Structural Requirements for Mutagenicity

Chem. Res. Toxicol., Vol. 20, No. 4, 2007 669

Table 3. Interlaboratory Reproducibility of Mutagenicity Data across the Inventories of Gene-Tox Program (GTP), National Cancer Institut (NCI), and NTP without S9 and with S9 databases

intersections

GTP/NCI; TA100 GTP/NTP; TA100 GTP/NCI; TA98 GTP/NTP; TA98

without S9 20 chemicals 39 chemicals 18 chemicals 21 chemicals

85 79 88 92

with S9 15 chemicals 14 chemicals 13 chemicals 23 chemicals

80 21 90 65

GTP/NCI; TA100 with S9 GTP/NTP; TA100 with S9 GTP/NCI; TA98 with S9 GTP/NTP; TA98 with S9

concordance (%)

Simulating Metabolic Activation. The second stage for deriving the TIMES mutagenicity model was to build a metabolic simulator that could predict the metabolism of chemicals that are positive with S9 only. In this respect, the TIMES approach requires the training set chemicals to be separated into two sets: mutagenic as parents and mutagenic after metabolic activation. The generated metabolites were submitted to the DNA reactivity model to identify mutagenic metabolites. To validate the modeling concept for metabolic activation of chemicals based on combined application of the reactivity model and the metabolic simulator, the last two stages of the scheme were applied to 1089 chemicals from the training set that were observed to be nonmutagenic with or without S9.

In the ideal case, all of these chemicals and their metabolites should have been predicted as negatives. The hierarchy of the transformations in the TIMES metabolic simulator was adjusted to reproduce the experimental metabolism data in mammalian (mainly rat) liver documented for 332 chemicals. Different levels of reproducibility were reached depending on the settings of the simulator. The highest sensitivitys79% (in terms of reproducing documented metabolites for all 332 maps in the documented database)swas reached when the thresholds for probability of obtaining metabolites (i.e., product of probabilities to obtain the metabolite across the metabolic pathway) and their hydrophobicity were set to 0.01 and -4.0, respectively. A maximum of five rival transformations at each level was applied. The propagation of the metabolic tree was also interrupted if phase II reactions occurred, such as glucuronidation and sulfate binding. Note that this high specificity was obtained at the cost of generating a large number of undocumented metabolitessabout 17500 (for 1819 documented metabolites representing the 332 training maps). We should emphasize that it is unknown whether these metabolites were all “false positives” or indeed what proportion represented real but undocumented structures. When the simulator was working in a mode of applying the concept of sequential implementation of enzymatic transformations (enzyme chanelling; that is, sequences of enzymatic action are applied to biotransform a given functionality in the molecule

Figure 2. Comparison between predicted metabolism by TIMES and documented metabolites for o-nitrotoluene (CAS 88-72-2) (gray line, generated and documented metabolites; bold line, generated but not documented metabolites; and dashed line, documented but not generated metabolites).

670 Chem. Res. Toxicol., Vol. 20, No. 4, 2007

SerafimoVa et al.

Table 4. Identified Alerting Groups and Associated Parametric Boundaries Used for Building a Bacterial Mutagenicity Model Based on the NTP and BASF Databases

Structural Requirements for Mutagenicity

Figure 3. Changes in the number of chemicals (a) and concordance (b) across alerts.

whereas the transformation pathways associated with different functional groups in the structure are considered to be independent), the sensitivity of the simulator dropped to 76%, with the added benefit of a strong reduction in the number of “false positives”sup to 7000. In view of the relatively low interlaboratory reproducibility of metabolism data, we decided to use the channeling mode of the simulator in the present study. This mode is illustrated in Figure 2 where the predicted metabolism for o-nitrotoluene (CAS 88-72-2) and documented metabolites for this chemical are compared. We would like to emphasize that the simulator used for predicting metabolic activation of chemicals with the S9 mix, in the present work, is in fact predicting a mammalian liver metabolism where the settings of the simulator (like number of rival pathways, level of metabolism trees, solubility, stability, etc.) are adjusted in such a way to avoid missing mutagenic metabolitessusually highly reactive species with extremely short half-lives (and therefore rarely observed experimentally). Another assumption of this work is that metabolism does not introduce new mechanisms of mutagenicity; that is, the collection of alerts and associated interaction mechanisms is derived from the training set of the parent chemicals only. The metabolic activation of parent molecules consists of metabolites possessing alerts, which can cause the effect; however, these alerts are detected from the training set of parent chemicals only. Of the 289 chemicals that were positive in any of the bacterial strains after S9 metabolic activation, TIMES failed to generate active metabolites from 67 structures, which were therefore incorrectly identified as nonmutagenic. The overall sensitivity for chemicals requiring S9 activation was 76% (results can be found in the Supporting Information, Table S4). At the same time, 581 of the 800 chemicals that were negative (with or without S9) were predicted correctly as nonactive. This corresponded to a specificity of about 74% (16 of 800 chemicals

Chem. Res. Toxicol., Vol. 20, No. 4, 2007 671

that did not meet the probabilistic level for reliable predictions). In total, TIMES classified correctly active and nonactive chemicals with an accuracy of about 74%. This result can be considered as adequate because the modeling error corresponds to experimental variation. Moreover, the goodness of the model was derived through use of a heterogeneous training set that integrated three classes of chemicals: mutagenic as parent, mutagenic after metabolic activation, and nonmutagenic. In traditional QSAR studies, chemicals that are mutagenic as parent and those that are metabolically activated are combined in the same training set. These subsets were modeled separately by the TIMES approach. In our previous work (1), the majority of the identified false positives (with respect to TA100) was found to be positive in one of the other strains (TA98, TA97, TA1535, TA1537, or TA1538). An analysis of the chemicals that were negative in TA100 but activated by the TIMES simulator showed that a large number of these “false positives” were positive in other genotoxicity or carcinogenicity assays. The TIMES metabolism simulator combined with a mutagenicity model was thought to actually produce true mutagenic metabolites that were not positive in TA100 but were detected in other short-term assays. It was concluded that the TIMES liver metabolism simulator behaved as a battery of genotoxicity assays providing predictions for mutagenicity and carcinogenicity to a greater extent than through the application of the Ames S. typhimurium test alone. A similar analysis of the “false positives” produced by the current general mutagenicity model will be performed in subsequent work. To demonstrate the importance of mammalian liver metabolism simulator in predicting mutagenicity of chemicals accounting for their metabolic activation, the derived mutagenicity model was applied to 289 chemicals, which elicited an effect after S9 activation. The model was applied without including the metabolic simulator. The sensitivity of the predictions was only 22%, which is significantly lower than the sensitivity of 76% obtained when the metabolic simulator was coupled with the model. Validation of the Model and Expanding Its Applicability Domain. The prospective predictivity of the model was assessed by screening 1626 proprietary chemicals that had been previously tested by BASF. Because of confidentiality reasons, the results of this external validation are not provided here. The concordance of these screening results was 80% when the model applicability domain was not taken in consideration. When the model applicability domain was applied to the validation set, 990 chemicals were outside of the model domain. For the 636 chemicals within the model applicability domain, the concordance was already 86%. Subsequently, the NTP training set was expanded with the BASF data, and the mutagenicity models were redeveloped using 2844 chemicals: 435 (15%) mutagenic without S9 metabolic activation, 297 (14%) mutagenic after metabolic activation, and 2012 (71%) nonmutagenic chemicals with and without S9. As a result of this expansion, three new alerts were identifieds#15, #16, and #17 in Table 4sas compared to Table 1, whereas one of the predefined alerts was merged into the collection of new alerts. This might be explained on account of the expanded structural diversity of the training set, which allowed a justification and generalization of interaction mechanisms. New parametric requirements for some of the alerts were also added after the expansion (e.g., for alert #2 in Table 4 as compared with Table 1). The number of chemicals across

672 Chem. Res. Toxicol., Vol. 20, No. 4, 2007

SerafimoVa et al.

Table 5. Comparison between Alerts Used in This Work and the Collections of Kazius et al. (10), Benigni and Bossa (5), and Ashby (6)

Structural Requirements for Mutagenicity

Chem. Res. Toxicol., Vol. 20, No. 4, 2007 673

Table 5 (Continued)

alerting groups after expanding the training set with the BASF data is shown in Figure 3. Changes in concordance across alerts are presented here. Summarizing the model statistics across all alerts, the sensitivity (82%), specificity (89%), and concordance (88%) of the global model were not found to be significantly altered when compared with the goodness of the estimate with the NTP data-based model; cf. 83, 89, and 88%, respectively. Nonetheless, the upgraded model could be characterized with a much broader scope of its applicability domain given the structural diversity of the newly added chemicals. In Table 5, the final list of alerts used in this work is compared with the collections of Kazius et al. (10), Benigni and Bossa (5), and Ashby (6). As can be seen, the alerts used in this present work have also been identified in other collections with one exceptions 1,4-benzoquinone. This was not found in any of the other lists. Slight differences were found between the definitions of alerts across the lists. The relative simplicity of the alerts used in this present work (lack of detoxifying fragments, etc.) is due to concomitant COREPA models, which describe the activation of the alerts by the rest of the molecules. The comparative analysis requires further discussion; this will be in the scope of another paper. Our analysis showed that a number of alerts used in the other lists were not used in the present work. There are two reasons for this. First, this could be justified by the structural limitation of the current training set of chemicals. Second, in the current formulation of the model, only alerts with a clear mechanistic understanding for the causative effect at the molecular level were applied. These alerts as well as the other new functionalities found in the literature will be further studied as potential toxicophores and will be used in our subsequent modeling exercises. The mechanisms of mutagenicity were also analyzed with a view to modeling them. These models were combined with the general mutagenicity model. The frameshift and base pair substitution mutagenicity associated with TA98 and TA100 strains and are considered as reference for either mechanism.

Two models were derived using training sets containing chemicals with predominant activity toward one of the strains, as defined in the Materials and Methods section. As models of general mutagenicity, the models for mutagenicity mechanisms were based on alerting groups combined with appropriate ranges of physicochemical parameters and 2D and 3D molecular descriptors. By and large, it was found that the direct acting alerts (see Table 2) were associated with frameshift (electrophilic) mechanism, whereas the alerts that acted after bacterial metabolism (Table 2) were associated with base pair substitution (point) mutagenicity condition. Hence, the functionalities of direct acting alerts could be anticipated as the structural components needed for DNA insertion (i.e., as a prerequisite for frameshift mechanism). The second group of functionalities leads to binding of the DNA backbone; the formed complexes may be considered as modified bases, which affect the correct replication of the macromolecules. This finding needs to be further analyzed. Details on both models will be presented in another paper. As mentioned previously, TIMES integrates on the same modeling platform, DNA reactivity (COREPA) model, and a simulator of metabolic activation. When a new chemical is submitted for predicting mutagenicity, first the COREPA model is used to screen the parent structure. Independent of the prediction for the parent chemical, liver metabolism is also simulated and generated metabolites are submitted for screening using the same COREPA model. The user is then informed about the mutagenicity of the parent structure and its metabolites. Mutagenicity could be due to the parent chemical only or the result of metabolic activation (i.e., the parent is inactive but it is transformed to a mutagenic metabolite), or both parent structures and metabolites could be mutagenic. Both mutagenic parents and/or metabolites are submitted to the models for frameshift and base pair substitution mechanisms. The system aims to predict the mechanism of the effect in answering the general question of whether the parent structure or its metabolites are mutagenic.

674 Chem. Res. Toxicol., Vol. 20, No. 4, 2007

Conclusions The TIMES system combines in the same platform, the metabolic activation of chemicals, and their interaction with target macromolecules. In the present work, the approach was used to predict bacterial mutagenicity of chemicals. The NTP training set with 1341 chemicals was used to derive the first formulation of the model. Subsequently, this training set was expanded by 1626 proprietary chemicals provided by BASF AG (part of two data sets are overlapped). The resulting training set consisted of 435 chemicals, which were mutagenic as parents; 397 chemicals mutagenic after S9 metabolic activation; and 2012 nonmutagenic chemicals. These three classes of chemicals were considered as biologically dissimilar in the modeling process; i.e., chemicals being mutagenic as parents as distinguished from chemicals, which were metabolically activated. These subsets were subsequently modeled separately by the TIMES approach unlike in traditional QSAR studies where chemicals that are mutagenic as parents and those that are metabolically activated are combined in a single training set. The reactivity model describing interactions of chemicals with DNA was based on an alerting group approach. Only those toxicophores having clear interpretation for the molecular mechanism causing the ultimate effect were included in the model. In this respect, the alerts were classified as direct acting and metabolically activated. As a comparative exercise, the alerts used in the present work were compared with three alert lists of Ashby, Kazius, and Benigni. The detailed comparative analysis between the alerts in these lists will be presented in a subsequent paper. The modulating effect of the rest of the chemical structure on the alerts is described by 2D and 3D (COREPA) models, accounting for molecular flexibility. Most of the alerts in the present model are combined with such concomitant models. The mechanistic interrelation between alerts and related parametric ranges generalizing the effect of the rest of the molecules on the alert is illustrated; however, detailed analysis of this relation across all alerts will be a subject of another paper. The model was coupled with a metbolic simulator, which was trained to reproduce documented maps for mammalian (mainly rat) liver metabolism for 332 chemicals. The ultimate model was found to have a sensitivity of 82%, a specificity of 89%, and a concordance of 88% for the training set chemicals. The model applicability domain was then introduced to account for similarity (structural, electronic, etc.) between predicting chemicals and training set chemicals for which the model performed correctly. Similar models were derived for frameshift and base pair substitution mechanisms of mutagenicity based on training sets with chemicals eliciting predominant activity against TA100 and TA98, respectively. In the final model construction, the parent chemicals and each of the generated metabolites were submitted to a battery of mutagenicity models to screen for a general effect and mutegenicitity mechanisms. Thus, chemicals were predicted to be mutagenic as parents only, parents and metabolites, and metabolites only. Acknowledgment. Research associated with this paper was funded in part through Research Agreements with Unilever and Contract Agreements with BASF AG. Gratitude is expressed to Drs. Grace Patlewicz, JRC; Carl Westmoreland from Unilever; and Joanna Jaworska from P&G for the discussions that improved the quality of the paper. Supporting Information Available: Table of CAS number, chemical names, SMILES, and mutagenic activity of all nonpro-

SerafimoVa et al. prietary chemicals used as a training set; table of CAS number, chemical names, SMILES, and model predictions for mutagenic chemicals (without S9); table of CAS number, chemical names, SMILES, and model predictions for nonmutagenic chemicals (without S9); and table of CAS number, chemical names, SMILES, and model predictions for mutagenic chemicals after metabolic activation. This material is available free of charge via the Internet at http://pubs.acs.org.

References (1) Mekenyan, O., Dimitrov, S., Serafimova, R., Thompson, E., Kotov, S., Dimitrova, N., and Walker, J. (2004) Identification of the structural requirements for mutagenicity by incorporating molecular flexibility and metabolic activation of chemicals I: TA100 model. Chem. Res. Toxicol. 17 (6), 753-766. (2) OECD (2003) Final Report of the Sixth Meeting of the Task Force on Endocrine Disrupters Testing and Assessment (EDTA 6), ENV/JM/ TG/EDTA/M, (2003) 4, OECD. (3) Schultz, T., Cronin, M., Walker, J., and Aptula, A. (2003) Quantitative structure-activity relationships (QSARs) in toxicology: A historical perspective. J. Mol. Struct. (THEOCHEM) 622, 1-22. (4) Ariens, E. J. (1984) Domestication of chemistry by design of safer chemicals: Structure-activity relationships. Drug Metab. ReV. 15 (3), 425-504. (5) Benigni, R., and Bossa, C. (2006) Structural alerts of mutagens and carcinogens. Curr. Comput.-Aided Drug Des. 2, 1-19. (6) Ashby, J. (1985) Fundamental structural alerts to potencial carcinogenicity and noncarcinogenicity. EnViron. Mutagen. 7, 919-921. (7) Ashby, J., and Tennant, R. (1988) Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. Mutat. Res. 204 (1), 17-115. (8) Tennant, R., and Ashby, J. (1991) Classification according to chemical structure, mutagenicity to Salmonella and level of carcinogenicity of a further 39 chemicals tested for carcinogenicity by the U.S. national toxicology program. Mutat. Res.sReV. Mutat. Res. 257 (3), 209227. (9) Ashby, J., and Tennant, R. (1991) Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutat. Res.sReV. Mutat. Res. 257 (3), 229306. (10) Kazius, J., McGuire, R., and Bursi, R. (2005) Derivation and validation of toxicophores for mutagenicity prediction. J. Med. Chem. 48 (1), 312-320. (11) Sanderson, D., and Earnshaw, C. (1991) Computer prediction of possible toxic action from chemical structure; The DEREK system. Hum. Exp. Toxicol. 10 (4), 261-273. (12) Ridings, J. E., Barratt, M. D., Cary, R., Earnshaw, C. G., Eggington, C. E., Ellis, M. K., and Judson, P. N., et al. (1996) Computer prediction of possible toxic action from chemical structure: An update on the DEREK system. Toxicology 106 (1-3), 267-279. (13) Woo, Y.-T., Lai, D. Y., Argus, M. F., and Arcos, J. C. (1995) Development of structure-activity relationship rules for predicting carcinogenic potential of chemicals. Toxicol. Lett. 79 (1-3), 219228. (14) Singer, B., and Kusmierek, J. (1982) Chemical mutagenesis. Annu. ReV. Biochem. 51, 655-693. (15) Klopman, G., Frierson, M., and Rosenkranz, H. (1990) The structural basis of the mutagenicity of chemicals in Salmonella typhimurium: The Gene-Tox Data Base. Mutat. Res. 228, 1-50. (16) Rozenkranz, H., and Klopman, G. (1990) The structural basis of the mutagenicity of chemicals in Salmonella typhimirium: The National Toxicological Program Data Base. Mutat. Res. 228, 51-80. (17) Deadren, J., Barrat, M., Benigni, R., Bristol, D., and Combes, R., et al. (1997) The development and Validation of expert Systems for Predicting Toxicity. ATLA, 223-252; available at http://altweb.jhsph.edu/publications/ECVAM/ecvam24.htm. (18) Greene, N. (2002) Computer systems for the prediction of toxicity: An update. AdV. Drug DeliVery ReV. 54, 417-431. (19) Pearl, G., Livingston-Carr, S., and Durham, S. (2001) Integration of computational analysis as a sentinel tool in toxicological assessments. Curr. Top. Med. Chem. 1, 247-255. (20) Johnson, D., and Wolfgang, G. (2000) Predicting human safety: Screening and computational approaches. Drug DiscoVery Today 5, 445-454. (21) Benfenati, E., and Gini, G. (1997) Computational predictive programs (expert systems) in toxicology. Toxicology 119, 213-225. (22) Cronin, M., Jaworska, J., Walker, J., Comber, M., and Watts, C., et al. (2003) Use of QSARs in international decision-making frameworks

Structural Requirements for Mutagenicity

(23) (24) (25) (26) (27)

(28) (29) (30)

(31)

(32)

(33) (34)

(35)

(36)

(37)

(38)

(39)

(40) (41) (42) (43) (44)

to predict health effects of chemicals substances. EnViron. Health Perspect. 111, 1391-1401. Benigni, R., and Richard, A. (1998) Quantitative structural-based modeling applied to characterization and prediction of chemical toxicity. Methods 14, 264-276. Benigni, R., and Guliani, A. (1992) QSAR studies in genetic toxicology: Congeneric and non-congeneric chemicals. Arch. Toxicol. Suppl. 15, 228-237. Richard, A. (1994) Application of SAR methods to non-congeneric data bases associated with carcinogenicity and muatgenicity: Issues and approaches. Mutat. Res. 305, 73-97. Enslein, K., Borgstedt, H., Tomb, M., Blake, B., and Hart, H. (1987) A structure-activity prediction model of carcinogenicity based on NCI/ NTP assays and food additives. Toxicol. Ind. Health 3, 267-287. Enslein, K. (1988) An overview of structure-activity relationships as an alternative to testing in animals for carcinogenicity, mutagenicity, dermal and eye irritation, and acute oral toxicity. Toxicity and Industrial Health (4), 479-498. Klopman, G. (1992) MULTICASE. 1. A hierarchical computer automated structure evaluation program. Quant. Struct.-Act. Relat. 11 (2), 176-184. Schultz, T. W., Cronin, M. T. D., and Netzeva, T. I. (2003) The present status of QSAR in toxicology. J. Mol. Struct. (THEOCHEM) 622, 23-38. Purdy, R. (1996) A mechanism-mediated model for carcinogenicity: Model content and prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 25 organic chemicals. EnViron. Health Perspect. 104, 1085-1094. Lewis, D. F. V., Ioannides, C., and Parke, D. V. (1993) Validation of a nuvel molecular orbital approach (COMPACT) for the prospective safety evaluation of chemicals, by comparison with rodent carcinogenicity and Salmonella mutagenicity data evaluated by the US NCI/ NTP. Mutat. Res. 291, 61-77. Mekenyan, O. G., Ivanov, J. M., Karabunarliev, S. H., Bradbary, S. P., Ankley, G. T., and Karcher, W. (1997) COREPA: A new approach for the evaluation of coman reactivity pattern of chemicals. I. Stereoelectronic requiermants for androgen receptor binding. EnViron. Sci. Technol. 31, 3702-3711. Mekenyan, O., Nikolova, N., Schmieder, P., and Veith, G. (2004) COREPA-M: A multi-dimensional formulation of COREPA. QSAR Comb. Sci. 23 (1), 5-18. Mekenyan, O., Dimitrov, S., Pavlov, T., and Veith, G. (2004) A systematic approach to stimulating metabolism in computational toxicology. I. The TIMES heuristic modelling framework. Curr. Pharm. Des. 10 (11), 1273-1293. Jaworska, J., Dimitrov, S., Nikolova, N., and Mekenyan, O. (2001) Probabilistic assessment of biodegradability based on metabolic pathways: CATABOL system. SAR QSAR EnViron. Res. 13, 307323. Dimitrov, S., Breton, R., MacDonald, D., Walker, J., and Mekenyan, O. (2001) Quantitative prediction of biodegradability, metabolite distribution and toxicity of stable metabolites. SAR QSAR EnViron. Res. 13, 445-455. Dimitrov, S., Dimitrova, N., Walker, J., Veith, G., and Mekenyan, O. (2003) Bioconcentration potential predictions based on molecular attributes- an early warning approach for chemicals found in humans, birds, fish and wildlife. QSAR Comb. Sci. 22, 58-68. Kroger-Koepke, M. B., Koepke, S. R., McClusky, G. A., Magee, P. N., and Michejda, C. J. (1981) R-Hydroxylation pathway in the in vitro metabolism of carcinogenic nitrosamines: N-nitrosodimethylamine and N-nitroso-N-methylaniline. Proc. Natl. Acad. Sci. U.S.A. 78 (10), 6489-6493. Lenk, W., and Rosenhauer-Thilmann, R. (1993) Metabolism of 2-acetylaminofluorene. I. Metabolism in vitro of 2-acetylaminofluorene and 2-acetylaminofluoren-9-one by hepatic enzymes. Xenobiotica 23, 241-257. Lynn, R. K., Gould, C. C., Milam, D. F., Eastman, C. L., and Rodgers, R. M. (1983) Metabolism of the human carcinogen, benzidine, in the isolated perfused rat liver. Drug Metab. Dispos. 11, 109-114. McKay, S., Farmer, P. B., Cary, P. D., and Grover, P. L. (1987) The metabolism of 7-ethylbenz[a]anthracene by rat liver microsomal preparations. Drug Metab. Dispos. 15, 682-694. Williams, J. A., and Phillips, D. H. (2000) Mammary expression of xenobiotic metabolizing enzymes and their potential role in breast cancer. Cancer Res. 60, 4667-4677. Lu, O. G., and Guethner, T. M. (1994) Detoxication of the 2′,3′-epoxide metabolites of allylbenzene and estragole. Conjugation with glutathione Drug Metab. Dispos. 22, 731-737. Nakajama, M., Iwata, K., Yamamoto, T., Funae, Y., Yoshida, T., and Kurowa, Y. (1998) Nicotine metabolism in liver microsomes from rats with acute hepatitis or cirrhosis. Drug Metab. Dispos. 26 (1), 3641.

Chem. Res. Toxicol., Vol. 20, No. 4, 2007 675 (45) Low, L. K. (1998) Metabolic changes of drugs and related compounds. In Textbook of Organic Medicinal and Pharmaceutical Chemistry (Delgado, J. N., and Remers, W. A., Eds.) pp 43-109, LippincottRaven Publishers, Philadelphia, New York. (46) Mays, D. C., Hecht, S. G., Unger, S. E., Pacula, C. M., Climie, J. M., Sharp, D. E., and Gerber, N. (1987) Disposition of 8-methoxypsoralen in the rat. Induction of metabolism in vivo and in vitro and identification of urinary metabolites by thermospray mass spectrometry. Drug Metab. Dispos. 15, 318-328. (47) Noguchi, M., Nitoh, S., Mabuchi, M., and Kawai, Y. (1996) Effects of phenobarbital on aniline metabolism in primary liver cell culture of rats with ethionine-induced liver disorder. Exp. Anim. 45 (2), 161170. (48) Mekenyan, O., Ivanov, J., Karabunarliev, S., Bradbury, S., Ankley, G., and Karcher, W. (1997) A computationally-based hazard identification algorithm that incorporates ligand flexibility. 1. Identification of potential androgen receptor ligands. EnViron. Sci. Technol. 31, 3702-3711. (49) Bradbury, S., Kamenska, V., Schmieder, P., Ankley, G., and Mekenyan, O. (2000) A computationally-based identification algorithm for estrogen receptor ligands, part I. Predicting hER binding affinity. Toxicol. Sci. 58, 253-269. (50) Mekenyan, O., Kamenska, V., Schmieder, P., Ankley, G., and Bradbury, S. (2000) A computationally based identification algorithm for estrogen receptor ligands: Part 2. Evaluation of a hERR binding affinity model. Toxicol. Sci. 58, 270-281. (51) Mekenyan, O., Dimitrov, D., Nikolova, N., and Karabunarliev, S. (1999) Conformational coverage by a genetic algorithm. Chem. Inf. Comput. Sci. 39, 997-1016. (52) Todorov, M., Pavlov, T., Serafimova, R., Aladjov, H., and Mekenyan, O. Conformational coverage by genetic algorithm: Saturation of conformational space J. Chem. Inf. Model. In press. (53) Steward, J. (1990) MOPAC: A semiempirical molecular orbital program. J. Comput.-Aided Mol. Des. 4, 1-105. (54) Steward, J. (1993) MOPAC 93, Fujitsu Limited, Chiba-City, Japan, and Stewart Computational Chemistry, Colorado Springs, CO. (55) Mekenyan, O., Nikolova, N., and Schmieder, P. (2003) Dynamic 3D QSAR techniques: Applications in toxicology. J. Mol. Struct. (THEOCHEM) 622, 147-165. (56) Connoly, M. (1983) J. Appl. Crystallogr. 16, 548. (57) Stanton, D. T., and Jurs, P. C. (1990) Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies. Anal. Chem. 62 (21), 23232329. (58) Mekenyan, O., Nikolova, N., Schmieder, P., and Veith, G. (2004) COREPA-M: A multi-dimensional formulation of COREPA. QSAR Comb. Sci. 23 (1), 5-18. (59) Serafimova, R., Todorov, M., Nedelcheva, D., Pavlov, T., Akahori, Y., Nakai, M., and Mekenyan, O. (2006) QSAR and mechanistic interpretation of estrogen receptor binding. SAR QSAR EnViron. Res. In press. (60) Dimitrov, S., Dimitrova, G., Pavlov, T., Dimitrova, N., Patlewicz, G., Niemela, J., and Mekenyan, O. (2005) A stepwise approach for defining the applicability domain of SAR and QSAR models. J. Chem. Inf. Model. 45 (4), 839-849. (61) Dimitrov, S., Pavlov, T., Vasilev, R., and Mekenyan, O. (2005) Simulation of abiotic molecular transformations by CATABOL, poster presented at SETAC Europe 15th Annual Meeting, Lille, France, May 21-26. (62) IARC (1987) EValuation of the Carcinogenic Risk of Chemicals to HumanssViews and Expert Opinions of an IARC Working Group, IARC, Lyon, France (Russian ed.). (63) Hemminki, K. (1981) Reactions of β-propiolactone, β-butyrolactone and γ-butyrolactone with nucleic acids. Chem.-Biol. Interact. 34 (3), 323-331. (64) http://potency.berkeley.edu/chempages/beta-BUTYROLACTONE.html. (65) Sawatari, K., Nakanishi, Y., and Matsushima, T. (2001) Relationships between chemical structures and mutagenicity: A preliminary survey for a database of mutagenicity test results of new work place chemicals. Ind. Health 39 (4), 341-345. (66) http://www.oehha.ca.gov/prop65/pdf/BCMEEf.pdf. (67) http://www.biolab.co.uk/dna.html. (68) Couch, D., Forbes, N., and Hsie, A. (1978) Comparative mutagenicity of alkylsulfate and alkanesulfonate derivatives in Chinese hamster ovary cells. Mutat. Res. 57 (2), 217-224. (69) www.uic.edu/classes/pcol/pcol425/restricted/Tiruppathi/Carcinogenst.PDF. (70) Kovacic, P., and Jacintho, J. (2001) Reproductive toxins: Pervasive theme of oxidative stress and electron transfer. Curr. Med. Chem. 8 (7), 863-892.

676 Chem. Res. Toxicol., Vol. 20, No. 4, 2007 (71) Wiaderkiewicz, R., Walter, Z., and Reimschussel, W. (1986) Sites of methylation of DNA bases by the action of organophosphorus insecticides in vitro. Acta Biochim. Pol. 33 (2), 73-85. (72) Hecht, S., McIntee, E., and Wang, M. (2001) New DNA adducts of crotonaldehyde and acetaldehyde. Toxicology 166 (1-2), 31-36. (73) http://monographs.iarc.fr/htdocs/monographs/vol60-11.htm. (74) http://www.inchem.org/documents/cicads/cicads/cicad_19.htm#Part Number:. (75) Sabboni, G. (1994) Hemoglobin binding of arylamines and nitroarenes: Molecular dosimetry and quantitative structure-activity relationships. EnViron. Health Perspect. 102 (6), 61-67. (76) Bartsch, H. (1981) Metabolic activation of aromatic amines and azo dyes. IARC Sci. Publ. 40, 13-30. (77) http://linkage.garvan.unsw.edu.au//public/gerham/macsos/public/thesis/ NitreniumIons.html. (78) Glatt, H., and Meini, W. (2004) Use of genetically manipulated Salmonella typhimurium strains to evaluate the role of sulfotransferases

SerafimoVa et al. and acetyltransferases in nitrofen mutagenicity. Cancer Causes Control 15 (3), 225-236. (79) Westwood, I., Holton, S., Rodriges-Lima, F., Dupret, J., Bhakta, S., Noble, M., and Sim, E. (2005) Expression, purification, characterization and structure of Pseudomonas aeruginosa arylamine N-acetyltransferase. Biochem. J. 385 (2), 605-612. (80) Guengerich, F., Parikh, A., Johnson, E., Richardson, T., von Wachenfeldt, C., Cosme, J., Jung, Fr., Strassburg, C., Mannis, M., Tukey, R., Prichard, M., Fournel-Gigleux, S., and Burchell, Br. (1997) Heterologous expression of human drug-metabolizing enzymes. Drug Metab. Dispos. 25 (11), 1234-1241. (81) Kiese, M., and Renner, G. (1966) The hydrolysis of acetanilide and some of its derivatives by enzymes in the microsomal and soluble fraction prepared from livers of various species. Naunyn-Schmiedeberg’s Arch. Pathol. Pharmacol. 252 (5), 480-500.

TX6003369