Chem. Res. Toxicol. 2004, 17, 753-766
753
Identification of the Structural Requirements for Mutagenicity by Incorporating Molecular Flexibility and Metabolic Activation of Chemicals I: TA100 Model Ovanes Mekenyan,*,† Sabcho Dimitrov,† Rossitsa Serafimova,† Ed Thompson,‡ Stefan Kotov,† Nadezhda Dimitrova,† and John D. Walker§ Laboratory of Mathematical Chemistry, University “Prof. As. Zlatarov”, 8010 Bourgas, Bulgaria, Human & Environmental Safety, The Procter & Gamble Company, MVL, Cincinnati, Ohio 45239-8707, and TSCA Interagency Testing Committee (ITC), Office of Pollution Prevention and Toxics (7401), U.S. Environmental Protection Agency, Washington, D.C. 20460 Received October 15, 2003
Traditional attempts to model genotoxicity data have been limited to congeneric data sets, primarily because the mechanism of action was ignored, and frequently, the chemicals required metabolism to the active species. In this exercise, the COmmon REactivity PAtterns (COREPA) approach was used to delineate the structural requirements for eliciting mutagenicity in terms of ranges of descriptors associated with three-dimensional molecular structures. The database used to build the mutagenicity model includes 1196 structurally diverse chemicals tested in the Ames assay by the National Toxicology Program. This manuscript describes the development of the TA100 model that predicts the results of mutagenicity testing using only the Ames TA100 strain. The TA100 model was developed using 148 chemicals that tested positive in TA100 strain without rat liver enzymes (S-9) and 188 chemicals that tested positive in TA100 strain with rat liver enzymes. A decision tree was developed by first comparing the reactivity profile of chemicals that were positive in TA100 without rat liver enzymes to the reactivity profile of the remaining 1048 chemicals. This approach correctly identified 82% of the primary acting mutagens and 94% of the nonmutagens in the training set. The 188 chemicals in the training set that are positive only in the presence of metabolic activation would pass through the decision tree as negative. The next step was to identify the chemicals that are positive only in the presence of metabolic activation. To accomplish this, a series of hierarchically ordered metabolic transformations were used to develop an S-9 metabolism simulator that was applied to each of the 1048 chemicals. The potential metabolites were then screened through the decision tree to identify reactive mutagens. This model correctly identified 77% of the metabolically activated chemicals in a training set. A computer system that applies the COREPA models and predicts mutagenicity of chemicals, including their metabolic activation, was developed. Each prediction is accompanied by a probabilistic estimate of the chemical being in the structural domain covered by the training set.
Introduction The Salmonella reverse mutation test (1, 2) has been used for several decades as a useful tool for detection of potentially mutagenic chemicals. Genetically different strains of the Salmonella typhimurium, such as TA100, TA1535, TA1537, TA97, and TA98, are used for testing. They all carry some type of defective (mutant) gene that prevents them from synthesizing the amino acid histidine in minimal bacterial culture medium. In the presence of mutagenic chemicals, the defective gene may be mutated back to the functional state, allowing the bacterium to grow on the minimal medium. The different genetic constructs of the Salmonella strains are specifically designed to detect different types of mutagens. Both TA1535 and TA100 are designed to detect primarily base pair substitution type mutagens, while TA1538 and TA98 * To whom correspondence should be addressed. Tel: ++359 56 880230. Fax: ++359 56 880230. E-mail:
[email protected]. † University “Prof. As. Zlatarov”. ‡ The Procter & Gamble Company. § U. S. Environmental Protection Agency.
detect primarily frameshift mutagens. This affords modelers an opportunity to develop mechanistic-based models, which tend to be more accurate and effective across diverse chemical domains. For instance, one would expect chemicals that are primarily positive in only TA100 to covalently bind to DNA, and so, chemical descriptors of electrophilic properties should characterize these chemicals. Likewise, descriptors, which characterize the more planar frameshift type mutagens, should be effective in identifying chemicals that are positive primarily in TA98. Importantly, these descriptors should differ sufficiently to allow separate models to be built for each strain. Because Salmonella lacks the mixed function oxidase systems mammals use to metabolize xenobiotics, extracts of rat liver enzymes (S-9) are added to “metabollicaly activate” xenobiotic chemicals. This procedure incorporates the important aspect of mammalian metabolic activation into the in vitro test. It also adds an additional level of complexity to developing a model to predict Salmonella positives; that is, the chemical itself will lack the characteristics of both base pair substitutions and
10.1021/tx030049t CCC: $27.50 © 2004 American Chemical Society Published on Web 04/30/2004
754
Chem. Res. Toxicol., Vol. 17, No. 6, 2004
frameshift mutagens and will be predicted to be negative. Only the metabolites of the chemical will have the characteristics to be positive in one of the strains. Thus, a system to identify metabolites needs to be part of an effective prediction system. The idea of developing a mechanistic-based model that sequentially analyzes data sets based just on physiochemical descriptors has limitations. Numerous traditional quantitative structure-activity relationship (QSAR) approaches that utilize quantitative descriptors for passive transport as well as reactivity with the molecular site of action have been developed. Some of these QSAR models showed that the hydrophobicity has an important role in describing bacterial mutagenicity of aromatic nitro compounds (3, 4), aromatic amines (5), and quinolines (6). The quantum chemical descriptors, especially frontier orbital energies, are also used as global parameters in these models. The energy of the lowest unoccupied molecular orbital (ELUMO) has been used to model electrophilic interaction for aromatic amines and nitro compounds, benzidines (3, 7), and halogenated compounds (8). The highest occupied molecular orbital (EHOMO), specific partial charges, and various steric descriptors were also used to explain the mutagenicity of aromatic amines, triazenes, some nitrofurans, and quinolines (9, 10). The traditional QSARs are applicable for predicting the mutagenicity of small series of congener chemicals, but they are impractical for assessing the mutagenicity of structurally diverse data sets of chemicals. The development of expert systems represents an alternative approach for predicting mutgenicity of existing or new compounds acting by different modes of action. Systems such as DEREK (11), HazardExpert (12), and Oncologic (13) are knowledge-based systems, known also as rulebased systems, that combine mechanistic hypothesis, expert judgment, and empirical observation in an effort to rationalize toxicity data. The structural alerts model (14-17) is a rule-based approach based on identification of the electrophilic features that will promote mutagenicity by reacting with the negatively charged groups on DNA. Many of these systems are unable to recognize nongenotoxic mutagens, which may not be associated directly with electrophilic fragments. The correlative structure-activity relationship (SAR) approach is another computerized and statistically driven method that minimizes human expert input and bias in the development of a predictive QSAR (18). Of the correlative systems for predicting mutagenicity and carcinogenicity, TOPKAT (TOxicity Prediction by Komputer-Assisted Technology) (19) and MULTI-CASE (Computer-Automated Structure Evolution) (20) are considered to be the most highly developed at this time. It was found that when applied to retrospective prediction within the training set, TOPKAT and MUTLI-CASE significantly outperformed most of the rule-based models. However, large discrepancies were reported between retrospective (>90%) and prospective (50-70%) predictability (18). Thus, an external validation (21) of TOPKAT for over 100 chemicals not used in the model building met difficulties in providing accurate predictions for about 50% of chemicals. The development of hybrid systems that combine the best features of the rule-based systems with more quantitative aspects of modeling by making use of molecular descriptors such as physicochemical properties is expected to improve prediction (22). Purdy (23) applied
Mekenyan et al.
such an approach to predict carcinogenicity. The proposed model is in the form of a decision tree with individual rules or QSARs for different chemical classes. A similar scheme is used in COMPACT (Computer-Optimized Molecular Parametric Analysis of Chemical Toxicity) (24). The hierarchical approach to decision trees when combined with an alerting group method can distinguish chemicals by their mechanisms of action and select the appropriate SAR or QSAR for toxicity prediction. The advantage of such models is their ability to handle end points conditioned by different mechanisms of action. In this respect, the decision tree approach can implement a number of fundamental truths about the modeled phenomena. For example, there are some instances where a presence of a structural fragment is sufficient to identify carcinogens, but on other occasions, alerting groups need to be fired by other steric and/or electronic requirements to achieve the effect. These requirements can be described by one or more physicochemical parameters or structural descriptors associated with the alerting groups. Finally, the decision tree is clearly mechanistic, easily interpretable, and easily updated when novel data are obtained. A further challenge of evaluating mutagenic and carcinogenic potential of chemicals is the necessity to effectively combine the decision tree modeling approach with simulators of metabolic activation. At this time, there are no QSAR systems incorporating in the same interface a model for predicting a hazard end point with the machinery that simulates tissue or organ specific metabolism of xenobiotic chemicals. The aim of the present study is to develop a sophisticated hybrid expert system for predicting the mutagenicity of chemicals to strain TA100 by using the decision tree modeling approach and accounting for metabolic activation of chemicals. The QSAR model is based on hierarchically ordered rules combining alerting groups, physicochemical properties, and structural requirements to activate the alerting groups to elicit the effect. The system was trained to predict mutagenicity in S. typhimurium TA100 with and without activation by the rat liver enzyme system (S-9) for 1196 structurally diverse chemicals. The pattern recognition techniques for identifying COmmon REactivity PAtterns (COREPA) (2528) of structurally diverse chemicals, as well as expert knowledge, was used to develop the decision tree model. Our previous experience in computer modeling of metabolism (29-31) was instrumental in developing the TIssue MEtabolic Simulator (TIMES). The TIMES system was trained to mimic liver metabolism on the basis of collected documented metabolic pathways for 179 chemicals. The explicit generation of metabolites affords the opportunity of the decision tree model to be applied not only for parent chemicals but also for their reactive metabolites. The external validation of the derived expert system with 36 noncongener chemicals showed that the model provides a reasonable balance between fitting accuracy and predictive power.
Materials and Methods Mutagenicity Database. The Procter & Gamble genotoxicity/carcinogenicity database has results for approximately 17 000 chemicals tested in assays such as the Ames Salmonella mutagenicity assay, in vitro and in vivo cytogenetics, mammalian cell mutation assays, and rodent carcinogenicity. A subset of 1196 chemicals tested in Salmonella by the National Toxicology Program (NTP) was selected from the larger database. Of these chemicals, 148 were positive in strain TA100
Structural Requirements for Mutagenicity without S-9. A semiquantitative measure of potency was assigned to each chemical. The chemicals were classified as (negative), ? and W (questionable or weak positive response, respectively), or 1+, 2+, 3+, and 4+, etc., which corresponds to 2n-th fold increased activity as compared to the controls. The potency values were important to distinguish strong frameshift mutagens from base pair type mutagens, as the strain distinction is not absolute. In this study, questionable mutagens were treated as nonactive, whereas weak positive responses were treated as 1+ activity. Another training set of 188 chemicals was also prepared including chemicals that possess mutagenic activity against strain TA100 only after activation under the S-9 enzymatic system. This training set was used to train the metabolic simulator to predict metabolic activation of chemicals. An additional set of 36 chemicals was included in the present investigation for external validation of the derived mutagenicity model. Chemical Abstracts Service (CAS) number, name, Simplified Molecular Input Line Entry System (SMILES) notations, and mutagenic activity of all chemicals used in this study can be found in Table S1 of the Supporting Information. A specific training set with documented metabolic pathways by liver enzymes, predominantly of humans, rats, and dogs, for 179 chemicals was collected from literature (31-45) and the Internet (Tables S2-S5, Supporting Information). These metabolism data were used to train the simulator of metabolism to reproduce enzymatic reactions in liver. Modeling Methodology. 1. Decision Tree Model. The decision tree model was constructed on the basis of expert knowledge and methodology for identifying chemicals with a COREPA. The detailed mathematical formulation of the COREPA method has been reported previously (25-28). The COREPA method is a pattern recognition technique for identifying common stereoelectronic (reactivity) patterns of structurally diverse chemicals, which exert similar biological effects. The method takes into account conformational flexibility of chemicals. For example, conformers of the chemical HC_BLUE_1 (CAS Registry No. 2784-94-3) from the training set, within the formation enthalpy range of 20 kcal/mol from the lowest energy structure, have a range of 0.19 eV for ELUMO, 0.82 eV for EHOMO, 0.82 eV for EGAP (difference between the energies of the HOMO and the LUMO), and 3.74 D for dipole moment (µ). The observation that relatively small energy differences between conformers of a chemical can yield significant variations in its electronic structure highlights the necessity to associate a finite range of molecular parameter values to each chemical. All energetically reasonable conformers are used in COREPA to establish conformer distributions across the global and local stereoelectronic descriptors associated with the activity under study. Continuous approximations of the discrete conformer distributions are analyzed for each descriptor. These distributions are normalized by dividing the total distribution area by the number of conformers (i.e., normalizing the distribution area to unity for each chemical), thus providing a probabilistic characterization of the distributions. The COREPA algorithm consisted of three steps. First, two subsets of chemicals were selected as training sets (step 1). The first subset consisted of chemicals having activity (here, in terms of mutagenicity) above a user-defined high activity threshold. The second subset included chemicals having activity below a predetermined nonactive threshold. Next, a set of parameters, associated with biological activity, was established (step 2) by evaluating the degree of overlap (in %) between the distributions associated with those thresholds. The stereoelectronic parameters that provided the maximal measure of similarity among chemicals in the training subsets of active and inactive chemicals and had the least overlap between overall patterns associated with those subsets (i.e., most distinct patterns) are assumed to be related to biological activity and are used in the subsequent step of the algorithm. Finally, common reactivity patterns for biologically similar molecules are obtained as products or sums of the probabilistic distributions for specific stereoelectronic parameters associated with chemicals in training sets of the active and inactive chemicals
Chem. Res. Toxicol., Vol. 17, No. 6, 2004 755 (step 3). Ultimately, a common reactivity pattern is a collection of the specified ranges of each molecular descriptor determined to be associated with the biological activity of concern. The genetic algorithm for coverage of the conformational space of molecules by a limited number of conformers (46) was employed in the present study. The approach handles the following stereochemical and conformational degrees of freedom: rotation around acyclic single and double bonds, inversion of stereocenters, flip of free corners in saturated rings, and reflection of pyramids on the junction of two or three saturated rings. Conformer geometry has been optimized by Molecular Orbital PACkage (MOPAC) 93 (47, 48) using the so-called Austin-Model (AM1) Hamiltonian. For a given chemical, only conformers with a heat of formation ∆Hf° g 20 kcal/mol of the ∆Hf° associated with the conformer with the absolute energy minimum were used. The conformers within this range of ∆Hf° were assumed to be energetically reasonable from a thermodynamic and kinetic perspective (25, 49, 50). Stereoelectronic parameters were calculated with MOPAC 93 augmented by a computing module that provides additional reactivity descriptors, using the AM1 all valence electron, semiempirical Hamiltonian. The log 1-octanol/water partition coefficient log Kow was calculated by the atom/fragment contribution method of Meylan and Howard (54). 2. TIMES. A single pathway probabilistic scheme, used in CATABOL (29), was expended to model liver multipathway metabolism. The multipathway scheme was conditioned by the fact that the chemicals could be metabolically activated not only across the most probable pathway but also across the less significant pathways. The enzyme-mediated reactions are imitated by principal molecular transformations. Each molecular transformation consists of a parent submolecular fragment, transformation products, and masks. The latter play the role of a reaction inhibitor. If the fragment assigned as a mask is attached to the target subfragment, the execution of the transformation on the parent chemical is prevented. The presence of groups that can promote or inhibit catabolic reactions significantly increases the number of principal transformations. Although the number of organic functional groups known in intermediary and specialized metabolism is less than 60, the reactions that may occur for polyfunctional compounds can be uncountable (55). Positional isomerism also adds to the combinatorial explosion. Currently, 341 principal transformations are used to model metabolism in the liver. A probability of occurrence was ascribed to each principal transformation, which determines its hierarchy in the transformation list. The transformations were separated into two major classes: nonrate-determining and rate-determining reactions. The first class includes 40 abiotic and enzymecontrolled reactions, which occur at a very high rate as compared to the duration of the tests. Transformations of highly reactive groups and intermediates such as acyl halide dehalogenation, geminal thiol halide dehalogenation, geminal halohydrine dehalogenation, N-nitrosoamine oxidative N-dealkylation, imide hydrolysis, sulfinic acid S-oxidation, etc., are included here. Various chemical equilibrium processes such as tautomerism are also included in this class of transformations. The second type of reaction includes 301 metabolic transformations such as oxidative, redox, reductive, hydrolytic, and synthetic reactions. Table 1 illustrates some of the nonrate-determining and rate-determining reactions and their inhibiting masks. Initially, the parent chemical is submitted to the list of hierarchically ordered transformations. All transformations meeting the associated substructures are implemented on the parent producing the list of first level metabolites. Each of the obtained metabolites is further submitted to the same list of hierarchically ordered transformations, thus producing the second level of metabolites. Figure 1 illustrates one level of metabolic pathway generation. The mathematical formulation is based on the assumption that transformations are performed sequentially; that is, the most probable transformation with probability Pn is applied first; then, the rest of the nonmetabo-
756
Chem. Res. Toxicol., Vol. 17, No. 6, 2004
Mekenyan et al.
Table 1. Different Types of Nonrate-Determining and Rate-Determining Principal Transformations and Their Inhibiting Masks
Structural Requirements for Mutagenicity
Chem. Res. Toxicol., Vol. 17, No. 6, 2004 757
Table 1. (Continued)
where [mo] and [m] are the initial and nonmetabolized number of moles of parent chemical, respectively. This equation is invariant to the order of the transformations; for example, if transformation to c is carried out first, then transformation to n, etc. In other words, the last two equations are symmetrical with respect to the transformation products. Subsequently, the amount of metabolized molecules can be expressed by the following term:
[mo] - [m] ) [mo](Pn + Pk + Pc - ... - PnPk - PnPc PkPc - ... + PnPkPc + ...) (2) If one assumes that this amount is transformed to products n, k, c, etc., proportionally to the probabilities of the corresponding reactions, then the quantity of products n, k, c, etc. can be described by the following equations:
[n] ) [mo](Pn + Pk + Pc - ... - PnPk - PnPc - PkPc - ... + PnPkPc + ...)(Pn/(Pn + Pk + Pc)) (3)
Figure 1. Illustration of the m-th level of metabolic pathway generation. lized parent molecules [mo](1 - Pn) undergo the second transformation with lower probability; Pk < Pn, and then, the third transformation is applied on the remaining molecules (with lower probability; Pc < Pk < Pn), etc. The amount of the nonmetabolized molecules obtained as a result of the application of all possible transformations is obtained by the following probability expression:
[m] ) [mo]{1 - Pn - Pk (1 - Pn) - Pc [1 - Pn - Pk (1 Pn)] - ...} ) [mo](1 - Pn - Pk - Pc - ... + PnPk + PnPc + PkPc + ... - PnPkPc - ...) (1)
[c] ) [mo](Pn + Pk + Pc - ... - PnPk - PnPc - PkPc - ... + PnPkPc + ...)(Pc/(Pn + Pk + Pc)) (4) [k] ) [mo](Pn + Pk + Pc - ... - PnPk - PnPc - PkPc - ... + PnPkPc + ...)(Pk/(Pn + Pk + Pc)) (5) The mass balance over all transformation products yields the equality:
[mo] ) [n] + [c] + [k] + ... + [m]
(6)
The probability of obtaining n from the preceding metabolite m according to the generalized scheme of metabolic transformation (Figure 1) can be expressed as a product of the probabilities to obtain m and the probability of m to metabolize to n:
758
Chem. Res. Toxicol., Vol. 17, No. 6, 2004
Mekenyan et al.
P(fn) ) N
P(fm) P(mfn) ) P(fm) [1 -
∏ (1 - P )] i
i)1
Pn
(7)
N
∑P
i
i)1
where P(fn) and P(fm) are the probabilities of obtainment of n and m from parent chemical, respectively, and N
P(mfn) ) [1 -
∏
(1 - Pi)]
Pn i
i)1
is the probability of m to be metabolized to n. The established hierarchy between transformations based on their probabilities and hydrophobicity was used to control the propagation of the generated metabolic pathways. The process of metabolic tree generation continues until user defined thresholds for the probability of obtaining metabolites (eq 7) and their hydrophobicity (log Kow) are reached. In this study, the thresholds for probability of obtaining metabolites and their hydrophobicity were set at 0.01 and -4.0, respectively. A maximum of five rival transformations at each level was applied. The propagation of the metabolic tree was also interrupted if phase II reactions occurred, such as glucuronidation and sulfate binding. Model Adequacy. 1. Quality of Decision Tree Model. The quality of the mutagenicity model constructed by combining the decision tree model and simulator of S-9 metabolism was assessed on the basis of sensitivity, specificity, and concordance (56). The probability that the prediction is mutagenic, given that the chemical is truly mutagenic, commonly referred to as sensitivity, is determined by the equation:
Pr(P+|T+) )
NP+ NT+
(9)
The probability that the prediction is not mutagenic, given that the chemical is truly not mutagenic, usually known as specificity, is obtain by the following formula:
Pr(P-|T-) )
NPNT-
(10)
Concordance was calculated by dividing the sum of correctly classified active and inactive chemicals to the total number of chemicals in the training or external validation set:
Pr(P+/-|T+/-) )
NP+ + NPNT+ + NT-
SM ) wM + (1 - w)Q
(8)
N
∑P
i)1
In some articles, metabolic routes were not proposed and only observed metabolites in urine or bile were listed without specifying whether they were conjugated or deconjugated. This uncertainty of the published routes of liver metabolism complicated the quantitative evaluation of the similarity between documented and generated metabolic pathways. To reflect the variability of published information and the uniqueness of the pathways generated by our software, the following complex similarity measure (SM) was defined
(11)
In eqs 9-11, P and T are predicted and experimentally observed mutagenicity, “+” holds for mutagenic, “-” holds for nonmutagenicity, NP+ and NP- are the number of correctly predicted mutagenic and nonmutagenic chemicals, respectively, and NT+ and NT- are the number of experimentally observed mutagenic and nonmutagenic chemicals, respectively. These goodness criteria were calculated both for model fitting and for external validation. 2. Similarity between Documented and Generated Metabolic Pathways. The information about the liver metabolism in mammals obtained from the literature was converted into a format suitable for computer treatment. The collected documented metabolic pathways reflect to some extent the specificity of study for which they were developed; that is, only part of the maps are specifically derived for investigating metabolic activation of chemicals with respect to their mutagenicity. The presence of potential intermediates that were not proven experimentally significant complicated the comparison between observed and TIMES-generated metabolic pathways.
(12)
where w is a weight used to balance between the desire to reproduce observed metabolites and the desire to reduce the amount of additionally generated metabolites (i.e., to confine the propagation of the map). Parameter M evaluates the portion of documented pathways found in the generated one:
M) number of observed nodes with the same compartments total number of observed nodes (13) In the pathways collected from the literature, intermediates may not be presented up to four consecutive nodes. Hence, such intermediates can be skipped in the generated maps before assessing similarity between the compared pathways. The second parameter Q evaluates the overpropagation of the generated tree:
Q)
amount of all counterpart node quantities (14) amount of all generated node quantities
Skipped intermediates are not included in the calculation of Q. The SM was used as an objective function during the development of the principal transformations and establishment of their hierarchy. The latter significantly depends on the accepted balance between two opposite tendencies estimated by M and Q: reproduction of observed metabolites and generation of additional products (i.e., overpropagation) in the product map, respectively. Because most mutagenic metabolites are highly reactive species with extremely short half-lives, they are rarely observed experimentally and usually are not presented in the documented metabolic pathways. Hence, the desire to restrict the map propagation within the frame of the observed metabolites will deprive the simulator of the possibility to generate reactive intermediates. To avoid this problem, in the present study, the weighing parameter w (in eq 12) was set to 1. Evaluation of the Model Interpolation Domain. In the present work, all predictions are accompanied by an estimate of their belonging to the interpolation domain of the derived mutagenicity model, in terms of evaluating the training set coverage. A probabilistic approach for that purpose was developed by assessing population density of molecular descriptor space in the training set chemicals. The descriptor space is defined by structural parameters used to derive the model for mutagenicity. For each continuous parameter, the probability density function (pdf) is calculated by means of Gaussian kernel functions. The function pdfi provides the probability of a chemical structure i to have value x for the parameter under investigation. The maximal value of this probability (pdmax) is used to normalize the values of the pdf. The probability densities (pdi) are estimated for all parameter values associated with the conformer of a chemical. Subsequently, the degree of belonging of the conformer to the training set domain can be estimated by eq 15:
Dmax ) max i
{ } pdi pdmax
(15)
The conformer with maximum coverage determines if a chemical belongs to the structural domain of the training set.
Structural Requirements for Mutagenicity
Chem. Res. Toxicol., Vol. 17, No. 6, 2004 759
Figure 2. Probability density for the model interpolation domain of the two-dimensional descriptor space. To calculate the multivariate probability density, the training set chemicals are represented in a multidimensional descriptor space defined by model parameters. The multivariate probability densities are estimated in this descriptor space by multiplying single parameter density functions. The question of interest is which is the multidimensional domain containing p percents of all data set points. This is known as p highest density domain. The probability density over a two-dimensional descriptor space is exemplified in Figure 2. As seen, more dense regions are shown with darker colors. In the case of chemical structures (points in the descriptor space) with two or more parameter values, the maximum p is assessed using the above-described algorithm. We define the coverage of the training set as regions with nonzero probabilities (i.e., nonzero values of density function). Multidimensional domains could be defined with a density level higher than certain percent p of the density function (p highest density domains). The above probabilistic scheme can be applicable to continuous parameters only. However, the modeling results in the present work are obtained as decision trees where discrete structural parameters, such as alerting groups, are combined with other structural requirements described as ranges of variation of continuous molecular parameters. Hence, for each alerting group, we are building an applicability domain associated with accompanying continuous parameters. Thus, eventually, we generate a set of such domains related to each of the identified structural alerts for mutagenicity. If a chemical is predicted as mutagenic by more than one combination of structural requirements, then the prediction with the highest domain probability is assigned. We should mention that chemicals could belong to domains of active mutagens but still could be predicted as nonmutagenic. Similarly, a chemical could be predicted as active but it could be out of the training set coverage. The approach to evaluating training set domain imposed determination of structural requirements for nonmutagenicity because if a chemical does not visit any of the domains associated with mutagenicity it is classified as nonmutagenic but still it should be determined if it belongs to the structural domain covered by nonmutagenic chemicals. In the present study, we have determined all discrete requirements for nonactivity in terms of the presence of structural fragments identified in nonmutagenic chemicals. If a chemical can be partitioned into fragments, all of which have been encountered in the list of “inactive” fragments, then it is classified to belong (100%) to the domain of nonmutagens. The above probabilistic scheme for training set coverage could be applied to any QSAR
Figure 3. Root decision tree (COREPA model) for mutagenicity. model if the training set chemicals (used for deriving the model) and selected modeling parameters are known.
Results and Discussion Decision Tree Model. The modeling scheme consisted of three stages. In the first stage, the COREPA method was used to derive a QSAR model contrasting 148 chemicals that are positive in TA100 without S-9 vs all of the other 1048 chemicalssincluding the negatives and the positives with S-9. At this stage, no simulation of metabolism was applied. Second, TIMES was used to predict the metabolism of chemicals that are positive in TA100 with S-9 only. Finally, the generated metabolites are submitted to the COREPA decision tree to identify mutagenic metabolites. To validate the modeling concept for metabolic activation of chemicals based on combined application of the COREPA model and TIMES, the last two stages of the scheme were applied to 820 chemicals from the training set that were observed to be nonmutagenic with or without S-9. It was expected that all of these chemicals and their metabolites would be predicted as negatives. The derived COREPA model for mutagenicity according to the first stage of the modeling scheme is given in Figure 3. First, the training set chemicals were partitioned by their global reactivity, as defined by EGAP, into three groups (the smaller the EGAP, the higher is the reactivity of chemicals). Subsequently, three branches can be recognized in the COREPA decision tree, determined by the global reactivity parameter, EGAP: (i) from 6.7 to 8.50 eV (highest reactivity range); (ii) from 8.50 to 9.54 eV; and (iii) from 9.54 to 14.00 eV (lowest reactivity range). Next, structural alerts were viewed as necessary
760
Chem. Res. Toxicol., Vol. 17, No. 6, 2004
Table 2. Alerting Groups Identified in the Training Set for Different Reactivity Windows along with Local and Global Molecular Parameters Describing Structural Features Activating These Alerts
attributes reflecting a single molecular interaction mechanism that eventually causes a mutagenic effect. The structural alerts provided a rational basis for segmenting the training data set according to independent interaction mechanisms and applying COREPA to identify common reactivity patterns associated with these mechanisms. The structural alerts identified in the training set are listed in Table 2 for each of the reactivity ranges. In a subsequent analysis, it was found that all of them belong to Ashby’s structural alerting features (14-17) except the second reactive group listed in Table 2, which could be considered as newly defined in the present work. The local and global molecular parameters describing structural features activating alerting groups are also listed in Table 2. As seen, stereoelectronic (such as charges, Q) and physicochemical parameters (such as hydrophobicity, logKow, and van der Waals surface, VdWsurface) are included here. The reactivity parameters, which appear in the logical boxes of the first two branches (reactivity windows), could be associated with local electrophilicity of chemicals, i.e., electrophilicity of alerting groups. Positive charge requirements for nitrogen atoms in nitroso groups, QNdO, or cyclic C-atoms in carbonyl groups, QC(cycle)dO are in the first (most reactive) branch, presented in Figure 4a. Positive charge requirements can also be recognized for a C-atom connected to a N{H} or a C{H3} group and an O-atom by a double bond, QC(dO)X where X is N{H} or C{H3}. Steric and electronic parameters appeared in the second branch of the decision tree (8.50 < EGAP < 9.54 eV), presented in Figure 4b. Requirements for van der Waals surface (VdWsurface) and hydrophobicity (log Kow) were introduced in the logical boxes at the end of both decision tree branches. The elelctrophilic interaction mechanisms encoded within the first two branches of the decision tree could
Mekenyan et al.
be associated with E2 (bimolecular electrophilic reactions): SE2 (substituted bimolecular electrophilicity mechanism of mutagenicity) and AdE2 (additive bimolecular electrophilicity mechanism of mutagenicity). Chemicals acting by these mechanisms could be called “primary” electrophiles because their mutagenicity is conditioned by the interaction of original structural alerts (primary structure) and there is no need for their transformation into other reactive groups causing the effect. Two-dimensional structural requirements associated with E1 (monomolecular electrophilic reactions): SE1 (substituted monomolecular electrophilicity mechanism of mutagenicity) and AdE1 (additive monomolecular electrophilicity mechanism of mutagenicity) appear in the third branch of the decision tree, presented in Figure 4c. Here, one can see reactive fragments, which are precursors of carbenium or nitrenium ions causing the ultimate effect by electrophilic mechanisms. The rate-controlling step of these reactions is the formation of these highly reactive intermediates, which in turn determines monomolecular electrophilicity mechanism of mutagenicity (i.e., SE1 and AdE1). Because of that, chemicals acting by this interaction mechanism could be called “secondary” electrophiles. The fact that chemicals consisting of such reactive fragments are not identified as mutagenic across the least reactive branch of the decision tree (highest EGAP) could be associated with the insufficiency of EGAP to describe inherent electrophilicity of these fragments proceeding only from the parent chemicals. The COREPA mutagenicity model was found to be affected by conformer flexibility of chemicals. For example, only one of two conformers of quercetin (CAS Registry No. 117-39-5; Figure 5) was identified to be active because it meets the charge requirement of C{0.30 < Q < 0.32} ) O imposed on the carbonyl carbon (see also Table 2). The other conformer was predicted as inactive because the charge requirement was not met for this alerting group. The modeling results showed, however, that the electrophilic interactions causing mutagenicity are less specific than receptor-mediated interactions (28, 50-52). Performance and Validation of the Derived Models. The derived decision tree correctly predicts mutagenic potential of 121 of 148 chemicals mutagenic without S-9 enzyme system, which corresponds to sensitivity Pr(P+|T+) ) 82% (results can be found in Table S2 of the Supporting Information S1). Better results were obtained for nonmutagenic chemicals. Of the 1048 nonmutagenic chemicals, 983 were correctly classified, i.e., specificity Pr(P-|T-) ) 94% (results can be found in Table S3 of the Supporting Information S1). The higher proportion of nonmutagens in the training set produced a concordance, which is closer to the specificity, Pr(P(|T() ) 92%. Considerably lower sensitivity as compared to the specificity can be explained by examining the reliability of experimental data associated with the categories of weak positive response (six chemicals) and low activity, assigned by 1+ (37 chemicals). This is confirmed by the improved sensitivity, Pr(P+|T+) ) 86%, obtained for chemicals having an activity of 2+ or greater. As seen, the retrospective predictability of the COREPA decision tree does not improve the statistics of the models produced by MULTI-CASE and TOPKAT. In this respect, we should mention, however, that when prediction accuracy of mutagenicity models exceeds 80%, the
Structural Requirements for Mutagenicity
Chem. Res. Toxicol., Vol. 17, No. 6, 2004 761
Figure 4. Different reactivity branches of the decision tree. (a) First reactivity branch, EGAP∈ [6.7, 8.50] eV. (b) Second reactivity branch, EGAP∈ (8.50, 9.54] eV. (c) Third reactivity branch, EGAP∈ (9.54, 14.00] eV.
models already explain the variation of experimental error, which is about 25% for Ames mutagenicity data. Moreover, the statistics are not enough for inferring model causality. According to the International Council of Chemical Associations (ICCA)/European Chemical Industry Council (CEFIC) Workshop (53) principles for model acceptance, once the experimental error of the end point under investigation is reached by the models, the latter should be evaluated by the causal association between modeling parameters and activity mechanisms. Such an association is provided by the COREPA decision tree.
Figure 5. Two-dimensional structure of quercetin (CAS 117395). One of its conformers is active, only meeting the charge requirement of C{0.30 < Q < 0.32} ) O imposed on the carbonyl carbon (Table 2).
762
Chem. Res. Toxicol., Vol. 17, No. 6, 2004
Mekenyan et al.
Table 3. External Validation of Decision Tree Model mutagenic potency no.
CAS
name
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
91598 92933 99354 99650 140954 607578 613478 618871 952216 1694208 2058028 2143886 3468631 3658773 6610088 6810260 10125765 24325700 69314472 70634285 78491028 92814283 108100283 127502685 188107702 275795125 275795169 279242090 279242114 279242169 279242170 345667018 345667029 345667609 389104567 389104578
2-naphthylamine 4-nitro-biphenyl benzene, 1,3,5-trinitrobenzene, m-dinitro1,3-bis-hydroxymethyl-urea 2-nitrofluorene N-hydroxy-2-naphthylamine aniline, 3,5-dinitro1,1′-biphenyl, 3-methyl-4′-nitrostilbene, 4-nitro-, (E)1-amino-2-carboxylate-4-nitroanthraquinone 4-methyl-4′-nitrobiphenyl pigment orange 5 3(2H)-furanone, 2,5-dimethyl-4-hydroxynaphthalene, 2-nitrosoN-hydroxy-4-aminobiphenyl 4-nitrosobiphenyl trans-4-methyl-4′-nitrostilbene biphenyl, 3-methyl-4-nitro1,1′-biphenyl, 3-(1,1-dimethylethyl)-4-nitrodiazolidinylurea 4-ethyl-3-nitrobiphenyl 9H-fluorene, 2-methyl-7-nitro2-isopropyl-4-phenylnitrobenzene 9-methyl-2-nitrocarbazole 1,1′-biphenyl, 3,5-diethyl-4-nitro1,1′-biphenyl, 3-(1,1-dimethylethyl)-4-nitroso1,1′-biphenyl, 4-ethyl-4′-nitro1,1′-biphenyl, 4-(1,1-dimethylethyl)-4′-nitropyridine, 2-[4-(1,1-dimethylethyl)phenyl]-5-nitro1,1′-biphenyl, 4-(1,1-dimethylethyl)-4′-nitroso[1,1′-biphenyl]-4-amine, 3-(1,1-dimethylethyl)-N-hydroxy[1,1′-biphenyl]-4-amine, 4′-(1,1-dimethylethyl)-N-hydroxybenzene, 1-ethyl-4-[(1E)-2-(4-nitrophenyl)ethenyl]1-ethyl-2-aminofluorene 1-isopropyl-2-aminofluorene
As an expert system (rule-based model), the derived decision tree is not a subject of the standard statistical tests for model stability, such as cross-validation techniques. In this respect, the prospective predictability of the developed decision tree was assessed by predicting mutagenicity of 36 mutagens against strain TA100 not used in the model-building process. These chemicals were selected from the P&G genotoxicity database independently from model builders thus providing a “blind evaluation”. The results of external validation are presented in Table 3. As can be seen, 21 of the 36 active chemicals were correctly classified, which corresponds to a sensitivity of about 60%. Prospective predictability of the model, however, should be evaluated within the model’s structural space, because QSAR predictions should be confined within the model applicability domain, only as stated by the ICCA/CEFIC workshop principles (53). Consequently, it is important to define the training set coverage in order to identify which predictions are out of model interpolation domain. Low sensitivity of our model was found to be due predominantly to the predictions in the model extrapolation area. Five of 15 noncorrectly classified mutagens were out of the model interpolation domain having a coverage of 0%. Without these chemicals, the predictive power of the decision tree is Pr(P(|T() ) 68%. Although this specificity is below the model fitting quality, the decision tree is able to identify correctly about 70% of mutagens within its domain of applicability. We should emphasize that this predictability is close to the experimental error of the modeled end point.
observed
predicted
+ 2+/3+ +++ +++ + +++ +++ +++ +++ ++ + + + ++ ++ ++ 1+/2+ + + + + +++ +++ +++ + +++ +++ + + + ++ +++ ++ + +++ ++
+ + + + + + + + + + + + + + + + + + + + + -
coverage (%) 100 80 10 30 0 90 100 90 90 50 70 10 100 100 100 100 100 80 80 0 0 20 60 0 60 0 40 80 70 90 80 100 100 70 100 100
Simulation of Metabolic Activation. According to the modeling scheme adopted in this work, TIMES was used to predict the metabolism of chemicals that are positive in TA100 with S-9, and subsequently, the generated metabolites were submitted to the COREPA decision tree to identify the mutagenic metabolites. To validate the modeling scheme, the same procedure was applied to 820 negatives with or without S-9. In advance, TIMES was adjusted to reproduce 85% (SM ) 85%) of the empirical metabolism data documented for mammalian liver for 179 chemicals. Figure 6 compares the predicted metabolism for nicotine and documented metabolites for this chemical (shown circles with continuous lines) (39). The total number of TIMESgenerated nicotine metabolites was 26, but not all of them are shown on the figure. In the same figure, the experimentally observed blood and urinary metabolites in smokers are presented as reported by Byrd et al. and Tricker (57, 58) (circles with dotted lines). This information was not included in the training set. As can be seen, TIMES correctly generated documented metabolites with only one missing metabolite in the computer-simulated maps5′-hydroxy-cotinine. Moreover, the predicted quantities in molar percentage of the total nicotine uptake were also close to the reported urinary excretion (58) with the exception of 3′-hydroxy-cotinine. We would like to emphasize that the simulator used for predicting metabolic activation of chemicals under S-9, in the present work, is in fact predicting a mammalian liver metabolism where the settings of the simulator (like the number of rival pathways, level of
Structural Requirements for Mutagenicity
Chem. Res. Toxicol., Vol. 17, No. 6, 2004 763
Figure 6. Predicted metabolism by TIMES and documented metabolites for nicotine. Solid circles, metabolites documented by Nakajima et al. (34); dotted circles, metabolites reported by Byrd et al. and Tricker (49, 50); solid arrow, simulated by TIMES metabolic transformation; and dotted arrow, documented metabolic transformation.
metabolism trees, solubility, stability, etc.) are adjusted in a way to avoid missing mutagenic metabolitessusually being highly reactive species with extremely short halflives (otherwise rarely observed experimentally). The analysis of literature and the set of principal transformations showed that the precursors of mutagenic metabolites are either aromatic chemicals or heteroatomcontaining systems. Chemicals lacking these attributes will not be metabolized to mutagenic species. These structural requirements were used to develop a simple prescreen identifying chemicals that could be enzymatically transformed to mutagenic intermediates (see Figure 7). Chemicals passing through this prescreen rule are submitted to TIMES to predict their metabolism by S-9. Of the 188 chemicals positive in TA100 with S-9, 144 were identified as potential precursors of mutagenic metabolites obtained as a result of liver metabolisms simulated by TIMES. All generated metabolites with a lipophilicity log Kow > -4 (about 98%) were screened for mutagenicity by the COREPA decision tree model. Positive activity was ascribed to those parent chemicals that have at least one mutagenic metabolite. The same procedure was applied to 820 nonmutagenic chemicals against strain TA100 after metabolic activation by the S-9 enzyme system.
Of the 188 chemicals positive in TA100 with S-9, 30 chemicals were excluded from metabolic activation by the prescreen rule; TIMES did not generate active metabolites for the other 14. Thus, these 44 chemicals were incorrectly identified as nonmutagenic. The overall sensitivity for chemicals requiring S-9 was 77%. Alternatively, 223 of the 820 chemicals that were negative in TA100 (with or without S-9), produced at least one metabolite in the TIMES simulator that was predicted to be positive in TA100. That corresponds to a specificity of about 74%. In total, TIMES correctly classified active and nonactive chemicals about 75% of the time. This result could be considered as adequate from a statistical point of view because of two reasons. First, the modeling error corresponds to the experimental one. Second, it evaluates the goodness of a model derived by making use of a heterogeneous training set that integrates three classes of chemicals: mutagenic as parent, mutagenic after metabolic activation, and nonmutagenic. Thus, the model could be considered as a first attempt to evaluate mutagenicity, which could result from metabolic activation of chemicals. An analysis of 223 false positives could be used as an additional validation test for the present mutagenicity model. Twenty-six of these 223 chemicals were negative
764
Chem. Res. Toxicol., Vol. 17, No. 6, 2004
Figure 7. Prescreen test for identifying potential precursors of mutagenic intermediates.
in TA100 but were found to be positive in one of the other strains TA98, TA97, TA1535, TA1537, or TA1538. An analysis of the remaining 197 chemicals that are negative in TA100 but activated by the TIMES simulator showed that 92 of these “false positives” were positive in other genotoxicity or carcinogenicity assays. There were no additional data available for another 67 chemicals, and only 38 were found to be negative in other genotoxicity assays. One could conclude that the TIMES simulator, developed in the present work, actually is producing true mutagenic metabolites that are not positive in TA100 but are detected in other short-term assays. It was concluded that the liver metabolism simulator used in the present work behaves as a battery of genotoxicity assays providing predictions in a larger scale for mutagenicity and carcinogenicity than the Ames Salmonella test applied as the only assay. Presently, the COREPA modes are integrated with the TIMES system. When a new chemical is submitted for predicting mutagenicity, first the COREPA model is used to screen the parent structure. No matter what the predicted activity for the parent chemical, the liver metabolism is simulated for this structure and generated metabolites are submitted to screening by making use of the same COREPA model. Subsequently, the user is informed for mutagenicity of parent and metabolites. Mutagenicity could be due to parent chemical only, could be a result of metabolic activation (i.e., parent is inactive but it is transformed to mutagenic metabolite), and finally, both parent and metabolites could be mutagenic.
Conclusions The present work is an attempt to incorporate the metabolic activation of chemicals when their mutagenicity is modeled. The derived model is a result of integrated application of two computerized QSAR approaches. The
Mekenyan et al.
COREPA probabilistic scheme for classification of chemicals was used to derive a decision tree type model for mutagenicity accounting for conformational flexibility of molecules and proceeding from the S. typhimurium training set consisting of 1196 chemicals. To improve model reliability, the training set was confined only to strain TA100. The COREPA model combined logically in the decision tree alerting groups and ranges of physicochemeical parameters required for activation of these alerting groups. The modeling approach allowed identification of molecular interaction mechanisms according to which chemicals were classified as primary and secondary electrophiles. The data fitting of the model is as follows: sensitivity, 82%; specificity, 94%; and concordance, 92%. The TIMES system for simulating tissue metabolism was used to predict metabolic activation of chemicals. Subsequently, the liver metabolism of chemicals was simulated by the TIMES system mimicking their metabolic activation under S-9. The explicitly generated metabolites were submitted to screening for mutagenicity by the COREPA model. The positive activity was ascribed to those parent chemicals that have at least one active metabolite. For chemicals, which are metabolically activated against TA100, modeling statistics were obtained as follows: sensitivity, 77%; specificity, 74%; and concordance, 75%. The external validations of the derived model also showed acceptable predictability of about 70%. The obtained statistics by this complex modeling scheme do not exceed the goodness of fit obtained by other mutagenicity models; however, the results could be considered as satisfying because of the following reasons. First, the derived model predicts mutagenicity with accuracy corresponding to the experimental error, which is about 25% for Ames mutagenivity data. Models with higher accuracy already explain the variation of experimental error. Second, the present model is consistent with the ICCA/CEFIC workshop principles for model acceptance providing causal association between the modeling parameters and the activity mechanisms. In this respect, two electrophilic interaction mechanisms were encoded in the model, associated with E2 and E1 reactions, respectively. Chemicals acting by E2 mechanisms (SE2 and AdE2) could be called primary electrophiles because their mutagenicity is conditioned by the interaction of original structural alerts (primary structure) and there is no need for their transformation into other reactive groups causing the effect. The ratecontrolling step for chemicals acting by E1 mechanisms is the formation of highly reactive intermediates, which in turn determines monomolecular electrophilicity mechanism of mutagenicity (i.e., SE1 and AdE1). Because of that, chemicals acting by this interaction mechanism could be called secondary electrophiles. Finally, as an advantage of the derived modeling scheme, one could point out its ability to predict mutagenicity resulting from metabolic activation of chemicals. The mammalian liver metabolism is simulated. The metabolites are generated explicitly, under user-defined constraints, and each metabolite is screened by the COREPA mutagenicity model. An analysis of false positives produced by the model showed that 26 of these 223 chemicals were negative in TA100 but were found to be positive in one of the other strains; for 97 of these chemicals, additional supporting data have been found for their mutagenicity or carcinogenicity; for 67 chemicals, no such data were available, and for 37 chemicals, there was no evidence for their
Structural Requirements for Mutagenicity
mutagenicity to any Salmonella strain or carcinogenicity. The S-9 metabolism simulator behaves as a mutagenicity battery providing predictions in a larger scale for mutagenicity and carcinogenicity than the Ames Salmonella test applied as the only assay. The electrophilic interactions conditioning mutagenicity could be characterized with higher nonspecificity as compared to receptor-mediated effects. Consequently, it was found that the COREPA model for mutagenicity is affected by conformer flexibility but not as significantly as the models describing ligand-receptor interactions. The conformer multiplication in the integrated TIMES/ COREPA system is a straightforward procedure, which is performed automatically during program implementation. This paper describes our first attempt to model mutagenicity analyzing not only parent chemicals but also their biotransformation products resulting from metabolism. Further work on improving modeling tools will be focused on expanding structural domain of the COREPA model and that of the metabolic simulator. The COREPA modeling on mutagenicity will also be repeated with the new multidimensional formulation of the method (26). The modeling tools should also be able to solve the reverse of what has been described in the present work and namely the metabolic deactivation of chemicals; i.e., the fact that some molecules showing mutagenic properties are not seen as mutagenic in the presence of liver extract. This could explain the in vivo deactivation of chemicals eliciting an in vitro mutagenic effect. In fact, the capability of the system to account for metabolic deactivation of chemicals is already a solved problem and an appropriate data set is searched to evaluate its performance. Concerning modeling end point, in the next part of this series, the structural and physicochemical requirements for different mechanisms of mutagenicity will be discerned. Models will be developed on the basis of using mutagenicity data for histidine auxotrophes of S. typhimurium strains, each bearing either a substitution-deletion (TA1535, TA100) or a frame shift mutation (TA1538, TA98, TA1537). The combination of both groups of models may improve the overall predictability of mutagenicity of chemicals. Supporting Information Available: CAS numbers, names, SMILES, and mutagenic activities of all chemicals used in this study. Metabolism data used to train the simulator of metabolism to reproduce enzymatic reactions in liver. This material is available free of charge via the Internet at http://pubs.acs.org.
Chem. Res. Toxicol., Vol. 17, No. 6, 2004 765
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
References (1) Ames, B. N., McCann, J., and Yamasaki, E. (1975) Methods for detecting carcinogens and mutagens with the Salmonella/mammalian-microsome mutagenicity test. Mutat. Res. 31, 347-364. (2) Maron, D. M., and Ames, B. N. (1983) Revised method for the Salmonella mutagenicity test. Mutat. Res. 3113, 173-215. (3) Compadre, R. L., Debnath, A. K., Shusterman, A. J., and Hansch, C. (1990) LUMO energies and hydrophobicity as determinants of mutagenicity by nitroaromatic compounds in Salmonella typhimurium. Environ. Mol. Mutagen. 15, 44-55. (4) Debnath, A. K., Compadre, R. L., Shusterman, A. J., and Hansch, C. (1992) Quantitative structure-activity relationship investigation of role of hydrophobicity in regulating mutagenicity in the Ames test: 2. Mutagenicity of aromatic and heteroaromatic nitro compounds in Salmonella typhimurium TA100. Environ. Mol. Mutagen. 19, 53-70. (5) Debnath, A. K., Shusterman, A. J., and Hansch, C. (1992) Quantitative Structure-activity relationship investigation of role of hydrophobicity in regulating mutagenicity in the Ames test:
(23)
(24)
(25)
(26)
(27)
1. Mutagenicity of aromatic and heteroaromatic amines Salmonella typhimurium TA98 and TA100. Environ. Mol. Mutagen. 19, 37-52. Smith, C. J., Hansch, C., and Mortan, M. J. (1997), QSAR treatment of multiple toxicities: the mutagenicity and citotoxicity of quinolines. Mutat. Res. 379, 167-175. Chung, K. T., Chen, S. C., Wong, T. Y., Li, Y. S., Wei, C. I., and Chou, M. W. (2000) Mutagenicity studies of benzidine and its analogues: structure-activity relationships. Toxicol. Sci. 56, 351-356. Tuppurainen, K. (1994) QSAR approach to molecular mutagenicity. A survey and a case study: mx compounds. J. Mol. Struct. 306, 49-56. Shusterman, A. J., Debnath, A. K., Hansch, C., Horn, G. W., Fronczek, F. R., Greene, A. C., and Watkins, S. F. (1989) Mutagenicity of dimethyl heteroaromatic triazenes in the Ames test: The role of hydrophobicity and electronic effect. Mol. Pharmacol. 36, 939-944. Benigni, R., and Giuliani, A. (1996) Quantitative structureactivity relationship (QSAR) studies of mutagens and carcinogense. Med. Res. Rev. 16 (3), 267-284. Sanderson, D. M, and Earnshaw, C. G. (1991) Computer prediction of possible toxic action from chemical structure: the DEREK system. Hum. Exp. Toxicol. 10, 261-273. Smithing, M. P, and Darvas, F. (1992) HazardExpert: An expert system for predicting chemical toxicity. In SF Finlay (Robinson, S. F., and Armstrong, D. J., Eds.) pp 191-200, Food Safety Assessment, American Chemical Society, Washington. Woo, Y.-T., Lai, D., Argus M., and Arcos, J. (1995) Development of structure activity relationship rules for predicting carcinogenic potential of chemicals. Toxicol. Lett. 79, 219-228. Ashby, J. (1985) Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. Environ. Mutagen. 7, 919921. Ashby, J., and Tennant, R. W. (1988) Chemical structure. Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested by the U.S. NTP. Mutat. Res. 204, 17-115. Ashby, J., and Tennant, R. W. (1991) Definitive relationships among chemicals structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutat. Res. 257, 229306. Ashby, J., and Paton, D. (1993) The influence of chemicals structure on the extent and sites of carcinogenesis for 522 rodent carcinogens and 55 different human carcinogen exposures. Mutat. Res. 286, 3-74. Richard, A. M. (1994) Application of SAR methods to noncongeneric databases associated with carcinogenicity and mutagenicity: Issues and approaches. Mutat. Res. 305, 73-97. Enslein, K. (1988) Something is wrong with this title? An overview of structure-activity relationships as an alternative to testing in animals for carcinogenicity, mutagenicity, dermal and eye irritation, and acute oral toxicity. Toxicol. Ind. Health 4, 479498. Klopman, G. (1992) MultiCASE 1. A hierarchical computer automated structure evaluation program. Quant. Struct.-Act. Relat. 11, 176-184. King, R. D., and Srinivasan, A. (1996) Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environ. Health Perspect. 104 (S5), 10311040. Schultz, T. W., Cronin, M. T. D., and Netzeva, T. I. (2003) The present status of QSAR in toxicology. J. Mol. Struct. (THEOCHEM) 622, 23-38. Purdy, R. (1996) A mechanism-mediated model for carcinogenicity: Model content and prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 25 organic chemicals. Environ. Health Perspect. 104, 1085-1094. Lewis, D. F. V., Ioannides, C., and Parke, D. V. (1993) Validation of a novel molecular orbital approach (COMPACT) for the prospective safety evaluation of chemicals, by comparison with rodent carcinogenicity and Salmonella mutagenicity data evaluated by the US NCI/NTP. Mutat. Res. 291, 61-77. Mekenyan, O. G., Ivanov, J. M., Karabunarliev, S. H., Bradbury, S. P., Ankley, G. T., and Karcher, W. (1997) A computationally based hazard identification algorithm that incorporates ligand flexibility. 1. Identification of potential androgen receptor ligands. Environ. Sci. Technol. 31, 3702-3711. Mekenyan, O. G., Nikolova, N., Schmieder, P., and Veith, G. D. (2004) COREPA-M: A Multi-dimensional formulation of COREPA. QSAR Comb. Sci. 23, 5-18. Mekenyan, O. G., Nikolova, N., Karabunarliev, S., Bradbury, S., Ankley, G., and Hansen, B. (1999) New developments in a hazard
766
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35) (36) (37)
(38)
(39)
(40)
(41)
Chem. Res. Toxicol., Vol. 17, No. 6, 2004 identification algorithm for hormone receptor ligands. Quant. Struct.-Act. Relat. 18, 139-153. Bradbury, S. P., Mekenyan, O. G., Kamenska, V. B., Schmieder, P., and Ankley, G. T. (2000) A computationally-based identification algorithm for estrogen receptor ligands. Part I. Predicting hER binding affinity. Toxicol. Sci. 58, 253-269. Jaworska, J., Dimitrov, S., Nikolova, N., and Mekenyan, O. (2001) Probabilistic assessment of biodegradability based on metabolic pathways: CATABOL system. SAR QSAR Environ. Res. 13, 307323. Dimitrov, S., Breton, R., MacDonald, D., Walker, J. D., and Mekenyan, O. (2001) Quantitative prediction of biodegradability, metabolite distribution and toxicity of stable metabolites. SAR QSAR Environ. Res. 13, 445-455. Dimitrov, S. D., Dimitrova, N. C., Walker, J. D., Veith, G. D., and Mekenyan, O. G. (2003) Bioconcentration potential predictions based on molecular attributessan early warning approach for chemicals found in humans, birds, fish and wildlife. QSAR Comb. Sci. 22, 58-68. Snyder, R. T., Chepiga, T., Yang, C. S., Tomas, H., Platt, K., and Oesch, F. (1993) Benzene metabolism by reconstituted cytochromes P450 2B1 and 2E1 and its modulation by cytochrome b5 microsomal epoxide hydrolase and glutathione transferases: Evidence for an important role of microsomal epoxide hydrolase in the formation of hydroquinone. Toxicol. Appl. Pharmacol. 122, 172-181. Liehr, J. G., Somasunderam, I. A., and Roy, D. (1998) Natural and anthropogenic environmental oestrogens: The scientific basis for risk assessment. Metabolism and fate of xeno-oestrogens in man. Pure Appl. Chem. 70 (9), 1747-1758. Scapellato, M. L., Marcuzzo, G., Mastrangelo, G., Sessa, G., Cellini, M., De Rosa, E., Saia, B., and Bartolucci, G. B. (1998) Environmental and biological monitoring of styrene exposure: Urinary excretion of D-glucaric acid compared with exposure indices. J. Occup. Health 40, 313-318. Park, B. K., Kitteringham, N. R., and O’Neill, P. M. (2001) Metabolism of fluorine-containing drugs. Annu. Rev. Pharmacol. Toxicol. 41, 443-470. Dring, L. G., Bye, A., Constable, D. A., Lang, J. C. T., Barsuhn, C., and Johnston, A. M. (1992) The metabolism of 14C-tazadolene succinate in the dog. Xenobiotica 22, 963. den Besten, C., Bennik, M. M., Van Tersel, M., Peters, M. A. W., Teunis, C., and van Bladeren, P. J. (1994) Comparison of the urinary metabolite profiles of hexachlorobenzene and pentachlorobenzene in the rat. Chem.-Biol. Interact. 90, 121-137. Tai, H. L., McReynolds, J. J., Goldstein, J. A., Eugster, H. P., Sengstag, C., Alworth, W. L., and Olson, J. R. (1993) Cytochrome P4501A1 mediates the metabolism of 2,3,7,8-tetrachlorodibenzofuran in the rat and human. Toxicol. Appl. Pharmacol. 123, 3442. Nakajama, M., Iwata, K., Yamamoto, T., Funae, Y., Yoshida, T., and Kurowa, Y. (1998) Nicotine metabolism in liver microsomes from rats with acute hepatitis or cirrhosis. Drug Metab. Dispos. 26 (1), 36. Buckpitt, A. R., Castagnoli, N., Jr., Nelson, S. D., Jones, A. D., and Bahnson, L. S. (1987) Stereoselectivity of naphthalene epoxidation by mouse, rat and hamster pulmonary, hepatic and renal microsomal enzymes. Drug Metab. Dispos. 15, 491. Corbett, M. D., and Corbett, B. R. (1985) Enzymatic N-oxidation of 4-nitroaniline. Biochem. Arch. 1, 115-120.
Mekenyan et al. (42) Nystrom, D. D., and Rickert, D. E. (1987) Metabolism and excretion of dinitrobenzenes by male Fischer-344 rats. Drug Metab. Dispos. 15, 821-825. (43) Kedderis, G. L., Summer, S. C. J., Held, S. D., Batra, R., Turner, M. J., Robert, E., and Fennel, T. R. (1993) Dose-dependent urinary excretion of acrylonitrile metabolites by rats and mice. Toxicol. Appl. Pharmacol. 120, 288-287. (44) Peterson, F. J., and Holtzman, J. Z. (1980) Drug metabolism in the liver. A perspective. In Extrahepatic Metabolism of Drugs and Other Foreign Compounds (Gran, T. E., Ed.) pp 1-121, MTPPress Ltd., Lancaster, England. (45) Sabourin, P. J., Burka, L. T., Bechtold, W. E., Dahl, A. R., Hoover, M. D., Chang, L. Y., and Henderson, R. F. (1992) Species differences in urinary butadiene metabolites; identification of 1,2dihydroxy-4-(N-acetylcysteinyl)butane, a novel metabolite of butadiene. Carcinogenesis 13, 1633-1638. (46) Mekenyan, O. G., Dimitrov, D., Nikolova, N., and Karabunarliev, S. (1999b) Conformational coverage by a genetic algorithm. Chem. Inf. Comput. Sci. 39/6, 997-1016. (47) Stewart, J. J. P. (1990) MOPAC: A semiempirical molecular orbital program. J. Comput.-Aided Mol. Des. 4, 1-105. (48) Stewart, J. J. P. (1993) MOPAC 93, Fujitsu Limited, 9-3, Nakase 1-Chome, Mihama-ku, Chiba-city, Chiba 261, Japan, and Stewart Computational Chemistry, 15210 Paddington Circle, Colorado Springs, CO 80921. (49) Ivanov, J. M., Mekenyan, O. G., Bradbury S. P., and Schuurmann, G. (1998) A kinetic analysis of the conformational flexibility of steroids. Quant. Struct.-Act. Relat. 17, 437-449. (50) Bradbury, S. P., Mekenyan, O. G., and Ankley, G. T. (1998) The role of ligand flexibility in predicting biological activity: Structureactivity relationships for aryl hydrocarbon, estrogen and androgen receptor binding affinity. Environ. Toxicol. Chem. 17, 15-25. (51) Mekenyan, O. G., Kamenska, V., Schmieder, P., Ankley, G., and Bradbury, S. (2000) A computationally-based identification algorithm for estrogen receptor ligands. Part II. Evaluation of a hER binding affinity model. Toxicol. Sci. 58, 270-281. (52) Mekenyan, O. (2002) Dynamic QSAR techniques: Application in drug design and toxicology. Curr. Pharm. Des. 8, 1605-1621. (53) CEFIC (European Chemical Industry Council). (2002) Report from the workshop on regulatory acceptance of QSARs for human health and environmental endpoints, pp 58, Setubal, Portugal, March 4-6, 2002, report available from
[email protected]. (54) Meylan, W. M., and Howard, P. H. (1995) Atom/fragment contribution method for estimating octanol-water partition coefficients. J. Pharm. Sci. 84, 84-92. (55) Wackett, L. P., and Hershberger, C. D. (2001) Biocatalysis and biodegradation: microbial transformation of organic compounds, ASM Press, Washington, DC. (56) Yeruhshalomy, J. (1947) Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. Public Health Rep. 62, 1432-1449. (57) Byrd, G. D., Davis, R. A., and Kazemi, V. E. (1995) Comparison of methods for determining nicotine and its metabolites in urine. Conference of the Society for Research on Nicotine and Tobacco, poster no. 9, San Diego, CA, 24-25 March. (58) Tricker, A. R. (2003) Nicotine metabolism, human drug metabolism polymorphisms, and smoking behaviour. Toxicology 183, 151-173.
TX030049T