Probabilistic Neural Network Multiple Classifier System for Predicting

428

Chem. Res. Toxicol. 2005, 18, 428-440

Probabilistic Neural Network Multiple Classifier System for Predicting the Genotoxicity of Quinolone and Quinoline Derivatives Linnan He and Peter C. Jurs* Department of Chemistry, Penn State University, University Park, Pennsylvania 16802

Constantine Kreatsoulas, Laura L. Custer, Stephen K. Durham, and Greg M. Pearl Bristol-Myers Squibb Company, Princeton, New Jersey 08453 Received September 13, 2004

Quinolone and quinoline are known to be liver carcinogens in rodents, and a number of their derivatives have been shown to exhibit mutagenicity in the Ames test, using Salmonella typhimurium strain TA 100 in the presence of S9. Both the carcinogenicity and the mutagenicity of quinolone and quinoline derivatives, as determined by SAS, can be attributed to their genotoxicity potential. This potential, which is measured by genotoxicity tests, is a good indication of carcinogenicity and mutagenicity because compounds that are positive in these tests have the potential to be human carcinogens and/or mutagens. In this study, a collection of quinolone and quinoline derivatives’ carcinogenicity is determined by qualitatively predicting their genotoxicity potential with predictive PNN (probabilistic neural network) classification models. In addition, a multiple classifier system is also developed to improve the predictability of genotoxicity. Superior results are seen with the multiple classifier system over the individual PNN classification models. With the multiple classifier system, 89.4% of the quinolone derivatives were predicted correctly, and higher predictability is seen with the quinoline derivatives at 92.2% correct. The multiple classifier system not only is able to accurately predict the genotoxicity but also provides an insight about the main determinants of genotoxicity of the quinolone and quinoline derivatives. Thus, the PNN multiple classifier system generated in this study is a beneficial contributor toward predictive toxicology in the design of less carcinogenic bioactive compounds.

Introduction Quinolone and Quinoline Derivatives. Quinolone derivatives (Figure 1) are a class of synthetic antibiotic classes with broad spectrum antibacterial activities (16). These compounds tend to interfere with bacterial replication by targeting the bacterial enzyme DNA gyrase in order to inhibit the coiling of bacterial DNA. The first derivative of the quinolone, nalidixic acid, was introduced for clinical use in 1962 (7). The discovery of norfloxacin followed, involving the fluoridation of the quinolone molecule at C-6. Many current antimicrobial agents were developed following this discovery. Figure 2 shows the evolutionary process of quinolones that were marketed or were under clinical trial. The left side of the figure includes the naphthyridine series, which have a nitrogen atom at the C-8 position; they are represented by nalidixic acid, enoxacin, tosufloxacin, and trovafloxacin. The right side includes floroquinolone agents, with the C-7 position being the most adaptable site for chemical change. Quinoline is a colorless hygroscopic liquid, with a pungent odor, and darkens with age. It is most soluble in alcohol, ether, benzene, and carbon disulfide and less soluble in water. Quinoline is a weak tertiary base that forms salts in acids and exhibits reactions similar to benzene and pyridine and is active in both electrophilic

Figure 1. Structures of quinolone and quinoline.

and nucleophilic substitutions. The quinoline derivatives are known to be most effective in the treatment of autoimmune conditions (such as rhenumatoid arthritis) with their antiinflammatory effects. While the precise mechanism of their antiinflammatory effect is not understood, quinoline derivatives are known to interact with cellular components, such as nucleic acid and melanin (8, 9). Toxicity of Quinolone and Quinoline Derivatives. Despite their effectiveness in disease treatment, laboratory studies and clinical trials have shown that both quinolone and quinoline derivatives have toxic effects. For example, a number of medical papers describe the side effects of the quinolone drugs (10). Numerous reports are also available regarding the in vitro mutagenicity of quinoline derivatives activated with S-9 liver homogenate

10.1021/tx049742m CCC: $30.25 © 2005 American Chemical Society Published on Web 02/08/2005

Quinolone and Quinoline Derivatives

Figure 2. Evolutionary process of quinolone.

in both mutation assays with strains of Salmonella typhimurium (11, 12). During therapy with second generation fluorinated quinolone derivatives, the most often observed adverse effects are reactions of the gastrointestinal tract (0.86.8%), the central nervous system (0.9-1.8%), and the skin (0.6-2.4%). In addition, some derivatives have been shown to have adverse drug interactions in patients taking theophylline for chronic respiratory tract diseases (13). These are examples of the neurotoxic effects of fluoroquinolones. Other toxic effects of quinolone derivatives include cardiotoxic potentials, phototoxicity, chondrotoxicity, hepatotoxicity, genotoxicity, and carcinogenicity. Therefore, quinolone derivatives’ specific toxic potentials have to be considered when they are chosen for treatment of bacterial infections. Quinoline derivatives also show genotoxicity effects. Study results have shown that the responsible intermediate for nucleic acid modification is a 2,3- or 3,4-expoxy derivative of quinoline. More recent studies involving fluorine and chlorine substitution at various locations on the quinoline rings supported this hypothesis further. 3-Fluoro- and 2- and 3-chloro quinolines were less mutagenic than all other fluoro- and chloro-substituted derivatives of quinoline (14, 15). Substitutions at other locations do not reduce quinoline’s mutagenicity and, in some cases, enhance it. These toxicity effects associated with quinolone and quinoline derivatives have raised greater concerns in the drug development of quinolone and quinoline derivatives. Among all of the toxic effects that these drugs might cause, concerns regarding carcinogenicity draw great attention. Currently, most carcinogenicity concerns in the early phase of drug design are addressed by carcinogenicity assessment, which provides information on the carcinogenic hazard potential of the substance in question and quantitatively estimates the risk from oral exposure and inhalation exposure. The information includes determination of the likelihood that the agent is a human carcinogen and the conditions under which the carcinogenic effects may be expressed. Clearly, the carcinogenicity assessment is a necessary and important step in the drug development process in determining the carcinogenicity potential of drug candidates. This assessment step currently carried out with in vitro procedures can be superseded with an in silico prediction method, using a mathematical carcinogenicity predicting model to classify

Chem. Res. Toxicol., Vol. 18, No. 3, 2005 429

a compound’s genotoxic potential. This alternative approach has the benefit of avoiding needless expenditures, reducing the number of animals sacrificed during testing, and saving time on synthesizing and marketing drug candidates that may cause severe carcinogenicity effects later. In this study, a probabilistic neural network (PNN) classification method was implemented to determine a collection of quinolone and quinoline derivatives’ carcinogenicity. In addition, a multiple classifier system is also developed to serve as a comparison with the individual PNN models in the purpose of seeking a model or a system that can provide superior predictability of genotoxicity of the two groups of derivatives. The objective is to build classification models that will not only accurately predict the genotoxicity of quinolone and quinoline derivatives but also provide insight about the main determinants of genotoxicity of the quinolone and quinoline derivatives.

Experimental Procedures The study process of generating predictive PNN classification models in this work contained the following major sequential steps: data set formation with collected relevant information, structure entry and molecular modeling, descriptor generation, training and prediction set formation, objective feature selection, model building (subjective feature selection), and model validation. All computations in this study were performed on a DEC 300 AXP model 500 workstation with built-in Automated Data Analysis and Pattern Recognition Toolkit (ADAPT) software package (16, 17), developed by the Jurs research group. ADAPT has been shown in various applications to predict physical properties and biological activities successfully (18-22). Evolutionary optimization algorithms and PNN routines developed by Jurs group members were used in classification model building. Data Set Formation. The process began with the collection of compounds with activity of interest for model building. The data set formed contained the compounds’ chemical information and the activity measurements under study. In particular, this study focused on a collection of commercially available quinolone and quinoline derivatives. The two data sets were provided by Bristol-Myers Squibb and contained compounds that were chosen based upon chemical diversity. Along with the structural information, the two data sets were also provided with the activity of interest, the genotoxic potential, which was determined by the SOS Chromotest for each compound in the data set. The SOS Chromotest is a genotoxicity test based on DNA damage (23). This assay measures induction of a lacZ reporter gene in response to DNA damage (24). The SOS pathway plays a leading role in Escherichia coli response to genotoxic damage, responding to a broad spectrum of genotoxic substances. SOS induction can therefore be used as a monitor for DNA damage. The SOS Chromotest was used in this study as an alternative method to the well-known and well-accepted uniform standard test, the Ames test, which new drugs must meet for genotoxicity under the ICH guidelines. The structure modeling technique proposed in this study predicts the outcome of the SOS Chromotest, the chemicals’ genotoxicity potential. Even though the SOS Chromotest was not included in the series of tests that provide uniform standards for genotoxicity that new drugs must meet under ICH guidelines, it is a reliable alternative for these tests based on the facts that studies published in the literature have shown that the SOS Chromotest’s outcome has a high percentage concordance with the Ames test. The choice was made and supported with published study results in the literature. Some laboratories (25-27) have demonstrated that quinolones induce the SOS repair pathway. The assay has been

430

Chem. Res. Toxicol., Vol. 18, No. 3, 2005

used extensively with many different chemical classes. A review of published data between 1982 and 1992 demonstrated that for 1776 compounds, the SOS Chromotest had 90% concordance with the Ames mutagenicity test (23). The SOS Chromotest is one of the most rapid and simple short-term test for genotoxins and is easily adaptable to various conditions. With the benefits that the SOS Chromotest provides and the high percentage outcome concordance with the Ames test, the SOS Chromotest is a reliable genotoxicity determination test for the purpose in this study. Two data sets were formed, containing 85 quinolone and 115 quinoline derivatives, respectively. They are listed in Tables 1 and 2 with the corresponding IMAX values for both the presence (+S9) and the absence (-S9) of S9. The IMAX (maximal SOS induction factor) values measured by the SOS Chromotest in the presence or absence of S9 liver homogenate are an indicator of an analyte’s genotoxicity potential. The analytes that have high IMAX values have a high potential to be genotoxic; in other words, they have the potential to be human carcinogens and/or mutagens. For the quinolone derivatives data set, the +S9 IMAX assay values ranged from 0.89 to 1.74 with a mean of 1.069. The IMAX values obtained from the -S9 assay ranged from 0.97 to 2.1 with a mean of 1.146. The quinoline derivatives data set’s IMAX values had a larger range than that of the quinolone derivatives data set. In the +S9 assay, IMAX values ranged from 0.95 to 6.74 with a mean of 1.23. For the -S9 assay, IMAX values ranged from 0.97 to 14.96 with a mean of 1.40. The quinolone derivatives’ class identity (genotoxic or nongenotoxic) was determined by a cutoff IMAX value of 1.20. Compounds with IMAX > 1.20 were labeled as genotoxic; compounds with IMAX e 1.20 were labeled nongenotoxic. On this basis, the quinolone data set contained 62 nongenotoxic compounds (72.9%) and 23 genotoxic compounds (27.1%). The quinolines class identity was determined by the same assignment rule but with a slightly higher cutoff IMAX value at 1.25, and this resulted in 71 compounds as being nongentoxic (61.7%) and 44 (38.3%) of the 115 compounds being genotoxic. The cutoff values were chosen by examining the overall distribution of the data sets’ dependent variables. As there were no reported rules on these data sets to define the classes, the cutoffs were determined such that there are adequate, representative numbers of compounds in the two classes. Structure Entry and Optimization. The next step was the entry of the structures of the 85 quinolone derivatives and 115 quinoline derivatives with their corresponding IMAX ( S9 values. The structures were entered with HyperChem (HyperCube, Inc., Waterloo, ON) on a PC. The two-dimensional sketched structures were represented by connection tables, which contain atom types and the connections between atoms. The structure information was then transferred and stored in an ADAPT work area. After the transferring process, the geometry of each compound was optimized with the PM3 Hamiltonian (28) using MOPAC (29). A single-point energy calculation was also performed using the AM1 Hamiltonian (30) on the PM3 geometry optimized structures using MOPAC in order to generate accurate charge information. Information for both two-dimensional and three-dimensional representations was saved for descriptor generation. Descriptor Generation. In developing predictive models to mathematically link the structure and activity of interest, organic compounds’ chemical structures were numerically encoded by descriptors. Descriptors are simply numerical values that describe molecular structures and capture various molecular properties. These numerical representations were calculated with ADAPT descriptor generation routines. On the basis of the information content, these descriptors can be classified into four categories: topological, geometric, electronic, and hybrid descriptors. Topological descriptors (31-33) were calculated from the connection table and do not require threedimensional (3D) optimized structural information. These descriptors reflect the types of atoms and bonds in a molecule, the molecular connectivity, and the branching of the molecule.

He et al. They include atom, bond, functional group, path, and substructure counts and connectivity indices (34). Geometric descriptors (35) were calculated from the PM3 geometry-optimized 3D structures. They encode information about the 3D structural nature and shape of the molecule. Examples of these descriptors are principal moments of inertia (36), solvent accessible surface area (37), molecular volume, length-to-breadth ratio (35), shadow projection areas (38), and gravitational index G (39). Electronic descriptors include information such as partial atomic charges (40), dipole moment, and highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy levels (41) that quantify charge information of the molecule. In many cases, they are also related to the molecular topology and composition. The calculation of these descriptors requires not only accurate 3D geometry-optimized structural information but also the single-point AM1 charges calculated using MOPAC. The last category, hybrid descriptors, was calculated by combining two or more of the above descriptor types. Charged partial surface areas (CPSAs) (42) are examples of these descriptors. They combine geometric and electronic properties of the molecules, e.g., surface areas and fractional partial charges. These descriptors characterize the molecules’ ability to engage in polar interaction with the information on partial positive/negative charges relative to surface areas, total weighted partial charges relative to surface areas, and fractional partial charges relative to surface areas. Hydrogen bonding descriptors (43) were also calculated with selective CPSA descriptors to encode those molecular features that are important for intermolecular hydrogen bonding. Another group of descriptors, HSA descriptors (44), was also added to capture the hydrophilicity/hydrophobicity properties of the molecules. Feature Selection. Once approximately 200 descriptors had been generated by the descriptor generation routines, feature selection was done to reduce the number of descriptors per compound by removing descriptors that contained little or redundant information. Thus, a pool of descriptors that is best in encoding the genotoxicity resulted. This feature selection includes objective methods and subjective methods. Objective feature selection uses only the independent variables to filter out nonuseful descriptors without using the dependent variables (here, class identity). It employs identical test, pairwise correlation test, and vector space descriptor analysis (VSDA) (45) to eliminate descriptors that contain little information or that have a high correlation with another descriptor. For example, descriptors with values that are identical for at least 90% of the compounds in the training set were removed. The correlated information was reduced by removing one of two descriptors whose pairwise correlation coefficient exceeded 0.85 for the training set compounds. The resulting pool of descriptors was then reduced by subjective feature selection, which searches for an information-rich subset of descriptors for model development. In subjective feature selection, the dependent variable (genotoxicity class) was considered in descriptor selection. The selected subsets of descriptors, usually ranging from three to 10 descriptors per model, were the ones that optimally mapped the genotoxicity to molecular structure. The feature selection process employed evolutionary optimization techniques (genetic algorithm or stimulated annealing) (46, 47) to search the descriptor space for optimal subsets of descriptors and was coupled with a fitness evaluator that calculates a measure of the overall performance of the model for each subset of descriptors. For this study, the subsets of descriptors were found by stimulated annealing (46), and PNN (48) was chosen as the fitness evaluator. In all, the subjective feature selection generated subsets of descriptors each corresponding to the topperforming model. The selected model’s predictive ability was considered best when a high training set classification rate was associated with a comparable prediction set prediction rate. PNN. The PNN implemented in this study was first introduced by Specht (48). A PNN is a three-layer, feed-forward neural network that is widely used in the area of pattern



Table 1. Data Information on the 85 Quinolone Derivatives ID no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

name 1-ethyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (2-trifluoromethylphenyl)amide 4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid (5-methyl-thiazol-2-yl)amide 6-amino-3-methyl-3H-naphtho(1,2,3-diethyl)quinoline-2,7-dione 4-hydroxy-2-oxo-1-pentyl-1,2-2H-quinoline-3-carboxylic acid (4-ethoxy-phenyl)amide 4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid (2-cyano-phenyl)amide 4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid dodecylamide 2-((1-allyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carbonyl)amino)benzoic acid methyl ester 1-hexyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (2-bromo-phenyl)amide 2-((4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carbonyl)amino)benzoic acid hexyl ester 4-hydroxy-1-methyl-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid (2-cyano-phenyl)amide 2-((4-hydroxy-2-oxo-1-propyl-1,2-2H-quinoline-3-carbonyl)amino)benzoic acid methyl ester 1-ethyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (4(4-bromo-phenyl)thiazol2-yl)amide RCL R42,413-7 4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (2-trifluoromethyl-phenyl)amide RCL R42,134-0 4-((4-hydroxy-2-oxo-1-propyl-1,2-2H-quinoline-3-carbonyl)amino)benzoic acid butyl ester RCL R36,488-6 RCL R42,017-4 RCL R42,018-2 1-ethyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (4-methyl-thiazol-2-yl)amide 1-ethyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (5-chloro-2-methoxyphenyl)amide 4-hydroxy-1-methyl-2-oxo-1,2-2H-quinoline-3-carboxylic acid (4H-furan-2-ylmethyl)amide 1-allyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (pyridin-2-ylmethyl)amide 4-((4-hydroxy-1-methyl-2-oxo-1,2-2H-quinoline-3-carbonyl)amino)benzoic acid propyl ester 4-((1-ethyl-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carbonyl)amino)benzoic acid methyl ester 4-hydroxy-1-methyl-2-oxo-1,2-2H-quinoline-3-carboxylic acid (pyridin-3-ylmethyl)amide 1-acetyl-4-methyl-6-bromoanthrapyridone 6-chloro-4-phenyl-3-(3-phenyl-acryloyl)-1H-quinolin-2-one 6-bromo-3-(1-(2-bromo-benzoyl)-5-(4-methoxy-phenyl)-4,5-2H-1H-pyrazol-3-yl)-4-phenyl1H-quinolin-2-one 3-acetyl-6-bromo-4-phenyl-1H-quinolin-2-one 6-bromo-3-(3-(3,4-dimethoxy-phenyl)acryloyl)-4-phenyl-1H-quinolin-2-one 4-hydroxy-2-oxo-1-pentyl-1,2-2H-quinoline-3-carboxylic acid (4-phenyl-thiazol-2-yl)amide 2((4-hydroxy-2-oxo-1-pentyl-1,2-2H-quinoline-3-carbonyl)amino)benzoic acid methyl ester 4-hydroxy-1-(3-methyl-butyl)-2-oxo-1,2-2H-quinoline-3-carboxylic acid (2-cyanophenyl)amide 1-(2,2-dimethyl-propyl)-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid m-tolylamide 1-(2,2-dimethyl-methyl-propyl)-4-hydroxy-2-oxo-1,2-2H-quinoline-3-carboxylic acid (4-iodo-phenyl)amide 4-(4-benzhydryl-piperazin-1-yl)-3-nitro-1-phenyl-1H-quinolin-2-one (4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetic acid ethyl ester N-decyl-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide N-cycloheptyl-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide N-cyclohexyl-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide N-(3-fluoro-phenyl)-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide 2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)-N-(4-methoxy-phenyl)acetamide N-(4-fluoro-phenyl)-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide N-(2,4-dimethyl-phenyl)-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide 2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)-N-P-tolyl-acetamide N-(3-chloro-2-methylphenyl)-2-(4-hydroxy-2-oxo-1,2-dihydro-3-quinolinyl)-2-oxoacetamide 2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)-N-(1-phenyl-ethyl)acetamide 6-bromo-4-phenyl-3-(5-phenyl-1-(pyridine-4-carbonyl)-4,5-2h-1h-pyrazol-3-yl)quinolin-one 6-bromo-3-(1-(4-methoxy-benzoyl)-5-(4-methoxy-phenyl)-4,5-2H-1H-pyrazol-3-yl)-4-phenyl1H-quinolin-2-one N,N-dibutyl-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide 2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)-N,N-BIS-(3-methyl-butyl)acetamide N,N-dibenzyl-2-(4-hydroxy-2-oxo-1,2-dihydro-quinolin-3-yl)acetamide 4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid 2-chloro-benzylamide 4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid cyclooctylamide 1-butyl-4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid 4-chloro-benzylamide 4-hydroxy-2-oxo-1-propyl-1,2-dihydro-quinoline-3-carboxylic acid 4-chloro-benzylamide 1-ethyl-4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid (2-cyano-phenyl)amide 4-hydroxy-1-methyl-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid 4-chloro-benzylamide 4-hydroxy-1-methyl-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid cyclopentylamide 4-hydroxy-N-(2-hydroxyethyl)-2-oxo-1,2-dihydro-3-quinolinecarboxamide 4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid nonylamide 1-butyl-4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid methylamide 4-hydroxy-2-oxo-1-propyl-1,2-dihydro-quinoline-3-carboxylic acid dodecylamide S-4-hydroxy-2-oxo-1-propyl-1,2-dihydro-quinoline-3-carboxylic acid (1-phenyl-ethyl)amide 1-ethyl-4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid isobutyl-amide S-1-ethyl-4-hydroxy-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid (1-phenyl-ethyl)amide ethyl 4-hydroxy-1-methyl-2-oxo-1,2-dihydro-3-quinolinecarboxylate

measured values class IMAX+S9 IMAX-S9 actual predicted 1.25

1.17

2

1*

1.29 1.55 1.74 1.64 1.25 1.08 1.12 1.16 1.12 1.09 1.02

1.34 2.1 2.1 1.51 0.99 1.35 1.26 1.37 1.26 1.15 1.09

2 2 2 2 2 2 2 2 2 1 1

1* 1* 2 2 1* 1* 2 2 2 2* 1

1.01 1.02 0.92 0.96 1.04 0.94 0.97 0.97 1.01

1.12 1.07 0.97 1.06 1.24 1.1 1.13 0.98 1.08

1 1 1 1 2 1 1 1 1

1 1 1 1 2 1 1 1 1

0.99 1.04 0.93 1.02 1.07 1.02 0.97 1.06

1.11 1.1 0.97 1.05 1.06 1.16 1.23 1.23

1 1 1 1 1 1 2 2

1 1 1 1 1 1 2 1*

1.06 1.14 0.99 1.09 0.98

1.16 1.12 1.08 1.22 1

1 1 1 2 1

1 1 1 2 1

1.04 1.1

1.14 1.23

1 2

1 1*

1.11 1 1.01 1.04 1.05 0.89 0.99 0.94 1.02 0.99 1.06 0.98 1.05 1.04

1.08 1.14 1.16 1.18 1.13 1.09 1.05 1.16 1.17 1.03 1.08 1.16 1.09 0.99

1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1

1.03 1.2 1.22 1.04 1.02 1.01 1.04 1.05 0.94 1.03 1.14 1 0.98 1.1 1.06 1.01 1.03 1.02

1.07 1.14 1.16 1.23 1.06 1.07 1.08 1.09 1.07 1.08 1.15 1.07 1.12 1.23 1.18 1.05 1.09 1.04

1 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1

1 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1

432


He et al. Table 1 (Continued) measured values class IMAX+S9 IMAX-S9 actual predicted

ID no.

name

69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85

4-hydroxy-1-methyl-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid dodecylamide 4-hydroxy-1-methyl-2-oxo-1,2-dihydro-quinoline-3-carboxylic acid pentylamide N-(1,3-benzothiazol-2-yl)-4-hydroxy-2-oxo-1-pentyl-1,2-dihydro-3-quinolinecarboxamide N-(2,4-dichlorophenyl)-4-hydroxy-2-oxo-1-pentyl-1,2-dihydro-3-quinolinecarboxamide 4-hydroxy-N-(4-methylphenyl)-2-oxo-1-pentyl-1,2-dihydro-3-quinolinecarboxamide N-(4-bromophenyl)-4-hydroxy-2-oxo-1-pentyl-1,2-dihydro-3-quinolinecarboxamide 1-butyl-N-(4-chlorophenyl)-4-hydroxy-2-oxo-1,2-dihydro-3-quinolinecarboxamide N-(2,4-dichlorophenyl)-4-hydroxy-2-oxo-1-propyl-1,2-dihydro-3-quinolinecarboxamide N-(4-chlorophenyl)-4-hydroxy-2-oxo-1-propyl-1,2-dihydro-3-quinolinecarboxamide 1-allyl-4-hydroxy-N-(4-hydroxyphenyl)-2-oxo-1,2-dihydro-3-quinolinecarboxamide N-(2-chlorobenzyl)-1-ethyl-4-hydroxy-2-oxo-1,2-dihydro-3-quinolinecarboxamide N-benzyl-1-ethyl-4-hydroxy-2-oxo-1,2-dihydro-3-quinolinecarboxamide 1-ethyl-4-hydroxy-N-(3-methoxyphenyl)-2-oxo-1,2-dihydro-3-quinolinecarboxamide 1-ethyl-4-hydroxy-N-(4-methylphenyl)-2-oxo-1,2-dihydro-3-quinolinecarboxamide 4-hydroxy-N-(3-methoxyphenyl)-1-methyl-2-oxo-1,2-dihydro-3-quinolinecarboxamide N-(2,4-dimethylphenyl)-4-hydroxy-1-methyl-2-oxo-1,2-dihydro-3-quinolinecarboxamide 9-chloro-1-methyl-1,10-phenanthrolin-2(1h)one

recognition, nonlinear mapping, and estimation of probability of class membership and likelihood ratios. When compared with back-propagation neural networks, both techniques are based on well-established statistical principles derived from Bayes decision strategy. However, the PNN trains much faster with its one-pass training algorithm, and the addition of training data does not require retaining of the entire network. In addition, a PNN is guaranteed to converge to the Bayes optimal decision surface, unlike the back-propagation technique, which may terminate in a local minimum (49). In PNN, inputs (of unknowns) similar to the ones in training set can be classified correctly within limits due to the fact that noise or errors in the training set do not affect classification accuracy easily. As with other neural networks, the PNN training process is a learning process. The goal is to approximate the probability density function (pdf) of the training examples. More accurately, the PNN can be interpreted as a function that approximates the probability density of the underlying example’s distribution. The finding of the pdf of the training examples with PNN relies on Parzen’s method (50). Parzen’s method synthesizes an estimate of a pdf by superposition of replicas of a function (The Gaussian function was used in this study.). The classification decision is made after calculating the pdf of each class using the training examples according to

1.05 1.2 1.11 0.98 1.13 1.02 0.94 0.99 1.06 1.07 1.08 1.2 1.04 1.07 1 1.11 1.11

fi (x) )

1 (2π)d/2

Ni

∑ σ N d

exp

- (x - xij)T(x - xij)

i j)1

2σ2

where i is the class number, j is the pattern number, xij is the jth training example from class i, x is the test (unknown) sample, Ni is the number of training examples in class i, d is the dimension of vector x, σ is the smoothing factor (the standard deviation), and fi (x) is the sum of multivariate spherical Gaussians centered at each of the training examples xij for the ith class pdf. This is also the function indicating the likelihood of an unknown sample belonging to a given class. Thus, given the pdf, generalization beyond the provided examples can be achieved. The classification decision is made according to the inequality decision rule, which is d(x) ) Ci, if

and Ci is the class i.

2* 2 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1

Figure 3. Architecture of PNN. contains neurons that accept feature vectors (descriptor values) and pass them to the pattern layer. Each neuron in the pattern layer corresponds to one of each training example. Here, each of them forms a dot product of the input vector X with the weight vector Wi

Zi ) X ‚ Wi and then performs a nonlinear operation on Zi before proceeding to the summation layer. The function used for this nonlinear transformation is the Gaussian activation function

(Zi - 1) exp σ2 instead of a sigmoid function commonly used for back-propagation. Next, the summation layer forms the weighted sum of the outputs from the pattern layer associated with a given class, after each summation neuron receives the outputs from the pattern layer. Nk

[ ]

∑ exp i)1

(Zi - 1) σ2

Finally, the output binary neurons in the output layer produce the classification decision according to this inequality. Nk

∑ exp i)1

fi (x) > fj (x), for i * j

1 2 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1

The described approximation of pdf learning process is trained with the PNN that consists of nodes allocated in four layers: input layer, pattern layer, summation layer, and output layer. The structure is shown in Figure 3 (48). The input layer

px fx > py fy for all x * y where px is the prior probability of occurrence of examples from class x and fx is the estimated pdf of class x. The finding of the pdf is achieved by adding up the values of the d-dimensional Gaussian probability distribution centered at each training example and scaling the sum to produce the estimated probability density

1.07 1.14 1.11 1.07 1.1 1.1 1.05 1.02 1.02 1.05 1.01 1.22 1.06 1.22 1.07 1.22 1.08

[ ] (Zi - 1) σ2

Nj

>

[ ]

∑ exp j)1

(Zj - 1) σ2

Multiple Classifier System. In this study, a multiple classifier system was built with the contribution of individual



Table 2. Data Information on the 115 Quinoline Derivatives ID no.

name

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

2-biphenyl-4-yl-quinoline-4-carboxylic acid phenyl-quinolin-2-ylmethyl-phosphinic acid methyl ester N-(7-chloro-quinolin-4-yl)-N′-(4-fluoro-benzylidene)hydrazine 5-methyl-1,10-phenanthroline acridin-9-yl-methyl-phenyl-amine 8-chloro-2-methyl-4-quinolinol 2-phenyl-quinoline-4-carboxylic acid (1-methyl-2-oxo-1,2-2H-indol-3-ylidene)hydrazide N′-(3,4-dichlorobenzylidene)-2-phenyl-4-quinolinecarbohydrazide disperse yellow 64 (latyl yellow 3 gb) 2-phenyl-quinoline-4-carboxylic acid (4-chloro-benzylidene)hydrazide 2-phenyl-quinoline-4-carboxylic acid (4-bromo-benzylidene)hydrazide 2-methyl-quinoline-4-carboxylic acid (2-oxo-1,2-dihydro-indol-3-ylidene)hydrazide (3-chloro-4-methyl-phenyl)-(2-chloro-quinolin-3-ylmethylene)amine RCL R53,713-6 2-(4-chlorophenyl)-3-methyl-4-(4-morpholinyl)quinoline hydroquinidine 4-methyl-2-quinolyl ether (methoxy-quinolin-yl)-(vinyl-aza-bicyclo(2.2.2)oct-yl)methanol, diphenyl-butyric acid methyl 4-(4-chloro-6-fluorosulfonyl-2-quinolyl)benzoate 3-hydroxy-2-methyl-4-quinolinecarboxylic acid 2-(quinolin-8-ylsulfanyl)acetic acid (4-bromo-benzylidene)hydrazide N′-(bicyclo(2.2.1)hept-5-en-2-ylmethylene)-2-(4-methyl-phenyl)-4-quinolinecarbohydrazide N′-benzylidene-2-(4-methylphenyl)-4-quinolinecarbohydrazide quinoline-2-carboxylic acid (3-bromo-benzylidene)hydrazine N′-(2,5-dimethoxybenzylidene)-2-(4-methylphenyl)-4-quinolinecarbohydrazide 2-(quinolin-8-ylsulfanyl)acetic acid (4-fluoro-benzylidene)hydrazine N′-(1-(4-chlorophenyl)ethylidene)-2-(4-methylphenyl)-4-quinolinecarbohydrazide N′-(1-(4-methoxyphenyl)ethylidene)-2-(4-methylphenyl)-4-quinolinecarbohydrazide 2-phenyl-quinoline-4-carboxylic acid(4-hydroxy-3-methoxy-benzylidene)hydrazide 2-methyl-quinoline-4-carboxylic acid (2-chloro-benzylidene)hydrazide 2-methyl-quinoline-4-carboxylic acid benzylidene-hydrazide 2-methyl-quinoline-4-carboxylic acid (5-bromo-2-hydroxy-benzylidene)hydrazide 2-phenyl-quinoline-4-carboxylic acid (1-(4-hexyl-phenyl)ethylidene)hydrazide 2-pyridin-4-yl-quinoline-4-carboxylic acid 2-(3-methoxy-phenyl)quinoline-4-carboxylic acid 2-(3,4-dimethoxy-phenyl)quinoline-4-carboxylic acid 2-(2,4-dichloro-phenyl)quinoline-4-carboxylic acid 2-(4-methylphenyl)-4-quinolinecarboxamide 2-(3,4-dimethyl-phenyl)quinoline-4-carboxylic acid 2-oxo-2-phenylethyl 8-methyl-2-phenyl-4-quinolinecarboxylate pentyl 6-methyl-2-phenyl-4-quinolinecarboxylate 4-hydroxy-6,7-diisobutoxy-quinoline-3-carboxylic acid ethyl ester dimethyl 11-(4-methoxybenzoyl)pyrrolo(1,2-A)(1,10)phenanthroline-9,10-dicarboxylate 2-phenyl-quinolin-4-ylamine 3-amino-1-phenyl-1H-4-oxa-5-aza-phenanthrene-2-carbonitrile 4-(4-methyl-piperazin-1-yl)-2-naphthalen-2-yl-quinoline 4-((1-methyl-4-piperidinyl)thio)quinoline N,N-dimethyl-N′-(2-pyridin-3-yl-quinolin-4-yl)ethane-1,2-diamine N-acridin-9-ylmethylene-N′-(6-chloro-pyridazin-3-yl)hydrazine (2-chloro-8-methyl-quinolin-3-ylmethylene)-(4-ethoxy-phenyl)amine 2-(4-bromo-phenyl)-6-methyl-4-phenyl-quinoline (4-chloro-2,5-dimethoxy-phenyl)-(2-chloro-8-methyl-quinolin-3-ylmethylene)amine 4-morpholin-4-yl-2-pyridin-3-yl-quinoline (methoxy-quinolin-yl)-(vinyl-aza-bicyclo(2.2.2)oct-2-yl)methanol, methyl-phenyl-butyric acid N-(4-bromo-benzylidene)-N′-(6-bromo-4-phenyl-quinolin-2-yl)hydrazine 5,7-diiodo-8-hydroxyquinoline N-(4-hydroxy-2-oxo-1,2-dihydro-3-quinolinyl)dodecanamide 7,10-dimethylbenz(C)acridine carbostyril 151 acridine orange base 3,6-Bis(dimethylamino)acridine chloroquine diphosphate tribromoquinaldine 8-quinolinesulfonic acid 8-quinolinyl trifluoromethanesulfonate 1-butyl-4-hydroxy-N-(4-iodophenyl)-2-oxo-1,2-dihydro-3-quinolinecarboxamide 1-ethyl-4-hydroxy-N-(4-iodophenyl)-2-oxo-1,2-dihydro-3-quinolinecarboxamide hydroquinine 4-chloro-2-phenyl-6-quinolinesulfonyl fluoride (2,2′)-biquinolinyl-4,4′-dicarboxylic acid dinonyl ester ethyl 4-hydroxy-6-methoxy-3-quinolinecarboxylate 8-hydroxyquinoline-β-D-glucopyranoside 4-hydroxy-1-methyl-3-phenyl-1,2-dihydroquinolin-2-one 5-chloro-2,8-dimethyl-4-[(3-nitro-2-pyridyl)oxy]quinoline 4-(4-chloro-3-methylphenoxy)-2-methylquinoline 4-(4-methoxyphenoxy)-2-methylquinoline N-(1,3-benzodioxol-4-yl)-2-methylquinolin-4-amine

measured values class I_MAX+S9 I_MAX-S9 actual predicted 1.31 1.32 1.26 1.39 1.44 1.11 1.12 1.07 1.08 1.12 1.19 1.22 1.14 1.13 1.19 1 1.05 1.01 1.08 1.04 1.01 1.08 1.05 1.07 1.14 1.06 1.13 1.15 1.09 1.12 0.99 1.06 1.23 1.09 0.98 1.01 1.1 1.18 1.15 1 0.97 1.01 1.02 1.01 1 1.06 0.95 1.03 1.03 1.19 1.04 1.1 1.05 1.01 1.07 1.28 1.28 1.48 1.72 2 6.74 1.12 0.99 1.08 1.2 1.13 1.09 0.97 1.1 1.01 1.2 1.24 1.03 1.02 0.95

1.19 1.39 1.44 1.15 1.54 1.3 1.33 1.27 1.32 1.33 1.25 1.36 1.3 1.49 1.66 1.51 1.31 1.21 1.05 1.19 1.13 1.17 1.07 1.09 1.17 1.19 1.2 1.2 1.14 1.23 1.14 1.24 1.2 1.19 1.18 1.04 1.13 1.17 1 1.08 1.06 1.05 1.19 1.24 1.17 1.18 0.97 1.18 1.07 1.16 1.09 1 1.08 1.24 0.98 1.26 1.28 1.65 1.59 1.74 14.96 1.34 1.26 1.26 1.4 1.61 1.13 0.97 1.27 0.99 1.38 1.15 1.1 1.15 1.19

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 2 1 1 1 1

2 2 1* 2 2 2 2 2 2 2 2 2 2 2 2 2 1* 1 1 2* 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 1* 1 1 1 2*

434


He et al. Table 2 (Continued) measured values class I_MAX+S9 I_MAX-S9 actual predicted

ID no.

name

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

2-isopropyl-4-phenylquinoline 2-phenyl[1,3]oxazolo[4,5-c]quinolin-4(5H)one 4-hydroxy-1-methyl-3-phenyl-1,2-dihydroquinolin-2-one 2-isopropyl-4-phenylquinoline N-(1,3-benzodioxol-4-yl)-2-methylquinolin-4-amine N-(tetrahydrofuran-2-ylmethyl)quinoline-8-sulfonamide N-cyclohexyl-2-[(2-methylquinolin-4-yl)oxy]-N-phenylacetamide N-(3-oxo-1,3-dihydro-2-benzofuran-5-yl)-8-quinolinesulfonamide 4-methyl-N-(2-quinolinyl)benzenesulfonamide 4-methyl-N-(2-quinolinyl)-1,2,3-thiadiazole-5-carboxamide 3-[1-(4-bromophenyl)-1H-1,2,3,4-tetraazol-5-yl]-4-methylquinoline 1,3,8-trimethyl-1,2,3,4-tetrahydropyrimido[4,5-b]quinoline-2,4-dione 3-ethyl-1-methyl-2-oxo-1,2-dihydroquinolin-4-yl 4-methylbenzene-1-sulfonate 2-oxo-1-phenyl-1,2-dihydroquinolin-4-yl benzoate 7-chloro-4-[4-(1H-pyrrol-1-yl)piperidino]quinoline N-phenethylquinoline-6-carboxamide 2-[(4-allyl-5-quinolin-6-yl-4H-1,2,4-triazol-3-yl)thio]acetonitrile 7-chloro-4-[4-(ethylsulfonyl)piperazino]quinoline 4-(7-chloroquinolin-4-yl)-N,N-diisopropyltetrahydropyrazine-1(2H)-carboxamide 1-[4-(7-chloroquinolin-4-yl)piperazino]-2-phenylethan-1-one N-(2-chlorophenyl)-2-methylquinolin-4-amine 1-[4-(7-chloroquinolin-4-yl)piperazino]propan-1-one N-[2-(dimethylamino)-4-quinolyl]-N′-phenylurea 6-benzyl-4-hydroxy-5,6-dihydro-2H-pyrano[3,2-c]quinoline-2,5-dione 3-butyl-4-hydroxy-1-phenyl-1,2-dihydroquinolin-2-one 1-ethyl-4-hydroxy-1,2-dihydroquinolin-2-one 7-hydroxy-6-[(phenylimino)methyl]-2,3-dihydro-1H,5H-pyrido[3,2,1-ij]quinolin-5-one 3-benzyl-4-hydroxy-1,2-dihydroquinolin-2-one 9-methyl-6H-chromeno[4,3-b]quinolin-6-one 5-benzyl-4-hydroxy-8,9,10,11-tetrahydro-6H-pyrido[3,2,1-jk]carbazol-6-one 3-benzyl-2-oxo-1,2-dihydroquinolin-4-yl 4-methylbenzene-1-sulfonate 4-(tert-butyl)-N-(2-methylquinolin-6-yl)benzamide N-(2-methylquinolin-6-yl)-2-furamide ethyl 7-hydroxy-5-oxo-2,3-dihydro-1H,5H-pyrido[3,2,1-ij]quinoline-6-carboxylate 4-hydroxy-1-methyl-3-(phenylthio)-1,2-dihydroquinolin-2-one 1-benzyl-3-chloro-4-hydroxy-1,2-dihydroquinolin-2-one 3-[1-(4-methylphenyl)-1H-1,2,3,4-tetraazol-5-yl]quinolin-4-amine 7,8-dichloro-1,3-dimethylpyrimido[4,5-b]quinoline-2,4(1H,3H)dione ethyl 6-chloro-2-methyl-4-phenylquinoline-3-carboxylate 2-chloro-6H-[1]benzothiopyrano[3,4-b]quinoline-12-carboxylic acid

base component classifiers. The multiple classifier system, also known as ensemble classifier generation, is an effective way of improving classification performance (51). Much research has shown that the classification performance with a multiple classifier system is superior to that of individual classifiers (5256). Accuracy is improved by combining predictions of a set of classifiers. Given a single classifier, the base classifier, a set of classifiers can be generated and used to build the multiple classifier system. Thus, finding a more efficient way to combine multiple classifiers, in other words, a variety of ways to create multiple classifier systems, has drawn much attention from many scientific fields. Recently, new ensemble creation methods have been proposed in the field of machine learning, which generates an ensemble of classifiers from a single classifier (52). Within this body of research, the main efforts have been focusing either on methods to partition the set of training samples [e.g., bagging and boosting (53)] or on methods for the selection of subsets of the original feature space [e.g., random subspace method (56)]. In this study, we have chosen bagging, one of the most widely used methods for creating subsets of training samples for constructing the multiple classifier system. This method, first proposed by Breiman (53), generates N individual component classifiers using N training subsets, each of which is drawn with replacement from the original training set. The final classifier is then formed by aggregate the component classifiers by majority voting. The bagging algorithm is represented by Figure 4. In step 2 of this algorithm, a subsetting procedure was applied (see Figure 5). The whole data set of compounds was first split into two classes: genotoxic and nongenotoxic. Within each class group, the compounds were pseudorandomly split into three groups, each containing equal

0.99 1.07 1.01 1.07 1.15 5.93 0.96 0.95 3.86 0.99 1.02 1.01 1 0.99 1 1.07 1.06 0.96 1.02 1.17 1.6 1.07 1.55 1.05 1.11 0.98 1.08 1.49 1.01 1.14 1.06 1.07 3.23 0.99 1.18 0.99 1.08 1.14 1.13 1.13

1.04 1.22 1.03 1.27 1.56 1.39 1.03 1.02 1.22 1.06 1 1.07 1.09 1.09 1.36 1.17 1.14 1.17 1.36 1.52 2.07 1.14 1.89 1.08 1.39 1.47 1.08 1.45 1.03 1.18 1.19 1.11 3 1.05 1.14 1.28 1.15 1.07 1.06 1.21

1 1 1 2 2 2 1 1 2 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 1 2 1 1 1 1 2 1 1 2 1 1 1 1

2* 1 2* 1* 1* 2 1 1 2 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 1 2 1 1 1 1 2 1 1 2 1 1 1 1

or approximately equal number of members. Finally, each data set distribution was formed by a leave one group out method; that is, two subgroups of compounds in each class group were

Figure 4. Procedure of the bagging algorithm. combined to form the training set, and the third group was used as the prediction set. This procedure was repeated nine times, and nine data set distributions resulted for multiple classifier system development.

Results and Discussion Quinolone Derivatives. Individual PNN Classification Models. The genotoxicity of quinolone deriva-


Chem. Res. Toxicol., Vol. 18, No. 3, 2005 435 Table 4. Confusion Matrix for the Quinolone Derivatives Using the Multiple Classifier System whole data set (85 quinolone derivatives) predicted class actual class

nongenotoxic

genotoxic

nongenotoxic genotoxic

60 2 7 16 overall % accuracy: 89.4% overall % correct for genotoxic class: 69.6% overall % correct for nongenotoxic class: 96.8%

Figure 5. Data set distribution formation for multiple classifier system development. Table 3. Classification Rate and Prediction Rate of the Nine Individual Models for Quinolone Derivatives model no.

1

2

3

4

5

6

7

8

9

AVE STD

TSET (%) 87.5 96.5 93.0 91.1 93.0 89.5 91.1 87.7 73.7 89.2 6.5 PSET (%) 55.2 64.3 67.9 62.0 71.4 75.0 62.1 60.7 67.9 65.2 6.0

tives was well-predicted with individual PNN classification models and a multiple classifier system. Simulated annealing was coupled with the PNN fitness evaluator to select subsets of descriptors with model sizes ranging from three to 10 descriptors from the reduced pool. This model development process was iterated nine times, one for each data set distribution. Individual PNN classification models with 3-10 descriptors were thus developed for each of the nine data set distributions. Among all of the individual PNN models, a single best-performing PNN model was then selected for each data set distribution with the criteria of high classification rate and a comparable prediction rate. Thus, a total of nine best individual PNN models were chosen from the pool of individual models for predicting the genotoxicity of the quinolone derivatives, and they composed the final pool of individual PNN models for subsequent studies. The nine individual best-performing PNN models’ classification and prediction rates are listed in Table 3. The average classification rate of the nine models was 89.2 ( 6.5%, and the prediction rate was 65.2 ( 6.0%. Of the nine models, five models had classification rates above 90%, and three additional models had classification rates above 85% and close to 90%. With only one exception, all models had a prediction rate ranging from the low 60s to 75%. Overall, the nine individual PNN classification models showed good predictability of the genotoxicity of the quinolone derivatives, and the results were consistent throughout the nine models. Multiple Classifier System. As mentioned earlier, the nine best individual PNN classification models were then aggregated to build the multiple classifier system. The final ensemble model combined the predictions of the set of nine classifiers and provided the final classification decision by majority voting. The confusion matrix of this multiple classifier system is presented in Table 4. The ensemble model was able to correctly predict 60 of 62 nongenotoxic compounds (96.8%) and 16 of 23 genotoxic compounds (69.6%). The overall classification rate was 89.4%; the model accurately predicted 76 out of 85 compounds’ true class identity. The results of the multiple classifier system showed that the prediction rate

was indeed improved, especially for the genotoxic compounds. Of the nine individual models, the classification rates for genotoxic class ranged from 56.25 to 87.50% (most values lie in low 80s) and the prediction rates were around the upper 20s and low 30s with a high at 57.14%. Clearly, there was a significant improvement shown by the multiple classifier system in predicting the genotoxic compounds’ class identity. The multiple classifier system’s predictive ability was almost three times more accurate than individual PNN model provided in predicting the genotoxic compounds. The multiple classifier system not only provided satisfying predictive ability of the genotoxicity of the quinolone derivatives but also gave some insights about the main determinants of genotoxicity of quinolone derivatives. The determinants were captured by examining the information carried by the descriptors contained in the ensemble model (also the nine individual models). These descriptors were listed in Table 5 with their brief definitions. The descriptor pool contained a total of 45 descriptorss17 topological, eight geometric, four electronic, and 16 HPSA descriptors (describing a molecule’s hydrophilicity/hydrophobicity). Their appearance frequency in the nine individual models is presented in Figure 6. Among the topological descriptors, several of them provided information on molecular connectivity and rigidity of a molecule. The electronic descriptor category contained the least number of descriptors; however, they suggested that the electronic environment of molecules is also important in genotoxicity mechanism. The HPSA descriptors constituted the second largest category. Their frequent appearance indicted that hydrophilic/hydrophobic character of a molecule was an important determinant in the causes of genotoxicity of quinolone derivatives as well as carcinogenicity. The descriptors that had distinct values between the two classes provided a more in depth view about the determinants of the genotoxicity. Twelve descriptors showed large differences between the two classes. They are listed in Table 6. ECCN-1 (57) is a topological descriptor that calculates the eccentric connectivity index of a molecule, a sum total of the product of eccentricity and degree of each vertex in a hydrogen-suppressed molecular graph having n total vertices. It encodes the information on size and degree of branching of the molecule. The next two descriptors in the list, PND-1 and PND-6 (58), encode topological information of a molecule. PND-1 calculates superpendentic index of a molecule with all pendent vertices considered, whereas in PND-6 only pendent halogens are considered. This index measures the distances from only pendent vertices on a hydrogen-suppressed graph. A higher value of the index indicates a high degree of branching and a larger number of terminal vertices between the terminal atoms in a molecule. Thus, these types of descriptors provide useful

436


He et al.

Table 5. Descriptors Appeared in the Nine Individual Models for the Quinolone Derivatives descriptor name KAPA-6 V5C-10 V5PC-13 V6CH-18 S3C-8 MOLC-9 NDB-13 WTPT-4 3SP2-1 1SP3-1 2SP3-1 SYMM EMIN-1 ELOW-1 ECCN-1 PND-1 PND-6 MOMI-2 MOMI-4 MOMI-6 SHDW-1 SHDW-2 SHDW-6 L/B-2 QNEG QPOS DIPO LUMO PPHS-1 PPHS-2 PNHS-1 PNHS-2 DPHS-1 DPHS-2 FPHS-1 FPHS-2 FNHS-2 WPHS-1 WPHS-2 WNHS-1 WNHS-2 RPH-1 RNHS-1 LOGP

descriptor meaning third-order κ-index corrected for no. of atoms fifth-order valence cluster fifth-order valence path cluster sixth-order valence chain third-order simple cluster average distance sum connectivity no. of double bonds sum of path lengths starting from oxygens doubly bound carbon bound to three other carbons singly bound carbon bound to one other carbon singly bound carbon bound to two other carbons topological symmetry search descriptor minimum atomic E-state value through-space distance between EMIN and EMAX (geometry dependent) eccentric connectivity index superpendentic index-all pendent vertices considered superpendentic index-only pendent halogens considered Y-principal component of moment of inertia X-principal component of moment of inertia Y-principal component of moment of inertia/ Z-principal component of moment of inertia shadow area projected onto the XY plane shadow area projected onto XZ plane standardized shadow area projected onto YZ plane the length-to-breadth ratio of orientation, which defines a box of minimum area partial charge on the most negative atom partial charge on the most positive atom electric dipole moment energy of the lowest unoccupied molecular orbital partial hydrophobic surface area atomic logP constant weighted hydrophobic surface area partial hydrophilic surface area atomic logP constant weighted hydrophilic surface area partial hydrophobic surface areaspartial hydrophilic surface area atomic logP constant weighted hydrophobic surface areasatomic logP constant weighted hydrophilic surface area partial hydrophobic surface area/total molecular surface area atomic logP constant weighted hydrophobic surface area/total molecular surface area atomic logP constant weighted hydrophilic surface area/total molecular surface area partial hydrophobic surface area × total molecular surface area/1000 atomic logP constant weighted hydrophobic surface area × total molecular surface area/1000 partial hydrophilic surface area × total molecular surface area/1000 atomic logP constant weighted hydrophilic surface area × total molecular surface area most positive logP constant/sum total hydrophilicity MAX SAMMNEG × most negative logP constant/sum total hydrophilicity calculated logarithm of n-octanol/ water partition coefficient

information about the size and shape of a molecule relative to the number of pendent atoms. MOMI-2 and MOMI-3 (59) calculate the Y- and Z-principal components of moments of inertia, respectively. They are also a

Figure 6. Descriptors appearance frequency of in the nine individual models for the quinolone derivatives.

Figure 7. Descriptors appearance frequency of in the nine individual models for the quinoline derivatives. Table 6. Descriptors that Show Big Value Difference between the Gentoxic and Nongenotoxic Classes for Quinolone Derivatives descriptor

meaning

ECCN-1 PND-1

eccentric connectivity index superpendentic index-all pendent vertices considered superpendentic index-only pendent halogens considered Y-principal component of moment of inertia Z-principal component of moment of inertia shadow area projected onto XZ plane partial hydrophobic surface area partial hydrophilic surface area partial hydrophobic surface areaspartial hydrophilic surface area atomic logP constant weighted hydrophobic surface areasatomic logP constant weighted hydrophilic surface area partial hydrophobic surface area × total molecular surface area/1000 atomic logP constant weighted hydrophobic surface area × total molecular surface area/1000

PND-6 MOMI-2 MOMI-3 SHDW-2 PPHS-1 PNHS-1 DPHS-1 DPHS-2 WPHS-1 WPHS-2

measure of the overall size of each molecule. The last topological descriptor is SHDW-2 (35, 38), a descriptor that calculates the shadow area projected onto the XZ plane. As other descriptors, it describes a molecule’s shape and size, which might be significant in steric interaction between the target compound and the active site. The last six descriptors in the list are HPSA descriptors. They encode information about a molecule’s hydrophobicity and hydrophilicity. PPHS-1 (44) calculates partial hydrophobic surface area. PNHS-1 (44) calculates partial hydrophilic surface area. The variations between the partial surface area for both hydrophobic and hydrophilic regions are assessed by DPHS-1 and DPHS-2 (44). DPHS-1 finds the difference between partial hydrophobic surface area and partial hydrophilic surface area, whereas DPHS-2 calculates the difference between the atomic logP constant weighted surface areas of the two regions. The last two HPSA descriptors, WPHS-1 and WPHS-2 (44), encode information on surfaceweighted hydrophilic and hydrophobic surface areas.



Table 7. Classification Rate and Prediction Rate of the Nine Individual Models for Quinoline Derivatives model no.

1

2

3

4

5

6

7

8

9

AVE STD

TSET (%) 92.2 88.3 93.4 92.1 85.7 90.9 88.2 92.2 93.5 90.7 2.7 PSET (%) 60.5 60.5 61.5 61.5 65.8 63.2 69.2 57.9 52.6 61.4 4.7

All of the above descriptors have a higher average descriptor value for the genotoxic quinolone derivatives. On the basis of the information that they encode, it can be concluded that for quinolone derivatives, genotoxic compounds tend to be larger in size than nongenotoxic compounds. They have a more branched structure. Genotoxic compounds have more accessible halogen atoms, and this implies that surface accessibility of halogen atoms contributes to the resultant genotoxicity. The higher superpendentic index seen with genotoxic compounds indicates that a larger number of terminal vertices or longer path length between the pendent atoms in the molecule tend to increase a molecule’s genotoxicity level. It also suggests that a primary halogen atom attached to an aromatic system can be a major contributor to the genotoxicity of a quinolone derivative. The major molecular property, hydrophobicity/hydrophilicity, is another distinguishable feature between genotoxic and nongentoxic quinolone derivatives. Genotoxic compounds have higher partial hydrophobic surface area than that of nongentoxic compounds. Thus, quinolone derivatives that are more hydrophobic are more genotoxic than the derivatives that possess more hydrophilicity. In conclusion, the size and shape of a molecule and the hydrophobicity property are major contributors in the resultant quinolone derivatives’ genotoxicity. Quinoline Derivatives. Individual PNN Classification Models. Nine individual best-performing PNN classification models were selected from all of the individual PNN models developed with each of the nine data set distributions. The descriptors composing each of the individual PNN classification models were selected with simulated annealing coupled to the PNN fitness evaluator. Model sizes ranged from three to 10 descriptors. The nine best individual PNN classification models were used as the base component classifiers for building the multiple classifier system. The classification decisions made by each of the nine models for each of the quinoline derivatives in the data set were aggregated by majority voting to produce the final prediction decision for each compound. The aggregated classifiers formed the multiple classifier system. Multiple Classifier System. The nine selected individual PNN classification models’ predictive performance are presented in Table 7 with their classification rates and prediction rates. The average classification rate was 90.7 ( 2.7%, and the average prediction rate was 61.4 ( 4.7%. The nine models all had a good classification rate, given the fact that the worst performing model had a classification rate of 85.7%. The prediction rates on the other hand were much lower; they ranged from 52.6 to 69.2%. With the low prediction rates, improvements of the predictive ability of the genotoxicity of quinoline derivatives were desirable. The prediction results provided by the multiple classifier system showed that the new system was a significant contributor in improving the model’s performance. The multiple classifier system had an overall classification rate of 92.2%. It correctly predicted 106 of 115 quinoline derivatives. For each class, the model was able to predict the true class identity very

Table 8. Confusion Matrix for the Quinoline Derivatives Using the Multiple Classifier System whole data set (115 quinoline derivatives) predicted class actual class

nongenotoxic

genotoxic



well. Thirty-nine of the 44 genotoxic compounds were 88.6% correctly predicted (Table 8). Sixty-seven of 71 nongenotoxic compounds were predicted correctly (94.4%). The results achieved by the multiple classifier system clearly indicated that the combined classifier had a superior predictability over individual classifiers. It not only improved the overall classification rate, but most importantly, it significantly increased the predictive ability of the model on predicting genotoxic quinoline derivatives’ true class identity. Also, the true-positive prediction is highly desirable in the pharmaceutical industry. With the high predictive ability of the multiple classifier system, it is interesting to know if the system could provide more information about the determinants of the resultant genotoxicity of quinoline derivatives. The descriptors composing the nine individual PNN classification models were gathered and studied. These descriptors were listed in Table 9. This descriptor pool contained 73 descriptors. Their appearance frequency was calculated and categorized based on the information that they carry (Figure 7). The topological category contained 18 descriptors. The second largest category contained 10 hybrid descriptors. The last three categories contained six HPSA descriptors, six geometric descriptors, and three electronic descriptors. The descriptor meanings are shown in Table 9. These descriptors mainly encode information on the size, shape, electronic property, and molecular features that are important for intermolecular hydrogen bonding for quinoline derivatives. They also characterized quinoline derivatives’ ability to engage in polar interactions and their hydrophobicity and hydrophilicity. Some of the genotoxic and nongenotoxic quinoline derivatives’ descriptor values showed significant differences between their averaged descriptor values. The descriptors listed in Table 10 were the ones that show large differences between the genotoxic and the nongenotoxic quinoline derivatives. N6C-18, NDB-13, NRA-18, ESUM-2, PND2, PND-5, and PND-6 are all topological descriptors. Only one geometric and one hybrid descriptor are includeds MOMI-1 and RNCS-1. The last three descriptors, PPHS1, DPHS-2, and WPHS-1, were HPSA descriptors. N6C18 calculates the number of sixth-order clusters in the molecule. NDB-13 and NRA-18 calculate the number of double bonds and ring atoms in a molecule. They encode information about the size and the degree of branching in the quinoline derivatives. ESUM-2 (60) gives the sum of electrotopological-state values of all heteroatoms. PND2, PND-5, and PND-6 (58) are superpendentic indices with only pendent carbons, oxygens, and halogens considered, respectively. The genotoxic quinoline derivatives possessed higher values for these descriptors, indicating that compounds with a higher value of the index tend to be more genotoxic. MOMI-1 (59) calculated the X-principal component of moment of inertia. It is also a measure of the overall size of each molecule. RNCS-1 (42) is a hybrid descriptor; it combines the geometric and elec-

438


Table 9. Descriptors Appeared in the Nine Individual Models for the Quinoline Derivatives descriptor name V6C S6C-11 N6C-18 MOLC-7 NDB-13 NRA-18 WTPT-3 WTPT-4 WTPT-5 3SP2-1 1SP3-1 2SP3-1 SYMM-15 EAVE-2 ESUM-2 PND-2 PND-5 PND-6 MOMI-1 MOMI-2 MOMI-4 MOMI-5 SHDW-5 SHDW-6 DIPO LUMO HARD PPSA-3 FPSA-3 FNSA-1 WNSA-3 RNCG-1 RNCS-1 SCDH-2 SAAA-2 CHAA-1 CHAA-2 PPHS-1 DPHS-2 FPHS-2 WPHS-1 RPH-1 RNH-1 RNHS-1

descriptor meaning sixth-order valence cluster sixth-order simple cluster no. of sixth-order clusters path-3 cluster molecular connectivity count of double bonds count of ring atoms sum of path lengths starting from heteroatoms sum of path lengths starting from oxygens sum of path length starting from nitrogens count of doubly bound carbon bound to three other carbons count of singly bound carbon bound to one other carbon singly bound carbon bound to two other carbon topological symmetry index average E-state values over all heteroatoms sum of E-state values over all heteroatoms superpendentic index-only pendent carbons considered superpendentic index-only pendent oxygens considered superpendentic index over all pendant halogen atoms X-principal component of moment of inertia Y-principal component of moment of inertia X-principal component of moment of inertia/ Y-principal component of moment of inertia Y-principal component of moment of inertia/ Z-principal component of moment of inertia standardized shadow area projected onto the XZ plane standardized shadow area projected onto the YZ plane electric dipole moment energy of the lowest unoccupied molecular orbital absolute hardness, 1/2(HOMO - LUMO) charge weighted partial positive surface area fractional positively charged partial surface area (partial negative surface area - sum of surface area on negative parts of molecule)/ total molecular surface area (charge weighted partial negative surface area × total molecular surface area)/1000 (relative negative charge - most negative charge)/total negative charge relative negatively charged surface area sum of (surface area × charges) on donatable hydrogen/no. of donatable hydrogens sum of surface area on acceptor atoms/ number of acceptor atoms sum of charges on acceptor atoms sum of charges on acceptor atoms/ number of acceptor atoms partial hydrophobic surface area atomic logP constant weighted hydrophobic surface area - atomic logP constant weighted hydrophilic surface area atomic logP constant weighted hydrophobic surface area/total molecular surface area partial hydrophobic surface area × total molecular surface area/1000 most positive logP constant/sum total hydrophilicity most negative logP constant/sum total hydrophilicity max SAMNEG × most negative logP constant/ sum total hydrophilicity

tronic properties of a molecule. It encodes information about the relative negatively charged surface area. PPHS-1 (44) calculates the partial hydrophobic surface area. DPHS-2 (44) calculates the difference between

He et al. Table 10. Descriptors that Show Big Value Differences between the Gentoxic and the Nongenotoxic Classes for the Quinoline Derivatives descriptor

meaning

N6C-18 NDB-13 NRA-18 RNCS-1 ESUM-2 PND-2

no. of sixth-order clusters count of double bonds count of ring atoms relative negatively charged surface area sum of E-state values over all heteroatoms superpendentic index-only pendent carbons considered superpendentic index-only pendent oxygens considered superpendentic index over all pendant halogen atoms X-principal component of moment of inertia relative negatively charged surface area partial hydrophobic surface area atomic logP constant weighted hydrophobic surface area - atomic logP constant weighted hydrophilic surface area partial hydrophobic surface area × total molecular surface area/1000

PND-5 PND-6 MOMI-1 RNCS-1 PPHS-1 DPHS-2 WPHS-1

Table 11. Quinolone Derivatives’ Classification and Prediction Rates of the Nine Individual Monte Carlo Models Developed to Test for Chance Correlations model no.

1

2

3

4

5

6

7

8

9

AVE STD

TSET (%) 46.4 54.4 61.4 39.3 63.2 57.9 76.8 57.9 70.2 58.6 11.4 PSET (%) 62.1 67.9 71.4 75.9 67.9 67.9 68.96 67.9 64.3 68.25 3.92

Table 12. Confusion Matrix for the Quinolone Derivatives of the Monte Carlo Multiple Classifier System whole data set (85 quinolone derivatives) predicted class actual class

nongenotoxic

genotoxic



atomic logP constant weighted hydrophobic surface area and hydrophilic surface area. WPHS-1 (44) calculates surface-weighted partial hydrophobic surface area. All of these three descriptors encode information about quinoline derivatives’ hydrophobicity/hydrophilicity properties. Surprisingly, unlike the situation with quinolone derivatives in which the genotoxic compounds had higher average descriptor values for these descriptors, the nongenotoxic quinoline derivatives possessed higher values for these descriptors. A similar situation was seen with descriptors PND-2 and PND-5 as well. Randomization Experiments. To ensure that chance correlation did not play a role in model construction, scrambling experiments were performed with the class identities randomly scrambled 10 times. Each compound was assigned with a new class label, which differs from its true class. The model building procedure was conducted in the same way as before as in the real experiment, except each compound had a noncorresponding class label in these experiments. The experiments were performed with all nine data set distributions; thus, the class labels were randomly scrambled 10 times for all of the individual models. The assumption underlying the chance correlation test was the following: if the observed classification rates were not caused by chance correlations, then random scrambling of the class labels should yield models with low classification results.



Table 13. Quinoline Derivatives’ Classification and Prediction Rates of the Nine Individual Monte Carlo Models Developed to Test for Chance Correlations model no.

1

2

3

4

5

6

7

8

9

AVE

STD

TSET (%) PSET (%)

37.66 39.5

61.0 55.3

38.2 56.4

30.3 58.97

42.9 47.40

61.0 60.5

36.8 66.7

54.5 52.6

59.7 57.9

46.9 55.03

12.1 7.88

Table 14. Confusion Matrix for the Quinoline Derivatives of the Monte Carlo Multiple Classifier System whole data set (115 quinoline derivatives) predicted class actual class

nongenotoxic

genotoxic



Quinolone Derivatives. The classification rates (58.6 ( 11.4%) and prediction rates (68.25 ( 3.92%) of the nine random models are shown in Table 11 with their averages and standard deviations. The predictive performance of the multiple classifier system built on the nine random models is shown in Table 12. The overall classification of the random multiple classifier system was 64.7% with an accurate rate of 21.7% for genotoxic class and 80.6% for nongenotoxic class. Quinoline Derivatives. The nine random individual PNN models had an average classification rate of 46.9 ( 12.1% and a prediction rate of 55.03 ( 7.88%. The rates for each individual random model can be found in Table 13. Table 14 presents the confusion matrix of the random multiple classifier system. The overall classification rate of the system was 47.8%; it was only able to predict 15.9% of the genotoxic quinoline derivatives correctly. The genotoxic class classification was better, at an accurate rate of 67.4%. Overall, the extremely low overall prediction rates as well as the high genotoxic class misclassification rates given by the random multiple classifier systems for quinolone and quinoline derivatives show that chance correlation was unlikely to have an influence on the classification rates during the model development in the real experiment.

Conclusions In this paper, we describe a process of predicting the genotoxicity of quinolone and quinoline derivatives by developing individual PNN classification models and then building multiple classifier systems. For each series of derivatives, nine best-performing individual PNN classification models were chosen from all of the individual models developed with nine different data set distributions to link the derivatives’ genotoxicity with their molecular structure. The nine best individual PNN models selected for quinolone derivatives provided an average classification rate of 89.2% and an average prediction rate of 65.2%. With the quinoline derivatives, the nine best models had an average classification rate of 90.7% and an average prediction rate of 61.4%. The results demonstrated that the genotoxicity of these derivatives could be reliably predicted; especially high accuracy was seen with nongenotoxic class predictions. The nine best individual PNN models also served as base components in constructing the multiple classifier system that effectively alleviated the problem of low genotoxic class prediction accuracy

associated with the individual models. The multiple classifier systems developed for the quinolone and quinoline derivatives had an overall classification rate of 89.4 and 92.2%, respectively. The multiple classifier system not only produced a higher classification rate than individual models, but most importantly, higher prediction rates on genotoxic compounds were seen with the multiple classifier system for each series of derivatives (69.6 and 88.6%). The comparison of the results given by the individual PNN models and the multiple classifier system demonstrated that adopting the multiple classifier system approach of bagging could significantly improve the classification performance in this application. The information carried by the descriptors that appeared in the nine best-performing individual models supplied possible explanations about the linkage between the molecular structure and the resultant genotoxicity. The size, shape of the quinolone and quinoline derivatives, and their hydrophobic/hydrophilic character were found to be the main determinants of the resultant genotoxicity. In conclusion, the PNN multiple classifier system developed in this study could be a beneficial contributor toward predictive toxicology in the design of less carcinogenic bioactive compounds.

Acknowledgment. This project was supported by Bristol-Myers Squibb.

References (1) Lohray, B. B., Baskaran, S., Rao, B. S., Mallesham, B., Bharath, K. S. N., Yadi Reddy, B., Venkateswarlu, S., Ashok Sadhukhan, K., Sitaram Kumar, M., and Sarnaik, H. M. (1998) Novel quinolone derivatives as potent antibacterials. Bioorg. Med. Chem. Lett. 8 (5), 525-528. (2) Cohen, M. A., Huband, M. D., Mailloux, G. B., Yoder, S. L., Roland, G. E., and Heife, C. L. (1991) In vitro antibacterial activities of the fluoroquinolones PD 117596, PD 124816, and PD 127391. Diagn. Microbiol. Infect. Dis. 14 (3), 245-258. (3) Jefferson, E. A., Swayze, E. E., Osgood, S. A., Miyaji, A., Risen, L. M., and Blyn, L. B. (2003) Antibacterial activity of quinolonemacrocycle conjugates. Bioorg. Med. Chem. Lett. 13 (10), 16351638. (4) Appelbaum, P. C., and Hunter, P. A. (2000) The fluoroquinolone antibacterials: past, present and future perspectives. Int. J. Antimicrob. Agents 16 (1), 5-15. (5) Bush, K., and Macielag, M. (2000) New approaches in the treatment of bacterial infections. Curr. Opin. Chem. Biol. 4 (4), 433-439. (6) Ma¨kinen, M., Forbes, P. D., and Stenba¨ck, F. (1997) Quinolone antibacterials: A new class of photochemical carcinogens. J. Photochem. Photobiol. B 37 (3), 182-187. (7) Lescher, G. Y., Froelich, E. D., Gruet, M. D., Bailey, J. H., and Brundage, R. P. (1962) 1,8-Naphthyridine derivatives. A new class of chemotherapeutic agents. J. Med. Pharm. Chem. 5, 1063-1068. (8) Aylward, J. M. (1993) Hydroxychloroquine and chloroquine: Assessing the risk of retinal toxicity. J. Am. Optom. Assoc. 64, 787-797. (9) Bernstein, H. N., Zvaifler, N., Rubin, M., and Mansour, A. M. (1963) Ocular deposition of chloroquine. Invest. Ophthalmol. Vision Sci. 2 (4), 384-392. (10) Bataille, L., Rahier, J., and Geubel, A. (2002) Delayed and prolonged cholestatic hepatitis with ductopenia after long-term ciprofloxacin therapy for Crohn’s disease. J. Hepatol. 37 (5), 696699. (11) U.S. Environmental Protection Agency (U.S. EPA) (1985) Health and Environmental Effects Profile for Quinoline, Environmental Criteria and Assessment Office, EPA, Cincinnati, OH.

440


(12) LaVoie, E. J., Defauw, J., and Fearly, M., et al. (1991) Genotoxicity of fluoroquinolines and methylquinolines. Carcinogenesis 12, 217220. (13) Siporin, C. (1989) The evolution of fluorinated quinolones: Pharmacology, microbiological activity, clinical uses, and toxicities. Annu. Rev. Microbiol. 43, 601-627. (14) Takahashi, K., Kamiya, M., and Sengoku, Y., et al. (1988) Deprivation of the mutagenicity property of quinoline: Inhibition of mutagenic metabolism by fluorine substitution. Chem. Pharm. Bull. 36 (11), 4630-4633. (15) Saeki, K., Takahashi, K., and Kawazoe, Y. (1993) Metabolism of mutagenicity deprived 3-fluoroquinoline: Comparison with mutagenic quinoline. Biol. Pharm. Bull. 16 (3), 232-234. (16) Jurs, P. C., Chou, J. T., and Yuan, M. (1979) Computer-Assisted Drug Design (Olson, E. C., and Christofferson, R. E., Eds.) American Chemical Society, Washington, DC. (17) Stuper, A. J., Brugger, W. E., and Jurs, P. C. (1979) ComputerAssisted Studies of Chemical Structure and Biologial Function, Wiley, New York. (18) He, L., Jurs, P. C., Custer, L. L., Durham, S. K., and Pearl, G. M. (2003) Predicting the genotoxicity of polycyclic aromatic compounds from molecular structure with different classifiers. Chem. Res. Toxicol. 16 (12), 1567-1580. (19) Wessel, M. D., and Jurs, P. C. (1995) Prediction of normal boiling points for a diverse set of industrially important organic compounds from molecular structure. J. Chem. Inf. Comput. Sci. 15, 841-850. (20) Mitchell, B. E., and Jurs, P. C. (1998) Prediction of aqueous solubility of organic compounds from molecular structure. J. Chem. Inf. Comput. Sci. 38, 489-496. (21) Mattioni, B. E., and Jurs, P. C. (2000) Development of quantitative structure-activity relationship, classification models for a set of carbonic anhydrase inhibitors. J. Chem. Inf. Comput. Sci. 42 (1), 94-102. (22) Patankar, S. J., and Jurs, P. C. (2002) Prediction of glycine/NMDA receptor antagonist inhibition from molecular structure. J. Chem. Inf. Comput. Sci. 42 (5), 1053-1068. (23) Quillardet, P., and Hofnung, M. (1993) The SOS chromotest: A review. Mutat. Res. 297 (3), 235-279. (24) Hofnung, M., and Quillardet, P. (1998) The SOS chromotest, a colorimetric assay based on the primary cellular responses to genotoxic agents. Ann. N. Y. Acad. Sci. 534, 817-825. (25) Gudas, L. J., and Pardee, A. B. (1976) DNA synthesis inhibition, the induction of protein X in Escherichia coli. J. Mol. Biol. 101, 459-477. (26) Engle, L. J., Manes, S. H., and Drlica, K. (1982) Differential effects of antibiotics inhibiting gyrase. J. Bacteriol. 149, 92-98. (27) Benbrook, D. M., and Miller, R. V. (1986) Effects of norfloxacin on DNA metabolism in Pseudomonas aeruginosa. Agents Chemother. 29, 1-6. (28) Stewart, J. P. P. (1990) MOPAC: A semiempirical molecular orbital program. J. Comput.-Aided Mol. Des. 4, 1-105. (29) Stewart, J. P. P. MOPAC 6.0, Quantum Chemistry Program Exchange, Program 455, Indiana University, Bloomington. (30) Dewar, M. J. S., Zoebisch, E. G., Healy, E. F., and Steward, J. P. P. (1985) AM1: A new general purpose quantum mechanical molecular model. J. Am. Chem. Soc. 107, 3902-3909. (31) Kier, L. B., and Hall, L. H. (1986) Molecular Connectivity in Structure-Activity Analysis, Research Studies Press Ltd., John Wiley & Sons, New York. (32) Balaban, A. T. (1982) Highly discriminating distance-based topological index. Chem. Phys. Lett. 89, 399-404. (33) Madan, A. K., Gupta, S., and Singh, M. (1999) Superpendentic index: A novel highly discriminating topological descriptor for predicting biological activity. J. Chem. Inf. Comput. Sci. 39, 9. (34) Kier, L. B., and Hall, L. H. (2000) Intermolecular accessibility: The meaning of molecular connectivity. J. Chem. Inf. Comput. Sci. 40, 792-795. (35) Rohrbaugh, R. H., and Jurs, P. C. (1987) Molecular shape, the prediction of high-performance liquid chromatographic retention indexes of polycyclic aromatic hydrocarbons. Anal. Chem. 59, 1048-1054.

He et al. (36) Goldstein, H. (1950) Classical Mechanics, pp 144-156, AddisonWesley, Reading, MA. (37) Pearlman, R. S. (1980) Molecular Surface Area and Volumes and Their Use in Structure/Activity Relationships (Yalkowsky, S. H., Sinkula, A. A., and Valvani, S. C., Eds.) Marcel Dekker, New York. (38) Stouch, T. R., and Jurs, P. C. (1986) A simple method for the representation, quantification, and comparison of the volumes and shapes of chemical compounds. J. Chem. Inf. Comput. Sci. 26, 4-12. (39) Katritzky, A. R., Mu, L., Lobanov, V. S., and Karelson, M. (1996) Correlation of boiling points with molecular structure. 1. A training set of 298 diverse organics and a test set of 9 simple organics. J. Phys. Chem. 100, 10400-10407. (40) Dixo, S. L., and Jurs, P. C. (1992) Atomic charge calculations for quantitative structure-property relationships. J. Comput. Chem. 18, 492-504. (41) Abraham, R. J., and Smith, P. E. (1987) Charge calculations in molecular mechanics IV: A general method for conjugated systems. J. Comput. Chem. 13, 288-297. (42) Stanton, D. T., and Jurs, P. C. (1990) Development and use of charged partial surface area structural descriptors in computerassisted quantitative structure-property relationship studies. Anal. Chem. 62, 2323-2329. (43) Vinogradov, S. N., and Linnell, R. H. (1971) Hydrogen Bonding, Van Nostrand Reinhold, New York. (44) Brain Mattioni’s Thesis. (2003) Pennsylvania State University. (45) Russell, C. J., Dixon, S. L., and Jurs, P. C. (1992) Computer assisted study of the relationship between molecular structure, Henry’s law constant. Anal. Chem. 64, 1350. (46) Sutter, J. M., Dixon, S. L., and Jurs, P. C. (1995) Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J. Chem. Inf. Comput. Sci. 35, 77-84. (47) Luke, B. T. (1994) Evolutionary programming applied to the development of quantitative structure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Comput. Sci. 34, 1279-1287. (48) Specht, D. F. (1990) Probabilistc neural networks. Neural Networks 3, 109-118. (49) Wasserman, P. D. (1993) Advanced Methods in Neural Networks, Chapter 3, pp 35-55, Van Nostrand Reinhold, New York. (50) Parzen, E. (1962) On estimation of a probability density function and mode. Ann. Math. Statist. 33, 1065-1076. (51) Kittle, Roli, J. F., Eds. (2002) Third International Workshop on Multiple Classifier Systems, Cagliari, Italy, Springer, New York. (52) Dietterich, T. (2000) Ensemble methods in machine learning. Proc. First Int. Workshop Multiple Classifier Systems, 1-15. (53) Breiman, L. (1996) Bagging predictors. Machine Learning 24 (2), 123-140. (54) Breiman, L. (1999) Pasting small votes for classification in large database and on-line. Machine Learning 36 (1-2), 85-103. (55) Dietterich, T. (2000) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40 (2), 139-158. (56) Breiman, L. (2001) Random forests. Machine Learning 45 (1), 5-32. (57) Sharma, V., Goswami, R., and Madan, A. K. (1997) Eccentric connectivity index: A novel highly discriminating topological descriptor for structure-property and structure-activity studies. J. Chem. Inf. Comput. Sci. 37, 273-282. (58) Gupta, S., Singh, M., and Madan, A. K. (1999) Superpendentic index: A novel topological descriptor for predicting biological activity. J. Chem. Inf. Comput. Sci. 39, 272-277. (59) Goldstein, H. (1950) Classical Mechanics, Addison-Wesley, Reading, MA. (60) Kier, L. B., and Hall, L. H. (1990) An electrotopological-state index for atoms in molecules. Pharm. Res. 7, 801-807.

TX049742M

Probabilistic Neural Network Multiple Classifier System for Predicting

Recommend Documents