Applications of computers to toxicological research - Chemical

Applications of computers to toxicological research. Shaomeng ... R. M. Bruce. Journal of Chemical Information and Computer Sciences 2004 44 (5), 1623...
1 downloads 0 Views 869KB Size
Chem. Res. Toxicol. 1993,6, 748-753

748

Applications of Computers to Toxicological Research Shaomeng Wang and G. W. A. Milne’ Laboratory of Medicinal Chemistry, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892 Received June 30. 1993

Introduction Toxicologyis one of the older and more well-established biological sciences which has perhaps been more insulated than other subjects from the computer revolution which is now in ita third decade. Disciplines such as analytical chemistry have been changed beyond recognition by computers, but toxicology, with ita necessary dependence upon biological systems, has proved to be more difficult to manage in a digital environment. Nevertheless, in recent years, unforeseen events have combined to attract chemists and computer scientists into exploringthe use of computers in toxicology. This has led to some interesting developmenta in computer technology. One problem faced by the field of toxicology is that the toxicological requirements for approval of new drugs are stringent and extensive, with the result that the toxicological workup represents a substantial portion of the overall cost of new drug development. A second, quite different problem is that social and political pressures, particularly in Europe, have greatly increased the procedural difficulties for toxicologists who must work with live mammals. Computer science has attempted, with mixed success,to help in these areas by the use of improved statistical methods and estimation techniques, all designed to curtail the use of live animals. These computer techniques, the enthusiasm of animal rights groups notwithstanding, have so far proved to be unable to supplant the use of animals in determining, for example, acute toxicity. The research which has been reported, however, has led to various “spinoffs” which are of interest and possibly of practical value. The accuracy and reliability of the toxicity data predicted by computers are still regarded with circumspection, but the ability of the computer to predict the toxicity of large numbers of compounds, and so prioritize toxicology experiments, is well recognized and widely used as a cost-saving measure. Likewise, the ability of computers to predict biological activities, albeit with little or no precision, has had a substantial impact upon large-scale screening programs which, for the same level of screening (and cost), can thus increase their yield of positives. Some recently published computer programs, which have had this type of impact upon toxicity prediction, are described in the subsequent sections of this paper. What follows is not intended as an exhaustive review. Many groups have published useful work in this area; we have deliberately selected a small number of papers in an effort to illustrate the general approaches in use.

Computer Retrieval of Toxicological Data A vast amount of toxicologicaldata have been measured and recorded during the last 100 years or so, and in the US.,there are two major publicly funded efforts to stay abreast of these data. The National Institute of Occu-

* Corresponding author.

pational Safety and Health (NIOSH),’ an agency of the

U.S.Department of Labor, has reviewed the international scientific literature for data on toxicities since 1970.2The Registry of Toxic Effects of Chemical Substances (RTECS) results from this review and a t the beginning of 1993 contained 106 000 toxicity measurements plus a further 30 000 measurements of mutagenicity, carcinogenicity, teratogenicity etc., on 119 OOO chemicals. This database was first made available by remote dial-up in 1978 (I)and has since been installed on most of the major commercial time-sharing systems? Searching the database is straightforward; chemical substances can be retrieved by name or CAS Registry Number on all hosts and in the Chemical Information System (I)and STN4by chemical structure or substructure. In RTECS, sources in the form of a literature citation are given for all data and this offers a very effective means of locating and retrieving toxicological data. The second government effort to manage published toxicological data is the maintenance by the National Library of Medicine of TOXLINE? a database of abstracts of papers that deal with different aspects of toxicology. This database currently contains 1 121 479 records from the literature since 1980, and a further 707 236 records from the literature between 1965 and 1980 are also online. The basic literature of toxicology is covered adequately by these databases, and their use is routine. Some of the newer databases related to toxicology are interesting, although they are less widely used. A database on residues of heavy metals and xenobiotic organic chemicals in plants (2) contains some 37 000 records, including 900 organic chemicals and 21 heavy metals found in over 350 plant species. These data, derived from papers published since 1926, comprise a resource which could be useful in the toxicology context. In a slightly different area, two groups have built databases of metabolic processes. The first of these, produced at the Sandoz Research Institute (31,treats metabolic pathways as a series of related reactions and manages a prototype database of 400 substances, together ‘Abbreviations: ADAPT, Automated Data Analysis by Pattern Recognition Techniques; CASE, Computer Automated Structure Evaluation;COMPACT,Computer-OpthiA Molecular ParametricAnalysis of Chemical Toxicity: FDA, Food and Drug Administration; HOMO, highestoccupied molecular orbital, LUMO,lowest unoccupied molecular orbital;MOLSTAC,Molecular StructureAnalysisand Classification;NCI/ NTP, National Cancer Institute-National Toxicology Program;NDA, New Drug Application: NIOSH, PAH, polyaromatic hydrocarbon(8); QSAR,quantitativestructureactivityrelationehip(s);QSPR, quantitative structure-property relationship(s);RTECS, Registry of Toxic Effecta of Chemical Substances; SMILES, SimplifiedMolecular Input Line Entry System; TOPKAT, Toxicity Prediction by Komputer-Assisted Technology. * This activity is required by the Occupational Safety and Health Act of 1970 (P.L. 91-596). Dialog Information Services Inc., 3460 Hillview Ave., Palo Alto, CA 94304. Orbit Search Service, InfoPro Technologies,8ooo Westpark Dr., McLean, VA 22102. National Library of Medicine, Bethesda, MI> 20894. ‘The Scientific and Technical Network (STN) was developed by Chemical Abstracts Service of the American Chemical Society, 2640 Olentangy River Rd., Columbus, OH 43210. TOXLINE is maintained and disseminated by the National Library of Medicine, Bethesda, MD 20894.

This article not subject to U.S. Copyright. Published 1993 by the American Chemical Society

Chem. Res. Toxicol., Vol. 6, No.6,1993 749

Forum: Frontiers in Molecular Toxicology with their known metabolic and catabolic transformations. The valuable feature of this database is that it contains not only chemical compounds, but also their biochemical context. The other database, developed at Kansas State University (41,is somewhat similar; it too is a prototype and uses a commercial database management system, allowing examination of metabolic pathways. Both of these programs are experimental, but suggestive perhaps of the logical successor to the now excessively large and complex wall charts that the Boehringer Co. has been painstakingly producing (5) for the last 20 years or so.

Computer Transmission of Data-Computerized NDAs The New Drug Application (NDA) that is required by the Food and Drug Administration (FDA) of pharmaceutical companies seeking to market new drugs can extend to 100 OOO pages of data, much of it toxicological. Most of these data are dealt with by computers both at the company and at FDA, but transmission of data between them still uses hard copy. The use of computer transmission of data has been identified (6) as an FDA goal, and the area has seen much activity in the 5 years since the FDA notice. This is a challenging problem, in which it is important to focus on the goal of increased efficiency, as opposed to simply achieving a computerized system. Many developed countries have the same need in this area, and there is serious effort going on in, for example, Canada and Germany, as well as the US. to use computers to aid in the data submission process (7). The problems include management of graphics and handwritten or machinecollected data, development of software standards, and legal validity of information in different forms. Any one of these can be formidable, but, given the economic rewards that would follow from a reduction in the review time, for example, continued development of these systems is inevitable.

Quantitative Structure-Activity Relationship Studies The related fields of quantitative structure-activity relationship (QSAR) and quantitative structure-activity property (QSPR) studies have been actively exploited during the last 20 years because they promise shortcuts around hugely expensive tasks involving direct laboratory measurement. The QSAR method, in particular, was embraced with enthusiasm by the pharmaceutical industry during the 19809, where a significant effort was made to identify mathematical relationships between chemical structure and biological properties, including toxicity. The goal of a QSAR study is to use a set of chemicals, whose biological activity is known, in an attempt to establish a mathematical relationship between the activity and the structure. This is typically done by postulating an equation which relates the structure, or surrogates for it, with the activity and then seeking a regressive fit of the data to the equation as a means of determining the unknown coefficients in the equation. Numerous programs in this genre have been developed in the last decade or so and include Toxicity Prediction by KomputerAssisted Technology (TOPKAT) by Enslein and coworkers (8, 23-34),the Computer Automated Structure Evaluation program (CASE) developed by Klopman and Rosenkranz (9-14),and the Automated Data Analysis by Pattern Recognition Techniques (ADAPT) program by Jurs et al. (15-21).

Step 1

I

I

*

Simole statistics

* *

Step 2

1

I

Stepwise Regression or Discriminant Analysis

II

ReeresSion on all Possible Subsets

-

Step 3

I

I

Step 4

II

Step 5

1

Identification and Removal of Influential Observations and Outliers

*

1 Final Stepwise Regression or Discriminant Analysis I I Final Validation I

Step 6 Step 7 Step 8

Figure 1. Schematic of the SAR model development of TOPKAT. The first step is assembly of the database, Le., entry of structures and data. The second step involves generation of the parameters necessary for model development. The core of the program consists of steps 3-7, in which statistical techniques are used to find a correlation between the toxicity and some or all of the parameters defined in step 2. The final step (no. 8) is to evaluate the predictive power of the model. (a) TOPKAT. TOPKAT utilizes a "learning database", which contains the chemical structures and the toxicity of the compounds, to establish a QSAR model correlating the chemical structure fragments and structural parameters with the toxicity. Its model development involves 8 steps, as shown in Figure 1. Chemical structures for compounds with known properties are entered into the program in the Simplified Molecular Input Line Entry System (SMILES) notation (221,in which structures are represented by a simple linear string of characters, such as ClCClC(=O)Ofor cyclopropanecarboxylic acid or clccccclC(=O)O for benzoic acid. After assembly of this learning database, which is comprised of the chemical structures and the corresponding biological activities, descriptors are generated on the basis of the chemical structures of the compounds. InTOPKAT, a "MOLSTAC" key is used to produce substructural descriptors, and other descriptors, such as molecular charge descriptors, molecular connectivity, and molecular shape indices, are also generated and included in the database. These descriptors are used as parameters in the subsequent analysis. Stepwise regression and discriminant analysis are employed to find the relationships between the biological activity of interest and the generated substructural and other descriptors. An effort is made to identify unduly influential observations and outliers, which are then discarded from the database. Stepwise regression or discriminant analysis is performed to establish the final QSAR model. The last step, which is very important, is to validate the QSAR model by predicting the biological activity for standard compounds that were not in the original database and comparing the calculated biological activity in these cases with the experimental values from bioassays. Once the QSAR model been established, it is used to assist in property prediction for new compounds, Le., compounds that are not in the database. Descriptors for each new structure are generated and used in conjunction with the database to assemble a picture of the properties of the new molecule. TOPKAT has been used to study the structure-activity relationships for a variety of toxicities, including carcinogenicity (23-29,mutagenicity (Ames test) (24,26,28), teratogenicity (29),skin irritation (Draize test) (30),eye irritancy (Draize test) (18,311, rat oral LDm (24,321,

750 Chem. Res. Toxicol., Vol. 6, No. 6, 1993

Wang and Milne

1 -

Table 11. Prediction of Carcinogenicity by TOPKAT. predicted

actual noncarcinogenic carcinogenic

negative

positive

indeterminate

44 3 47

1 35 36

3 1 4

total Indeterminates: 4/87 = 4.6%. False positives: 1/87 = 1.1%. False negatives: 3/87 = 3.4%. Overall accuracy: 79/87 = 90.8%. Adjusted accuracy: 79/83 = 95.2% (excludes indeterminates).

Gzi2 112 Ama-Tmted

51 Ames +

41 Am=

-

Figure 2. Breakdown on the 222 compounds from Tthe Prediction NCI/ NTP database. The 222 compounds are divided into two subsets: (1)116 carcinogens and (2)106 noncarcinogens. Of the 116 carcinogens, 112were evaluatedin the Ames test, which gave 65 positives and 47 negatives. The TOPKAT prediction for the 112 compounds was 92% correct. Table I. Prediction of Mutagenetic Activity by TOPKAT. actual

negative

nonmutagenic mutagenic

37 3 40

total

predicted positive indeterminate 2 4 49 4 51 8

OIndeterminates: 8/99 = 8.1%. False positives: 2/99 = 2.0%. False negatives: 3/99 = 3.0%. Overall accuracy: 86/99 = 86.9%. Adjusted accuracy 96/91 = 94.5% (excludes indeterminates). daphnia magna 48-h ECw values (32,331, and aerobic biodegradability (34). The possibility that such computer methods could predict chronic toxicity measurements has been examined in considerable detail by a number of groups. The National Cancer Institute’s National Toxicology Program (NCI/ NTP) has systematically examined 222 significant organic compounds for carcinogenicity. Out of these 222 compounds, as shown in Figure 2,116 were carcinogens. For these 116 compounds, 112 of them had been tested in Salmonella histidine reversion mutagenicity assay (Ames test). Of these 112compounds, 65 were found to be positive (genotoxic) in the Ames test while 47 were found to be nongenotoxic. TOPKAT was employed to classify these 112 Ames tested compounds. Ten of the 112 compounds were not entered into the database for modeling because they were mixtures, inorganic, or organometallic, had no or unequivocal associated biological end point, or were associated with a contaminated bioassay. Of the remaining 102 compounds, 3 were removed in a later stage because they were found to be unduly influential observations or statistical outliers, and 99 compounds were used for the development of final discriminant analysis model. The derived final model for this database used 8 IJ molecular charge descriptors, 2 molecular connectivity indices, 2 K molecular shape descriptors, and 1MOLSTAC substructure descriptor to classify the genotoxic and nongenotoxic carcinogens. The results are shown in Table I. From this table, it can be seen that TOPKAT was able to correctly classify the genotoxic carcinogens and the nongenotoxic carcinogens with an overall accuracy of 86.9 % Out of 222 compounds in the NCI/NTP database, 118 were found to be negative (nongenotoxic) in the Ames

.

test. Of these 118nongenotoxic compounds, 47 were found to be carcinogens, 57 were noncarcinogens, and for 14, the results were equivocal. After the removal of influential observations and statistical outliers, 93 compounds were included in the final discriminant analysis. Using 8 u molecular charge descriptors, 3 molecular connectivity indices, 1 K shape descriptor, and 12 substructural descriptors, TOPKAT was able to correctly classify nongenotoxic carcinogens and nongenotoxic noncarcinogens with an overall accuracy of 90.8%. The details of the results are shown in Table 11. Thus, in the first of these examples, TOPKAT classified 86 of 99 mutagens correctly, producing the wrong classification on 5 compounds and failing to classify 8. In the seccond case, the program classified 79 of 87 carcinogens correctly, misclassifying4 compounds and failing to classify 4. These two examples show that, in general, TOPKAT was able to classify the compounds in the database with an accuracy of between 85% and 90%. TOPKAT has also been used to study the structureactivity relationships on other toxic effects as well, the summary of these studies is shown in Table 111. (b) CASE. During the 1980s,Klopman and Rosencranz (9-14) developed the Computer Automated Structure Evaluation program (CASE). This program has been used in attempts to determine the relationship between chemical structure and carcinogenicity and mutagenicity as well as studies of the quantitative structure-activity relationships operating in biodegradation and eye irritation. A general schematic of the CASE program is shown in Figure 3. Chemical structures can be entered into the database using any of four possible formats: (a) Klopman Line Notation (KLN), (b) Clark Still graphic input, (c) MolFile (Molecular Design) format, and (d) Chemical Abstracts Registry Numbers (CAS). The program associates each compound to be analyzed with an index of biological activity. A “learning set” of compounds of known activity is created, and each molecule in this learning set is broken into all possible substructural fragments, each with between 2 and 10 linearly connected non-hydrogen atoms. These fragments are all marked as active or inactive depending on whether their parent molecule is active or not, and they are then subjected to a series of statistical tests to determine which fragments have a distribution in the whole learning set that, in a statistically significant sense, is skewed toward either activity or inactivity. These statistically significant fragments are recorded as activating or inactivating and the probability that a compound will be active depends upon the presence or absence in its structure of such activating or inactivating fragments. If the biological activity (e.g., an LDw) is continuous, a linear regression analysis is performed using these activating and inactivating fragments, as well as other physicochemical properties, such as the partition coefficient between n-octanol and water (the “log P” value) of the compound, its atomic charges, molecular orbital (HOMO and LUMO)

Chem. Res. Toxicol., Vol. 6, No. 6,1993 751

Forum: Frontiers in Molecular Toxicology

Table 111. Summary of the TOPKAT Studies on Toxicity n toxicity parameters 805 mutagenicity (all strains) 75 substrs

compounds GeneTox database; miscellaneous compds (28) GeneTox database; miscellaneous compds (28) miscellaneous chemicals (23) miscellaneous chemicals (acyclic) (30) miscellaneous chemicals (acyclic) (31) miscellaneous chemicals (acyclic) (31) miscellaneous chemicals (cyclic) (31) miscellaneous chemicals (cyclic) (31) miscellaneous chemicals (33) miscellaneous chemicals (33) miscellaneous chemicals (33) miscellaneous chemicals (33)

700 343 201 588 593 542 543 425 1000 1500 2000

accuracy 95 %

mutagenicity (BP subs strains)

84

93 %

carcinogenicity Draize skin Draize eye Draize eye Draize eye Draize eye

76 substrucs, MW 22 12 28 43 36 29 88 17 103

91 % 95 % 90 % 90 % 92 % 88 % r2 = 0.493, SD = 0.702 r2 = 0.562, SD = 0.620 r2 = 0.523, SD = 0.620 r2 = 0.524, SD = 0.623

LDso LDso LDm LDso /NH2

/O

/NO2

CH3-0

+++

+++

+++

/c"\ ++

NJ

+

2 1 DATABASE

w

Figure 4. Fragmentsassociated with carcinogenicityas identified by CASE. These structural fragments were found by CASE to be significantand highly correlated to the carcinogenicity of the compounds in the database. They are associated with different levels of activity: +++ = very active; + = weakly active.

other Global

Molecular Fhgment

L 1

.

I

+

1 New Comuounds

STATISTICAL . . EVALUATION I)lscnrmnant Analysis I

t

PREDICTIONS

I

t

Mode of Action

I

Figure 3. Schematic of the CASE program. This program operates in 5 steps. In the first step, the database is built by entering toxicitiesand structures. Then substructural and other parameters are developed in step 2, and in step 3, the statistical significance of each substructural fragment is determined. In step 4, a qualitative model is derived using the best descriptors, and a quantitative model is produced by multiple regression techniques. Finally, in step 5,the predictive power of the model is tested with new structures. charges, and topologicalindices as the potential parameters for a QSAR equation. An example of an application of CASE can be found in a study (13)of the structural basis of the mutagenicity of chemicals in Salmonella typhimurium. These data are found in the GeneTox database, which consists of 808 compounds. In the study, CASE identified 29 activating and 3inactivating structural determinants which correctly predicted the carcinogenicity of 93.7% of the known mutagens and nonmutagens in the database (sensitivity = 0.998,specificity = 0.704). Here, sensitivity is defined as the number of correctly predicted positives divided by the total number of positives, and specificity is the number of correctly predicted negatives divided by the total number of negatives. In one validation test, CASE was able to correctly classify the mutagenicity of 93 chemicals (39mutagens, 52nonmutagens, and 2marginal) previously studied in the NTP with an accuracy of 86 % . In another validation test, for 25 physiological substances which are not expected to have mutagenic properties, 24 of them were predicted to be nonmutagens. CASE has also been employed (14)to elucidate the structural basis of carcinogenicity in rodents of genotoxicants and nongenotoxicants in which the database consists of 189chemicalstested in the National ToxicologyProgram Cancer Bioassay. CASE identified 23 fragments which

accounted for the carcinogenicity, or lack thereof, of most of the chemicals in the database. The sensitivity and specificity were 1.00and 0.86, respectively. Some of these fragments, and their relative potencies, are shown in Figure 4. CASE has been applied to studies of the structureactivity relationships in other databases, and a summary of these studies, as well as the two studies above, is shown in Table IV. In an example of CASE prediction concerning carcinogenicity, based on the results of analysis of the NTP was correctly clasdatabase, 4-chloro-o-phenylenediamine sified as a carcinogen on the basis of its containing the substructural fragments A-C shown in Figure 5. The likelihood of carcinogenicityassociated with any compound containing A is 79%; with B, it is also 79% and with C, 66 % The probability of carcinogenicity in a compound which contains all such as 4-chloro-o-phenylenediamine, three fragments, is calculated by CASE to be 96.8%. This reveals several interesting points. The reinforcing effect of the amino groups upon each other is clear, and it is equally clear that the presence or absence of the chlorine has no effect upon the final outcome. (c) ADAPT. The ADAPT program, developed by Jurs et al. (15-21),employs another device of statistical analysis, pattern recognition. ADAPT has been applied to studies of the structure-activity relationships in mutagenicity. A database consisting of 105 chemically diverse mutagens and nonmutagens was studied by ADAPT. It used substructural features and other parameters representing the overall size and shape of molecules in the pattern recognition analyses or discriminant analysis. The derived models were not very useful because they gave almost the same estimates as a random choice when testing new compounds. This was probably due to the small size of the database. When ADAPT was used, however,to analyze a small homogeneousdatabase of 21 aliphatic nitrosamines, five models were derived by employing different statistical techniques in the analysis, and these models gave, on average, a 93% correct classification with classifications ranging from 81% to 100% correct. ADAPT was also applied to study the structure-activity relationships in carcinogenicity. A number of models have been established based on different databases, such as polyaromatic

.

752 Chem. Res. Toxicol., Vol. 6, No. 6, 1993

Wang and Milne

Table IV. Summary of the CASE Studies on Toxicity Databases compounds GeneTox database; miscellaneous compds (13) NTP database; miscellaneous compds (14) PAHs, nitrated PAHs (44) miscellaneous chemicals (45) aromatic amines (12) aromatic amines (12)

n 808 189 56 236 65 107 53 37 283

nitroarenes (10)

PAHs (46) miscellaneous chemicals (47)

QfNH2 CI

CI

CI

A

B

C

Figure 5. Carcinogenic substructural fragments in 4-chloro-ophenylenediamine. These substructuralfragments (in boldface) were determined by CASE to be responsible for the carcinogenicity of 4-chloro-o-phenylenediamine.

(In,

hydrocarbons (PAHs) aromatic amines (18, 19), nitrosamines (201, and a heterogeneous set of chemicals (21). The accuracy of classification obtained by ADAPT analysis varied from model to model and also depended upon the statistical techniques used in the analysis. For example (20),for a database which consists of 150N-nitroso compounds, using 22 descriptors, ADAPT gave an average accuracy of 90.5 % in property classification over 9 models, ranging from a low of 81% to a high of 97 ?6 For another database (211, containing 209 heterogeneous chemicals, using 26 descriptors, ADAPT gave an average accuracy of 81.6% in classification over 5 models. Other Structure-Toxicity Relationship Studies. Recently, Nakadate et al. developed an expert system, called BL-DB (35), to study the structure-activity relationships on toxicity. BL-DB uses Wiswesser line formula chemical notation (WLN) (36) to enter and retrieve the chemical structures. In studies of the structure-activity relationships, rules which correlate the chemical structures and toxicities were derived with the aid of experts. These rules were then used to predict chemical toxicity. In one example (35),eight kinds of rules were derived to predict the results of Salmonellalmicrosome assay. I t was found that, using these eight rules, the mutagenicity in the Salmonellalmicrosome assay of aliphatic and heterocyclic compounds can be predicted as accurately as 90-95 5%. Lewis et al. developed the Computer-Optimized Molecular Parametric Analysis of Chemical Toxicity (COMPACT) procedure (37-40) to predict the carcinogenicity of the chemicals. On the basis of mechanisms of chemical carcinogenicity, they proposed that if a chemical is a substrate for the cytochromes P450 I, or if it can interact with A h receptor, there will be a high probability that it is carcinogenic. Hence, the procedure used in developing COMPACT is to determine the optimal spatial and electronic requirements of chemicals which enable them to fit the active sites of the cytochromes P450 I, or the binding sites of the A h receptor. Using this approach, they were able to predict the rodent life span carcinogenicity with an accuracy of 92% for a data set of 100 compounds (40). Kier et al. (41) have successfully used connectivity indices to correlate the mutagenicity of 15 nitrosamines while Hansch et al. (42) reported a correlation between

.

toxicity mutagenicity carcinogenicity repair in E. coli PQ37 micronuclei induction Ames mutagenicity Ames mutagenicity mutagenicity mutagenicity biodegradation

parameters 32 substrucs 23 substrucs 9 substrucs 30 substrucs 4 substrucs 4 substrucs 2 substrucs 4 substrucs 6 substrucs

accuracy ( 76 ) 93.7 95 95

100 86 86 89 94 74

the Ames test mutagenicity and an electron-withdrawing parameter for 15organoplatinum compounds. Tinker (43) used a modified Hodes method (an approach based upon atom-centered fragments) to study mutagenesis data (Ames test) for more than 1000 diverse chemicals. The derived model was used to predict the mutagenicity of 34 new compounds not in the original database. Thirty out of the 34 compounds (88% ) were predicted to within fl category, and 76 % were categorized correctly.

Summary Computers are used in toxicology in two ways. They are able to manage and manipulate large amounts of data, and it is because of this that they are used quite commonly to search toxicity databases. The mechanical ability of computers has led a number of organizations to pursue their use in regulatory compliance. The cost-benefit aspect of this issue being what it is, much more effort can be expected in this area. The other major use of computers has been to support efforts to predict or estimate toxicity properties. This task has proven to be very difficult, as was expected, and progress has been mixed. Developers of systems, testing their own development, report impressive accuracy, as has been seen. The “real world” view is less felicitous. In a highly publicized, head-to-head test of some of the computer methods against human experts (48),accurate prediction of carcinogenicity by computer was achieved for 49-59 % of the compounds, depending upon the method used. The humans, on the other hand, scored between 65% and 84%. A conclusion that could be drawn from this experiment is that with compounds which “obviously” are or are not carcinogenic, both computers and humans score well. Once obviousness recedes however, both are at a disadvantage, but humans can improvise more effectively. As research continues, the computer methods will develop better learning sets, and so there will be incremental improvements in their performance. It is not likely that they will ever achieve absolute correctness, but the cost savings that could be derived from a computer program that is, say, 85% correct are significant, and such economics guarantees a place for these programs in modern toxicology. By way of completing a cycle, it is significant that there has been (7)some tentative use of estimates produced by TOPKAT in reviews by the Canadian Drug Directorate of data on new chemical entities. This suggests a role for estimated data in the regulatory process and a direction for the future, in which a balance might be struck between reliability and cost.

References (1) McGill, J. R., Heller, S. R., and Milne, G. W.A. (1978)A Computer-

Based Toxicology Search System. J.Enuiron. Pathol. Toricol. 2, 539-551.

Forum: Frontiers in Molecular Toxicology (2) Nellessen, J. E., and Fletcher, J. S. (1992) UTAB: A Computer Database on Residues of Xenobiotic Organic Chemicals and Heavy Metala in Plants. J. Chem. Znf. Comput. Sci. 32, 144-148. (3) Barma, S.,Kelly, L. A., andlenz, C.D. (1990) ComputerizedRetrieval of Information on Biosynthesis and Metabolic Pathways. J. Chem. Znf. Comput. Sci. 30,243-251. (4) Ochs, R. S., and Conrow, K. (1991)A Computerized Metabolic Map. J. Chem. Znf. Comput. Sci. 31, 132-137. (5) Michal, G. (1982) Biochemical Pathways, Universitits Druckerei, Wurzburg, W. Germany, 1982. (6) Young, F. E. (1988) Submission of Drug Applications to the Food and Drug Administration Using Computer Technology. Fed. Regist. 53, FR35912. (7) Studebaker, J. F. (1993) Computers in the New Drug Application Process. J. Chem. Znf. Comput. Sci. 33,86-94. (8) Enslein, K., and Craig, P. N. (1978) A Toxicity Estimation Model. J.Enuiron.Pathol. Toxicol. 2,115-121. Enslein, K. (1984)Estimation of Toxicological Endpoints by StructureActivity Relationships. Pharmucol. Reu. 36,131S-135S.Gombar,V.K.,andEnsleinK. (1989) TopologicalShape and Electronic Descriptors and their Correlation with Toxicity to Photobacterium phosphoreum. In Vitro Toxicol. 2, 117-127. (9) Klopman G. (1984) Artificial intelligence approach to structureactivity studies, Computer automated structure evaluation of biologicalactivity of organicmolecules.J.Am. Chem. SOC.106,73157321. (10) Klopman, G., andRosencranz, H. S. (1984)Structural Requirement for Mutagenicity of Environmental Nitroarenes. Mutat. Res. 126, 227-238. (11) Klopman, G., Contreras, R., Rosencranz, H. S., and Waters, M. D. (1985) Structure-Genotoxic Activity Relationships of Pesticides: Comparison Between the Results of Several Short-Term Assays. Mutat. Res. 147, 343-356. (12) Klopman, G.,Frierson, M. R., and Rosencranz,H. S.(1985)Computer Analysis of Toxicological Databases: Mutagenicity of Aromatic Amines in Salmonella testers Strains. Enuiron. Mutagen, 7,625644. (13) Klopman, G., Frierson, M. R., and Rosencranz, H. S. (1990) The Structural Basis of the Mutagenicity of Chemicals in Salmonella typhimurium: The Gene-Tox Data Base. Mutat. Res. 228, 1-50. (14) Rosencranz, H. S., and Klopman, G. (1990) Structural Basis of Carcinogenicityin Rodents of Genotoxicantsand non-Genotoxicants. Mutat. Res. 228, 105-124. (15) Stouch, T. R., and Jurs, P. C. (1985) Computer-Assisted Studies of Molecular Structure and GenotoxicActivity by Pattern Recognition Techniques. Enuiron. Health Perspect. 61, 329-343. (16) Nesnow, S., Langenbach, R., and Mass, M. J. (1985) Pattern Recognition Analysis of a Set of Mutagenic Aliphatic N-Nitrosamines. Enuiron. Health Perspect. 61, 345-349. (17) Yuan, M., and Jura, P. C. (1980) Computer-Assisted StructureActivity Studies of Chemical Carcinogens: A Polycyclic Aromatic Hydrocarbon Data Set. Toxicol. Appl. Pharmacol. 52, 294-312. (18) Yuta, K.,and Jurs,P. C. (1981)Computer-AssistedStructurestudies of Chemical Carcinogens.Aromatic Amines. J.Med. Chem. 24,241251. (19) Yuta, K., and Jurs, P. C. (1984) Computer-Assisted StructureActivity Studies of Chemical Carcinogens: Aromatic Amines. 11. Rat and Liver Data Set. Yakugaku Zasshi 104,496-508. (20) Rose, S. L., and Jura, P. C. (1982) Computer-Assisted Studies of Structure-Activity Relationships of N-Nitroso Compounds Using Pattern Recognition. J. Med. Chem. 25, 769-776. (21) Jurs, P. C., Chou, J. T., and Yuan, M. (1979) Computer-Assisted Structure-Activity Studies of Chemical Carcinogens. A Heterogeneous Data Set. J. Med. Chem. 22, 476-483. (22) Weininger,D. (1988)SMILES,aChemicalLanguageandInformation System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Znf. Comput. Sci. 28, 31-36. (23) Enslein, K., Borgstedt, H. H, Tomb, M. E., Blake, B. W., and Hart, J. B. (1987)AStructureActivity PredictionModelof Carcinogenicity Based on NCI/NTP AssaysandFood Additives. Toxicol. 2nd. Health 3, 267-287. (24) Enslein, K. (1988)An Overviewof Structure-Activity Relationships as an Alternative to Testing in Animal for Carcinogenicity, Mutagenicity, Dermal and Eye Irritation, and Acute Oral Toxicity. Toxicol. Ind. Health 4, 479-498. (25) Enslein, K., and Borgstedt, H. H. (1989) A QSAR Model for the Estimation of Carcinogenicity. Example of an Application to an Azo-Dye. Toxicol. Lett. 49, 107-121. (26) Blake, B. W., Enslein, K., Gombar, V. K., and Borgstedt, H. H. (1990)SalmonellaMutagenicity and Rodent Carcinogenicity: Quan-

Chem. Res. Toxicol., Vol. 6, No.6,1993 763 titative Structure-Activity Relationships. Mutat. Res. 241, 261-271. (27) Enslein, K., Blake, B. W., and Borgstedt, H. H. (1990) Prediction of Probability of Carcinogenicityfor a Set of OngoingNTP Bioassays. Mutagenesis 5, 305-306. (28) Enslein, K., Blake, B. W., Tomb, M. E., and Borgstedt, H. H., (1986) Prediction of Ames Test Results by Structure-Activity Relationships. In Vitro Toxicol. 1, 33-44. (29) Gombar, V. K., Borgstedt, H. H., Enslein, K., Hart, J. B., and Blake B. W. (1991) A QSAR Model of Teratogenesis. Quant. Struct.-Act. Relat. 10, 306-332. (30) Enslein, K., Borgstedt, H. H, Blake, B. W., and Hart, J. B. (1987) Prediction of Rabbit Skin Irritation Severity by Structure-Activity Relationships. In Vitro Toxicol. 1, 129-147. (31) Enslein, K., Blake, B. W., Tuzzeo,T. M., Borgstedt, H. H., and Hart, J. B. (1988)Estimation of Rabbit Eye Irritation Scoresby StructureActivity Equations. In Vitro Toxicol. 2, 1-14. (32) Enslein, K., Tuzzeo, T.M., Borgstedt, H. H., Blake, B. W., and Hart, J. B. (1987) Prediction of Rat Oral LDm from Daphnia magna LCw and Structure. In Proceedings of the 2nd International Workshop on QSAR in Environmental Toxicology (Kaiser, K. L. E., Ed.) D. Reidel Publishing Co., Dordrecht, Holland. (33) Enslein, K., Tuzzeo, T. M., Blake, B. W., Hart, J. B., and Landis, W. G. Prediction of Daphnia magna EC50 Values from Rat Oral LDm and Structural Parameters. ASTM Special Technical Publication 1007,American Society for Testing & Materials, Philadelphia, PA 19103. (34) Gombar, V. K., and Enslein, K. A Structure-Biodegradability Relationship Model by Discriminant Analysis. In Applied Multivariate Analysis in SAR and Environmental Studies (Devillers,J., and Karchner, W., Eds.) pp 377-414, Kluwer Academic Publishers, Dordrecht, Holland. (35) Nakadate, M., Hayashi, M., Sofuni, T., Kamata, E., Aida, Y.,Osada, T., Ishibe, T., Sakamura, Y., and Ishidate, M., Jr. (1991)The Expert System for Toxicity Prediction of Chemicals Based on StructureActivity Relationship. Enuiron. Health Perspect. 96, 77-79. (36) Smith, E. G., and Baker, P. A. (1975) The Wiswesser Line-Formula Chemical Notation (WLN), 3rd. ed., Chemical Information Management, Cherry Hill, NJ. (37) Lewis, D. F. V., Ioannides, C., and Parke, D. V. (1989) Prediction of Chemical Carcinogenicity from Molecular and Electronic Structures: A Comparison of MINDO/3 and CND0/2 Molecular Orbital Methods. Toxicology Lett. 45, 1-13. (38) Lewis, D. F. V., Ioannides, C., and Parke, D. V. (1993) Validation of a Novel Molecular Orbital Approach (COMPACT) for the Prospective Safety Evaluation of Chemicals, by Comparison with Rodent Carcinogenicity and Salmonella Mutagenicity Data Evaluated by the US. NCI/NTP. Mutat. Res. 291,61-77. (39) Lewis,D. F. V., Ioannides, C.,andParke,D. V. (1990)Aretrospective study of the molecular toxicology of benoxaprofen. Toxicology 65, 33-47. (40) Lewis, D. F. V., Ioannides, C., and Parke, D. V. (1990)A prospective toxicity evaluation (COMPACT) on 40 chemicals currently testad by the National Toxicology Program. Mutagenesis 5,433-435. (41) Kier, L. B., Simons, R. L., and Hall, L. H. (1978)Structure-Activity Studies on Mutagenicity of Nitrosamines using Molecular Connectivity, J. Pharm. Sci. 67, 725-726. (42) Hansch, C., Venger, B. H., and Panthananickal, A. (1980) Mutagenicity of Substituted (0-Pheny1enediamine)platinum Dichloride in the Ames Test. A Quantitative Structure-Activity Analysis. J.Med. Chem. 23, 459-461. (43) Tinker, J. F. (1981)A Computerized Structure-Activity Correlation Program for Relating Bacterial Mutagenesis Activity to Chemical Structure. J. Comput. Chem. 2,231-243. (44) Mersh-Sundermann, V., Klopman, G., and Rosenkranz, H. S. (1992) StructuralRequirements for the Induction of SOS Repair in Bacteria by Nitrated Polycyclic Aromatic Hydrocarbons and Related Chemi d . Mutat. Res. 265,61-73. (45) Rosenkranz, H. S., and Klopman, G. (1990)The Structural Basis of the Mutagenicity of Chemicals in Salmonella typhimurium: The National Toxicology Program Database. Mutat. Res. 228, 51-80. (46) Mersh-Sundermann,V., Rosenkranz, H. S., and Klopman, G. (1992) The Structural Basis of the Genotoxicity of Polycyclic Aromatic Hydrocarbons. Mutagenesis 7, 211-218. (47) Klopman, G., Balthasar, D. M., and Rosenkranz, H. S. (1993) Application of the Computer-Automated Structure Evaluation (CASE) Program to the Study of Structure-Biodegradation Relationships of Miscellaneous Chemicals. Enuiron. Toxicol. Chem. 12, 231-240. (48) Hileman, B. (1993) "Expert Intuition" Tops in Tests of Carcinogenicity Prediction. Chem. Eng. News June 21, 35-37.