Computer simulated process of "lead optimization": A student

Bits and pieces, 43. In this paper the MLIGHT program is presented, which was created in order to introduce students to quantitative structure activit...
2 downloads 0 Views 3MB Size
C OPU~ter miw.

1 12

edited by JAMES P. B~RK Arizona sbte university, Tempe,

85281

Bits and Pieces, 43 Guidelines for Authors of Rits and Pieces appeared in July 1986;the number of Bits and Pieces manuscripts is expecwd todecrease in the future-see theJuly 1988 and March 1939 issues. Bits and Pieces authors who describe programs will make available listings andlor machine-readable versions of their programs. Please read each description carefully to determine compatibility with your own computing environment before requesting materials from any of the authors.

Computer Simulated Process of "Lead Optimization": A Student-Interactive Program Modesto Orozco and Rafael Franc0 Departament de Bioquimica i Fisiologia Facultat de Ouimica Universitat de Barcelona Marti i Franques I 08028 Barcelona, Catalunya, Spain The high cost of pharmacological research compels laboratories and companies to obtain the maximum information about a drug, its receptor, and its mechanism of action with the minimum of experimental work. This has produced the development of structure-activity relationship techniques (for areview see refs I and 2). The usefulness of these are specially important in the stage of "lead optimization" where the "lead compound" is modified in order to increase its activity. One of the best known techniques for QSAR (quantitative structure activity relationships) is that of Hansch (3).I t is hased on the existence of an equation that correlates the activity of a drug with its steric, hydrophobic, and electronic properties. A very widely used strategy in "lead optimization" is based on the study of series of homologues (drugs with a common core and that differ only in one substituent). Under this strategy the activity of a drug can be represented as a function of the properties of the substituent. This simplification permits the use of large tables of parameters that describe the . ~ r o- ~ e r t iof e sthe substituents (2,443) in QSAR studies. The great relevance of QSAR techniques in the development of a new drug recommends inclusion of their study in chemistry and pharmacy schools. Computer programs that simulate the process of rational drug design have been developed (7),to permit the student an easy and pleasant view of QSAR techniques. In this paper we present the MLIGHT' program,which was createdin order to introduce students to QSAR strategies by means of simulation of the "lead optimization" stage. MLIGHT is written in GW-BASIC and can be used in an IBM-PC computer or compatible. The program is interactive and simulates all the steps of "lead optimization." Therefore the student can perform theoretical approaches to the problem, suggest new drugs to be analyzed in the labora-

l

Copies of the program are available upon request

232

Journal of Chemical Education

m

Figure 1. Flow chart of the MLIGHT program

tory, and finally suggest the considered optimum to he produced in a factory (see Fig. 1). The program uses a data file of substituent properties (see Table I),obtained from Hansch's computerized files (6)and Borth's data file (8). This table was created in order to represent hydrophobic, electronic, and steric characteristics of a substituent over a nonaromatic core. The program also SF, which gives includes one synthetic feasibility index (8), an idea of the difficulty of introducing a substituent onto a hypothetical "core." Hydrophobic characteristics are represented by means of the Rekker FR constant (9),electronic characteristics by means of the Lupton F1 inductive constant (10) and steric characteristics by means of the molecular refractivity MR. The relevance of outliers elimination in the "series design"is well known (11,12). Thus an outliers test (12,13) was applied to the data file, and the feasihility index of the detected outliers was set eaual to 10 (these modifications in the feasibility index are nocinrluded in the data file that the student owns). The user has to reject the substituentv that he or she considers "anomalous" (which correspond to outliers) and those that he or she considers as very difficult to synthesize. These substituents should not be suggested for being synthesized and tested. When the program starts, a random equation (which relates log activity with the properties of the substituents) is generated. I t can be linear or parabolic (the two most common equations in QSAR), with the parameters and the coefficients of these parameters randomly selected. I t should be noted that the coefficients of the parameters in the equation are pouderated in order to avoid the problem derived from

Table 1. FR

FI

MR

D a l a Base of the MLlOHT Program SF

"Br" "CI" "F" '.I"NO2" "H" -OH" "SH" "NHC "CBrs" "CCb" "CFC %N" "SCN '%02H" "CH&" "CHICI" "CH# "CONH2" "CH==NCH" "CH3" "NHCONHI" "OCHo"

the different scaling of the properties and that only two independent variables are considered; otherwise the searching process would be rather difficult and slow. By means of this equation, the program calculates the activity of all the substituents in the data file. Finally the program searches for the optimum and stores it in memory. When the student suggests some substituent to be analyzed in the laboratory, the program, using the above described equation and introducing a random error (between 0 and 10%).computes and displays the "experimental" activity of the corresponding drug. The student can use a regression subroutine implemented in the program to obtain Hansch equations with the analyzed subatituents. I t must be stressed that the kind of equation and the variables that are included in i t must be selected for the student. The computer displays the Hansch equation, the regression coefficient, and the value of Snedeckor's F i n order to provide enough information t o the user who must decide the usefulness of the obtained equation. In this step the student has to select

FR

FI

MR

SF

"OCOC2H5" ''CH,CH2C02H "NHCO&H< 'CONHC2H< "NHCOC*' "CH(CH&' "CsH," "OCH(CH&' "OCSHI" "CH20C2Hr" "FERROCENYL" "SOCJH," "SCSH;' "NHCaH," "SI(CH3h" "2-THIENYL' "3-THIENYL" "CHCHCCCHs" "CHCHC02CHs" ''COCJHI" "OCOC3HI" "C02C3Hi' "(CH2hCO2H "NHCOC#I" "CONHC3HI" ''C(CH$h" %Hs" "OC4Hs" "CH20CaHg" "NHC&" "N(CzHs)2" "CH==CHCOCIH~"

the substituents t o be included in the regression studies in order t o increase the "validitv" of the Hansch eauation obtained. When the student believes that he or she knows the optimum substituent, he or she can suggest that i t be produced in the factory. The program then compares the suggested substituent with the optimum one (stored in memory). If both are identical, the program ends with the calculation of the hypothetical cost of the complete process of drug design. The cost is calculated following eq 1: cost (million $)

= 0.25*n

+ CO.Ol*fi(i)+ 8*m

(1)

j=,

where n = number of substituents suggested to the laboratory, m = number of substituents sugg'sied to the factory, and fx(i) = feasibility index of the i substituent. Otherwise the searching process continues, but each mistake (suggestion of a substituent that is not the optimum Volume

67

Number 3

March

1990

233

one) will be reflected in the global price of the final drug (see eq 1). The program obviously does not include all the different factors that can have a role in the process of "lead optimization," but we believe that i t allows the student to be familiar with some important topics in the "lead optimization" stage, such as (1) the importance of rational suggestion of representative substituents (series design methods), (2) the problems derived from the existence of anomalous or unsuitable substituents, (3) the importance of synthetic considerations in drug design, and (4) the high power of QSAR techniques in drug design.

A PROLOG Program for the Generation of Molecular Formulas 6. Mender and J. A. Moreno

Facultad de Ciencias Universldad Central de Venezuela Apartado Postal 47906 Caracas 1041A. Venezuela When engaged in the interpretation of mass spectra data of organic compounds, it is essential that students be able to nasien constitutional formulas to the molecular (parent) and ------- - - ~ fragment ion peaks of the spectrum under consideration. This is by no means an easy task, and i t is generally carried out by either inspecting tables and compilations of mass spectral information (14) or by implementing numerical algorithms in calculators or computers (15, 16). Considering that the latter procedure is the most adequate for this prohlem, we have developed a computer program called FORMULAS that generates a list of all possible molecular formulas for compounds that may contain carbon, hydrogen, nitrogen, oxygen, sulfur, phosphorus, and halogens that are consistent with certain restrictive conditions that include the number and type of constituent atoms, mass value, state of ionization, and the degree of unsaturation. The idea underlying the problem is that an organic molecule can be conveniently depicted as a topological plane connected graph (17,18). A graph is a mathematical device consistine of a set of ooints (known as vertices) and a set of lines (known as edges) that connect pairs of the vertices. In such chemical mauhs the vertices represent the average position of e a c h i f t h e atoms, and the edges represent the covalent bonds existing between pairs of atoms. In the program, a chemical graph is characterized by the number of vertices and edges it possesses and by its cyclomaticnumber. This is achieved by using the following relations: The number of edges N, in a graph can be calculated as: ~~~

-

where N,(i)represents the number of vertices of degree Di. The cyclomatic number of a graph C, can be defined as follows:

with C, > 0 and where the sum xN,(i)is the total number of vertices. I t must be noted further that the degree of a vertex, corresponding t o the number of edges converging a t that vertex, and the cyclomatic number of a graph, which is the number of independent loops or cycles formed by the edges, have both a counterpart in the terminology of molecular chemistry. I t is clear that the degree of a vertex corresponds t o the valence of the corresponding atom, whereas the cyclomatic number is equivalent to the "degree of unsaturation" or of 234

Journal of Chemical Education

"hvdroeen deficiencv" in the molecule. In the promam a . . u recursive routine generates sets of integer numbers corresnondine " to the different N,,,) (numbers of allowed isotopic species) whose nominal masses add up to the target molecular mass. The set of \,ertices is then tested for a graph condition by checking that relation 2 holds. This restriction is not enough since many graph-forming combinations of atoms are meaningless from a chemical point of view. Therefore additional restrictive rules of heuristic character ("rules of thumb"), are used in order to limit the formula production to only those that constitute plausible chemical graphs. The present version of FORMULAS uses the following heuristic rules: (1) If the number of carbons is greater than zero, then the number of edges (bonds) is greater than or equal to four. (2) If the number of atoms with odd valences is greater than zero, then this number must he even, otherwise the number of atoms with even valences, excluding the number of carbons, must be an even number. (3) If the sum of the numbers of carbon, nitrogen, oxygen, sulfur, and phosphorous atoms is greater than three, then the degree of unsaturation is arbitrary, greater than or equal to zero; otherwise the degree of unsaturation is less than or equal to two. (4) If the only atoms considered are sulfur and oxygen,then the sum of their numbers is less than or equal to three. (5) If the only atoms considered are nitrogen and phosphorous, then the sum of their numbers is less than or equal to two. When a set of N,m numbers is found fulfilling the above rules, the formula is printed in the formula window, together with its exact mass and its degree of unsaturation. The oroeram FORMULAS iscoded inPROLOG usine the T ~ ~ ~ P R(19) ~ syntax. LOG Although PROLOG is a ;omputer language not yet very extensively used in chemical applications (20), it has been chosen by us because i t offers many advantages when compared to common procedural languages like BASIC and Pascal. In particular its high modularitv allows the promam to be structured as a collection of indkidual indeienaent modules or procedures. These modules express facts about objects and relations between objects t h a i describe the problem. In our case the different graph and heuristic rules are programmed in a declarative manner without having to bother too much about procedural details. The results follow logically from the rules and facts defining the problem so that, if they do not agree with reality, then the rules and facts must be revised. A final and crucial advantage of PROLOG is its built-in resolution mechanism. The resolution principle embodies pattern matching- and a n automatic backtracking strategy, forming in this way a powerful reasoning mechanism (inference engine). This feature is useful when solving problems with nondeterministic initial conditions that involve complicated search strategies, as in the present case. The program FORMULAS has been designed to be simple in use. After hooting the disk, the program displays the initial screen containing title and authorship information; after pressing any key, a second screen is displayed presenting a menu with the available options. In this menu the user chooses between two execution modes. In mode 1the program produces formula listings of compounds or ions with a predefined molecular mass that the user can choose to be exact or nominal. The user is also prompted to specify the different atoms thought to conform~thecandidate molecular species, a specified number of free bonds defining its neutral or ionic character and, if known, its possible degree of unsaturation. In mode 2 the program generates a list of formulas for compounds or ions, with nominalmasses in agiven range. As in mode 1 the user is prompted for the desired atomic constitution. the number of free bonds, and the degree of unsaturation. When the user presets an exact ma&, the oroeram reauesta the desired resolution for the exact mass constrain (chis option is called mass tolerance). The mass tolerance sets the number of significant decimal places to be -A