1318
Chem. Res. Toxicol. 2003, 16, 1318-1327
3D-MEDNEs: An Alternative “In Silico” Technique for Chemical Research in Toxicology. 1. Prediction of Chemically Induced Agranulocytosis Humberto Gonza´lez Dı´az,*,† Yovani Marrero,‡ Ivan Herna´ndez,† Iyusmila Bastida,† Esvieta Tenorio,† Oslay Nasco,† Eugenio Uriarte,§ Nilo Castan˜edo,† Miguel A. Cabrera,† Edisleidy Aguila,† Osmani Marrero,† Armando Morales,† and Maikel Pe´rez† Chemical Bio-actives Center, Central University of “Las Villas” 54830, Cuba, Department of Pharmacy, Central University of “Las Villas” 54830, Cuba, and Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain Received October 9, 2002
A novel approach to molecular negentropy from the point of view of Markov models is introduced. Stochastic negentropies (MEDNEs) are used to develop a linear discriminant analysis. The discriminant analysis produced a set of two discriminant functions, which gave rise to a very good separation of 93.38% of 151 chemicals (training series) into two groups. The total predictability (86.67%, i.e., 52 compounds out of 60) was tested by means of an external validation set. Randic´’s orthogonalization procedures allowed interpretation of the model while avoiding collinearity descriptors. On the other hand, factor analysis was used to suggest the relation of MEDNEs with other molecular descriptors and properties into a property space. Three principal factors (related to three orthogonal MEDNEs) can be used to explain ∼90% of the variance of different molecular parameters of halobenzenes including bulk, energetic, dipolar, molecular surface-related, and hydrophobic parameters. Finally, preliminary experimental results coincide with a theoretical prediction when agranulocytosis induction by G-1, a novel microcidal that presents Z/E isomerism, is not detected.
Introduction Alternative techniques can help to reduce the number of animals required in medical and scientific research, thus precluding the suffering of any animals that would otherwise be used and, whenever possible, replacing animals by other techniques. This approach is termed the “three R’s” rule: reduction, refinement, and replacement (1, 2). In this sense, QSA(T)R methods (quantitative structure-activity (or toxicity) relationships) appear to be promising approaches (3-7). In this context, one important application of QSTR models is the detection of unreported negative and/or toxicological effects for drugs that are either marketed or under clinical development. Among the clinically significant side effects is so-called blood dyscracias: mild leukocytosis, leukopenia, eosinophilia, and, more specifically, agranulocytosis, which may occur with antipsychotic, antiviral, antithyroid, anticancer, and other medications (8). Chemically induced agranulocytosis is quite marked in antibacterial drug use (9). However, our knowledge of QSTRs is very limited in terms of blood dyscracias both for general and for antibacterial drugs. The problem with this technique becomes more challenging and rewarding when considering the existence of three-dimensional (3D) isomers in drugs (10). Current research in the Chemical Bio-active †
Chemical Bio-actives Center, Central University of “Las Villas”. Department of Pharmacy, Central University of “Las Villas”. Department of Organic Chemistry, University of Santiago de Compostela. ‡ §
Center has been focused on the microcidal potential of 2-furylethylenes. In particular, G-1 [E-1-bromo-1-nitro2-(5-bromo-fur-2-yl)ethylene] has shown a pronounced double effect (antifungicidal/antibacterial) (11). It was therefore of major interest for us to develop a general model for predicting the risk of agranulocytosis induction for a given molecular structure, with the major emphasis on the case of 2-furylethylenes. These compounds may present Z/E isomerism, and so, our QSTR study could not be based on classical topological indices (12). On the other hand, quantum mechanics-based descriptors are 3D specific but their use is also time-consuming (13). Moreover, comparative molecular field analysis-like methods are specific only for homologous series’ of compounds (14). These two techniques were therefore not considered for application in QSTR-based drugscreening programs that pretend to find drug hits from large and heterogeneous series of compounds. A very interesting development has recently been described in the use of 3D molecular similarity indices (15). Indeed, our efforts in this case were directed toward the use of 3D topological indices because these kinds of indices are timely and appear to be extremely versatile (16-18). In this sense, our research group has introduced the novel computer-aided drug design scheme MARCHINSIDE (Markovian chemicals in silico design). This approach is based on the theory of Markov chains (MCH) (19) and involves modeling the intramolecular movements of electrons with time after a specific fluctuation of their stationary distribution. The method essentially
10.1021/tx0256432 CCC: $25.00 © 2003 American Chemical Society Published on Web 09/11/2003
Prediction of Chemically Induced Agranulocytosis
codifies electronic structural information. This point of view was successfully applied to the experimental discovery of new fluckicidal drugs (20). The method is very flexible and makes possible the study of small molecules as well as macromolecules such as proteins (21). Interestingly, MARCH-INSIDE molecular descriptors can be generalized to allow the codification of 3D features (22). Another important point is the calculation of negentropies for the intramolecular movement of electrons. A number of successful indices, such as molecular negentropy, have previously been defined based on the concept of Shannon’s entropy from the point of view of information theory (23-26). In this context, the present paper deals with the formulation of novel 3D molecular descriptors in terms of a Shannon’s entropy-like approach using MCH for QSTR studies. Initially, it was necessary to define the novel molecular descriptors. A QSTR was then performed discriminating between agranulocytosis-causing chemicals and nontoxic ones. Second, the QSTR was validated by means of an external prediction series. Finally, the theoretical prediction of the harmless characteristics of G-1 in relation to chemically induced agranulocytosis was shown to agree with preliminary experimental assays.
Experimental Section Three-Dimensional Markovian Electron Delocalization Negentropies (MEDNEs). Katritzky and Douglas have recently published a summary of QSTR results (27) and the reader is referred to that article for an overview of this field. Our attention in this paper will be focused on describing the particular aspects of the present approach. The approach described here may be considered as a stochastic interpretation of the concept of molecular negentropy, as introduced by Kier for biological studies (25). The method is based on a simple model for the intramolecular movement of electrons from the point of view of MCH. Let us consider a hypothetical situation in which a set of atoms is free in space at an arbitrary initial time (t0). Alternatively, a more realistic situation can be imagined in which, after perturbation by some external factor, electrons reach a distribution that differs from that in the stationary state. One external factor that may be considered, for instance, is the perturbation of molecular electron “clouds” during the early stages of drug target docking. It certainly appears logical to accept that the return of electrons to their stationary position will depend on the specific molecular structure of each drug and, thus, may be related to biological effects. If we accept this situation, it would be of interest to develop a simple stochastic model for the return of electrons to the original position with time. It can also be supposed that after any of these two initial situations, electrons begin to distribute around atom cores in discrete intervals of time tk. Thus, by using MCH (19-22, 28) it is possible to calculate the probabilities with which electrons move around these atom cores in subsequent time intervals until they reach a stationary electron density distribution (see Figure 1). As depicted in Figure 1, such a model would deal with the calculation of the probabilities (kpij) with which electrons move from an arbitrary atom i at time t0 (in black) to other atoms j (in white) along discrete time periods tk (k ) 1, 2, 3...) and throughout the chemical-bonding system. Such a model is stochastic per se (probabilistic electron distribution in time) but in fact considers molecular connectivity (electron distribution in space throughout the chemical-bonding system). The selection of a Markov model is not arbitrary. It is wellknown from quantum physics that if electrons are labeled at an initial time, one cannot use these labels to distinguish
Chem. Res. Toxicol., Vol. 16, No. 10, 2003 1319
Figure 1. Diagrammatic representation of the stochastic electron distribution kinetic in a simple Markovian model for molecule formation. between them at subsequent times. This physical fact has been historically called the principle of indistinguishability of identical particles (29). Our MCH model will obey this principle because in MCH the occurrence of one event (in this case electron movement) does not depend on the history of the process. In other words, the type of model described here will not depend a priori on electron labeling. The present procedure considers as states of the MCH the external electron layers of any atom core in the molecule, i.e., the valence shell (30, 31). The method uses the matrix 1Π(ωk, Iχ ), which has the elements 1p . This matrix is built as a k ij squared table of order n, where n represents the number of atoms in the molecule. The numbers 1pij are the transition probabilities with which electrons move from atom i to j in the interval t1 ) 1 (considering t0 ) 0): I 1
p(ω)ij )
χj ‚ eωj
(1)
δ+1
∑
I
χk ‚ e
ωk
k)1
where Iχk is Pauling’s electronegativity of atom j, which is bonded to the atom i (30). The elements of the new matrix (1pij) are defined to codify information about the electronwithdrawing strength of atoms, i.e., the power to withdraw electrons from their neighbors in the molecule while considering 3D structural features. The 3D dummy variable ωk can take different values in order to codify specific stereochemical information such as chirality, Z/E isomerism, and so on (10). It is remarkable that 1pij is proportional to Iχj, the electronegativity of the atom that attracts the electrons. Conversely, the pij values are inversely related to the electronegativity of the atoms that “compete” with j to withdraw electrons from i. The selection of the Pauling scale (Iχj) or any other linearly related scale (IIχj ) a ‚ Iχj) such as Kier-Hall electronegativity (32) does not make any difference in this method. In fact, the present approach is invariant to the choice of electronegativity scale (22). It is also remarkable that in the approach described here, it is possible but not necessary to use electronegativity scales to distinguish between hybrid states of atoms in bonds (32). It can clearly be seen from Figure 2 that electrons will have a higher probability of returning to the sp carbon (0.313) than to the sp2 carbon (0.200) despite the use of the same electronegativity. As mentioned above, the present approach codifies 3D information by the introduction of a local exponential factor ωk (22):
1320 Chem. Res. Toxicol., Vol. 16, No. 10, 2003
Dı´az et al.
Figure 2. Definition and calculation of 1Π matrix for a specific case. The element symbol is used to denote the value of the elements electronegativity: F represents the electronegativity of fluorine χ(F).
{
wk ) wk ) 1
when k has R (rectus), E (entgegen), or a (axial) notation according to Cahn-Ingold-Prelog rules wk ) 0 if k does not have a 3D specific environment wk ) -1 when k has S (sinister), Z (zusammen), or e (equatorial) notation according to Cahn-Ingold-Prelog rules
Figure 3. Graphical representation of some hypothetical situations in order to facilitate the probabilistic interpretation of the Chiral MEDNEs approach.
(2)
The 3D factor exp(ωk) therefore takes values in the following order exp(1) > exp (0) > exp(-1) for atoms that have specific 3D environments. The chemical idea here is not that the attraction of electrons by an atom depends on their 3D environment but on their chirality. Experience shows that chirality does not change the electronegativities of atoms in the molecule in an isotropic environment in an observable way. The 1pij values are the conditional probabilities with which electrons move from atom i to atom j, given a priori that these atoms have specific 3D environments in the following order of preference: R ) E ) a > nonchiral ) no Z/E isomerism involved ) no a/e substitution > S ) Z ) e. This kind of interpretation is mainly probabilistic and must not be the source of any misunderstanding whatsoever (22). A very interesting point is that the present 3D descriptor reduces to simple MARCH-INSIDE ones for molecules without specific 3D characteristics because exp(0) ) 1 (22). The interpretation described above can be illustrated graphically using a hypothetical case of a chiral system (Figure 3). We will consider a specific atom class with three different labels: R (white), nonchiral (shaded), or S (black). As the atoms have the same electronegativity, we can state that in situations (a)-(f), electrons in atom 1 are attracted with the same strength by the cores 2, 3, and 4. We can then ask the question: what are the probabilities with which atoms 2, 3, and 4 are withdrawing electrons from atom 1? The answer is that the probabilities are identical. However, if the question concerns which electrons in atom 1 have a higher probability of moving to an atom with configuration R and lower probability to move to an atom with S configuration, a different answer is forthcoming. In this case, there is no doubt that these probabilities must be arranged in the following order: (a), (b) ... to (f), with situation (a) having the highest probability. Similar questions could be answered by employing 3D MARCH-INSIDE calculations, bear-
ing in mind the attraction of electrons by atoms at topological distances greater than 1 [as in (g), (h), and (i)]. The explanation outlined above is easily extended to include other cases such as a/e substitutions and Z/E or π-isomer differentiation. In addition, the stochastic nature of the present approach makes it possible to quantify molecular structure in terms of a new kind of molecular negentropy (MEDNEs) (24-26). Equation 1 represents the sum in the dominator over all of the atoms in the molecule (n) rather than the sum up to δ + 1 only. This new probability is called the 0 step absolute Markovian probability of electron delocalization for atom j (Aπ0(j)). In fact, these probabilities for each atom are normalized, i.e., Aπ0(j) values obey the normalization condition because their sum up to n is equal to 1 (19, 20-22, 28, 31).
χj ‚ eωj
A
π0(j) )
(3)
n
∑χ
k
‚e
ωk
k)1
By following the MCH theory formulation, the physical meaning of Aπ0(j) is very clear. Under the approximation that electronegativity describes the electron-withdrawing strength of atoms in the molecule, π0(j) represents the probabilities with which a specific atom j attracts any electrons in the molecule at t0. By analogy, Aπk(j) is the probability with which a specific atom j attracts any electrons in the molecule at any subsequent time tk or after travelling through all different paths composed of k bonds or less. The calculation of this value is simple from Chapman-Kolmogorov equations (19): A
Πk ) AΠ0 × (kΠ)
(4)
where AΠk are 1 × n vectors whose elements Aπk(j) are the probabilities found, AΠ0 is a 1 × n vector whose elements are the Aπ0 (j) probabilities for the n atoms in the molecule, and kΠ is the k-th natural power of the 1Π matrix. It is now very simple to calculate the k-th step MEDNEs, which represents the entropy involved in the attraction of electrons at least k steps
Prediction of Chemically Induced Agranulocytosis
Chem. Res. Toxicol., Vol. 16, No. 10, 2003 1321
Table 1. Calculation of kΠ, AΠk, and Θk(j) for Two Z/E Isomers When k Varies from 0 to 2 and j Is a Specific Atom or the Molecule as a Whole (SUM) (E)-HFCdN-Cl Xij
H
F
C
N
H F C N Cl
2.1 0 2.1 0 0
0 4 4 0 0
H F C N Cl
H 0.24 0 0.10 0 0
F 0 0.4 0.2 0 0
H F C N Cl
H 0.1 0.1 0.1 0.1 0
F 0.1 0.4 0.2 0.1 0
6.8 6.8 6.8 6.8 0 1Π(E) C 0.76 0.63 0.32 0.38 0 2Π(E) C 0.4 0.3 0.4 0.3 0.5
Xij
H
F
H F C N Cl
2.1 0 2.1 0 0
0 4 4 0 0
H F C N Cl
H 0.7 0 0.2 0 0
F 0 0.8 0.4 0 0
H F C N Cl
H 0.5 0 0.1 0 0
F 0.1 0.6 0.3 0 0
Cl
SUMj
0 0 8.2 8.2 8.2
0 0 0 3 3
8.9 10.8 21 17.9 11.2
N 0 0 0.39 0.45 0.73
Cl 0 0 0 0.17 0.27
SUM 1 1 1 1 1
N 0.3 0.1 0.3 0.3 0.4
Cl 0 0.1 0.1 0.2 0.1
SUM 1 1 1 1 1
C
N
Cl
0.9 0.9 0.9 0.9 0 1Π(Z) C 0.3 0.2 0.1 0.1 0 2Π(Z) C 0.2 0.3 0.4 0.4 0.5
0 0 2.8 2.8 2.8
0 0 0 3 3
3.02 4.92 9.82 6.72 5.8
N 0 0 0.3 0.4 0.5
Cl 0 0 0 0.4 0.5
SUM 1 1 1 1 1
N 0.1 0 0.2 0.3 0.3
Cl 0 0 0 0.3 0.3
SUM 1 1 1 1 1
H
F
C
N
Cl
SUM
AΠ 0
[0.09
0.17
0.28
0.34
0.12]
1
0.128
0.1787
Θ0(j) 0.21
0.22
0.156
Θ0 0.896
[0.049
0.115
0.391
0.355
0.09]
SUM 1
[0.088
0.1492
Θ1 0.22
0.22
0.13]
0.808
[0.071
0.1724
0.35
0.286
0.11]
SUM 1
[0.113
0.1817
Θ2 0.22
0.215
0.14]
0.879
Cl
SUM
AΠ 1
AΠ 2
(Z)-HFCdN-Cl SUMj
(bonds) away from any atom j in the molecule (Θk(j)). The sum of the Θk(j) values for all n atoms in the molecule gives the k-th MEDNEs used here as total molecular descriptors: n
Θk ) -
∑ Θ (j) ) -k ∑ [ π (j)] ‚ ln[ π (j)] k
B
F
C AΠ
0
[0.16
0.31
0.07
0.22
0.23]
1
[0.178
0.218
Θ0 0.11
0.199
0.204]
0.911
[0.13
0.28
AΠ 1 0.14
0.22
0.22]
SUM 1.00
[0.16
0.21
Θ1 0.17
0.20
0.20]
0.94
0.11
0.22
AΠ 2 0.37
0.16
0.14
SUM 1.00
0.15
0.20
Θ2 0.22
0.17
0.16
0.90
chemicals from nontoxic compounds. In this connection, we develop a discriminant function with the general formula:
AGRAN ) b + b0 Θ0 + b1 Θ1 + b2 Θ2 + ... + bkΘk
n
A
j)1
H
A
k
k
(6)
(5)
j)1
Boltzmann’s constant, kB, can be used, and this means that our molecular descriptors are measured in kJ/K. Table 1 illustrates the calculation of Θk for a pair of Z/E isomers, and these calculations can easily be extended to R/S or axial/ equatorial isomers. Computer Software. The calculation of Θk for any organic or inorganic molecule was carried out using the software MARCH-INSIDE (33). This software has a graphical interface that makes it user friendly for medicinal chemists. The chemical structure is input directly using the molecular graphics in the software draw mode. Clicking on an upper dynamic array, which contains the labels for the different groups of the Periodic table, enables selection of the active atom symbol. It is then possible to select the calculation option and perform the calculation of molecular indices. Statistical Analysis. As a continuation of the previous sections, we can attempt to develop a simple linear QSAR using the MEDNEs methodology. We selected linear discriminant analysis (LDA) (34, 35) to fit the classification functions. The model deals with the discrimination of agranulocytosis-causing
Here, Θk are the molecular descriptors codifying molecular structure. In total, we used 16 Θk (from Θ0 to Θ15) as input. The activity (output of the model) was codified by a dummy variable (AGRAN). This variable indicates either the presence (AGRAN ) 1) or the absence (AGRAN ) 0) of chemically induced agranulocytosis. In eq 6, bk represents the coefficients of the classification function, determined by the least-squares method as implemented in the LDA modulus of STATISTICA 6.0. Backward stepwise was fixed as the strategy for variable selection (35). Previously, we used tree joining cluster analysis (TJCA) (36) to design the training and predicting sets. Single distance was used as the clustering method, and Euclidean distance was selected as the linkage distance. The quality of the model was determined by examining Wilk’s λ statistic, Mahalanobis squared distance (D2), Fisher ratio (F), and the p level (p). We also inspected the percentage of good classification and the ratios between the cases and the variables in the equation and variables to be explored in order to avoid overfitting or chance correlation. Validation of the model was corroborated by means of an external prediction set; these compounds were never used to develop the classification function (35).
1322 Chem. Res. Toxicol., Vol. 16, No. 10, 2003
Dı´az et al.
Figure 4. Random, but not exhaustive, sample of the molecular families of compounds studied here.
Figure 5. TJCA used to design training and predicting series of agranulocytosis-causing (I) and innocuous (II) chemicals.
Results and Discussion
major types of drug-induced blood dyscracias (DIBD) as being hemolytic anemia, thrombocytopenia, aplastic anemia, and agranulocytosis/neutropenia. Agranulocytosis is defined as a severe form of neutropenia with a total granulocyte count