An Approach to Molecular Similarity using Density Functional Theory

Similarity Based on Atomic Electrostatic Potential in Heterocyclic Molecules: ... Identification of Active Molecular Sites Using Quantum-Self-Similari...
6 downloads 0 Views 406KB Size
J. Phys. Chem. 1994,98, 1135-1 138

1135

An Approach to Molecular Similarity Using Density Functional Theory Chengteh Lee' and Shep Smithline Cray Research, Inc., 655E Lone Oak Drive, Eagan, Minnesota 55121 Received: September 27, 1993"

This paper describes an approach to molecular similarity using a Gaussian expansion for the density which simplifies the calculation of the Carbo-type molecular similarity index. In addition, we introduce the notion of a local similarity index. The local index uses the fragment density of a molecule to compute the index and allows parts of molecules to be compared to one another. Our approach to similarity is tested on a variety of small test cases; benzene and substituted benzenes, water, the water dimer, and complexes such as HzO-F- and H20-Cl-. The results of these calculations are in qualitative agreement with previous calculations, and one's general intuition of chemical similarity. Finally, we point out that the results obtained here suggest that our approach can be applied to more complex systems to study questions involving biological activity and the binding of inhibitors to proteins.

I. Introduction

most of the computations using the Carbo index have been based on semiempirical calculations such as CNDO method^.^^^ The notion of chemical similarity1-the idea that compounds In this note we do not employ the CNDO method, but instead which share some macroscopic property have an underlying use the density functional program Dgauss, which is part of the microscopic similarity-is an old concept in chemistry. However, UniChem molecular modeling system?JO DGauss uses an it is only very recently that the concept has been put on theoretical auxiliary Gaussian basis to represent the chargedensity,*1-'3which standing,24 showing that it is possible to quantitatively define an reduces the computational effort of our similarity calculations, index, or more precisely a functional, of the electrostatic potential, allowingeq 1 to be written as a sum of simple products of Gaussian density, or wave function. Such an index can be used to predict functions. a variety of important properties, including biological a ~ t i v i t y , ~ , ~ In addition, we also use the fragment density to compute the chemical reactivity,' and the chemical properties we describe in similarity index of eq 1. The fragment density is the density this study. arising from the orbitals centered on a subset of atoms in the There are several definitions for molecular similarity have been molecule, and use of this density allows us to compare a fragment proposed in the literature. These include the following: NOEL, of parts of one molecule to one another. The paper is organized number of the overlapping electrons, which is defined using the as follows: section I1 describes the computational methods while difference between the first-order density matrix of two molecules? section 111tests the methods on a series of simple examples. We and the Carbo2 and Hodgkin4 indices, which are defined using conclude our paper in section IV. overlap of the electronic densities2J or electrostatic potentials6,* of molecules. II. Methods: Global Similarity and Local Similarity In this work we use the definition of overlapping the electron The molecular densities used in these chemical similarity densities for similarity index, which was proposed by Carbo, calculations are generated from DG~USS,~JO a Gaussian-based Leyda, and Arnau.2 density functional program. In this sort of density functional scheme, the molecular density is fitted with auxiliary basis J P A ( ~ ) P B ( ~ )dr functions in order to solve efficiently the electrostatic Coulomb S= (1) c o n t r i b u t i ~ n s . ~ ~Thus - l ~ by expanding the density in a linear [ J P A w d r J P B w drl combination of Gaussian functions where pA(r) and p&) are the molecular densities of molecule A and B. The index S of eq (1) is maximized with respect to the (3) relative molecular orientations of the two molecules, A and B. This definition is related to one's intuitive notion of chemical where gk(r) are Guassian functions and Ck are coefficients similarity. That is, two densities are most similar when their determined variationally in the KohnSham c a l c ~ l a t i o n ,the ~~ mean square difference is minimized: global similarity is computed as n

It is clear that the configuration that minimizes mean square difference is the same configuration that maximizes the Carbo index. Note that the denominator of the eq 1 is used only as a normalization factor which ensures that the Carbo index equals 1 when two molecules are identical. The computation of the Carbo index depends on the basis set used to construct the molecular densities, making the index itself more difficult to compute the larger the system. As a result, @

Abstract published in Aduance ACS Abstracts. January 1, 1994.

0022-365419412098-1 135$04.50/0

Here the index i indicates the Gaussian functions are taken from molecule A and j indicates that the Guassian functions are from molecule B. The similarity expression of eq 4 depends only on a two-center overlap of Gaussian functions and can be computed analytically. Having defined the similarity index in eq 4, one can maximize the index with respect to the six degrees of freedom corresponding to translation and rotation of one molecule relative to another. 0 1994 American Chemical Society

1136 The Journal of Physical Chemistry, Vol. 98, No. 4, 1994

Equation 4 was maximized using a simple bisection algorithm, the value at the maximum being defined as the Carbo index. The initial configuration for the bisection algorithm was chosen by minimizing the distance between the planes defined by three atoms on each molecule. Since the Carbo similarity exhibits many local maxima, the value of the index depends sensitively on the initial configuration. To address this issue, one needs a global maximization algorithm to determine the maximum of the similarity index. One such approach is simulated annealing, and work is underway to implement this for molecular ~imi1arity.I~ With a formulation similar to the index defined in eq 1, we propose a local similarity, SL, in terms of the fragment densities of the molecules, p&) and p&)

Lee and Smithline

: li'

0.6365

s,

dr JPgW dr

=

1: /

1

0.999

s 0.636

0.6355

JP&)

i

I

(5) [Jp,2(r) dr JP&)

drl 1'2

where the fragment density satisfies

0.635

0.996 1

2

3

4

R

a

where the superposition of the fragment densities is the molecular density. The definition of eq 5 is not restricted to comparing the similarity between two fragments from two different molecules; we also can measure the similarity between two regions in a single molecule. The construction of the fragment density is not unique. In these studies, we generated fragment densities by rewriting eq 6 as a sum of the Gaussian functions located on each atom in the molecule

where patom(r)is the atomic density obtained from the summation of the Gaussian functions located at the atom. Then the fragment density pa@), is constructed from the atomic density, patom(r),in the fragment a atom

As with the global similarity measure, the local similarity is maximized using a simple bisection algorithm. 111. Results and Discussion

The densities we used for the computation of the similarity index are obtained from local density functional approximation with the exchange-correlation functional of Vosko, Wilk, and Nusair.I6 We used double-{ plus polarized functions as the molecular basis set and DZVP or DZVP2 and A1 or A2 auxiliary functions for the fitting of the electron density in the KohnSham self-consistent calculation. The details of ~e basis set and auxiliary functions see ref 1 1. The similarity index is maximized with respect to the six degrees of freedom corresponding to translation and rotation of one molecule relative to another. Equation 4 is used for the calculation of the global similarity, and equ 5 for the local similarity index. The global and local similarity indices between benzene molecule and the substituted benzenes, aniline, nitrobenzene, and nitroaniline are given in Table 1. The DZVP basis set was used in the KohnSham self-consistent calculation, and A1 auxiliary functions were used for densuity fitting. The global similarity index is the index between the benzene molecule and other molecules, while the local similarity index is obtained from the overlap of the carbon ring charge densities in benzene and substituted benzene. The local similarity index allows us to see how external perturbations, such as the NO;! group in nitrobenzene or NH2 group in aniline, affect the electronic density arising

Figure 1. Global (solid line) and the local (dash line) similarity of the H20-F- as function of the distance R.

TABLE 1: Global and Local Similarities of the Benzene and Other Molecules benzene benzene (ring only) benzene 1.oooo 1.oooo aniline 0.8782 0.9898 nitrobenzene 0.6779 0.9981 p-nitroaniline 0.6422 0.9929 m-nitroaniline 0.6261 0.9673 o-nitroaniline 0.6291 0.9723 from the conjugated carbon atoms of the benzne ring. The symmetry of the perturbations plays an important role as the perturbation resulting from p-nitroaniline is different from m-nitroaniline. A number of important results are apparent from Table 1. First, note that the global similarity of benzene compared to benzene is 1, as it should be when identical molecules are compared. Second, the similarity index in Table 1 displays the same trend as the NOEL indices which was proposed by Cioslowski and Flei~chmann.~ We both observe the p-nitroaniline has the highest global similarity, followed by m-and o-nitroaniline. This is not surprising since in the para form the groups are separated by 180' whereas they are separated by a smaller angle in the meta and ortho structures. Since the perturbation of the groups is more nearly symmetrical in the para structure than in the other anilines, the resulting perturbation on the electronic density of its carbon atoms is smaller for the para form and consequently the similarity index is larger. Another important observation from Table 1 is that the local similarity indices of the substituted benzenes are nearly 1 in contrast to the global similarity indices. Since the localsimilarity is computed from the density on the atoms in the carbon ring, a local similarity of nearly 1 means that the chargedensity change in the carbon atoms of the substituted benzenes due to the substituent is small. We also used our similarity program to study water systems. Figure 1 shows the global and local similarity between H20 and H20-F- as a function of the distance between the water molecule and F-. This curve was determined by carrying out geometry optimization for an isolated water molecule and then performing a self-consistent energy calculation without geometry optimization for the system formed by placing F- at a distance R along the OH bond vector of the water molecule. Note that the global similarity index shows that a maximum similarity of the H20 and H20-F- occurs around at a distance of 0.9 A. As expected, both of the indices go to constant value at large distance R.

The Journal of Physical Chemistry, Vol. 98, No. 4, 1994 1137

An Approach to Molecular Similarity

/" -H ,- - -- - x

0

X = F, Ne, CI', Ar, H,O

R

0.1384

Figure 2. Similarity calculated in terms of the distance R. 0 . 0 6 8 ~I

I

1 1 ,

I

,

, ,

,

I

,

,

I

,

I

I

0

,

-0.005

-0.0675

I !

0.1383

1

/

-0.02

/

/ /

/

AS

PE 0.1382 -0.01

I

/

0.066

0.0655

-1

1

4

I,,,, ,

1

,

0.1381

-0.06 AE

-0.08

i

,

, I

, , , ,

2

I

3

, , , ,

I

, , , ,

4

1

I

-0.1

I I I

5

R

Figure 3. Similarity differenceAS (solid line) and the dissociationenergy (AE (dashed line) for the H20-F- complex molecule.

In addition, we also investigated the difference of similarity indices which, allows us to study the differences of the charge density in these systems. Figure 2 shows the systems studied. By computing the similarity between H2O and H20X, where X is an atom or molecule and substracting the similarity between H20 and HzOY where Y is a noble gas atom with the same number of electrons as X, the difference in the charge density can be quantified. Since the total number of electrons is the same in both systems and the degree of charge transfer is minimal in H20Y, the difference is a suggestive measure of charge transfer. Thus, we compute the difference,

0.0238 I

L

0.0236

I

1 1

(9)

x = Cl-

where S(H20X,H20) and S(H20Y,H20) are the similarity indices computed for H2O and H2OX and H20 and H20Y, respectively, such that X and Y are at a distance R along the OH bond vector of water. It is clear that by subtraction of the noble gas index, the difference of the similarity is due solely to the difference of the nuclear charge. The difference of the global similarity index is shown in Figures 3-5. For the calculations, we used DZVP2 as molecular basis function and A2 auxiliary functions for the electron density fitting in the Kohn-Sham calculations. The similarity index was calculated from the overlap of the densities of the H20 and H2OX(Y) molecules by eq 4;then the indices were used to calculate the difference of the similarity indices, AS, according to eq 9. Figure 3 shows that the difference of theglobal similarity index of H20-F and H20 is plotted as a function of the distance R. The dissociation energy of the complex, HzO-F-, AE = E(H20-F-) - E(H20) - E(F-), is also shown in the figure. The larger the difference of the similarity indices, the larger of the distortion of the charge density is from that of the isolated water molecule. This distortion of the charge density suggests that the electron transfer occurs in the complex. In present case, the maximum difference of the similarity indices occurs near the minimum of the dissociation energy curve, where the system is most stable. The difference of the global similarity indices for the water dimer case is shown in Figure 4. The results are similar to those

/

/ /

/

1

/

/ /

-0.02AE

I

i I

-0.03

1

AS = S ( H 2 0 X , H 2 0 )- S(H,OY,H,O) X=FY = Ne Y = Ne x = H20 Y = Ar

-0.01

/

/

I'

0.0232

,, ,,

0.023

\ \

I

2

/

4

3

\,

5

R

Figure 5. SimilaritydifferenceAS (solid line) and the dissociationenergy AE (dashed line) for the H20-CI- complex molecule. H

Water A

H' Water B

Figure 6. Notations for the water dimer in local similarity calculations.

for the H20-F- system as the maximum of the similarity index occurs approximately a t the minimum of the energy curve. Figure 5 shows that the similarity difference calculation of H2O-Cand H20 is not correlated with the minimum energy curve suggesting that the minimum energy configuration arises from interactions other than purely electrostatic terms. We also tested the local similarity index for the water dimer. The notation and structure for the water dimer are given in Figure 6. By using the local similarity index defined in eq 5 to calculate the similarity of an individual water in the water dimer with an isolated water molecule, we can determine which water of the dimer pair is most similar to the isolated water. The local similarity index of each of the A and B waters at different distances is plotted in Figure 7. We can see from Figure 7 that water B has a larger local similarity at all distances R . The larger local similarity of water

Lee and Smithline

1138 The Journal of Physical Chemistry, Vol. 98, No. 4, 1994 1

I

I

I

/

r

1

,

: -8-

--

-r

-

-I

I

i 0.99996

0.99994

1

1.5

2

2.5

3

R

Figure7. Local similarity of the water A (solid line) and water B (dashed line) in the water dimer.

B can be explained by recognizing that in the formation of a hydrogen bond an electron is transferred from the hydrogen atom in the water molecule A to the oxygen atom in the water molecule B. Clearly, the percentage of the charge density change in the oxygen atom in water B is relatively small compared to the percentage of the charge density change in the hydrogen atom in water A. Therefore, the electron density of water B in the water dimer of Figure 6 is more similar to the electron density of the isolated water than water A is to the isolated water, and consequently water molecule B has larger local similarity than water A. IV. Concluding Remarks We have calculated the global and local similarity index in terms using the Carbo index. The local similarity index provides us with a method to investigate the similarity between a chemically interesting region of a large molecule and other molecules or other fragments. This allows us to understand how the chemical environment effects a particular local region of interest. One

possible application of local similarity is to an inhibitor-protein systems. By computing the local similarity using the fragment densities of a series of inhibitors bound to a protein, one can investigate why one inhibitor might be a better inhibitor than another. In addition, the success of the difference similarity to qualitatively measure charge transfer suggests that the notion of subtracting to similarity indices can be used to study other properties besides charge transfer. By introduction of a reference system which does not have some property (e.g. charge transfer) and subtracting it from a target system which is believed to have it, the degree to which the target system has this property can be determined. In summary, the test calculations presented in this note indicate that global and local similarity should become a useful tool for a variety of problems in computational chemistry, and we hope to use our tool to study some of these molecular modeling problems in the future.

Acknowledgment. We would like to thank the management at Cray Research, Inc., for supporting this work. The authors thank Dr. Ilene Carpenter for helpful discussions. References and Notes (1) Heller, S. R. J . Chem. InJ Compur. Sci. 1993, 32, 578. (2) Carbo, R.; Leyda, L.; Arnau, M. Inr. J . Quantum Chem. 1980,17, 1185. (3) Cioslowski, J.; Fleischmann, E. D. J. Am. Chem. SOC.1991,112,64. (4) Hodgkin, E. E. Richards, W. G. Inr. J . Quantum Chem. Biol. Symp. 1987, 14, 105. (5) Carbo, R.; Calabuig, B. J . Chem. InJ Compur. Sci. 1993,32, 600. (6) Burt, C.; Richards, W. G. Huxley, P. J . Comput. Chem. 1990,11, 1139. (7) Ponec, R.; Stmad, M. J . Chem. InJ Compur. Sci. 1993, 32,693. (8) Richard, A. M.J . Compur. Chem. 1991, 12,959. (9) UniChem is a computational chemistry software package available from Cray Research. (10) DGauss is a software product available from Cray Research, Inc. For details of DGauss methodology, see ref 1 1 . (11) Andzelm, J.; Wimmer, E. J . Chem. Phys. 1992,96, 1280. (12) Sambe, H.; Felton, R. H. J . Chem. Phys. 1974, 61, 3862. (13) Dunlap, B. I.; Connolly, J. W. D.; Sabin, J. R. J. Chem. Phys. 1979, 71, 3396, 4993. (14) Kohn, W.; Sham, L. J. Phys. Rev. A 1965, 140, 1133. (15) Laird, B.; Lee, C.; Smithline, S. Work in progress. (16) Vosko, S. J.; Wilk, L.; Nusair, M. Can. J . Phys. 1980, 58, 1200.