A Gaussian Description of Molecular Shape - American Chemical

Dec 28, 1994 - Expressions for the volume and surface area of this Gaussian ..... also given the Brookhaven entry code and the number of .... 1993, 14...
1 downloads 0 Views 888KB Size
J. Phys. Chem. 1995, 99, 3503-3510

3503

A Gaussian Description of Molecular Shape J. A. Grant* Zeneca Pharmaceuticals, Mereside, Macclesfield, Cheshire SKlO 4TF, England

B. T. Pickup Centre for Molecular Materials, Department of Chemistry, The University of Shefield, Shefield S3 7HF, England Received: October 11, 1994; In Final Form: December 28, 1994@

A simple description of molecular shape is presented in which overlapping atom-centered Gaussians replace the more conventional intersecting hard spheres. Expressions for the volume and surface area of this Gaussian model are obtained, as are the gradient and Hessian of the nuclear coordinate derivatives. The quantities calculated using the Gaussian model are compared to their hard-sphere analogues, for simple model systems, small molecules, and some macromolecules (proteins).

1. Introduction The shape of a molecule and quantities derived from it can be used to interpret some features of their physical properties. For example, in the simplest continuum models of solute-solvent interactions are described in terms of atomic contributions that are proportional to either accessible surface areas or volumes. Simple regression models6-' make use of area and volume to obtain descriptions of solution properties such as solubility, partition coefficients, and rate constants. More sophisticated solvation models, based on solution of the Poisson-Boltzmann e q ~ a t i o n ,also ~ , ~ rely on molecular shape in order to describe the boundary between regions of different dielectric constant. The shape of a molecule partly dictates whether it is able to bind to the active site of a protein or an intercalative site of DNA. Shape comparisonlo can be helpful in identifying molecules with similar spatial (steric) properties but belonging to different structural classes. It is hoped that such comparisons can lead to alternative phannacophore models in the design of ligands to bind to receptor sites.11s12Mezey13 has recently presented a rigorous mathematical discussion of these and other concepts relating to molecular shape. This book offers a very broad viewpoint and contains innovative techniques which include a combination of quantum chemical information with topology. The most widely used method of describing molecular shape is to treat a molecule as a collection of atom-centered intersecting spheres of differing radii. Such an approach gives rise to three different kinds of surfaces. The van der Waals surface is simply the surface formed from the atom-centered intersecting spheres. In the case of large macromolecular structures such as globular proteins, a large fraction of the van der Waals surface belongs to the protein interior and is inaccessible to bulk solvent. To account for this, alternative definitions of molecular shape have been proposed that make use of a finite-sized spherical probe, to filter out area and volume contributions from the interior of macromolecules that would be inaccessible to bulk solvent. The solvent-accessible surface14 is defined as the surface traced by the locus of a sphere rolled over the van der Waals surface. The radius of the sphere is chosen to approximate that of a water molecule. This surface is displaced outward from the van der Waals surface by a distance equal to

* To whom correspondence should be addressed.

e Abstract published in Advance ACS Absrrucrs, February 15, 1995.

the radius of the probe sphere. Finally, the molecular surface15 is defined to be the smooth surface traced by the inward-facing part of the probe as it rolls over the set of atom-centered intersecting spheres. Unlike the van der Waals surface or the solvent-accessible surface, the molecular surface has a continuously defined surface normal. The convex faces of this surface, made when at any one time only one atom in the molecule touches the probe, correspond exactly with the van der Waals surface. However, when the probe is simultaneously in contact with more than one atom, tracing the path of the inward-facing surface of the probe gives rise to a smooth reentrant surface. There are many analytical and numerical algorithms for the calculation of the areas or volumes of these surfaces and their graphical representation. Perhaps the most important of these algorithms is due to Connolly. It gives rise to the analytic molecular surface area16J7and molecular volume computation1* and is sufficiently general to enable the easy computation of the van der Waals and solvent-accessible surfaces. This algorithm very accurately represents the molecular surface but does not account fully for situations in which the molecular surface intersects itself, leading to cusps, which result in small errors in computed volumes and areas. A similar third-order algorithm for calculating analytical areas and volumes of the solvent-accessible surface, which also presented the equations of the nuclear coordinate derivatives, has also been described by R i ~ h m 0 n d . l ~ The algorithm of Connolly has been used4 to calculate solvent-accessible atomic areas and their nuclear derivatives, as part of a solvation function that was incorporated into energy minimization calculations of peptides and small proteins. This work developed some important protein solvation models and demonstrated the value of using such models to distinguish native from near-native conformations. However, one difficulty encountered was the speed of computation of the surface and its nuclear gradient. An extremely elegant solution to this problem was the development by Perrot et a1.20*21 of a new analytical algorithm to compute solvent-accessible atomic areas and their nuclear coordinate derivatives. This very fast algorithm (which also scales almost linearly with respect to the number of atoms) recursively searches the exterior surface of a molecule to locate all vertices and their connecting edges. This is achieved without considering any of the large number of hardsphere intersections that are not part of the exterior surface, a

0022-3654/95/2099-3503$09.00/0 0 1995 American Chemical Society

Grant and Pickup

3504 J. Phys. Chem., Vol. 99, No. 11, I995 time-consuming component of the Connolly algorithm. There are a couple of difficulties with this algorithm. Firstly, the method can only obtain the exterior surface of a protein, though for the empirical solvation models frequently used it is uncertain whether this is a disadvantage because the way in which an interior protein surface is treated by such models is obscure. The algorithm will occasionally fail if the surface consists of domains connected by free saddles, that is, a piece of surface along which the probe can roll freely in contact around two atoms without bumping into another atom. Finally, some instabilities in the gradient of the algorithm have been observed that lead to difficulties during the search for local minima. A very detailed mathematical treatment and discussion of discontinuities in the nuclear gradient of the molecular surface area has been given by Wawak et a1.22 A solvent-accessible area solvation model has also been incorporated into energy minimization calculations using a corrected version of Richmond’s original m e t h ~ d . ~ ~ , ~ ~ The Connolly algorithm is the basis of an analytical technique to compare the shapes of molecules by computation of their intersection volumes.11 A similar method12 leads to a quantitative measure of similarity in molecular surfaces, by assigning such surfaces a finite thickness, a molecular skin, and computing the intersection volumes between these skins. The method expresses the intersection volume as a sum and difference of different union volumes and as such can only make use of purely convex surface pieces, namely, those of the van der Waals surface or the solvent-accessible surface. Although very accurate, the Connolly-based analytical algorithms for computing volumes and area are not exact solutions for describing the fused hard-sphere model, because only sphere intersections up to third order are explicitly treated. More rigorous methods of volume and surface area computation (for van der Waals and solvent-accessible surfaces) have also been d e ~ c r i b e d . ~These ~ . ~ ~ methods use the inclusion-exclusion principle, combined with analytical treatments of fourth-order intersections. Higher order intersections are included by using theorems to reexpress such contributions in terms of lower order terms, starting from fourth order. The more detailed method of Petitjean26includes a calculation in which all intersections up to order 13 have been explicitly calculated. Currently, analytical derivatives are not available for either of these methods, and it is not clear if they are fast enough for calculations involving macromolecular systems. The intersecting sphere model is not the only representation of molecular shape. It is also possible to extract some features of molecular shape from ab initio wave functions, for example, defining the surface to be an isoelectron density contour. Alternatively, it is possible to approximate simply the ab initio electron density function as a superposition of atom-centered Gaussians. This description of molecular shape has been used by Mezey et as part of a method to characterize the shapes of molecular charge distributions and to enable the comparison of shapes within a given series of molecules. A similar representation of molecular shape has also been used to evaluate the Carbo similarity A related a p p r o a ~ h ~ ~ isvto ~l construct ab initio quality charge distributions for large molecules from the charge densities of small molecular fragments. The densities constructed in this manner could then be used to analyze molecular shape, including computation of volume and area quantities. A two-parameter atomic Gaussian function has been used to describe the exposure of atoms and molecular fragments to solvent.32 A simple integral function of these (off-center) atomic Gaussians is used to define a Gaussian neighborhood which

behaves in a complementary fashion to the conventional definition of solvent accessibility, viz., the smaller the Gaussian neighborhood, the more exposed the atom and hence the larger its accessible area. This method was used to analyze elements of secondary structure in terms of solvation behavior, though it was not used as part of a potential energy minimization procedure. A very similar description of the shape of atoms in molecules, in terms of an occupancy function defined by multiplying a fragmental volume by a Gaussian envelope, has been used to incorporate a continuum solvation contribution into molecular dynamics simulations of a small protein C ~ n n o l l has y ~ ~used methods taken from algebraic topology to give a general technique for computing molecular volumes using the adjoint join formula. This technique provides a method for decomposing volumes into simpler component volumes. It does not provide a clear way to compute derivatives to arbitrarily high order. A hard-sphere methodology is not, therefore, definitive as an approach to molecular shape. The hard-sphere concept is probably best understood in terms of the space occupied by the electrons in the molecule. That is, an appropriate isosurface contour of the electron density can be represented by a set of intersecting atom-centered spheres. Given that the hard-sphere technology has many technical difficulties and is not theoretically well founded, it seems sensible to seek a more convenient representation of molecular shape. The method described in this paper uses the Gaussian technology adopted in quantum chemistry to represent atomic orbitals.35 We replace the hardsphere function

where r is a radial coordinate with a Gaussian

g(r) = exp(-a?) which has a rapid, smooth falloff with distance. The advantages of Gaussians lie in the (1) ease of integration, (2) coalescence properties of products of Gaussians, and (3) ease of finding derivatives with respect to nuclear position. Our motive for adopting Gaussian representations is to make use of these advantages, while obviating some of the problems encountered using hard spheres, for example the gradient discontinuities. Using the Gaussian methodology, it is hoped that it will be possible to introduce molecular shapes into molecular mechanics in a very efficient manner and provide a rapid technique for comparing the shapes of molecules. The next section presents a set of very simple formulas to compute areas and volumes using Gaussian functions and the first and second derivatives of these quantities with respect to the nuclear coordinates. The results of the calculations using these equations are discussed for some simple model systems, small molecules, and large macromolecules (proteins).

2. Volume of a Multicenter4 Molecule 2.1. Hard-Sphere Volumes. The volume and surface area of a collection of intersecting hard spheres (a “molecule”) can be calculated from the formula

where we follow the convention of Gibson and S ~ h e r a g asuch ,~~ that the i refers to individual spheres at points Ri = (Xi, Yi, Zi)

Gaussian Description of Molecular Shape

J. Phys. Chem., Vol. 99,No. 11, I995 3505

and mhs indicates a measure (volume, V, or surface area, A). The alternating series of positive and negative terms arises because of the need to take into account double, triple, and higher order intersections between the spheres. In principle, the hard-sphere volume can be obtained by integrating a “matter” density which for a single atom of radius ai is given by

&ri) = e(ai- ri>

(2)

where 8 is the unit step function: (3)

where ai is a “sphere radius” and Ki is a dimensionless constant. The Gaussian density

is such a representation, although it has to be acknowledged that the distance dependence is too strong in comparison with the true exponential variation of electron densities and a Gaussian has a flat top unlike the cusped electron density. Our philosophy in this work is merely that we are introducing a matter density with appropriate technical advantages which will lead to rapid accurate and analytical algorithms for working out molecular volumes, areas, and connected quantities. To this end, we would like to define an atomic Gaussian volume as

The local radial variable ri is defined with respect to the atom center as origin, viz.,

ri = lril = Ir - Ril

(4) We can ensure that

The hard-sphere volume of atom i is

F = f d r i ey(ri)= 4n -ai

3

3

provided we define

The hard-sphere density for the whole molecule is now defined so as to reproduce eq 1

K.

n =-

and set p h i = 4n -

3

and the molecular volume is The two parameters pi, and Ai, subject to condition 15, reproduce the hard-sphere volume independently of the sphere radius (ai). The use of a large pi gives a small Ai and hence a tight (rapidly decreasing) but tall Gaussian. The use of a small pi, on the other hand, gives a short diffuse (slowly decreasing) Gaussian. We can also enquire how much of the density lies inside the sphere radius. Hence,

The nth order intersection volume is defined as

The practical evaluation of higher order intersections in (8) is time-consuming and prone to numerical errors. The matter density referred to above in (2) gives the conventional hardsphere representation of molecular shape. A more reasonable definition would be based upon the distance variation of observed electron densities in atoms and molecules. The electron density decays exponentially with distance from the molecular nuclei. For this reason, it seems sensible to replace the hard-sphere model by a representation possessing such an appropriate distance dependence. 2.2. Gaussian Densities. We consider an atomic center i, with coordinates Ri, and define a spherical Gaussian of weight Pi as gi = p i exp( -airi2)

(9)

where ri is a local radial coordinate (eq 4) in terms of the global coordinate system. The constant pi is the value of the Gaussian at its maximum on center Ri. Our intention is to replace the concept of a “hard-sphere” volume by a soft Gaussian representation. To that end, it is convenient to replace the exponent ai controlling the rate of decay by a scaled exponent K:

ai= -2 ai

v‘“ = 2 n f d r i

rfgi(ri)

where the subsidiary function

The quality V“ is not used in what follows but serves to emphasize that the definition of the radius in a soft-sphere model is fundamentally different from that of the hard-sphere analogue. 2.3. Gaussian Volumes. We can now define a Gaussian version of (6) in which we replace the hard-sphere matter density by the Gaussian version based upon (11)

The Gaussian volume of the “molecule” is

v8 = j d r eg(r)

(19)

In order to calculate the quantities appearing in (18) and (19), we must be able to simplify the product terms representing

Grant and Pickup

3506 J. Phys. Chem., Vol. 99, No. 11, 1995 intersection volumes. This problem is solved using the Gaussian product theorem described in the next subsection. 2.4. Gaussian Product Theorem. We will consider a set of n Gaussians gi (i = 1, 2, ..., n) each with exponent ai and situated at points Ri as in (9). We now define a product coalescence center via

volume. Hence, taking the derivative of (27) with respect to the pth Cartesian coordinate of center i (in what follows, p and q index the components n,y,z), we find that

where the vectors

where the total exponent

define center i with respect to the coalescence center as origin. The second derivatives (the Hessian matrix) can be written as

We also define a product factor

-2aiq2,,,n{4,dpq - up:,:> i j c ( 1 2 ...n} otherwise (31) where where the inter-Gaussian distance R, = [(Xi- Xj)' -I- (Y;-

q)2-I- (2;- ZJ']"'

(23)

The Gaussian product theorem allows us to write a product of n Gaussians, each on different centers, as a single Gaussian localized at the coalescence center (eq 20). Hence,

n2)

5 2...

(24)

i= 1

The weight factor in (24) is defined as

and the radial coordinate for the product Gaussian is

The theorem was first introduced into quantum chemistry by Boys35for two Gaussians. We have introduced explicit formulas (eqs 20-22) for computing the generalized coalescence for n Gaussians (checked by the authors using mathematical induction). Since each intersection term in (18) can be written as a single Gaussian, it follows that the whole expression can be integrated exactly and that an analytical formula exists for the volume. For n Gaussians gl, g2, ..., gn, we can write the nth order intersection volume as

Formulas 28 and 31 give rise to an elegant and efficient algorithm for evaluation of volume gradients. This is because once the volume has been computed there are no new terms in the gradient expressions other than the computation of a simple vector difference involving the atom centers and the previously computed coalescence center (Rl2.-"). 2.6. Gaussian Areas. K r a t k ~observed ~~ that the surface area of the intersections of any number of spheres with equal radii u was related to the volume of intersection by

This formula was extended by Gibson and Scheraga to cover the case of N intersecting spheres of unequal radii. If the spheres have m different radii uk (1 Ik Im),then the area is given by (34) The total area S of N intersecting spheres can also be expressed in terms of contributions from the kth sphere according to N -

=

The simplicity of this formula leads to a very efficient computational algorithm. 2.5, Derivatives of the Gaussian Volume. For any practical optimization method in which molecular shape is part of the function being minimized or maximized, the gradient, and preferably the Hessian of the molecular shape, is required. The gradient referred to is the gradient with respect to nuclear coordinates. The simplicity of function 27 for the Gaussian volume ensures that the nuclear coordinate derivatives of (27) are almost as simple. In the following, we shall present equations for the derivatives of a general n-fold intersection

(35)

k=l

av

The latter two expressions describe a convenient partitioning of the total surface area into atom contributions. This is important for simple solvation models, in which the total solvation function depends linearly on the area of each atom. In order to obtain the surface area for a set of overlapping Gaussians, it is necessary to sum the radii derivatives of the Gaussian volume intersection, eq 7; that is,

Gaussian Description of Molecular Shape

J. Phys. Chem., Vol. 99, No. 11, 1995 3507

70

110

1

85

'

I

4.5

5

1W

Bo 90 55

80

50

7 2

2 5

1

F i

45

4

g

0

70

60

35 50

30 40 25

20 15

30

__I'

0.5

1

1.5

2 2.5 3 dimer separaHon (A)

3.5

4

4.5

5

20

0.5

1

1.5

2 2.5 3 dlmr separation (A)

3.5

4

Figure 1. Gaussian volume vs dimer separation (radius = 2.00 A). Each dotted line represents a different p J parameterization of the Gaussians (see text). For comparison, the analytical hard-sphere dimer volume is plotted as a diamond.

Figure 2. Gaussian area vs dimer separation (radius = 2.00 A), Each dotted line represents a differentp J parameterization of the Gaussians (see text). For comparison, the analytical hard-sphere dimer area is plotted as a diamond.

The radial derivatives of the Gaussian volume intersection (eq 27) can be obtained by straightforward differentiation as

center of the other (2.0 A), and slightly underestimates it until the point that the hard spheres no longer intersect. In all cases, the Gaussian volume in the region where the hard spheres do not intersect agrees to high accuracy with the analytical volume, a direct consequence of the constraint imposed by the choice of p,A. The Gaussian (p = 2.50, A = 1.6755) reproduces the hard-sphere volume almost exactly at most dimer separations, with only a small error in the region (2.0-3.0 A). Figure 2 is the analogous graph of surface area for this simple dimer system. Again it is the Gaussian (p = 2.50, 1 = 1.6755) that best reproduces the hard-sphere result, though the error in the region (2.0-3.0 A) is in this case slightly larger. These trends are also illustrated for a three-atom system, with each atom at the vertex of an equilateral triangle, a four-atom square-shaped system, and a six-atom hexagon-shaped system, in graphs provided in the supplementary material. The behavior of the hard-sphere analytical volume and area is a little more complex than for the dimer and trimer cases. The Gaussian description does not reproduce these volumes or areas perfectly, but the agreement is reasonable. A key feature of all of these calculations is that the volume is always better represented than the area when using the Gaussian description. In the supplementary figures, it sometimes happens that the analytical hardsphere volumes or areas do not vary smoothly around the region where the spheres just intersect. This behavior is a feature of the third-order algorithm we used to compute these hard-sphere volumes and areas. In contrast, the Gaussian description of volume and area always varies smoothly in this region. 3.2. Molecular Calculations. The Gaussian surface area and volume computed for the simple systems described in the previous section were exact in the sense that all terms up to order 6 and all possible products in each summation term were included when using the set formula for Gaussian measure, eq 18. In the case of real molecules with an arbitrary number of atoms N, this formula should be summed to order N, and each summation will comprise a combinatorial number of product terms, the evaluation of which is clearly impractical. A simple algorithm has been adopted to compute volumes and areas from eq 18 for an arbitrary number of atoms. First, an atom neighbor list is constructed in which atoms are defined to be neighbors if

The nuclear derivatives of (38) are derived trivially because

(39) and the nuclear derivative of the and 31.

f12,,,nterm is given by eqs 28

3. Calculations

3.1. Model Systems. To illustrate the use of Gaussian functions to compute the molecular volume and area, we have carried out calculations for model systems. These simple systems are chosen to be representative of some of the types of intersections and shapes found in real molecules. The simplest of such systems is a dimer, in which each atom is represented by a single Gaussian, and we have chosen a radius of 2.00 A. The volume expected for hard spheres of corresponding radius at different dimer separations is plotted in Figure 1 (diamond symbol). This volume varies smoothly between 33.5 A3 at zero separation (monomer volume) and 67.0 A3 at 4.00 A, at which point the two hard spheres exactly touch but do not intersect each other. Clearly, beyond 4.00 A the hard-sphere volume is constant. The set of broken lines represents the volume calculated from the overlapping Gaussian functions, each broken line representing a different p , A parametrization (see eq 15). The uppermost line is the Gaussian (p = 1.00, A = 4.188), and the subsequent broken lines represent values of p incremented by 0.5. The lowest line represents the Gaussian (p = 4.00, A = 1.0472). The Gaussian (p = 1.00, A = 4.188) overestimates the analytical hard-sphere volume at small dimer separations and slightly underestimates the total volume at the point at which the hardspheres just intersect. The lowermost curve (the Gaussian (p = 4.00, A = 1.0472)) badly underestimates the analytical volume at small dimer separations, overestimates it at the point at which the edge of one hard sphere touches the

IR,-Rjl S u i + u j + 6

(40)

where E is an arbitrary parameter called the Gaussian cutoff. Because of the form of eq 18, it is only necessary to save those

3508 J. Phys. Chem., Vol. 99, No. 11, 1995

Grant and Pickup

TABLE 1: Atomic Radii Used for the Molecule Calculations atom radius. A C 1.70 N 1.65 H 1.00 0 1.60 S 1.90 TABLE 2: Comparison between the Gaussian and Hard-Sphere Volumes for Some Blocked Amino Acids vol, A3

blocked residue Ala k g

Asn CYS Gln His

Ile Leu LYS Met Phe Pro

Ser Thr Trp

TYr Val

Gaussian 130.3 191.4 161.1 151.8 176.0 181.6 172.5 173.6 181.8 180.1 196.8 151.5 139.9 152.5 225.0 206.2 158.1

hard sphere 128.7 186.5 160.1 150.1 175.0 180.3 170.4 171.5 182.8 177.9 195.1 150.0 138.8 151.5 223.6 204.8 157.0

TABLE 3: Comparison between the Gaussian and Hard-Sphere Areas for a Few Blocked Amino Acids area, A2 blocked residue Gaussian hard sphere Ala 170.5 169.2 Arg 234.4 224.0 Asn 203.8 203.7 CYS 192.2 192.0 Gln 223.0 225.7 His 226.1 225.0 Ile 215.7 215.4 Leu 219.5 220.5 LYS 231.5 240.2 Met 223.6 223.5 Phe 241.1 240.5 Pro 191.3 191.2 Ser 181.0 180.6 Thr 190.0 190.6 Trp 267.1 270.0 TYr 250.2 252.4 Val 198.1 202.0 TABLE 4: Comparison between Gaussian and Hard-Sphere Volumes for a Few Protei& vol, A' protein no. of hard % cpu (Brookhaven entry) residues Gaussian sphere error time, s 1cm 46 3 735 3 737 0.0 0.32 2ins 100 8 779 8 801 0.3 0.79 5cyt 103 9070 9 060 0.1 0.67 2rhe 114 9 346 9 358 0.1 0.73 1121 130 11 638 11 628 0.1 0.94 3fxn 138 12 149 12180 0.3 0.99 26467 26449 0.1 2.63 3aPP 323 The cpu timings refer to the Gaussian method and include the computation of the first derivative.

neighbors of atom i for which k > i. Clearly the larger the value of E , the greater the number of neighbors each atom will have, such that in the limit of large E each atom will have N 1 neighbors. Once the neighbor list is constructed, each term in eq 18 is computed, but the product terms in a given sum are only evaluated for mutual neighbors. For example, in the second term in eq 18, only those products are evaluated for atom i that the Gaussian area and hard-sphere area is not as good as the involve the neighbors of atom i defined according to eq 40. agreement between the two volume terms. The Gaussian Similarly, the only products evaluated in the third term of eq parameterizations chosen for these calculations were (p = 2.70, 18 are those for which atom i is a neighbor to atom j and atom A = 1.5514) and (p = 2.60, A = 1.6111) for evaluation of the k, such that atom j is also a neighbor of atom k. All terms up Gaussian volume and Gaussian area, respectively. These values to order 6 are treated in a similar way. It is found that, for were chosen on the basis of the graphs obtained for the simple small E , the series (18) converges reasonably quickly and that, model systems. In principle, a more accurate method such as by using the neighbor list, to the number of product terms in a nonlinear least-squares procedure could be chosen for each summation is approximately linear with respect to the parameterizing the Gaussian basis to reproduce the hard-sphere number of atoms in the molecule, rather than combinatorial. results. Also, choosing a different parameterization of the The calculations that follow are intended to illustrate the Gaussian basis to compute area, from that used to compute the accuracy achieved using Gaussian functions with respect to the volume, makes only a small difference to the results, compared hard-sphere measure and the dependence of the result on to using the same Gaussian parameters for both the volume and different values of the Gaussian cutoff parameter ( E ) . area computations. These results were computed using the value A set of Gaussian volumes and areas has been calculated for E = 0.0 in eq 40 when assembling the neighbor lists, and all a number of amino acids, blocked with acetyl at the N terminus terms up to order 6 in eq 18 were evaluated. and N-methyl at the C terminus. Extended structures of each 3.3. Protein Calculations. The use of Gaussian functions of these blocked residues were generated (4 = ?/J = 180") using combined with the neighbor list algorithm described in the ECEPP/2.37,38The side-chain angles k's)were those of the previous section leads to the very efficient computation of most common rotamer state of each residue as given by Ponder and Richards.39 volume and area and associated nuclear coordinate derivatives. Tables 4 and 5 give the cpu times for an Indigo R3000 (using Tables 2 and 3 present the results for the Gaussian areas and volumes computed for these blocked residues and compare these the compilation directive f77-02) required to compute the quantities to their hard-sphere analogues. The hard-sphere areas Gaussian area and volume (and associated first derivative) for and volumes have been computed using our implementation of a number of proteins. The coordinates of these proteins were the Connolly analytical alg0rithm.'~9'~The radii used for these taken from the Brookhaven Protein Data Bank.40 These tables calculations are given in Table 1. To facilitate comparison, these also given the Brookhaven entry code and the number of radii were used for both the Gaussian and the hard-sphere residues in each protein. The calculations were carried out using models. These values are based on those used by C o n n 0 l l y , ~ ~ 3 ~ ~only the protein atoms, with all HETATM records ignored. The though the choice is arbitrary. The difference between the Gaussian parameterizations for volume and area and the value Gaussian and hard-sphere quantities is typically less than 1%, of E were those used for the blocked amino acids described and as found for the simpler models, the agreement between previously. (I

Gaussian Description of Molecular Shape

TABLE 5: Comparison between the Gaussian and Hard-Sphere Areas for a Few Protei& area, A3 protein no. of hard % cpu (Brookhavenentry) residues Gaussian sphere error time, s 1crn 46 4222 4288 1.5 0.33 2ins 100 9821 9907 0.9 0.79 10371 10528 1.5 0.67 103 5cYt 2rhe 10738 10858 1.1 0.74 114 llzl 13 194 13281 0.7 0.93 130 3fxn 13805 13957 1.1 0.98 138 323 30 112 30456 1.1 2.67 3aPP "The cpu timings refer to the Gaussian method and include the computation of the fist derivative. It can be seen that the calculations using the Gaussian method are all very fast; for example, the volume or area along with the associated analytical f i s t nuclear coordinate derivative of a 323 residue (2366 atom) protein, penicillopepsin (3app), takes less than 3 s using an Indigo R3000. Precise timings have not been given for our implementation of the Connolly analytical method because no effort has been made to optimize the performance of these codes and we have not coded all of the gradient terms for this algorithm. However, using our codes, the hard-sphere calculations of just the volume or area were all at least an order of magnitude slower than the Gaussian timings given, where the latter also include the computation of the first nuclear derivative. The Gaussian timings are comparable to the speed achieved when using the MSEED method2' to compute solvent-accessible areas. It can also be seen from the timings given in Tables 4 and 5 that, using the Gaussian description with E set to zero, there is no appreciable nonlinear degradation in performance as the number of atoms (N) is increased from 327 (crambidlcrn) to 2366 (penicillopepsid 3app). In contrast, the Connolly analytical algorithm scales approximately as N3. The convergence of the calculations with respect to the number of summations retained in the series used to compute the volume and area (eq 18) can be illustrated by considering the values of each of these terms. For example for penicillopepsin (3app), the first six terms in the volume summation are 46 304, 25 481, 6173, 543, 14, and 0.2 A3, when using E = 0.0. This demonstrates that the volume has converged to a more than satisfactory level at sixth order. This is found to be true for all of the calculations carried out for the blocked amino acids and proteins, for both the volume and the area. Clearly, the convergence depends on the choice of the 6 parameter, as does the speed of the calculation. For example, for penicillopepsin, the cpu times using cutoff parameters of 0.0, 0.5, 1.0, 1.5, and 2.0 are 2.7,8.3, 13.0, 120.0, and482.5 s, respectively. However, the results obtained for these protein calculations demonstrate that a rapid calculation of the Gaussian volume, area, and derivatives is possible using only a small neighbor list (small E parameter) that still retains much similarity to the analogous quantities obtained using hard-sphere atom representations. 4. Conclusions

The Gaussian description of molecular shape, using a generalized coalescence theorem, gives an elegant set of analytical formulas for molecular volumes, areas, and their nuclear coordinate derivatives. This formulation leads to an efficient algorithm for the simultaneous determination of all of these quantities. The Gaussian volumes and areas computed using our algorithm seem to be most closely related to the analogous quantities computed from the hard-sphere representation of the

J. Phys. Chem., Vol. 99, No. 11, I995 3509 van der Waals surface, discussed in the Introduction. In fact, it is rather obvious that there is no formal definition of a surface when using soft representations of matter distribution. Indeed, eq 16 emphasizes that only part of the volume is localized inside the formal sphere radii. On the other hand, the ease of differentiability of the Gaussian representation is matched by the ease of integration. We can use the Gaussian density (eq 18) to define the shape distribution quantities SI-; thus,

where 1, m,and n are integers. We can refer to these parameters as steric multipoles. The steric dipoles41 (l+m+n = 1) are the lowest order parameters, giving a means which describe the departure from sphericity of the shape distribution. It is rather obvious that the steric multipoles can be used to analyze the shapes of molecules, i.e., to answer such simple questions as how globular is a protein. They could also be used to evaluate the similarity between the shapes of related compounds. The single Gaussian representation of matter density gives a descriptor of shape which leads to accurate measures in the limit of weakly overlapping spheres. The results for strongly overlapping spheres are less good. This is a problem for the definition of the solvent-accessible surfaces described in the Introduction, because to define them in the present context one needs to increment the atomic radii by a factor representing the radius of water (typically 1.4 A). This leads to a collection of points with much larger radii but at the usual internuclear separations, Le., strongly overlapping spheres. The answer to the problem seems to lie in a multi-Gaussian representation of the single atom densities (eq 11); that is,

in which the coefficients P k are chosen to give improved results for strong overlap. A preliminary test of this method has led to such improvements for simple model systems. The soft Gaussian approach addresses many of the difficulties associated with hard-sphere methods. It is even possible to allow for the nonspherical nature of atoms inside molecules by using an elliptical Gaussian representation of the atom density, viz.,

This formulation retains all of the advantages of the spherical one, including a coalescence theorem and the analytical formulas which follow from it. This work has demonstrated that a description of molecular shape motivated by the technical advantages implicit in using Gaussian functions leads to a fast evaluation of area and volume quantities. For the purposes of this work, it has been shown that such area and volume quantities can reproduce those computed using a hard-sphere model. It should be emphasized that this is not necessarily the ultimate goal of a Gaussian description. Rather, it is hoped that the more realistic "softsphere" description of molecules will ultimately improve empirical models currently reliant on hard-sphere methodology (for example, molecular shape comparison or simple continuum solvent models). Simple expressions for the analytical nuclear coordinate derivatives have also been given and require only negligible computational overhead once the volume and area have been computed. The technical advantages of using Gaussian functions also suggest methods to characterize mo-

3510 J. Phys. Chem., Vol. 99, No. 11, 1995

lecular shape by the calculation of steric multipoles and account for anisotropic shapes of atoms in molecules. We are currently investigating both of these possibilities.

Acknowledgment. We thank Dr Jaroslaw Kostrowicki for very valuable discussions concerning the use of Gaussian functions to represent the unit step function. We also thank Brian Masek for useful and stimulating discussions concemhg the shape of molecules. Supplementary Material Available: Figures 3-7, showing Gaussian areas and volumes vs triangle, square, and hexagon edge lengths (7 pages). Ordering information is given on any current masthead page. References and Notes (1) Eisenberg, D.; McLachlan, A. D. Nature 1986,319,199. (2) Ooi, T.; Oobatake, M.; Nemethy, G.; Scheraga, H. A. Proc. Natl. Acad. Sci. U.SA. 1987,84, 3086. (3) Kang, Y. K.; Nemethy, G.; Scheraga, H. A. J. Phys. Chem. 1987, 91,4105. (4) Vila, J.; Williams, R. L.; Vasquez, M.; Scheraga, H. A. Proteins 1991,10, 199. (5) Wesson, L.; Eisenberg, D. Protein Sci. 1992,1, 227. (6) Abraham, M. H.; Grellier, P.L.;Abboud, J. M.; Doherty, R. M.; Taft, R. W. Can. J. Chem. 1988,66,2673. (7) Abraham, M. H. Chem. SOC.Rev. 1993,22,73. (8) Nicholls, A.; Honig, B. J. Comput. Chem. 1991,12,435. (9) Sharp, K.; Honig, B. Annu. Rev. Biophys. Chem. 1990,19,301. (10) Johnson, M. A.; Maggiora, G. M. Concepts and Applications of Molecular Similarity; Wiley: New York, 1990. (1 1) Masek, B. B.; Merchant, A.; Matthew, J. B. J. Med. Chem. 1993, 36, 1230. (12) Masek, B. B.; Merchant, A.; Matthew, J. B. Proreins 1993,17, 193. (13) Mezey, P. G. Shape in Chemistry; VCH: New York, 1993. (14) Lee, B.; Richards, F. M. J. Mol. Biol. 1971, 55, 379.

Grant and Pickup (15) Richards, F. M. Annu. Rev. Biophys. Bioeng. 1977,6, 151. (16) Connolly, M. L. Science 1983,221,709. (17) Connolly, M. L. J. Appl. Crystallogr. 1983,16,548. (18) Connolly, M. L. J. Am. Chem. SOC. 1985,107,1118. (19) Richmond, T. J. J. Mol. Biol. 1984,178,63. (20) Perrot, G.; Maigret, B. J. Mol. Graphics 1990,8, 141. (21) Perrot, G.; Cheng, B.; Gibson, K. D.; Vila, J.; Palmer, K. A.; Nayeem, A.; Maigret, B.; Scheraga, H. A. J. Comput. Chem. 1992,13,1. (22) Wawak, R. J.; Gibson, K. D.; Scheraga, H. A. J . Math. Chem, in press. (23) von Freyberg, B.; Richmond, T. J.; Braun, W. J. Comput. Chem. 1993,14,510. (24) von Freyberg, B.; Richmond, T.J.; Braun, W. J. Mol. Biol. 1993, 233. 275.

(25) Gibson, K. D.; Scheraga, H. A. Mol. Phys. 1987,62,1247. (26) Petitjean, M. J. Mol. Biol. 1994,15,507. (27) Duane, P.; Arteca, G. A.; Mezey, P. G. J. Comput. Chem. 1991, 12,220. (28) Good, A. C.; Hodgkin, E. E.; Richards, W. G. J. Chem.In5 Comput. Sci. 1992,32, 188. (29) Good, A. C.; Richards, W. G. J. Chem. Inf Comput. Sei. 1993,33, 112. (30) Walker, P. D.; Mezey, P. G. J. Am. Chem. SOC. 1993,115,12423. (31) Walker, P. D.; Mezey, P. G. J. Am. Chem. Soc., in press. (32) Nauchitel, V. V.; Somojai, R. L. Proteins 1993,15,50. (33) Stouten, P. F. W.; Frommel, C.; Nakamura, H.; Sander, C. Molec. Simul. 1993,10,97. (34) Connolly, M. L. J. Math. Chem. 1994,15,339. (35) Boys, S.F. Proc. R. SOC.London 1950,A200, 542. (36) Kratky, K. W. J. Statist. Phys. 1981,25,619. (37) Momany, F. A,; McGuire, R. F.; Burgess, A. W.; Scheraga, H. A. J . Phys. Chem. 1975,79,2361. (38) Nemethy, G.; Pottle, M. S.; Scheraga, H. A. J. Phys. Chem. 1983, 87, 1883. (39) Ponder, J. W.; Richards, F. M. J. Mol. Biol. 1987,193,775. (40) Bemstein, F. C.; Koetzle, T. F.; Williams, G. J. B.; Meyer, E. F., Jr.; Brice, M. D.; Rodgers, J. R.; Kennard, 0.; Shimanouchi, T.; Tasumi, M. J. Mol. Biol. 1977,112,535. (41) Petrov, A. G.; Derzhanski, A. Mol. Cryst. Liquid Cryst. 1987,157, 303. Jp9427488