J . Phys. Chem. 1987, 91, 6341-6343 molecules which absorb a quantum of X < 392.8 nm, the energy equivalent to a vibrationally excited N02(v2,v=1) which absorbs a 404.7-nm quantum, dissociate with @ I > 0.95.4 From the estimates of the cross sections for the vibrationless and vibrationally excited NO2 molecules, we can estimate the fraction of the total 404.7-nm light-absorbing molecules which are in the vibrationally excited states at each temperature (&b) using the following relation: 4vib
= Uv(1 - XO)/[(l - XO)uv
+ XOJ'Ol
These fractions (I#J,& shown in column 4 of Table I, may in theory contribute directly to the observed 41 values. In addition, the remaining fraction of the light-absorbing molecules (1 - )& ,$ which have rotational energies greater than 422 cm-', given by may contribute (column 5, Table I). The theoretical values for (equal to 4vib &,J may be compared with the experimental values for 41in Table I, columns 6 and 7, respectively. It is clear that there is reasonable agreement between the theory and experiment for data over a wide temperature range. The results from the lowest (223 K) and highest temperatures (566 K) which show the largest differences are least reliable because of experimental difficulties. Davenport's measurements of $1 at 223 K8
+
6341
were very difficult and probably inaccurate; the results of Pitts et al.' at 566 K required a significant correction for the thermal decomposition of NO2.
Conclusion We conclude from the present considerations that vibrationally excited NO2 molecules absorb 404.7-nm light more strongly than vibrationally unexcited NO2, and the photodecomposition of these molecules can account for the "missing" fraction of 404.7-nm excited NO2 molecules which had previously appeared to possess insufficient electronic plus internal energy to decompose.
Acknowledgment. This work was supported in large part through project W16042 of the Upper Atmosphere Research Program, Earth Science and Applications Division, The National Aeronautics and Space Administration. Support for one of the authors (S.M.) was provided under interagency agreement DW930144-01-1 by the U S . Environmental Protection Agency and another (C.A.C.) under contract AP-13 from the Coordinating Research Council. The authors are grateful to Prof. Kyle Bayes for useful discussions related to this work. (8) Davenport, J. E. Final Report, FAA-Eq-78-14, High Altitude Program, Federal Aviation Administration, Office of Environmental Quality, Washington, DC, 1978.
Why Energy Embedding Works G . M. Crippen College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109 (Received: September 17, 1987)
Energy embedding is a numerical technique for locating conformations of molecules corresponding to very good local minima of the potential function used, which generally has involved only atom-pair interactions. Computational experience with various molecular force fields has shown it has a remarkable talent for locating either the global energy minimum or nearly equally favorable minima, thus bypassing numerous uninteresting high-energy minima. Here it is shown that the algorithm is indeed well-defined in general, that it will directly converge to the global minimum of energy given certain reasonable properties of the potential function, and that it can be successfully applied to any standard molecular mechanics force field, such as AMBER, which includes bond angle bending and intrinsic torsional terms in addition to atom-pair interactions.
Introduction The basic premise of molecular mechanics is that given an adequate approximation to the internal energy of a molecule as a function of conformation, the physically significant conformations are those which are near the lowest local minima of energy. The problem is that even for molecules having only a few rotatable bonds, there are numerous uninteresting minima lying well above the global minimum. One can either search exhaustively over all possible conformations or initiate local energy minimizations from many starting points. Neither approach is feasible for molecules having more than about six rotatable bonds.' The problem is particularly acute for predicting the conformation of globular proteins, due to their very many degrees of conformational freedom. A promising way around the problem has been "energy e m b e d d i ~ ~ g "an , ~outgrowth ,~ of the distance geometry approach to conformational a n a l y ~ i s . ~The energy embedding algorithm basically consists of locating a very low energy conformation for the molecule in a high-dimensional space (R",where each atom has n Cartesian coordinates) and then keeping the energy minimal subject to the gradually increasing constraint that the molecule (1) Crippen, G.M.J . Comput. Phys. 1975, 18, 224-231. (2) Crippen, G.M.J . Comput. Chem. 1982, 3, 471-476. (3) Crippen, G.M.Biopolymers 1982, 21, 1933-1943. (4) Crippen, G.M.Distance Geometry and Conformational Calculations; Wiley: New York, 1981;Chemometrics Res. Stud. Ser. No. 1.
0022-3654/87/2091-6341$01.50/0
lie in R3.To put it more picturesquely, the molecule first is allowed to pucker up into the additional dimensions to produce a generally rather compact ball having very low energy, since the atoms have more freedom to satisfy all the energy terms they are involved in. Then the molecule is gradually flattened back into three dimensions, allowing the atoms to reshuffle between the jaws of the vice (so as to maintain as low an energy as they can) after every turn of the vice handle. The exact details of how this is done are apparently rather unimportant, because variations on the original method also work ~ a t i s f a c t o r i l y . ~There ~ ~ are two questions to be addressed: why does this procedure work at all, and will it work on the sorts of force fields which are widely used in molecular mechanics programs? The answer to the first question can best be seen by examining the simple examples in the next section, and the subsequent section shows indeed how well it works with the AMBER molecular mechanisms program.
Simple Examples To begin with, consider a situation so simple that it can easily be displayed in two-dimensional diagrams. Suppose we have three atoms in the plane R2,with a l fixed at (O,O), u3 fixed at (2,0), ( 5 ) Crippen, G. M.; Ponnuswamy, P. K. J . Comput. Chem. 1987, 8, 972-981. (6) Purisima, E.0.;Scheraga, H. A. h o c . Natl. Acad. Sci. U.S.A. 1986, 83,2782-2786.
0 1987 American Chemical Society
6342
The Journal of Physical Chemistry, Vol. 91, No. 25, 1987
Figure 1. Energy surface diagram for the simple case of two fixed atoms, a, and u3 (open circles), and one free to move in the plane, u2 (not shown). They axis is marked in terms of the dimensionality parameter, s = y 2 . The maxima, minima, and saddle points are indicated as solid dots and labeled as explained in the text. The curved lines indicate channels and ridges in the surface.
and a movable atom a2 located at (x,y). Let the conformational energy function be where d,, is the ordinary Euclidean distance between atoms i and j . Clearly, E ( x , y ) = E(x,-y), so we shall discuss only the y I
0 case. There are four critical points: pI = (1,81/2)with E(pl) = 0, p2 = (3.80179,O) with E(p2) = 95.9, p3 = (-1.2288,O) with E(p3)= 60.2, and p4 = (1.4270,O) with E(p4) = 198.9. (See Figure 1.) In the half-plane, pl is the global minimum, where both bond stretching terms of eq 1 are completely satisfied. In either R2 or R' (restricting the conformation space t o y = 0), p4 is a maximum. Let s = 3 be a measure of how far the conformation is from being embeddable in R'. Then pz and p3 are local minima in R' (where s = 0) which are saddle points in R2 because the energy decreases as s increases. From the one-dimensional point of view, both the stretching terms of eq 1 cannot be satisfied simultaneously, and since the second one is weighted more heavily, p3 is the better compromise. As shown in Figure 1, energy embedding locates the global minimum in R1 not by searching along the x axis but by first miminmizing to the unique minimum in R2, namely pl, where s = 8. There the initial direction of "gentlest ascent" is (4.99,-0.13), corresponding to the eigenvector for the smallest (positive) eigenvalue of the Hessian at pl. Only after s C 5 is there a second channel leading toward pz. Another way to look at this is with cross sections of the energy surface as shown in Figure 2. Going in the opposite order from energy embedding, as s increases from 0 toward 8, the sections become simpler as the minimum corresponding to p2 approaches the maximum corresponding to p4, eventually annihilating each other to form a point of inflection at s = 5, which subsequently disappears. In general, conformational calculations become simpler in hyperspace. In the foregoing extremely simple example, energy embedding located the global minimum in R], but in practice, this does not always occur. One thing that can go wrong is that the high-dimensional minimum may not be unique, so that there will be various paths leading to different conformations in R3. If the potential function consists solely of terms depending on the distance between a pair of atoms and if for each atom pair there is a unique optimal separation, and if those optimal separations are embeddable in RN-' (for N atoms), then there will be a unique energy minimum in RN-'. The embeddability condition is actually rather mild and can be easily satisfied as long as all the optimal separations are of comparable magnitudeP A second source of difficulties is that numerical computer programs used to locate the high-dimensional minimum and follow the subsequent channel back to R 3 may not track accurately enough, so that inferior channels may be found. For example, in Figure 2, at s = 5 there is a rather small energy difference between the correct channel on the left and the beginnings of the incorrect one on the right. The third and most serious consideration, however, is that energy embedding can be fundamentally fooled by the potential function. Suppose that instead of eq 1 we use
Letters
As in Figure 1, pl has the same location, where the Hessian has an eigenvector (-0.998,0.057) corresponding to its smallest eigenvalue, which points the way to the left channel, eventually arriving at p3, now located at x = -1 and having E = -1. In fact, the global minimum in R' is p2, located at x = 3, having E = -1.43. Energy embedding initially favored the second term of eq 2, the sharper Gaussian, not realizing that in the long haul the first term is more important energetically. In general, energy embedding requires that each energy term have a (positive) second derivative at its optimum which has a positive correlation with its energetic depth. AMBER Energy embedding has been used with special purpose protein folding potential^^^^ and a slightly modified version of ECEPP (a force field for peptides assuming rigid valence geometry),6 but the question remained whether it would work with more general, flexible geometry force fields. AMBER version 3.0 is a popular potential function employing some united atoms, such as -CH3 and -CH2-, with the Cartesian coordinates of the atoms as variables.' First, the programs were converted to deal with more than three dimensions. Some technical details of that are covered in Appendix 2. The venerable example of N-acetyl-L-alanineN'-methylamide was chosen. There are only 12 united atoms in CH3CONHCaH(CH3)CONHCH3because only the amide hydrogens are represented explicitly. An exhaustive search over its +J. map in 10' intervals (holding bond lengths and angles at their equilibrium values, and peptide bonds in the trans conformation) revealed the usual features,* but energy minimization with respect to all atomic coordinates in R3 starting at the seven minima on the map led to only five local minima: (+,$,E) = 1 (-79.67', 70.53', -42.41 kcal) (extended conformation; global minimum in R3), 2 (-137.04', 140.92', -39.09 kcal) (extended), 3 (-72.84', -10.29', -38.34 kcal) (right-handed a helix), 4 (-160.32', -52.36', -36.40 kcal), and 5 (68.79', -58.62', -41.40 kcal). Subsequent energy minimization in R1l starting from each of these lead to precisely the same high-dimensional minimum, 6, where E = -43.69 kcal. Of course, the five different coordinate sets differ among themselves by rigid rotations and translations in R", but the corresponding interatomic distances are essentially identical. As long as the potential function consists solely of additive pairwise interactions, each term of which has a single minimum as a function of interatomic distance, and the optimal separations obey certain modest inequalities, then there must be a unique minimum in R". In this case, however, there are two- and threefold intrinsic torsional potentials, so it is especially gratifying to see a unique minimum nonetheless. Compared to conformer 1, the bond stretching, bond angle bending, intrinsic torsional, nonbonded, electrostatic, and hydrogen-bonding contributions of 6 are very similar; only the 1-4 nonbonded interactions are better by about 1 kcal. Local minimization from 5 leads to a set of coordinates for 6 which are particularly close in terms of s to the starting point in R3, yet energy embedding returns in this case from 6 to 1, the global minimum in R3. Conclusions Instead of being restricted to energy functions that include strictly atom-pair distance-dependent terms, energy embedding has been shown to work on much more general potential functions, such as AMBER, which include bond angle bending and intrinsic torsional terms. Although even the small peptide example involved manipulating 132 coordinates in 11-dimensional space, it behaved in exactly the same way as the two-variable case we were able to exhaustively examine. There is a unique high-dimensional (7) Weiner, S. J.; Kollman, P. A,; Case, D. A,; Singh, U. C.; Ghio, C.; Alagona, G.; Profeta, S.;Weiner, P. J . Am. Chem. SOC.1984, 106, 765. (8) Crippen, G. M.; Scheraga, H. A. Pror. Natl. Acad. Sci. U S A . 1969, 64, 42-49
The Journal of Physical Chemistry, Vol. 91, No. 25, 1987
Letters
6343
N
c x; i=3na-5
L(s) = {XI s=o
(5)
5 SJ
is also convex. To see this, suppose we choose any two conformations, x1 and x2, on the boundary of L(s), i.e. Exl? = Ex2? = s. Then by the definition of a convex set, we must show x3 = axl (1 - a)x2 E L(s) for any 0 5 a 5 1.
+
c N
i=3n,-5
= (1 - 2a
x3; =
C(ax1i
+ (1 - a ) X 2 i ) 2
+ 2a2)s + a( 1 - ( Y ) E X l i X 2 i I(1 - 2a + 2a2)s + a(1 - a ) [ Z x 1 ~ ] ~ / 2 [ C x 2 ~ ] ~ / 2 =(1-a+a2)s~sforOSa5 I
where the inequality in sums is from the Cauchy-Schwartz ineL(s* - 6s) quality. Since both K and L are convex, then K is also convex, where s* Ex*; and 6s > 0 is chosen small enough so that the intersection is nonempty. Now the minimization of the convex function E restricted to the convex region K L always leads to a unique minimum. Therefore, as s decreases from its initial value s*, there will be a unique path of minimal E. Of course, eventually as s 0 the argument breaks down as E ceases to be convex or K n L becomes empty.
n
n
Figure 2. One-dimensionalcross sections of the energy surface in Figure 1 taken at the s values indicated. Note how as s increases from 0 to 5
the Center maximum and the right minimum approach and finally cancel each other. 1
R
Figure 3. Definition of vectors used in calculating the dihedral angle @
using dot products.
-
Appendix 2. Torsions Generalizing a force field to an arbitrary number of dimensions presents no problem for simple distance-dependent terms, such as bond stretching, nonbonded interactions, and electrostatic interactions, because for n dimensions, the distance is simply
Bond angle bending can always be expressed in terms of cos 0, where 0 is the i-j-k angle defined by three atoms, and ++ n d i j d k j COS 6 = ji-jk = E ( X i i - X j r ) ( X k , - Xi,) (8) I= 1
minimum because the barriers and multiple local minima we generally struggle with are artifacts of being in three dimensions which erase each other as the dimensionality requirement is relaxed. Then gradually reimposing the requirement for three dimensions forces the molecule along a path of gently ascending energy, eventually leading to the global minimum in three dimensions for favorably constructed potential functions or otherwise to at least a very good energy minimum. Even though the parameters of AMBER had been adjusted to give good local agreement with experimentally determined structures in three dimensions, there is reason to hope that the more global viewpoint of energy embedding will show there are no radically different conformations preferred by the force field.
Acknowledgment. This work was supported by N I H Grant GM37123 and N S F Grants DMB-8596009 and DMB-8705006. Appendix 1. Unique Path One can rigorously argue that there must be a t least at first, a unique path that energy embedding follows as the energy rises from the initial high-dimensional minimum. Let the molecule have n, atoms and altogether N variable Cartesian coordinates arranged in a vector x, such that the desired final coordinates are components xi ( i = 1, ..., 3n, - 6). Sufficient atomic coordinates are excluded from consideration so that trivial rigid motions of the molecule are eliminated. As usual, we locate the unconstrained (probably unique), energy minimum, x* E RN. Since the energy E is at a minimum, we can approximate E(x+x*) = E(x*) Y2xTHx (3)
+
for llxll small, where H is the positive definite Hessian matrix. Therefore, E is a convex function near x*, and hence K = (xlE(x+x*) < E(x*) + 6EJ (4) is a convex set for small 6E > 0. The set of all conformations within s > 0 of being three-dimensional
However, the dihedral angle 4 for intrinsic torsional terms is generally calculated via cross products, which do not generalize very well to R”. Referring to Figure 3 for the dihedral angle 4 defined by the four atoms i-j-k-I, we can calculate ir
a = -kl
+
[
++
kbkj + ,]kj kj-kj
(9)
and
and then cos
=
a.b (Pa) lI2( b-b)l j 2
The torsional terms in AMBER 3.0 can always be expressed as a function of cos (n4 +), where the phase angle is either 0 or a and n = 1, 2, 3, 4, 6. With appropriate trigonometric identities, such expressions can always be written in terms of cos 4, so although eq 11 does not give 4 unambiguously, it is sufficient for our purposes. Calculating analytical first and second derivatives with respect to some coordinate x of one of the four atoms is actually not too hard because
+
+
~
--
-3
kj-kj and similarly for a-a and b-b. Then aa-b aa-a ab-b - (a-b)(b.b)- (a.b)(a.a)a--cos 4 - 2-(a-a)(b.b) ax ax ax ax 2(a-a)3/2(b.b)3/2 (13) Second-derivative expressions are not actually all that much worse.