Visualizing the Variation Principle An Intuitive Approach to Interpreting the Theorem in Geometric Terms Robert Parson Department of Chemistry and Biochemistry, University of Colorado, and Joint lnstitute for Laboratory Astrophysics, University of Colorado and National lnstitute of Standards and Technology, Boulder, CO 80309-0440 One can hardly overstate the importance of the variation principle for quantum chemistry. Unlike many approximation methods, it does not rely on expansions in a small parameter. so it can be used to studv stronelv interactine svstems, shch a s electrons in atoms an&"moleculesr~he variation principle provides a foundation for both ab initio (1)and semiempirical (2) electronic structure calculations, and it is becoming increasingly popular for studying strongly coupled vibrations and rotations in highly excited molecules (3). The principle is usually at least mentioned in the typical oneyear physical chemistry course (4, 5). Then it is usually covered in more detail in introductory quantum chemi s t courses ~ (6).such as those taueht as oart of a three-semester physiiai chemistry sequence a t the University of Colorado. Specialized courses for advanced undergraduates or beginning graduate students (7,s) also usually include a proof of the simplest version of the theorem: the minimum principle for the ground state. This result surprises some students when they first see it, but the proof is kery appealing once they have become cbmfortabie with the state-vector representation of quantum mechanics. Extension of the Variation Principle to Excited States Quantum chemistry textbooks usually mention, but rarely prove, the extension of the variation principle to excited states: If one chooses a trial wavefunction that is a linear combination of basis functions. and minimizes the evoedation value of the Hamiltmian with resperT to the coefficients of the linear cumbnxatlon hy diagonalizing the resulting secular matrix. then the lowest eigenvalue of rhls matrix is nn upper bound to the ground-state energy, and the remaining eigenvalues provide upper bounds to the energies of the excited states. ~~
~~
.
~~~~
~~~~~
~~~
~
~~
If the eigenvalues of the matrix are put in increasing order, thenthe nth eigenvalue must lie above the exact energy of the nth excited state. This result allows one to apply the methods of computational quantum chemistry to excited electronic states; it is also the foundation for the variational approach to molecular vibrations and rotations. Oddly enough, this Yextendedvariation principle" is rarely discussed in quantum physics textbooks, which often leave the impression that one can only obtain an upper bound to the first excited-state eigenvalue by working with trial functions that are orthogonal to the exact ground state. Indeed, Peierls (9) includes the extended variation ~ r i n c i ~ (for l e which he eives a oarticularlv elegant proofi in his li& of "surprisesin thedretical ph&x". The result has, of course, been known for many years. Chemists and physicists usually refer to Hylleraas and Undheim (10) or to MacDonald (111, although the theorem was f r s t established by Courant (12, 13).An equivalent statement is referred to in the mathematics literature as the "Courant-Fischer Theorem" (14-15). The corresponding algebraic property of finite-dimensional matrices (15)
is older yet, going back at least as far as Hermite (16),who recognized that eigenvalues of the minors of a hermitian matrix are interleaved between the exact eigenvalues. Many of the older proofs are rather elaborate, but the result can be demonstrated very simply (1,9, 17) particularly in the finite-dimensional case. Nevertheless, it still surprises most people, who feel that if the trial wavefunclion is not orthngo&l to the ground state, it can "take advantnge"of this nonzero overlap to reduce its energy t)elow that of the first excited state. It is disturbing to find that our intuition is so defective. lnterpreting Linear Algebra in Geometric Terms In my first-year graduate course in mathematical methods for ohvsical chemistm I have tried to oromote an intuitive pi'cto"rial understankg of the variation principle. I use other fundamental theorems from linear algebra that emphasize the connection between symmetric matrices and homogeneous quadratic forms. The latter may be envisaged as ellipsoids (or hyperboloids) in N-dimensional Euclidean space. Finding the eigenvectors of a positivedefmite symmetric matrix is equivalent to determining the principal axes of the correswndina ellipsoid. This point of hew, Ghich brings out the rkxim&&inimum of eigenvalues very clearly, is of course very old. I t was this geometric problem that originally led Euler (18)and Cauchy (19) to formulate the algebraic eigenvalue problem. (A useful historical survey and guide to the literature can be found in refs 20 and 21.) Nevertheless, I have found that most chemistry majors and beginning graduate students are not accustomed to interpreting linear algebra in geometric terms. All too often, they remember their undermaduate linear algebra course as collection of techniquesfor manipulating matrices and solving systems of linear equations. Physics students are in a bittkr position because their undergraduate courses introduce the concept of principal axes through such physical examples as the polarizability r221and moment-of-inertia (23, tensors. Some phys~calchemists learn about these ideas in a graduate-ievel course on molecular spectroscopy, but this comes quite late in their education.
a
Translating the variation Principle into Three-DimensionalEuclidean Geometry In this oaoer I will show the a ~ ~ eofathe l eeometric interpretagoiby using it to demon&ate the simple and extended forms of the variation ~rinciple.Translated into three-dimensional ~ u c l i d e a n ~ ~ e o m ~this t r y principle , states If one rhces an ellipsoid whore pnnc~palaxcs haw IengthsA
> B > C w ~ t ah plane passlng through thecenter, then the pnncrpal axes a w d h of the rcsultmp elllpsr (-,here o > b are bounded by the inequalitiesA Z a ,B 2 b
This observation is not new; it can be found in the books by Courant and Hilbert (13)(as a footnote) and by Arnold (14). Volume 70 Number 2 February 1993
115
However, few undergraduate and beginning graduate students in chemistry are likely to consult such texts. If they did they would probably have trouble extracting the essential points from the unfamiliar mathematical language. Thus, I present these ideas in a less formal style, augmented with graphical illustrations, using the simplest algebraic treatments of the principle (1,9). Algebraic and Geometric Formulations
of the Variation Principle Algebraic Considerations
In its simplest form, the variation principle asserts that The ground-state wave function of a quantum-mechanical system minimizes the expectation value of the system's Hamiltonian. Far any normalized trial wave function v, ( yr I H yr) can never he less than the true ground-stateenergy. If we construct a family of normalized trial wave functions depending upon N parameters c., then that member of the family which minimizes ( v H v )provides the best estimate of the ground-state energy that can be obtained within this family. We usually expect that i t also provides an optimal estimate of the ground-state wavefunction, but other wavefunctions are sometimes better when calculating specific molecular properties. Estimating the Ground-State Energy and Wavefunction A common way to implement this construction is to express the trial wavefunction as a linear combination of basis functions, which for wnvenience of illustration we take to be normalized.
I n these terms the expectation value becomes a quadratic form in the parameters c.
where ~.,,=(n~kln') is a n N x N hermitian matrix. By minimizing E(c) subject to the constraint that the trial wave function be normalized, we are led to consider the eigenvalues and eigenvectors of H. The lowest eigenvalue is the optimal estimate of the ground-state energy, while the corresponding eigenvector provides us with an approximate ground-state wavefunction. Moreover, the extended variational principle tells us that the remaining eigenvalues provide upper bounds for the excited-state energies. This principle thus provides a foundation for the intuitive procedure in which one expresses operators and wavefunctions in a discrete basis and then truncates that basis. Visualizing the Algebra Using Euclidean Geometry Let us now express these algebraic considerations in geometric language. For ease of visualization we assume that the eigenfunctions and basis functions are real. This is always possible in the absence of spin-rbit coupling or external magnetic fields. Then each normalized set of coefficients en specifies a unit vector, that is, a direction in an N-dimensional real vector space, and eq 1prescribes the expectation value E as a quadratic function of the direction of the vectar c. The surfaces of constant E-the 'level surfaces" of the quadratic form E(c)-are higher-dimensional analogs of
116
Journal of Chemical Education
conic sections. If we choose our energy zero so that all eieenvalues are wsitive. H becomes a oositive definite mat c x and the surkices b&ome~-dimensionalellipsoids. With each eili~alidis associated a set ofNcharacteristic directions, the principal axes, whose orientations with respect to some space-fixed coordinate system are determined by the eigenvectors of the matrix H. The longest principal axis defines the direction in which E increases most slowly. Thus, it determines the minimum value of E that can be obtained by varying the direction of vector c. Similarly, the shortest axis is determined by the eigenvector belonging to the highesblying eigenvalue, while the intermediate axes and eigenvectors obey "minima2 principles (12-15):
-
The value of E is stationam when c ~ointaalone one of these axes,hut it increases as one moves further away in some directions, while deereasing in others. ~
~~~
Each eigenvalue is inversely proportional to the square of the length of the corresponding principal axis. (From this one mav infer the well-known result that first-order errors i n Ggenvectors yield second-order errors in eieenvalues.) The unitarv transformation that exmesses the eigenvectors of H in the basis @ ) corresponds 'to a rotation in N-dimensional Euclidean space from a spacefixed coordinate frame to a frame defined by the principal axes of the ellipsoid. Assuming a ~inite-~imensional Space This construction is based on a finite-dimensional space, but the dimensionN may be arbitrarily large so we expect the picture to be valid in the infinite-dimensional case so long a s our basis $I ) i s complete. After all, the very defmition of a complete basis relies upon convergence in the mean of vector norms, that is, upon the assumption that the infinite-dimensional space may be approximated by a finite-dimensional one (13).Naturally, systems with a continuous spectrum must be treated separately, I n fact, Courant (12,131showed that minimax properties may be used to define eigenvalues and eigenfunctions: If one allows the space of "trial" wavefunctionsto encompass all admissible ones, the variation principle may actually replace the differentialor integral eigenvalue equation as a fundamental concept. In the remainder of this paper we shall regard the full state space as having a finite dimension N. We rely on the mathematicians to make the technically difficult extrapolation N +m. This is consistent with the level of rigor usually adopted in teaching quantum chemistry a t the firstyear graduate level. Dual Representation Above we gave a geometric statement of the variation principle, which can be summarized a s Equation 1associates with any positive definite symmetric matrix an ellipsoid EN whose principal axes are the eigenvectors of the matrix. The longest axis of the ellipsoid lies in the direction of the eigenvector corresponding to the smallest eigenvalue, so this direction minimizes the quadratic form E for any normalized veetor c. Put another way, Among all vectors that reach from the origin ro a surface of constant E, the vecror along this a x r ~(the longest principal a m of the ellipsoid, is the longest.
In the remainder of this paper we shall adopt this "dual" representation, in which we draw vectors to a fixed E surface and ask for their lengths. c i s is more visually appeal-
Figure 3. The elliptical cross sections that result when the ellipsoid In Figure 1 is sliced at various angles. In this figure,9 is held fixed at O'. Moving outward from the center, the ellipses correspond to 9 = V, 30', 6O', and 90'. whose principal axes are found by diagonalizing H restricted to the subspace. Because EM is a "slice" taken out of EN, the longest principal axis of EM cannot be longer than the longest principal axis of EN. Less obviously, the extended variation principle assures us that The second-longest principal axis of EM cannot be longer than the corresponding axis of EN. Asimilar statement can be made for the remaining axes. Two- and Three-Dimensional Examples
Figure 1. Slicing a triaxial ellipsoid (axes in ratios 123)with a plane through the origin gives an elliptical cross section. The longest principal axis of the ellipse is shorter than the longest principal axis ofthe ellipsoid, while the shortest axis of the ellipse is shorter than the intermediate principal axis of the ellipsoid. ing than holding the length of the vector fixed and asking for the value of E . When we select a trial function depending linearly upon a set of M parameters, with M < N,we are exploring an M-dimensional subspace of the N-dimensional state space; in this subspace we have an M-dimensional ellipsoid EM
Having posed the problem in N dimensions, let us now for ease of visualization retreat to two- and three-dimensional examples. This will, in particular, help us to visualize the extended variation principle. We suppose that our com~letestate mace is three-dimensional (N = 3). and we approximate it byconsidering normalized vectors in a twodimensional subspace. Geometrically, we are slicing a triaxial ellipsoid with a plane going through the center, obtaining an elliptical cross section (Figs. 1-3). Clearly, the longest principal axis of this ellipse (denoted here as the a axis) is closer to the longest axis (theA axis) of the ellipsoid than any other vector in the slicing plane.
The Bound on the Length of the b Axis The extended variational principle now tells us that the other axis of the ellipse (the b axis) is always shorter than the second principal axis of the ellipsoid (the B axis). Students whose three-dimensional spatial perception skills are exceptionally strong may be able to see this; for the rest of us a simple construction is helpful. We first note that any vector that is orthogonal to the a axis must be no longer than the B axis; this provides the trivial upper bound that is mentioned in many books. Now all vectors that are orthogonal to the A axis lie in a plane passing through the B and C axes. This plane intersects the slicing plane, in which the ellipse lies in a line. This line defines a vector that lies in the trial subspace and is orthogonal to the a axis. The trial subspace thus contains a vector that is no longer than B. However, the b axis of the ellipse is the shortest vector in the trial mace. so it too must be no longer than B. In essence, this'cons'truction is a geometric translation of the areuments eiven bv Peierls (9)and Szabo and Ostlund (17.Once itlhas be& grasped in the three-dimensional case, students can readily appreciate how its algebraic analog extends by induction to an arhitrary number of dimensions. Approximations for Higher Excited States
Figure 2. Same as Figure 1, except that the front half of the ellipsoid has been peeled away to show the cross section more clearly.
This nicture also h e l ~ us s understand whv the accuraoi of variational approximations tend to degrade for higher excited states. The ao~roximationto the loneest principal axis is found by opt&ization: All vectors in tde slicing plane are known to be shorter thanthe A axis, and the a Volume 70 Number 2 February 1993
117
axis provides the sharpest bound on the length of the A axis. Once the a axis has been found, however, the b axis is immediately determined by the requirement that i t be orthogonal to a. There is no opportunity to sharpen the bound by searching a family of vectors. In fact, because we know-that the slicing plane contains a vector that is ortho~onalto A, there is a set of vectors in this plane that do sharper bounds. (They lie between the b axis and the axis that is orthogonal to A). However, there is in general no way to find them without finding the exact A axis. The bound on the length of the b axis is thus expected to be weaker than that for the a axis. This idea readily extendx to higher dimensions: Each successive approximate eigenvector is determined by optimizing over a smaller set of oarameters. Thus. a small error in the mound-state eigenvector translates into a substantial error in the excited states. For example, when the energies of highly excited vibrational states are calculated, the top 50% or more of the computed eigenvalnes are usually discarded because the bounds on them are so poor as to be useless.
-
Graphical Illustration
Some care is needed to illustrate these results graphically, It is easy to set up intersecting ellipsoids and planes using the function ParametricPlot3D that is supplied with version 2 of Mathematiea (24).Bv choosinn various viewing orientations and lighting schekes, the student can develop some intuitive feeling for these geometric results. unfortunately, it i s difficult to orient the objects so that the lenmhs of the orincioal axes of both the elliose and the much of the dlpth percepellipso2 are obvidus. tion that a color monitor can provide is lost when these pictures are reproduced in black and white (Figs. 1and 2)).
oreo over,
Slicing the Ellipsoid at Various Angles I n Figures 3 and 4, therefore, we show the ellipses that are obtained when the ellipsoid is sliced a t various angles. To specify the orientation of the slicing plane with respect to the principal axes of the ellipsoid, we construct a system of axes (x, y, z) that initially coincide with the A, B, and C axes, respectively. The slicing plane as defmed is spanned by y and z and is thus orthogonal to x. We rotate the (x,y, z) system through a n Euler angle @ around z (that is, around C) and then tilt it throueh an Euler anele 9 around the newy axis that results fromthe first rotagon. Equivalentlv. the soherical coordinate aneles of the line normal to the s"Gcing plane are ( d 2 + 0, $). In Figure 3 we keep the z axis along c (9 = 0) while we increase @ from 0 (the bc plane) to d 2 (the AC plane). In Figure 4, in contrast, we keep $ fixed a t 0 while we tilt the plane through various angles from 0 = 0 (BC plane) to 0 = 4 2 (ABplane). Each elliotical cross section is disdaved with theshorter principai axis vertical. The details'of "the calculations used to orepare . . these fieures are eiven in the appendix.
-
-
Applications and Variations of the Geometric Analogy The geometric analogy provides a direct visual representation of a central result in quantum chemistry: the variation principle for ground and excited states. Many other theorems of linear algebra may be illustrated in this manner. Depicting Degeneracies
For example, a geometric representation can be used to help students easily understand one peculiar result: Any lnnenr combiiat~onof the e g e n v c a o n bclonpng to a degenerate eiganvalw is also an elgrnvecmr
118
Journal of Chemical Education
Figure4. Similar to Figure 3, but now $ is held fixed at O' while e takes on the values O',30', 60',75,and 90',again moving out from the center. Consider what happens when an ellipse, which has two unique principal axes, is deformed into a circle for which all axes are equivalent. This picture is well-known in molecular s~ectrosco~v. where