Newton-Raphson approaches and generalizations in

Danny L. Yeager, Diane Lynch, Jeffrey Nichols, Poul Joergensen, and Jeppe Olsen. J. Phys. Chem. , 1982, 86 (12), pp 2140–2153. DOI: 10.1021/j100209a...
2 downloads 0 Views 2MB Size
J. Phys. Chem. 1982, 86, 2140-2153

2140

TABLE VII: Total Energies (in hartree) for the C 3 x i State of He," R 2.9" 2.gb 3.2" 3.5b 3.86 4.1b 4.4b 4.7b 4.7c 5.OC

lo.oc a

E, -4.956 -4.967 -4.943 -4.955 -4.955 -4.955 -4.956 -4.957 -4.958 -4.959 -4.956

906 171 340 835 018 537 422 229 290 044 493

E, -5.080 578 -5.078 896 -5.077 050 -5.073 864 - 5.071 475 -5.071 899 - 5.073 656 -5.076 490 -5.073 301 - 5.074 640 -5.077 370

E3

- 5.066 652

-5.063 397 - 5.063 762 -5.058 174 -5.058 888 - 5.060 053 -5.061 511 - 5.062 853 - 5.063 995 - 5.065 376 -5.073 769

E*S - 5.064

574 -5.064 201 -5.061 286 -5.060 165 - 5.058 203 - 5.059 083 -5.061 274 - 5.064 475 -5.061 262 -5.063 7 7 1 -5.066 537

-5.071 727 - 5.067 353 -5.069 618 -5.063 560 -5.063 918 -5.064 882 -5.066 176 - 5.067 416 -5.068 890 - 5.071 499 -5.078 314

Model spaces are described in footnotes a-c in Table VI.

i.e., they cannot be separated into parts which are by themselves legitimate diagrams. The expansion is therefore size consistent. The method was applied to the c38,+ and C?8 states of Hez near an avoided crossing of the potentiaf curves. While nondegenerate MBPT diverges in this case and +

other degenerate theories require a 25-configuration model space with up to four-open-linediagrams, good convergence was obtained by using the present method in a threeconfiguration space with diagrams having no more than two open lines. Applications to other atomic and molecular systems are under way.

Newtondaphson Approaches and Generalizations in Multlconfigurational Self-Consistent Field Calculations Danny L. Yeager,' Dlane Lynch, Jenrey Nlchols, Department of Chemistry, Texas A&M Universw, College SiWion, Texas 77843

Poul Jerrgensen, and Jeppe Olsen Chemistry Department, Aarhus UniverMy, OK 8000, Aarhus C, Denmark (Received: July 13, 198 1)

We develop and review multiconfiguration Hartree-Fock procedures based on unitary exponential operators. When the energy functional is-expanded through second order in the orbital optimization operator, 2, and the state optimization operator, S, and the variational principle applied, the resulting set of coupled, linear inhomogeneous equations are known as the Newton-Raphson equations. These equations demonstrate quadratic convergence. Particularly when far from convergence, small eigenvalues of the Hessian may give extremely large step length amplitudes. We discuss two procedures, mode damping and mode controlling, which are used to reduce large step length amplitudes. With both procedures mode reversal is also used to assure that the proper number of negative eigenvalues is present for each iteration. New calculational results are given for the first excited lZg+ state of Cz and the 21A1 state of CH2. For an optimized MCSCF state the generalized Brillouin's theorem is satisfied. There may be other criteria which we want the MCSCF state to fullfil, e.g., the nth excited state of a certain symmetry have n negative eigenvalues of the Hessian. We discuss several such criteria, including that the MCTDHF not be unstable and that the MCTDHF should have n negative excitation energies for the nth excited state of a certain symmetry. Finally, generalizations of Newton-Raphson and the multiplicity independent Newton-Raphson (MINR) approaches are discussed. We show how fixed Hessian-type approaches will demonstrate quadratic, cubic, quartic, ... convergency. Calculations are presented for the E3Z; state of Oz. MINR approaches may be useful when small eigenvalues of the Hessian are present. Since the MINR formula involves the product of third-derivative matrices with vectors, explicit evaluation of the third-derivative matrices is avoided.

I. Introduction Recently, multiconfigurational self-consistent field (MCSCF) techniques have received considerable, renewed interest.'-14 This is in large part due to developments (1)A. C. Wahl and G. Das, "Modem Theoretical Chemistry",Vol. 3, H. F. Schaefer, 111, Ed., Plenum Press, New York, 1979, Chapter 3. (2)E. Dalgaard and P. Jsrgensen, J. Chem. Phys., 69,3833 (1978). (3)D. Yeager and P. Jsrgensen, J. Chem. Phys., 71,755 (1979). (4)E. Dalgaard, Chem. Phys. Lett., 65,559 (1979). (5)D. Yeager and P. Jsrgensen, Mol. Phys., 39,587 (1980). 0022-365418212086-2140$01.25/0

involving Newton-Raphson kchniques2-8and generalizations.13 These ~ r o c e d u r e s ~usually -'~ show superior con(6) D. Yeager, P. Albertsen, and P. Jsrgensen, J.Chem. Phys., 73,2811 (1980). (7)P.Jsrgensen, P. Albertsen, and D. Yeager, J. Chem. Phys., 72,6466 (1980). (8)C. C.J. Roothaan, J. Detrich, and D. G. Hopper, Int. J.Qwntum Chem. Symp., 13,93 (1979). (9)P.Siegbahn, A.Heiberg, B. Roos, and B. Levy, Phys. Scr., 21,323 (1980). (10)B. Lengsfield, 111, J. Chem. Phys., 73,2342 (1980).

0 1982 American Chemical Society

Multiconfigurational Self-Consistent Field Calculations

vergency over other techniques and will rapidly and reliably converge even to states for which other procedures would converge slowly or even diverge. It has been shown necessary in MCSCF in some cases to apply some type of constraint procedure, usually only when far from c o n v e r g e n ~ e . ~ ~Both J ~ - ~step ~ length and step direction may need to be monitored and controlled for critical modes. We will discuss some commonly used constraint procedures such as scaling, the use of an augmented Hessian,16and shifting denominators," as well as mode-damping5i6and mode-controllinglZtechniques. We report calculations and compare the latter two techniques. It has recently become a common practice to compare slightly different second-order and approximate-secondorder procedures with regard to efficiency and rate of ~onvergence.~*~,'J~J' Unfortunately, sometimes these comparisons are between two calculations which differ considerably with regard to basis set, starting point, etc.; e.g. Gaussians vs. STO basis sets, symmetry unrestricted Hartree-Fock vs. grand canonical Hartree-Fock starting point, and so 0n.17 Although qualitative properties of a certain technique can be learned by convergency studies, unless exactly the same starting point and basis set is used, detailed comparison of convergence studies can be somewhat misleading. This rather obvious point we explicitly demonstrate with MCSCF convergency studies of the first excited lZg+ state of Cz using a contracted Gaussian basis set. We demonstrate the efficiency of one-step complete second-order procedures and explicitly show again that convergency rate may often be improved slightly by not coupling with the CI states for an iteration or two when far from c~nvergence.~ Newton-Raphson approaches may be extended and generalized by examining higher order terms in the expansion of the energy.13J8 We show how quadratic, cubic, quartic, convergency may be obtained through fixed Hessian-type pro~edures.'~J~ These do not involve explicit calculation of higher derivative terms. Calculations are presented for convergence to the E32; state in 02.13Alternatively, other cubic approaches can be studied which involve direct calculation of the third-derivative supermatrix times a vector. The multiplicity independent Newton-Raphson approach exhibits quadratic convergence even when there are zero eigenvalues of the Hessian.13Jg Hence, the procedure may be particularly useful when there are small eigenvalues of the Hessian. (Small eigenvalues can exist in all iterations, even at convergence.) We derive formulas for the multiplicity independent Newton-Raphson approach and discuss explicitly how the effects of third-de-

...

(11)H.-J. Werner and W. Meyer, J . Chem. Phys., 73, 2342 (1980). (12)J. Olsen, P. Jsrgensen, and D. Yeager, J. Chem. Phys., 76,527 (1982). (13)P. Jsrgensen, J. Olsen, and D.Yeager, J . Chem. Phys., 75,5802 (1981). (14)R. Shepardand J. Simons, Znt.J.Quantum Chem. Symp., 14,211 (1980);R. Shepard and J. Simons, preprint; R. Shepard, I. Shavitt, and J. Simons, preprint. (15)V. A. Kuprievich and 0. V. Shramko,Znt. J. Quantum Chem., 9, 1009 (1975);6, 327 (1972). (16)(a) F. W. Bobrowicz and W. A. Goddard, 111, in "Methods of Electronic Structure",Vol. 3, H. F. Schaefer, 111, Ed., Plenum Press, New York, 1977,Chapter 4. (b) D.Yarkony, Chem. Phys. Lett.,77,634(1981). (c) R. Shepard in "Recent Developments and Applications of MCHF Methods", Proceedings of the National Resource for Computational Chemistry Workshop,Texas A&M University, College Station, TX, July 15-17, 1980. NRCC Proceeding, No. 10,Feb 1981,Lawrence Berkeley Laboratory. (17) H.-J. Werner and W. Meyer, J. Chem. Phys., 74,5794 (1981). (18)J. Olsen, P. Jsrgensen, and D.L. Yeager, J. Chem. Phys., submitted for publication. (19)C. E. Froberg, "Introduction to Numerical Analysis", 2nd ed, Addison-Wesley, Reading, MA, 1969.

The Journal of Physical Chemistry, Vol. 86, No. 12, 1982 2141

rivative terms may be incorporated. 11. T h e Newton-Raphson Multiconfigurational Hartree-Fock Procedure A. Unitary Transformations in the State and Orbital Space. The multiconfigurational Hartree-Fock (MCSCF) reference state 10) may be regarded as a member of the set of state

(u)}

10) = CI~g)C,

(1)

for which the coefficients C form a unitary matrix. The states I&,) are given as =

nrq

Il u,+Ivac) reg

(3)

where a: refer to an ordered product of creation operators. For convenience we consider real orbitals and expansion coefficients. A unitary transformation of the states b) may then be described as3y4 exp(i&P) = C ~ ' ) [ e x p ( - ~ ) ] = y jCtj')~yj

(4)

9 = ikC s k O ( ~ k ) ( -o lI O ) ( ~ I ) #O

(5)

j'

where

D = exp(-S) is a unitary matrix and S a real antisymmetric matrix with elements Skoand zero elsewhere. A unitary transformation of the orbitals may similarly be described 8,' = exp(iii)a,t exp(-iii)

(6)

ii = iCKrs(u,tas- astar)

(7)

where r>s

From eq 6 and 7 we get 8,t = Ca,+[exp(-~)],~ = Cas+Xsr S

(8)

S

where X = exp(-K) is unitary and K antisymmetric. The set of excitation operators (a,tus}and {lk)(Ol} may be linearly dependent. The elimination of redundant operators a',ta, for a specific reference state has been described in detail in ref 2 and 12. B. The Second-OrderMulticonfiguration Hurtree-Fock Approach. We will now determine stationary points on an energy hypersurface where variations in the orbitals and in the expansioz coefficients C are considered. A new reference state 10) may then be obtained by means of the unitary transformation

16)= exp(iii) exp(iS)IO)

(9)

The total energy then becomes E(?,@ = (01 exp(-iS) exp(-iii) H exp(iii) exp(iS)IO) = (OlHlO) - i(O[lj*+ii,H]10) + 1/2(0l[S,[H,S]]10) + ' / , ( O l [ ~ , [ ~ , ~ l l+~ O (ol[S,[H,alllo) ) + ... (10) A stationary point on the energy hypersurface is obtained when SE(K,S)= 0. We get SE(ii,S) = -i(Ol[S9+St,HllO) + (Ol[Ss,H,f?](O)+ (ol~?,H,nllo)+ ( O [ ~ , ~ H , ~ ~ + I l l(Ol[~~,[H,~IllO~ O) 3-

... = 0 (11)

where the double commutator is introduced. If third- and higher-order terms in eq 10 are neglected, eq 11 can be written in matrix notation as

2142

Yeager et al.

The Journal of Physical Chemistv, Vol. 86, No. 12, 1982

quantized form. The resulting matrix equations contain at most two-body density m a t r i c e ~ . ~ *Three~ ~ ~ ~and four-body density matrices do not appear and are never calculated either theoretically or computationally. This is well-known2*22and obvious provided one utilizes the anticommutation of the creation and anihilation operators

or

ai+aj + a,ai+ = aij

where /W\

/K

\

and where we have used eq 5 and 7 and introduced the notation Q,,' = a,ta,, r > s, and R,,+ = In)(Ol. y = (olEQallo) (13)

_v = (Ol[R,HllOO)

(14)

Equation 12 may be rewritten as

Since the series expansion, eq 10, has been truncated at second order, an iterative procedure is established to get 6E = 0. For a given set of orbitals and coefficients E,41, A, and B are calculated from eq 13-16. The vectors K and S are then evaluated from eq 12 and a new coefficient matrix and a new set of orbitals are determined from eq 4 and 8. The procedure is repeated until the numerical value of W and is smaller than a specific tolerance. The above procedure is a second-order procedure which performs the variation in the orbitals and in the coefficient in a single step. It will be denoted as the one-step second-order a p p r o a ~ h . ~ ? ~ A two-step second-order procedure may be described as f01lows.~JAfter an initial guess of orbitals the matrix D in eq 4 is determined from a configuration interaction calculation. We then have (mlHln) = E n a n z n (18)

v

and the vector 41 becomes equal to zero. The vector E that contains the effect of the variation in the coefficient matrix through second order may then be determined from a partitioned form of eq 12 -I(

= (Aoo- Boo- Y)-lW

(19)

where

(21)

This is a procedure commonly used when working with second quantization; and, of course, the resulting equations are considerably simplified since cancellations are explicitly included theoretically. C. Constraints i n the Initial Few MCSCF Iterations. Second-order procedures may have difficulty converging if the walk along the energy hypersurface is not constrained during the first one or two iterations where the orbitals may be far from the converged set. Small eigenvalues of the Hessian may cause step lengths which are much too large and even closer to convergence may considerably slow down convergence.12 Spurious negative eigenvalues of the Hessian may cause a calculation to proceed in the wrong direction and may lead to convergence to a different state. Several schemes have recently been proposed which would provide for a more controlled walk along the energy hyp e r ~ u r f a c e . ~ ~ JIn &the ' ~ following we discuss in detail two constraint procedures which include mode reversal. Mode damping was developed to include separate control for orbital and mixed or configurational modes. Mode controlling is a refinement of the mode-damping technique which takes into account the number of amplitudes that are changed in an MCSCF iteration. It should be particularly useful for difficult cases such as near-avoided curve crossings. In practice, we have never experienced any convergence difficulties with either procedure. With the mode-damping procedure>6in order to control the direction and magnitude of the step length taken in the one-step calculations, we obtain the spectral representation

(A - B)-' = Ut-' U+

(22)

Equation 12 is transformed

where

Since t is diagonal, eq 23 is uncoupled; and we can solve N one-dimensional problems instead of solving a set of N X N coupled, linear inhomogeneousequations. For the pth eigenvalue we obtain the step length amplitude 2Xp = -Fp/ep (no sum) (25)

Y= From eq 25 it is obvious that, for small tp, lXpl may be large. (Ol[ rqsrl,g+-~llO) (01[_R,H,_R+I Io)-' (Ol[R+-Iz,[H,&_+IllO) (20) (A, - B, - Y)is sometimes called the reduced Hessian. From eq 8 a new set of orbitals may be derived, a CI calculation performed, etc. This procedure will be denoted the two-step second-order approach. The above procedures for determining E, the first-derivative or generalized Brillouin's theorem (GBT) vector, and G,the second-derivative or Hessian matrix, involves the commutators of the excitation and deexcitation opertors Q+,Q and the Hamiltonian operator in second

Large amplitudes, if not reduced, may take a calculation far from the desired solution on the energy hypersurface. This may lead to slow convergence, convergence to an undesired state, or even to divergence. Most modes,

(20) P. Jergensen and J. Simons, "Quantum Chemistry Second Quantization Methods", Plenum Press, New York, 1981. (21) D. J. Rowe, "Nuclear Collective Motion: Models and Theory", Methuen and Co., London, 1970. (22) C. W. McCurdy, R. N. Rescigno, D. L. Yeager, and V. McKoy in "ModernTheoretical Chemistry",Vol. 3, H. F. Schaeffer,111, Ed.,Plenum Press, New York, 1977, and references therein.

Multiconfigurational Self-Consistent Field Calculations

however, are well-behaved (lXpl < 0.5 or so) and do not need to be constrained. In order to assure rapid convergence to the proper state, the magnitude of X p may need to be reduced. Step length restriction is advocated as a constraint procedure in several applied mathematics books, see, e.g. ref 23a, chapter 5, and ref 23b. For the nth state of a certain symmetry, we usually require n - 1negative eigenvalues of the full Hessian, G. If n - 1eigenvalues of the full Hessian are not present at each iteration, convergence may proceed to the wrong state. When we are converging to the lowest state of a particular symmetry, all negative eigenvalues are forced positive. Alternatively, if the reference state is an excited state of a particular symmetry, then only those modes with predominantly orbital character are forced positive. In this case orbital, configurational, and mixed-mode amplitudes may still be reduced. Most workers usually do not examine the eigenvalues of the full Hessian but instead examine only the ordering of the CI states from the MCSCF configurations.11J4J7 However, this latter approach may not be optimal. In many cases through a simple reparameterization based on redundant variables among orbital and state operators, two calculations may give MCSCF states with the same energy and properties. However, in one calculation the CI states may have the “incorrect” ordering by energy and in another have the “correct”ordering. This is discussed in detail in ref 12. It is usually best to constrain modes differently depending on the character of the mode. The character (orbital, mixed, or configurational) of a particular mode is determined by the 7, parameter for that mode. rp is the square root of the sum of the squares (Le., length) of the coefficients of the configurational transfer operators ((R,+,R,J)in the normalized eigenvector of the Hessian which corresponds to the 7, eigenvalue. A purely configurational mode has 7, = 1,while for a purely orbital mode rp = 0. We have given 0.33 as the cutoff value for orbital ( r 50.33) and configurational or mixed ( r p> 0.33) modes. T i e damping criteria is as follows. (zPis an orbital mode step length amplitude and Sp is a configurational or mixed-mode step length amplitude.) If an eigenvalue t p is smaller in magnitude than a specific tolerance-(O.l au) and the corresponding step length amplitude li#lS 1) is greater than Izol(lsOl)then IR I(lSpl)is replaced by laol(l&. We have = 0.1. The choice of different values set lRol = 0.25 and ~ ~ ~sign for lzOLand has been discussed e l ~ e w h e r e .The of zp(Sp)is maintained unless tp < 0.0 au and r p I 0.33, in which case, iip is replaced by -Ik0lii / l i i p l . It is important to monitor and possifdy change the step length amplitude ( X p ) and not the A - B eigenvalue ( E , ) . This is because X, 0.0 and Fp 0.0 at convergence while tp does not. Hence, no constraints should be applied when near convergence. Even when far from convergence, small Hessian eigenvalues usually give step length amplitudes which are not large. Of course, these should not be constrained. It is usually only when far from convergence that a f e w amplitudes need to be constrained. Recently, Olsen, Jerrgensen, and Yeager’, have developed an alternate procedure, mode controlling, in which the character of the mode is directly incorporated. With mode controlling, orbitals are allowed to change completely in nature in approximately three iterations, while the configuration amplitudes may change by at most 10% in two or three iterations. The step length is reduced in a two-

lsol

-

-

(23) (a) R. Fletcher, “Practical Methods of Optimization”, Vol. 1, Wiley, New York, 1980, (b) ‘Numerical Solution of Systemsof Nonlinear Algebraic Equations”, George D. Byrne and Charles A. Hall, Ed., Academic Press, New York, 1973.

The Journal of Physical Chemistry, Vol. 86, No. !2, 1982 2143

stage sequence. First if lXp\ > K 1 ,I&,\ is replaced by K1. Secondly if lip] > K2/rP1l2 then lXpl is replaced by K2/rp1J2. Note that K 2 < K 1 . By dividing K2 by r:l2 the character of the mode is taken into account. Since K 2 / r P 1 lis2 large for orbital modes ( r p s; R+ = (In)(Ol)

-0.018 124 691 8 1

-75.448 540 086 1 -0.007 242 441 6

2

-75.557 825 277 0 t 0.002 469 862 7

3

-75.453 312 665 0

E(ii,s) = (OIH]O)- i(Ol[s+ii,H]IO! - (01[9,sfi]lO) (01[$&")(01[ s , [ m I l10) + (i/6)(OI[~,~,SfiIlO) + (i/6)(01[ii,ii,ii,HlIO)+ (i/2)(Ol[s,~,[iifilIlO)+ (i/2)(Ol[s,[ii,WlI10) + ... (56)

0

both form column vectors

+0.001547 919 9 4

-75.451 764 745 1

0

h=

+O.OOO 004 831 3

5

-75.451 759 913 8

6

-75.451 759 913 9

(57)

0

(;)

-0.000 000 000 1 0

There is no CI coupling for the first iteration only. Constraint conditions: K,"= 0.426; K," = 0.113. See footnotes b-d in Table I11 for other calculational data.

T=(Z)

(59)

a

In Table VI we give the results of a calculation using the mode-controlling procedure12described above. This procedure was designed for particularly difficult MCSCF cases such as avoided curve crossings and has been applied to the avoided crossings of the B3Z; and E3Z; states in 02.12 The calculation converges in six iterations. The damping criterion for this case was not sufficiently loose and thus constraints were applied for the first three iterations. Although this requires two more iterations than with the mode-damping procedure, because of reasons discussed above and in ref 12, mode controlling is probably more reliable in general than mode damping for particularly difficult cases. We further note that using eiither mode damping or mode controlling with the one-step procedure, we never have had difficulty converging to any state. Werner and Meyer" report some problems using the mode-damping technique. However, they report no details and no numb e r ~ . ~We ' anticipate that a correct implementation of the mode-damping, mode-controlling, or similar procedures with the one-step procedure will alleviate any convergence problems.

111. Generalizations of Newton-Raphson Procedures A. Variations in the Total Energy. The total energy corresponding to the unitary transformation may be written as E(;$) = (blHlb) = (Olexp(-iS) exp(-iii)H exp(iii) exp(iQ0) (IO) Expanding the exponentials then allows us to write E(?,& = (OlHlO) - i (Ol[L?+ii,HllO! - (Ol[L?,[fi,Hl]lO)up)pl [;c[~rKIllO)- (01[s,[iimIlo) + (i/6) (01[S,[S,[S,HlllIO) + (i/6) (01[4[4 [W'lI1 10) + (i/2) ( O I [ ~ , [ ~ , [ W I+ I IG/2) I O ) ( o I [ ~ , [ ~ ~ , [ w I+I I...I o ) (54) where all terms through third order in ii and are written out explicity. We define the n-tupple symmetric commutator for the operators D1, D2, ..., D, as 1 [ D I P ~ , * * * P= ~-JW,2,...,n) H [Dl,[D,,...,[Dn,Hl...Il

(55) where P(1,2,...,n) is a permutation operator which contains the n! permutations of the indices 1,2,...n. Equation 54 then may be written as

allows us to write the total energy in eq 56 as E(K,S)= (OlHlO) + FiXi + (1/2)GijXiXj + (1/6)KijkXiXj&

+ ... (60)

where the matrices Fi, Gij, and Kijk are defined as (1/2)Fi = ( O l ~ + f i I i l O ) (1/2)Gij = (Ol[T'-T,T+,Hlijlo)

(1/2)Kijk =

(Ol[~'-T,~'-T,T'~lij~lO)

(61) (62) (63)

Note that here and for the remainder of this paper, we use the Einstein summation convention. All terms in the n-tupple commutator which couple the configuration and orbital space are defined such that the Hamiltonian first operates on the orbital space excitation operators and then on the configuration space excitation operators, e.g., all the six terms [_R+,R,$,Q+,HI = [Q,Q+,E+,3m = [_R,Q,Q+,l3+,HI = [R+,Q+,Q,5fiI = [3g+,Q,Q+,HI = [Q,$+,E,5+fiI are defined to equal

@+,E, [Q,Q+;HI1

(64)

The matrices and G represent the slope and curvature of the energy hypersurface, respectively. The MCSCF reference state represents a stationary point on the energy hypersurface. At such a point, thg first-order variation in the energy is zero, that is GE(ii,S) = 0. This condition implies that

(

(0 I[

(1/2)_F_=

@,HI IO) -

tOI[_R',H] IO)

)

=&

(65)

We use the condition that GE(K,S) = 0 to determine the step length to take on the energy hypersurface to reach a stationary point. Using eq 60, we evaluate the first-order variation in the energy and get Fi + GijXj + (1/2)Kij,XjX, ... = 0 (66)

+

If we could determine a set of parameters from eq 66 without truncating this equation in any order in 1,we would reach a stationary point on the energy hypersurface in one iteration. The iterative nature of MCSCF approaches originates by using truncated forms of eq 66 to determine the stationary points on the energy hypersurface. These truncated forms require successive applications to determine a stationary point.

The Journal of Physical Chemistry, Vol. 86, No. 12, 1982

2150

Yeager et al.

B. Generalized Newton-Raphson Approach. We initially derive a series of approximations to eq 66 where the Newton-Raphson formula may be considered to be the first member in the series. We rearrange eq 66 to be Xk

= -GkLIFi - (1/2)Gki1KimnXmXn ...

(67)

The simplest solution to eq 67 is obtained when 1is set equal to zero on the right-hand side of eq 67. We then obtain 'Xk

= -GkilFi

(68)

which corresponds to a so-called Newton-Raphson iterative ~ c h e m e . ~A- ~ second iteration may be carried out on eq 67 where the values of the 3 parameters on the righthand side of eq 67 are from eq 68. When these are inserted into eq 67, we obtain 'hk =

-

'Xk

(l/2)Gkl-'KimnGmi-'FiGnj-1Fj

(69)

where terms of higher powers than two in E are neglected. Equation 69 is known as the Chebyshev equation. A general series of formulas which is consistent through still higher orders in F may be derived through iterating with eq 68, 69, etc. keeping terms of the desired power in F. Yaffe and Goddard%have examined a Chebyshev method for orbital optimization in SCF and GVB calculations. No coupling with the CI states was included in their theory. C. Generalized Brillouin Theorem Procedures. So far, we have used knowledge of the energy surface to define iterative MCSCF approaches that will converge to stationary points on the energy hypersurface. A characteristic of a stationary point is that it fulfills the GBT theorem of eq 65. Stationary points on the energy hypersurface may be determined by constraining the error in the GBT vector in one iteration relative to the error in the GBT vector in the previous iteration. Denoting the operators in the (n + 1)st iteration with and those of nth with no -, we may write the error in the GBT vector in the (n + 1)st iteration as using eq 6 and 9

-

E=(

)=( (CI[&,H] 16

(Cl[

@,HI I5

-

t01exp(-i$)[Q,E71 exp(i9)Io)

(O ~ ~ + , e x p ( - i S h exp(i@] )ii IO)

)

(70)

The Ri+ component of Nijkbecomes similarly

.i

(0 I@+,[ Q-Q,Q-Q,H] - - - ] IO)t0 I[ R+,[R+-R,[ - Q-Q,H] - ] 3 IO)

(OI@+,@+-~,[Q-Q,H]] -] IO)tOI[_R+,@+-&&+-&,H] ] IO)

1

jjj

(77)

Equation 74 has the same structure as eq 66. An iterative procedure may thus easily be derived which is equivalent to the symmetric Newton-Raphson (eq 68) and the Chebyshev (eq 69) approaches. The equivalent of the symmetric Newton-Raphson approach is Xk

= -MkilFi

(78)

and the equivalent of the symmetric Chebyshev formula is Xk

= -Mkc1Fi - (1/2)Mki1Nim,Mmi-'FiMn~'Fj(79)

An iterative approach based on eq 78 will (as also will the symmetric Newton-Raphson approach) give errors which are quadratic in the error vector El. The result of using eq 79 gives errors which are of third order in E of the previous iteration. We should point out that iterative approaches which are based on eq 68 and 78 and on eq 69 and 79 even though they show the same convergency characteristics are not identical. The matrices G and M and the supermatrices K and N differ (see eq 62, 75,63, 76, and 77) since Gij and Kijk are symmetric with respect to permutation of indices i and j and i, j , and k respectively while Mij and Njjkare not. Equation 79 may be rewritten to a form which may be computationally more attractive. If we carry out an iteration using eq 78 the generalized Brillouin theorem matrix in the following iteration FBmmay be written as (insert 1of eq 78 in eq 74)

P?NR= Fi - MijMji'Fi + (1/2)NijkMji1FiMkp-1Fp+

where

R = exp(-ik)H

exp(iii) (71) Expanding the exponential then gives that the individual components in eq 70 become

@A3)

= (1/2)Ni,kM,l-'F~Mkp-1F, +

o(X3)(80)

which when inserted into eq 79 gives Xk

(bl[~+,HIlfi) = (ol[Q+,HIlO) = i ( o l [ Q + , [ ~ , ~ l l l-o ) i ( o 1 [ ~ , [ Q + ~ l I l -O (1/2) ) (01[ Q + ~ k [ ~ ~ l l l-l O ~ (01rs,rQ_.r;(alI1 10) - (1/2) (01[IS,~S,[g',HllllO~ +

= -MkL1(Fl

+ FpNR)

(81)

This form has a much simpler structure and may be a computationally more attractive form to use. In fact, if eq 81 is used third-derivative termspo not need to be o ( x 3 ) = o (72) calculated, but rather the GBT vector, EBNR,is constructed (~lr&srllb, = ( o l [ E + ~ l l o-) i(O[E+,[WI x from 1 of eq 78. 10)- ~ ( o I [ E + , [ ~ , K II I(~/~)(oI[R+,[~,[~,HIIII~) o) If the matrix M is kept fixed during two consecutive iterations, a cubically convergent scheme is thus obtained (Ol[_R+,[& [R,WI 110) - (1/2) (01[E+,[k,[WI1110) + (eq 79-81). With this technique no third-derivative suo(~3= ) o (73) permatrices are ever calculated or stored. With alternative Fi + MijXj + ( 1 / 2 ) N i j k x j X k + = O (74) cubic procedures, such as the Chebyshev procedure, the explicit calculation of third-derivative terms or the thirdwhere the Fi and X i are defined in eq 58 and 63. The derivative supermatrix times a vector may be required.36 matrix Mij is defined as This result may be generalized as follows. e..

(1/2)Mij=

(0 I[ Q,[Q-Q,H] - - - 3 IO) - ] IO) (0 I[R+,[Q+-Q,H]

(0 I [ _ R+-R,[ R+,[R+-RJI] - - 1 IO) - - Q,H] ] IO) (0 I[ -

(75)

(35) P. Siegbahn, J. Almldf, A. Heiberg,and B. Roos, J . Chem. Phys., 74, 2384 (1981). (36) J. Olaen, P. Jsrgensen, and D. Yeager, J.Chem. Phys., submitted

for publication.

The Journal of Physical Chemistry, Vol. 86, No. 12, 1982 2151

Multiconfigurational Self-consistent Field Calculations

TABLE VII: Convergence of h f o r Newton-Raphson vs. Fixed Hessian-Type Approaches'

n iteration (m)

e..)

8

16 32 64 128 256 a The numbers in this table show that h has errors proportional t o after m iterations. Here is the GBT vector for iteration point 0.

En

TABLE VIII: The 0, STO Basis Set Used in Table IX exponent

coeff

Is

7.657 81 2.688 01 1.675 43 0.9 3.694 45 1.658 64 0.9

2P 2P

'

The exponents for the first three s functions and first t w o 2 p functions are from E. Clementi and C. Roetti, A t . Data N u l . Data Tables, 14, 177 (1978).

is obtained

If an iterative scheme is applied where from eq 82 = -M

nAk

-1 F k l ( I

... + n - l P B1 N R

+ 2pBNR + 3pBNR + 1

1

+ npBNR) 1

(82) where iEBNR denotes the GBT matrix in the ith iteration, eq 82 contains errors of n + 1 power in F(0(Fn+')). Using induction, we will now prove this result. From eq 78 and 79 it is clear that, for n = 1, eq 82 contains errors of second power in E. From eq 81, it is further clear that, for n = 2, eq 82 contains errors of third power in E. To conclude the proof, we therefore assume that, when n = p - 1, eq 82 contains errors of pth power in E and we are thus left with proving that eq 82 when n = p contains errors of O ( P + l ) . Equation 82 may be rewritten P X = ~ pixk

-M ~ ~ ~ P ~ ~ B N R

TABLE IX: Convergency t o the E iteration point 0

1 2 3 4 5 6 7 8 9

energy,d au

-148.969 -148.971 -148.971 -148.971 -148.971

496 134 5 170 240 6 238 498 1 238 637 8 238 637 8

(84)

+

2 4

2s 2s 2s 2P

'FPNR= Fl + Mk['-lhl + (1/2)N1kmP-1AkP-1Am + ...

which when inserted into eq 83 gives 'Ab = '-lhk - '-'Ab - Mkrl(FI + (1/2)N[kmP1AkP-'Am ...) = -Mkr1(FL + (1/2)N1kmP-1AkP-1& + (85)

fixed Hessian tY Pe

NewtonRaphson

From eq 74 we further know that

3 ~ u State -

o f 02a-c

A E , au ~

-0.001 -0.000 -0.000 -0.000

(83)

Since the p l A k parameters contain errors of O ( P ) ,the term (1/2)N&,p1xkp1Amhas 811 error (1/2)Nikm0(P)l&which is of at least the order O(FP+l)because ' A , contains one order in E. Similarly, the terms in eq 85 which are not explicitly written out contain errors of at least O(EP+l). The errors introduced by applying eq 85 for n = p thus gives errors of at least O(P+'). By applying eq 82 we have thus been able to derive cubically, quartic, ... etc. convergent MCSCF schemes without ever explictly constructing the third, fourth, etc. derivatives of the total energy. iEBNRis constructed from the orbitals and CI coefficients obtained from i-lX (see eq 4 and 8). M is constructed one time and is not constructed in each iteration. Quadratic, cubic, quartic, ... convergence is also obtained when M is replaced with G . In Table VI1 we show how converges (in powers of E) after applying a series of Newton-Raphson steps vs. a series of these fixed Hessian-type approaches. The fixed Hessian-type approaches do not involve a recalculation of G or M for every iteration. It is estimated that it is three to five times faster to simply set up rather than to set up both and G.ll@ Hence, we anticipate that these fmed Hessian-type procedures should be useful for between two and five iterations. Subsequently, a Newton-Raphson step should be performed before further fixed Hessian-type iterations. D. Sample Calculation. Convergence to the E3Z; State in 02.As an example of these fixed Hessian-type procedures, calculations for the E32; state of O2 are presented in Tables VI11 and IX. It is easily seen that after two Newton-Raphson steps the energy and length of 131are approximately the same as after an initial Newton-Raphson and three fixed Hessian-type steps. We note that initially when far from convergence cubic and higher-order terms may cause Newton-Raphson approaches to appear to converge faster (or slower) than quadratic. After three Newton-Raphson steps the energy and are about the same as after an initial Newton-Raphson and seven fixed Hessian-type approaches. Hence, these fixed Hessian-type approaches are useful to replace 2-5 New-

674 106 1 068 257 5 000 139 7 000 000 0

h n o m d, e

2.19 X 10-l 7.13 X 1.62 X l o T 3 2.69 X lo-'

energy,f au

-148.969 -148.971 -148.971 -148.971 -148.971 -148.971 -148.971 -148.971 -148.971 -148.971

496 134 5 170 240 6 230 886 2 237 669 9 238 511 4 238 621 2 238 635 6 238 637 5 238 637 8 238 637 8

A E , au

-0.001 674 106 1 -0.000 060 645 6 -0.000 006 783 7 -0.000000 841 5 -0.000 000 109 8 -0.000 000 014 4 -0.000 000 001 9 -0.000 000 000 3 -0.000 000 000 0

AllOllIl

e, f

2.19 x 10-1 4.84 X 1.53 X 5.86 X 1.93 X 7.49 X 2.52 X 9.85 X lo-' 3.34 X 1.31 X lo-'

a Basis set of Table VIII. R = 2.20 au. Configurations: (core) 3og21nu31ng3, (core) 3ag1lnU4lng2 3uu1,(core) Starting orbitals were obtained from a 3og11n,21n,430,1, (core) 3 ~ , ~ 1 n , ~ l r , ~ 2 where r , ~ , (core) = log21au220220,2. Newton-Raphson calculation for E32;. The starting orbitals are the drst orbitals from the Newton-Raphson MCSCF seis the length quence where n o constraints were applied. d Unsymmetric Newton-Raphson approach, eq 78 in text. Fixed Hessian-type approach, eq 8 2 in of & Note that & i n the row marked iteration m - 1 is the &after m iterations.

text.

2152

The Journal of Physical Chemistry, Vol. 86, No. 12, 1982

ton-Raphson steps. Other fixed Hessian-type approaches which also demonstrate quadratic, cubic, quartic, ... convergence are discussed in more detail in ref 18. From Table VI1 it is clear that the total energy obtained after an initial Newton-Raphson and two fixed Hessian calculations are carried out approximately should correspond to the total energy obtained after two NewtonRaphson iterations. Similarly, the total energy obtained after three Newton-Raphson iterations should correspond approximately to the total energy obtained after an initial Newton-Raphson and six fixed Hessian steps have been carried out. The calculations of Table IX confirm these predictions. From the Newton-Raphson results reported in Table IX, we further note that the quadratic convergence near convergence is characterized by a proportionality constant (supermatrix) a,where a is defined from the equation fn+l = fn denotes the error vector of the nth iteration. The small discrepancies in the total energies of fixed Hessian and Newton-Raphson iterations which contain errors of the same order have its origin in the fact that the error terms in the fixed Hessian and in the Newton-Raphson approach contain different proportionality constant^.^'

IV. The Multiplicity Independent Newton-Raphson Procedure When there are small eigenvalues of the Hessian, there may occasionally be convergence difficulties with Newton-Raphson procedures. The reason for this is easily seen. We expand the gradient (the GBT matrix) in the ( n + 1)stiteration in the basis where the Hessian is diagonal in the nth iteration = Fp)

+ 2~ijX;+ (1/2)NijkXjXk + ...

(86)

t is the nth iteration diagonalized Hessian matrix. With a Newton-Raphson step length in the nth iteration

2T;j(n) = -E.-1j7(n) J J

(no sum)

(87)

we get the gradient in the ( n + 1)st iteration to be j7,(n+l) = I

(1/8)Nijkc;'F;(~)ek-lFk(n)+ ... (no sum for j and k) (88)

which does not contain linear terms in E("). In cases without small eigenvalues of the Hessian, the factors multiplying p) are usually of order unity and the new gradient possesses about double as many zeros as the preceding gradient. When very small t p eigenvalues (e.g., 0.001-0.0001) occur, the factors multiplying can be very large, and the behavior of the Newton-Raphson approach may be very different from what is usually considered quadratic convergency. This impediment to convergency increases the number of iterations necessary from typically 6 to as many as 10-12 iterations or even more in extreme cases. For examples of this behavior we refer to calculations for the low-lying states of BO as a function of internuclear separation given in ref 12. Obviously, small eigenvalues (the curvature) can exist throughout the iterative process. For the O2calculation on the E3Z; state at 2.1 au we previously reported,12at convergence we have one eigenvalue 0.000 68 au, one eigenvalue approximately 0.001 au, and five eigenvalues smaller than 0.01 au. During each iteration both far from and close to convergence we have small eigenvalues. Of course, very near convergence ~~

(37) J. F. Traub, 'Iterative Methods for the Solution of Equations", Prentice-Hall, Englewood Cliffs, NJ, 1964.

Yeager et al.

all E amplitudes will be much less than any of the e's and small eigenvalues will not cause serious convergence problems. A possible solution to problems caused by small eigenvalues is the use of

(62E)-16E= 0

(89)

rather than

6E = 0 (90) to obtain a stationary point on the energy hypersurface. Equation 89 will have the same zeroes as eq 90. The use of eq 89 is known as the multiplicity independent Newton-Raphson approach in the applied mathematics literature.lg This approach assures quadratic convergency even when there are zero eigenvalues of the Hessian.lg From eq 89 we obtain (G + K_X)-l@ + Gh) = 0 (91) where in eq 91 terms linear in 3 have been retained. Equation 91 is rewritten as -(G

+ K_X)-'E = (G + KA)-lGX= (1 - (G

+ K_X)-'K_X)X

(92) (93)

Hence

+ K_X)-'KA)-'(G + K_X)-lI (94) + K&)-'_F- (G + K_X)-'KA(G + K_X)-'F + ...

X = -(1 - (G

= -(G

(95)

-(G + Kb)-'_F (96) where in eq 96 only terms first order in (G KX)-'F are retained. We note that eq 96 is the same as the NewtonRaphson equation with G replaced by G + KX. This equation may be solved iteratively for A. It appears that an explicit calculation of the third-derivative matrix K is needed. Of course, this would require, in general, a large amount of computer storage, as well as computer time. However, in eq 96 it is actually KX which is required and not simply K. This fact can result with a considerable savings in computational time and storage. Details of this technique are reported el~ewher'e.'~We note that when this technique is used to calculate KX, only an additional one index transformation is required and that no density matrices of the form (nllm)are required, where In) and Im) are in the orthogonal complement space. Modifications of existing Newton-Raphson computer programs are not extensive. Also, storage requirements for KX are considerably reduced from the storage for K. Our initial estimates and preliminary calculations indicate that this method to calculate KA is only approximately twice as time consuming as calculating G.I3 Equation 96 is an iterative equation for 3. We note that an iterative cubic equation can be solved with this technique by replacing KX by (1/2)KX in eq 96 and that other optimization techniques involving third derivatives such as least squares23can similarly easily be developed and employed.

+

Summary We have discussed how Newton-Raphson approaches may be used in MCSCF wave function optimization procedures. In general, these procedures show superior convergence characteristics over other MCSCF techniques. Constraint procedures, which may be required when far from convergence, have been discussed and analyzed. We hae discussed further conditions which may be imposed

J. Phys. Chem. 1982, 86, 2153-2158

in an MCSCF wave function to assure it is a reasonable representation of a state of the system. For the nth state of a certain symmetry, these include the following: (i) the full Hessian has n - 1 negative eigenvalues, (ii) the MCTDHF is stable and has n - 1negative excitation energies, (iii) the state is the nth state in energy of a CI with MCSCF configurations. Ideally, all of these criteria should be fulfilled. However, there are cases where one or more of these are not fulfilled, e.g., near avoided curve crossings. Also new calculational results for the 21A1 state in CH2and the second lZg+state of Cz have been presented. Generalizations of Newton-Raphson and multiplicity independent Newton-Raphson procedures are presented and analyzed. We have shown how quadratic, cubic,

2153

quartic, ... convergence may be obtained with fixed Hessian-type approaches. Sample calculations are presented for convergence to the E3Z,-state of 02. Acknowledgment. We thank Dr. S. Huzinaga for providing w with basis set information and Dr. w. Meyer and Dr. H.-J. Werner for providing us with some of their unpublished results and basis set information. We are thankful for support by the Danish National Science Foundation. Danny L. Yeager, Diane Lynch, and Jeffrey A. Nichols acknowledge the Robert A. Welch Foundation (Grant A-770) and the National Science Foundation (Grant CHE-8023352) for research support. Danny L. Yeager expresses appreciation to NATO for travel funds.

Is Dynamical Chaos the Same Phenomenon in Classical and Quantum Mechanical Hamiltonian Systems? Stuart A. Rice" and Ronnie Kosioff The Department of Chemistry and The James Franck Institute, The University of Chicago, Chicago, Illinois 60637 (Received July 13, 1981)

A brief discussion of the Kolmogorov entropy interpretation of classical mechanical chaos and quantum mechanical chaos is given. It is pointed out that if a quantum mechanical Kolmogorov entropy is constructed to be analogous to a classical mechanical Kolmogorov entropy, then, for the case that the spectrum of the quantum mechanical system is discrete, the Kolmogorov entropy vanishes and there is no chaos, irrespective of the behavior of the corresponding classical mechanical system. A simple model which illustrates the deviation between classical and quantum dynamics with respect to chaos is also discussed.

I. Introduction Recent advances in the theory of classical dynamics, particularly those concerning the onset and nature of dynamical chaos in Hamiltonian systems,l-s have stimulated two very different approaches to understanding the corresponding quantum dynamics. One category of studies is concerned with the general relationship between classical and quantum mechanics: the nature of the transition between them, and the development of convenient and accurate algorithms for so-called semiclassical quantization."'-" The second category of studies is related to (1)See, for example, the set of papers of "Topics in Nonlinear Dynamics", AZP Coni.h o c . No.46,1-403 (1978). (2)V. I. Arnold and A. Avez, "Ergodic Problems of Classical Mechanics", W. A. Benjamin, New York, 1974. (3)B. V. Chirikov, Phys. Rep., 52,263 (1979). (4)G. Benettin, C. Froeschle, and J. P. Scheidecker, Phys. Rev. A , 19, 2454 (1979);G.Benettin, L.Galgani, and J. M. Steclyn, ibid., 14,2338 (1976). (5) A. Kolmogorov, Dokl. Akad. Nauk. SSSR, 124,774 (1959). (6)V. A. Rokhlen, Usp. Mat. Nauk., 22, l(1967) [Math. Surveys, 22, (1967)l;V.A. Rokhlin, Izu. Akad. Nauk. SSSR, Ser. Mat., 25,499(1961) [Am. Math. SOC.Transl., 39, (2),1 (1964)l. (7)Ya. Sinai, Zzv. Akad. Nauk. SSSR Serv., Mat., 25,899(1961); 30, 15 (1966)[Am. Math. SOC.Transl. 39 (2),83 (1964);68 (2),34 (1967)l. (8)Ya. B. Pesin. USD. Mat. Nauk.. 32 (4).55 (1977) [Math. Suruevs. . " , 32,'(4),55 (1977)l.' (9) See, for example, M. V. Berry, J. Phys. Chem., article in this issue, and ref 1~. ~.~ (10)M. C. Gutzwiller,J. Math. Phys., 8,1979(1967);10,1004(1969); 11, 1971 (1970);12,343 (1971). (11)C. Jaffe and W. Reinhardt, J. Chem. Phys., 71,1862 (1979). (12)I. C. Percival, J.Phys. A , 7,794(1974);Adu. Chem. Phys., 36,1 (1977). I. C. Percival and N. Pomphrey, J . Phys. B, 32, 97 (1976). (13)M.V. Berry, Phil. Trans. R. SOC.London, Ser. A , 287,237(1977); J.Phys. A , 10,2083(1977);M.V. Berry and M. Tabor, Tabor, Proc. R. SOC.London, Ser. A, 349,101 (1976). (14)S.Chapman,B. C. Garrett, and W. H. Miller, J. Chem. Phys., 64, 502 (1976);W.H.Miller, ibid., 63,936 (1975). I

.

I

,

I

.

attempts to improve understanding of intramolecular dynamics, in particular the relative rates of vibrational relaxation, isomerization, dissociation, and other processes.lgZo It is usually assumed that the onset of chaos in the classical mechanical model signals the onset of rapid intramolecular energy transfer in the corresponding quantum mechanical model. In turn this assumption presupposes that dynamical chaos in classical and quantum mechanical models of the same system occur under nearly the same conditions, e.g., nature of excitation and energy, and that the existence of chaos in a classical mechanical system implies the existence of chaos in the corresponding quantum mechanical system. In this brief note we will confine our comments to the second category of investigations. We will argue, referring the reader elsewhere for details,21that the interference effects inherent to quantum dynamics render it different from classical dynamics. Indeed, when a common quantitative measure of chaos is employed, namely, the Kolmogorov it is found that a bounded quantum mechanical system cannot exhibit chaos even when the corresponding classical mechanical system does. This result implies that we must interpret the dynamical predictions derived from classical mechanical models with considerable care, keeping in mind

A

0022-3654/82/2086-2153$0 1.25/0

(15)D. W.Noid and R. A. Marcus, J. Chem. Phys., 62,2119 (1975); W. Eastes and R. A. Marcus, ibid., 61,4301 (1974). (16)R. T.Seimm and J. B. Delos, J. Chem. Phys., 71, 1706 (1979). (17)G. C. Schatz and T. Mulloney, J. Phys. Chem., 83,989 (1979). (18)K. S.J. Nordholm and S. A. Rice, J. Chem. Phys., 61,203, 768 (1974);62,157 (1975). (19)E. J. Heller, J. Chem. Phys., 72,1337 (1980);Chem. Phys. Lett., 60,338 (1979). (20)K. G. Kay, J. Chem. Phys., 72,5995 (1980). (21)R. Kosloff and S. A. Rice, J . Chem. Phys., 74,1340 (1981).

0 1982 American Chemical Society