Parameterless Stopping Criteria for Recursive Density Matrix

Oct 26, 2016 - Different ways to decide when to stop the iterations have been suggested. A common ...... and εM denotes the machine epsilon. Using th...
0 downloads 0 Views 483KB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Parameterless stopping criteria for recursive density matrix expansions Anastasia Kruchinina, Elias Rudberg, and Emanuel H. Rubensson J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.6b00626 • Publication Date (Web): 26 Oct 2016 Downloaded from http://pubs.acs.org on October 30, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Parameterless stopping criteria for recursive density matrix expansions Anastasia Kruchinina,∗ Elias Rudberg,∗ and Emanuel H. Rubensson∗ Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden E-mail: [email protected]; [email protected]; [email protected] Abstract Parameterless stopping criteria for recursive polynomial expansions to construct the density matrix in electronic structure calculations are proposed. Based on convergence order estimation the new stopping criteria automatically and accurately detect when the calculation is dominated by numerical errors and continued iteration does not improve the result. Difficulties in selecting a stopping tolerance and appropriately balancing it in relation to parameters controlling the numerical accuracy are avoided. Thus, our parameterless stopping criteria stand in contrast to the standard approach to stop as soon as some error measure goes below a user-defined parameter or tolerance. We demonstrate that the stopping criteria work well both in dense and sparse matrix calculations and in large-scale self-consistent field calculations with the quantum chemistry program Ergo (www.ergoscf.org).

1

Introduction

An important computational task in electronic structure calculations based on for example Hartree–Fock 1 or Kohn–Sham density functional theory 2,3 is the computation of the one1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 48

electron density matrix D for a given Fock or Kohn–Sham matrix F . The density matrix is the matrix for orthogonal projection onto the subspace spanned by eigenvectors of F that correspond to occupied electron orbitals:

F xi = λ i x i ,

(1.1)

n occ X

(1.2)

D :=

xi xTi ,

i=1

where the eigenvalues of F are arranged in ascending order

λ1 ≤ λ2 ≤ · · · ≤ λhomo < λlumo ≤ · · · ≤ λN −1 ≤ λN ,

(1.3)

nocc is the number of occupied orbitals, λhomo is the highest occupied molecular orbital (homo) eigenvalue, and λlumo is the lowest unoccupied molecular orbital (lumo) eigenvalue and where we assume that there is a gap

ξ := λlumo − λhomo > 0

(1.4)

between eigenvalues corresponding to occupied and unoccupied orbitals. An essentially direct method to compute D is to compute an eigendecomposition of F and assemble D according to (1.2). Unfortunately, the computational cost of this approach increases cubically with system size which limits applications to rather small systems. Alternative methods have therefore been developed with the aim to reduce the computational complexity. 4 One approach is to view the problem as a matrix function

D = θ(µI − F ),

(1.5)

where θ is the Heaviside function and µ is located between λhomo and λlumo , which makes (1.5) equivalent to the definition in (1.2). 5 A condition number for the problem of evaluating

2

ACS Paragon Plus Environment

Page 3 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(1.5) is given by ∆ǫ kθ(µI − (F + hA)) − θ(µI − F )k = h→0 A:kAk=∆ǫ h ξ

κ := lim

sup

(1.6)

where ∆ǫ is the spectral width of F . 6–8 We let kAk = ∆ǫ to make the condition number invariant both to scaling and shift of the eigenspectrum of F . 7 When the homo-lumo gap ξ > 0, a function that varies smoothly between 0 and 1 in the gap can be used in place of (1.5). To construct such a function, recursive polynomial expansions or density matrix purification have proven to be particularly simple and efficient. 9 The regularized step function is built up by the recursive application of low-order Algorithm 1 Recursive polynomial expansion (general form) 1: 2: 3: 4: 5: 6:

X0 = f0 (F ) f =X +E X 0 0 0 while stopping criterion not fulfilled, for i = 1, 2, . . . do f ) Xi = fi (X i−1 f Xi = Xi + Ei end while

polynomials f0 , f1 , . . . , see Algorithm 1. With this approach, a linear scaling computational cost is achieved provided that the matrices in the recursive expansion are sufficiently sparse, which is usually ensured by removing small matrix elements during the course of the recursive expansion. 10 In Algorithm 1 the removal of matrix elements, also called truncation, is written as an explicit perturbation Ei added to the matrix in each iteration. Several recursive expansion algorithms fitting into the general form of Algorithm 1 have been proposed. Note that here we are considering methods that operate in orthogonal basis. The function f0 is usually a first order polynomial that moves all eigenvalues into the [0, 1] interval in reverse order. A natural choice for the iteration function fi , i = 1, 2, . . . is the McWeeny polynomial 3x2 − 2x3 , 11,12 which makes Algorithm 1 essentially equivalent to the Newton– Schulz iteration for sign matrix evaluation. 6 Furthermore, algorithms were developed that do not require beforehand knowledge of µ. Palser and Manolopoulos proposed a recursive 3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

expansion based on the McWeeny polynomial. 12 Niklasson proposed a simple and efficient algorithm based on the second order spectral projection polynomials x2 and 2x − x2 . 13 We will refer to this algorithm as the SP2 algorithm. The recursive application of polynomials gives a rapid increase of the polynomial order and the computational cost increases only with the logarithm of the condition number. 9,13 The computational cost can be further reduced by a scale-and-fold acceleration technique giving an even weaker dependence on the condition number. 14 Recursive expansion algorithms are key components in a number of linear scaling electronic structure codes including CP2K, 15 Ergo, 16,17 FreeON, 18 Honpas, 19 and LATTE. 20 Since most of the computational work lies in matrix-matrix multiplications, recursive expansion algorithms are well suited for parallel implementations 21–24 and a competitive alternative to diagonalization also in the dense matrix case. 22,23 Different ways to decide when to stop the iterations have been suggested. A common approach is to stop when some quantity, measuring how far the matrix is from idempotency, goes below a predetermined convergence threshold value. The deviation from idempotency has been measured by the trace 24,25 or some matrix norm 6,15,23,26–28 of Xi − Xi2 sometimes scaled by for example the matrix dimension. However, since the recursive expansion is at least quadratically convergent, what one usually want is to continue iterating until the idempotency error does not anymore decrease substantially. This happens when any further substantial decrease is prevented by rounding errors or errors due to removal of matrix elements. To find a proper relation between matrix element removal and the parameter measuring idempotency can be a delicate task, often left to the user of the routine. However, a few attempts to automatically detect when numerical errors start to dominate exist in the literature. Palser and Manolopoulos noted that with their expansions, the functional Tr[Xi F ] decreases monotonically in exact arithmetics and suggested to stop on its first increase which should be an indication of stagnation. 12 A similar criterion for the SP2 expansion was proposed by Cawkwell et al. 22 In this case, the iterations are stopped on an increase of the idempotency error measure |Tr[Xi − Xi2 ]|. However, the value of the functional or 4

ACS Paragon Plus Environment

Page 4 of 48

Page 5 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the idempotency error may continue to decrease without significant improvement of the accuracy. In such cases, the computational effort in the last iterations is no longer justified. In the present work, we propose new parameterless stopping criteria based on convergence order estimation. The stopping criteria are general and can be used both in the dense and sparse matrix cases using different strategies for truncation, and with different choices of polynomials.

2

Parameterless stopping criteria

The iterations of density matrix expansions can be divided into three phases: 29,30 1) the conditioning phase where the deviation from idempotency decreases less than quadratically or not at all, 2) the purification phase where the idempotency error decreases at least quadratically, and 3) the stagnation phase where the idempotency error again does not decrease significantly or at all, see the lower panels in Figure 2.1. Here, we propose new parameterless stopping criteria designed to automatically and accurately detect the transition between purification and stagnation, so that the procedure can be stopped without superfluous iterations in the stagnation phase. We measure the deviation from idempotency by ei := kXi − Xi2 k2 .

(2.1)

We recall that an iterative method has asymptotic order of convergence q if it in exact arithmetics generates a sequence of errors e1 , e2 , . . . such that ei = C ∞, q i→∞ e i−1 lim

(2.2)

where C ∞ is an asymptotic constant. The order of convergence can also be observed numerically by q≈

log(ei /C ∞ ) =: qi . log(ei−1 ) 5

ACS Paragon Plus Environment

(2.3)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2.1: Illustration of the three phases for a recursive expansion of order q = 2 based on the polynomials x2 and 2x − x2 (see Section 4). In the conditioning phase the matrix does not come closer to idempotency but the condition number, κ, is lowered. In the purification phase the condition number is close to 1 and idempotency is approached quadratically. In the stagnation phase numerical errors start to dominate and the matrix again does not come closer to idempotency. The upper panel shows what we call the observed orders of convergence qi and ri . Throughout the conditioning and purification phases ri ≥ 2 but in the stagnation phase ri < 2.

6

ACS Paragon Plus Environment

Page 6 of 48

Page 7 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Our stopping criteria are based on the detection of a discrepancy between the asymptotic and observed orders of convergence. When the stagnation phase is entered numerical errors start to dominate, leading to a fall in the observed order of convergence. Since the observed order can be significantly smaller than the asymptotic order also in the initial conditioning phase, an issue is how to determine when the purification phase has started and one can start to look for a drop in the order qi . A similar problem of determining the iteration when to start to check the stopping criterion appears in the method described by Cawkwell et al. 22 Our solution is to replace the asymptotic constant C ∞ in (2.3) with a larger value such that the observed order of convergence in exact arithmetics is always larger than or equal to the asymptotic order of convergence. In other words we want to find Cq , as small as possible, such that in exact arithmetics

ri :=

log(ei /Cq ) ≥q log(ei−1 )

(2.4)

for all i. One may let Cq vary over the iterations but we will later see that it is usually sufficient to use a single value Cq for the whole expansion. In the presence of numerical errors ri is significantly smaller than q only in the stagnation phase. We may therefore start to look for a drop in ri immediately. As soon as the observed order of convergence, ri , goes significantly below q, the procedure should be stopped, since this indicates the transition between purification and stagnation. In this way we avoid the issue of detecting the transition between conditioning and purification. See the upper panel in Figure 2.1 for an illustration. We note that (2.4) is equivalent to

Cq ≥

ei kfi (Xi−1 ) − fi (Xi−1 )2 k2 . = 2 eqi−1 kXi−1 − Xi−1 kq2

(2.5)

For generality and simplicity we want to assume as little information as possible about the eigenspectra of Xi , i = 0, 1, . . . . We will use the following theorem to find the smallest possible Cq value fulfilling (2.5) with no or few assumptions about the location of eigenvalues 7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 48

for several recursive expansion polynomials of interest for appropriate choices of q. Theorem 1. Let f be a continuous function from [0, 1] to [0, 1] and assume that the limits

lim+

x→a

f (x) − f (x)2 (x − x2 )q

and

lim

x→(1−a)−

f (x) − f (x)2 (x − x2 )q

(2.6)

exist for some q > 0, where a ∈ [0, 0.5]. Let H denote the set of Hermitian matrices with all eigenvalues in [0, 1], at least one eigenvalue in ]0, 1[, and at least one eigenvalue in [a, 1 − a]. Then, sup X∈H

kf (X) − f (X)2 k2 = max g(x, a), x∈[0,1] kX − X 2 kq2

where g(x, a) :=

      

f (x)−f (x)2 (x−x2 )q f (x)−f (x)2 (a−a2 )q

if a ≤ x ≤ 1 − a,

(2.7)

(2.8)

otherwise,

is extended by continuity at x = 0 and x = 1 for a = 0. As suggested by Theorem 1, we will choose Cq := maxx∈[0,1] g(x, a) thereby making sure that (2.5) is fulfilled. In principle, the value a should be chosen as large as possible to get the smallest possible Cq -value, since a larger a gives a smaller set of matrices H in (2.7). If for the given recursive expansion polynomials fi , the limits (2.6) exist with a = 0, then we employ the theorem with a = 0. Only if the limits (2.6) do not exist with a = 0 will we use the theorem with a > 0 and in general get different values of Cq in every iteration. In such a case some information about the eigenspectrum of Xi in each iteration i is needed so that a can be chosen appropriately. We will later see that for all recursive expansions considered in this article it is sufficient to use a single value Cq for the whole expansion. The theorem should be invoked with q equal to the order of convergence of the recursive expansion. Proof of Theorem 1. Continuity of f (x) together with existence of the two limits (2.6) (needed

8

ACS Paragon Plus Environment

Page 9 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

in case a = 0) implies existence of maxx∈[0,1] g(x, a). Let X be a matrix in H. Then, kf (X) − f (X)2 k2 ≤ max x∈[0,1] kX − X 2 kq2

max

y∈[a,1−a] |y−0.5|≤|x−0.5|

f (x) − f (x)2 (y − y 2 )q

f (x) − f (x)2 , x∈[a,1−a] |y−0.5|≤|x−0.5| (y − y 2 )q ! f (x) − f (x)2 max max x∈[0,1]\[a,1−a] y∈[a,1−a] (y − y 2 )q f (x) − f (x)2 = max max , x∈[a,1−a] (x − x2 )q ! f (x) − f (x)2 max x∈[0,1]\[a,1−a] (a − a2 )q

= max

max

max

= max g(x, a).

(2.9) (2.10)

(2.11)

(2.12)

x∈[0,1]

To show that maxx∈[0,1] g(x, a) is the supremum, it is therefore sufficient to show that (2.7) holds for some subset of H. Assume first that a > 0 and let X = diag(a, b) where b = arg maxx∈[0,1] g(x, a). Assume that f (a) − f (a)2 > f (b) − f (b)2 . Then, g(a, a) =

f (a) − f (a)2 f (b) − f (b)2 > ≥ g(b, a) = max g(x, a) x∈[0,1] (a − a2 )q (a − a2 )q

(2.13)

which is a contradiction. Thus, kf (X) − f (X)2 k2 = f (b) − f (b)2 and since    (b − b2 )q

kX − X 2 kq2 =   (a − a2 )q we have that

kf (X)−f (X)2 k2 kX−X 2 kq2

if a ≤ b ≤ 1 − a,

(2.14)

otherwise,

= g(b, a) = maxx∈[0,1] g(x, a). Assume now that a = 0. Then,

since g(x, 0) is a continuous function in [0, 1] we have that f (x) − f (x)2 = max g(x, 0). x∈[0,1] (x − x2 )q 1[

(2.15)

sup x∈]0,

Thus, maxx∈[0,1] g(x, a) is the least upper bound of

9

kf (X)−f (X)2 k2 kX−X 2 kq2

ACS Paragon Plus Environment

for X ∈ H.

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 48

Remark 2.1. Inequality (2.9) can be interpreted as follows. The arguments of the maxima, x and y, play the roles of the eigenvalues of X that define kf (X) − f (X)2 k2 and kX − X 2 k2 , respectively. While, for all we know, x can be any eigenvalue, y is necessarily the eigenvalue of X that is closest to 0.5. In other words,

y = arg min |λ − 0.5|

(2.16)

kX − X 2 k2 = max λ − λ2 = y − y 2 ,

(2.17)

λ∈{λi }

and λ∈{λi }

where {λi } are the eigenvalues of X. Since there is at least one eigenvalue at x and at least one eigenvalue in [a, 1 − a] the constraints on y in (2.9) follow. By construction, the observed convergence order ri may in exact arithmetics end up c such that exactly equal to q. This happens for matrices X c − f (X) c 2 k = C kX c− X c2 kq . kf (X) 2 q 2

(2.18)

c may trigger the stopping This means that even an infinitesimal deviation of the matrix f (X)

criterion. To address this issue we will in place of Cq use a value Ceq > Cq and our stopping

criteria will be on the form stop as soon as ei > Ceq eqi−1 . We choose Ceq such that c − f (X) c 2k + τ ≤ C e kX c− X c2 kq kf (X) 2 q 2

(2.19)

c fulfilling (2.18). Here, τ is the deviation of the idempotency error holds for any matrix X

caused by numerical errors (such as e.g. floating-point roundoff and truncation of small

matrix elements). In other words, a deviation of the idempotency error smaller than or c This gives a lower bound equal to τ will not trigger the stopping criterion for matrices X.

10

ACS Paragon Plus Environment

Page 11 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

for Ceq as a monotonically increasing function of τ : τ Ceq ≥ Cq + c c q kX − X 2 k2

(2.20)

c−X c2 k For simplicity, we will assume that for all matrices for which (2.18) holds, the value kX 2

c− X c2 k , one has to take the most restrictive is unique. If (2.18) holds for several values of kX 2

lower bound. At the same time Ceq should not be too large so that superfluous iterations in the stagnation phase or failure to stop can be avoided as far as possible. An upper bound

for Ceq is given by the condition that in the stagnation phase the stopping criterion should

be triggered by a deviation τ of the idempotency error kf (X) − f (X)2 k2 such that kf (X) − f (X)2 k2 + τ ≥ kX − X 2 k2 ,

(2.21)

i.e. when there is no reduction of the idempotency error. Assume that at stagnation 1 kf (X) − f (X)2 k2 < kX − X 2 k2 2

(2.22)

then from (2.21) and (2.22) we have kf (X) − f (X)2 k2 < τ.

(2.23)

Then if q > 1 an upper bound for Ceq is given by Ceq ≤ (2τ )1−q

11

ACS Paragon Plus Environment

(2.24)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 48

which whenever (2.21) and (2.23) are fulfilled gives Ceq kX − X 2 kq2 ≤ (2τ )1−q (kf (X) − f (X)2 k2 + τ )q

(2.25)

< (kf (X) − f (X)2 k2 + τ )1−q (kf (X) − f (X)2 k2 + τ )q

(2.26)

= kf (X) − f (X)2 k2 + τ

(2.27)

and a stop of the procedure. Note that (2.24) gives an upper bound for Ceq as a monoton-

ically decreasing function of τ . This suggests that Ceq should be chosen as the value at the intersection between (2.20) and (2.24). The intersection point τ˜q is a solution of the equation 1 c− X c2 kq kX

2

τ q + Cq τ q−1 − 21−q = 0.

(2.28)

The intersection point that defines Ceq sets a limit on how inaccurate a calculation can

be and still not stop early but as soon as the calculation is dominated by numerical errors.

The deviation τ from the exact idempotency error kf (X) − f (X)2 k2 must, throughout the whole expansion, be smaller than τ˜q . Another restriction on how inaccurate a calculation can be is given by (2.22). When q > 1 it is clear that (2.22) will be satisfied as soon as the idempotency error kX − X 2 k2 is small enough, the exact value being determined by f . We would like to stress that (2.22) only needs to be satisfied at stagnation. In this work we will mainly consider second order expansions with q = 2. In this case, the intersection point is (˜ τ2 , Ce2 ) with τ˜2 =

  1 q 2 c c2 4 c− X c2 k2 c− X c2 k2 − C kX C2 kX − X k2 + 2kX 2 2 2 2

(2.29)

and Ce2 = q

1 c− X c2 k2 c− X c2 k2 − C kX c− X c2 k4 + 2kX C22 kX 2 2 2 2

12

ACS Paragon Plus Environment

.

(2.30)

Page 13 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

2.1

Alternatives to the spectral norm

The spectral norm is often expensive to compute, since in general iterative eigenvalue solvers require computation of many matrix-vector products. Moreover, in the last iterations the eigenvalues of Xi − Xi2 become clustered around 0 which may result in slow convergence of the iterative eigensolver. One may therefore want to use an estimate in place of the spectral norm. An alternative is the Frobenius norm

vi := kXi −

Xi2 kF

v uX  2 u λj − λ2j , =t

(2.31)

j

where {λj } are the eigenvalues of Xi . The calculation of the Frobenius norm requires just one pass over the elements of the matrix and the computational cost is independent on the eigenvalue distribution. In our tests we note that the time needed to compute the spectral norm is two orders of magnitude larger than the time needed to compute the Frobenius norm. We expect the Frobenius norm to be a good estimate to the spectral norm close to convergence since then many eigenvalues are clustered around 0 and 1 and do not contribute significantly to the sum in (2.31). Since kXi − Xi2 k2 ≤ kXi − Xi2 kF we have that vi = Ki ei

(2.32)

for some Ki ≥ 1. Assume that for a given q q Ki ≤ Ki−1 .

(2.33)

q q vi = Ki ei ≤ Ki−1 Ceq eqi−1 = Ceq vi−1 ,

(2.34)

Then,

which means that if the assumptions above hold we will not stop prematurely. In the following sections we will see under what conditions (2.33) holds for specific choices of polynomials

13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 48

for the recursive expansion. Another concern when using an estimate in place of the spectral norm is whether the algorithm will stop at all. For the matrix norm used, the deviation τ from the exact idempotency error kf (Xi ) − f (Xi )2 k must, throughout the whole expansion, be smaller than τ˜q . This might be hard to satisfy in case of the Frobenius norm because of the large number of eigenvalues that contribute to the sum in (2.31). For large systems one may instead make use of the so-called mixed norm. 31 Let the matrix be divided into square submatrices of equal size, padding the matrix with zeros if needed. One can define the mixed norm as the spectral norm of the matrix with elements equal to the Frobenius norms of the obtained submatrices. It can be shown that

ei ≤ mi ≤ vi

(2.35)

where mi := kXi − Xi2 kM is the mixed norm of Xi − Xi2 in iteration i. The result for the Frobenius norm above that we will not stop prematurely therefore holds also for the mixed norm under the corresponding assumptions, i.e. (2.32) and (2.33) with vi replaced by mi . However, with a fixed submatrix block size the asymptotic behavior of the mixed norm follows that of the spectral norm. Thus, the quality of the stopping criterion will not deteriorate with increasing system size. One may therefore consider using the mixed norm for large systems. We will in the following mainly focus on the regular and accelerated McWeeny and second order spectral projection polynomials.

14

ACS Paragon Plus Environment

Page 15 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3

McWeeny polynomial

We will first consider a recursive polynomial expansion based on the McWeeny polynomial pmw (x) := 3x2 − 2x3 , 11,12 i.e. Algorithm 1 with      f0 (x) =

µ−x + 0.5, 2 max(λmax − µ, µ − λmin )

    fi (x) = pmw (x),

(3.1)

i = 1, 2, . . . .

Here, λmin and λmax are the extremal eigenvalues of F or bounds thereof, i.e.

λmin ≤ λ1 and λmax ≥ λN .

(3.2)

We want to find the smallest value C2mw such that log(ei /C2mw ) ≥2 log(ei−1 )

(3.3)

in exact arithmetics. Note that

lim+

x→0

pmw (x) − pmw (x)2 pmw (x) − pmw (x)2 = lim = 3. x→1− (x − x2 )2 (x − x2 )2

(3.4)

Thus, we may invoke Theorem 1 with f = pmw , a = 0 and q = 2 which gives pmw (x) − pmw (x)2 = max (3 + 4x − 4x2 ) = 4 x∈[0,1] x∈[0,1] (x − x2 )2

C2mw = max

(3.5)

where the maximum value is attained at xmw = 0.5. c is such that kX c− X c2 k = xmw − (xmw )2 = 0.25 we have Using equation (2.30) where X 2

1 √ τ˜2mw = ( 3 − 1) ≈ 0.0915 8 √ Ce2mw = 2 3 + 2 ≈ 5.4641, 15

ACS Paragon Plus Environment

(3.6) (3.7)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 48

and we therefore suggest to stop the expansion as soon as √ ei > (2 3 + 2)e2i−1 . The inequality (2.22) is satisfied for matrices X such that kX − X 2 k2 ≤

(3.8) √

17−3 8

≈ 0.1404.

We note that the McWeeny polynomial can be defined as the polynomial with fixed points at 0 and 1 and one vanishing derivative at 0 and 1. For polynomials with fixed points at 0 and 1 and q − 1 vanishing derivatives at 0 and 1, sometimes referred to as the Holas polynomials, 32 it can be shown that the smallest value Cqh such that log(ei /Cqh ) ≥q log(ei−1 )

(3.9)

is Cqh = 4q−1 . The equality in (3.9) is attained for xh = 0.5 for every q and thus Ceqh = (2τqh )1−q

where τqh is the real positive solution of the equation

(4τ )q + (4τ )q−1 − 21−q = 0.

3.1

(3.10)

Acceleration

Prior knowledge of the homo and lumo eigenvalues makes it possible to use a scale-andfold technique to accelerate convergence. 14 In each iteration the eigenspectrum is stretched out so that the projection polynomial folds the eigenspectrum over itself. This technique is quite general and may be applied to a number of recursive expansions making use of various polynomials. For the accelerated McWeeny polynomial the eigenspectrum is stretched out around 0.5, see Algorithm 2. The amount of stretching is determined by the parameter ξ˜ which is an estimate of the homo-lumo gap ξ such that at least one eigenvalue of F is in ˜ ˜ [µ − ξ/2, µ + ξ/2]. With α = 1 on line 9, Algorithm 2 reduces to the regular McWeeny expansion discussed above.

16

ACS Paragon Plus Environment

Page 17 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Algorithm 2 McW-ACC algorithm 1: input: F , λmin , λmax , µ, ξ˜ 2: γ = 2 max(λmax − µ, µ − λmin ) ˜ 3: β0 = 0.5(1 − ξ/γ) µI−F 4: X0 = γ + 0.5I 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:

f =X +E X 0 0 0 f f e0 = kX0 − X02 k2 Ce2mwa = 4 for i = 1, 2, . . . do 2 α = 3/(12βi−1 − 18βi−1 + 9)1/2 f Xs = α(X i−1 − 0.5I) + 0.5I Xi = 3Xs2 − 2Xs3 βs = α(βi−1 − 0.5) + 0.5 βi = 3βs2 − 2βs3 f =X +E X i i i f f ei = kXi − Xi2 k2 if ei > Ce2mwa e2i−1 then n=i break end if end for f output: n, X n

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 48

As for the regular McWeeny expansion we want to find the smallest value C2mwa such that log(ei /C2mwa ) ≥ 2. log(ei−1 )

(3.11)

For the sake of analysis we introduce 



pmwa (x, β) := pmw 3(12β 2 − 18β + 9)−1/2 (x − 0.5) + 0.5 .

(3.12)

f , β ). Furthermore, let Note that line 11 in Algorithm 2 is equivalent to Xi = pmwa (X i−1 i−1

g(x, β) =

   g1 (x, β)   g (x, β) 2

if x ∈ [β, 1 − β],

(3.13)

otherwise,

where pmwa (x, β) − pmwa (x, β)2 , (x − x2 )2 pmwa (x, β) − pmwa (x, β)2 g2 (x, β) = , (β − β 2 )2

g1 (x, β) =

(3.14) (3.15)

which are plotted in Figure 3.1 for β = 0.3. 1

8 pmw (x) pmwa (x, 0.3)

0.8

g1 (x, 0.3) g2 (x, 0.3)

6

0.6 4 0.4 2

0.2 0

0

0.5

0

1

0

0.3

x

0.7

1

x

Figure 3.1: The left panel shows the regular pmw (x) and accelerated pmwa (x, β) McWeeny polynomials with β = 0.3. The right panel shows the functions g1 (x, β) and g2 (x, β) with β = 0.3. Black vertical dashed lines indicate β and 1 − β. 18

ACS Paragon Plus Environment

Page 19 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Note that when β > 0, the limits limx→0+ g1 (x, β) and limx→1− g1 (x, β) do not exist. However, we may use Theorem 1 with a > 0. We have that

lim+ g1 (x, β) =

x→β

lim

x→(1−β)

 16(β − 1.5)2 (β − 0.75)2     3 2 2

g1 (x, β) =  (β − 1) (4β − 6β + 3) −   3

if β > 0, (3.16) if β = 0.

Note that

lim lim g1 (x, β) 6= lim+ g1 (x, 0)

β→0+ x→β +

(3.17)

x→0

due to discontinuity of g1 (x, β) at x = 0 for β > 0. Since there is at least one eigenvalue in ˜ ˜ [µ − ξ/2, µ + ξ/2], at least one eigenvalue of Xi will be in the interval [βi , 1 − βi ] in each iteration i. Therefore, we may for each iteration i invoke Theorem 1 with f (x) = pmwa (x, βi ), a = βi and q = 2. We have that

max g(x, βi ) = max

x∈[0,1]

max

x∈[βi ,1−βi ]

g1 (x, βi ),

max

x∈[0,1]\[βi ,1−βi ]

!

g2 (x, βi )

= g1 (0.5, βi ) = 4,

(3.18) (3.19)

where we used that for β > 0 the function g2 (x, β) is convex on the intervals [0, β] and [1 − β, 1], and max

x∈[0,1]\[β,1−β]

g2 (x, β) = g2 (0, β) = g2 (1, β) = g2 (β, β) = g2 (1 − β, β) = g1 (β, β) ≤ max g1 (x, β). x∈[β,1−β]

(3.20) (3.21)

Therefore, C2mwa = 4. Since the maximum value in (3.18) is attained at xmwa = 0.5 we have √ Ce2mwa = 2 3 + 2, which gives the stopping criterion in Algorithm 2. The most restrictive

bound for kX − X 2 k2 satisfying (2.22) is given by the regular McWeeny polynomial, i.e. kX − X 2 k2 ≤



17−3 . 8

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3.2

Page 20 of 48

Estimation of the order of convergence

As discussed earlier one may want to use the Frobenius norm in place of the spectral norm to measure the idempotency error. It is then desired that (2.33) is fulfilled, at least in iterations prior to the stagnation phase. Let X be a Hermitian matrix with eigenvalues {λj : 0 ≤ λj ≤ 1} and let y = arg min |λ − 0.5| > 0,

(3.22)

β = min(y, 1 − y).

(3.23)

λ∈{λj }

Then, kX − X 2 k2 = β − β 2

and

kpmw (X) − pmw (X)2 k2 = pmw (β) − pmw (β)2 .

(3.24) (3.25)

Consider the function pmw (x) − pmw (x)2 x − x2 − β − β 2 pmw (β) − pmw (β)2 x (x − 1) (4 β 4 − 8 β 3 + β 2 + 3 β − 4 x4 + 8 x3 − x2 − 3 x) =− . β 2 (β − 1)2 (−4 β 2 + 4 β + 3)

f (x) =

(3.26) (3.27)

The roots of this function on the interval [0, 1] are 0, 1, β, and 1 − β, and the function is non-negative on the intervals [0, β] and [1 − β, 1]. Therefore pmw (x) − pmw (x)2 x − x2 ≥ ≥0 β − β2 pmw (β) − pmw (β)2

20

ACS Paragon Plus Environment

(3.28)

Page 21 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

for x − x2 ≤ β − β 2 and kX − X 2 k2F kpmw (X) − pmw (X)2 k22 kX − X 2 k22 kpmw (X) − pmw (X)2 k2F =P Thus, we have that

P  j

j

2 Ki−1 Ki2

(3.29) 2

(λj − λ2j )/(β − β 2 )

((pmw (λj ) − pmw (λj )2 )/(pmw (β) − pmw (β)2 ))2

≥ 1.

≥ 1 for every i ≥ 1. Since Ki−1 ≥ 1 it follows that 2 Ki−1 ≥ Ki−1 ≥ Ki ,

(3.30)

i.e. (2.33) is fulfilled when using the regular McWeeny expansion. However, for the accelerated McWeeny expansion inequality (2.33) is not always satisfied. Consider for example the diagonal matrix

X = diag(0, 0, 0, λ, 1 − λ)

(3.31)

pmwa (0, λ) − pmwa (0, λ)2 = pmwa (λ, λ) − pmwa (λ, λ)2

(3.32)

for some λ ∈]0, 0.5[. Then,

and kX − X 2 k2F kpmwa (X, λ) − pmwa (X, λ)2 k2 kX − X 2 k22 kpmwa (X, λ) − pmwa (X, λ)2 kF

(3.33)

2

=

2 (λ − λ2 ) (pmwa (λ, λ) − pmwa (λ, λ)2 ) q

(λ − λ2 )2 5 (pmwa (λ, λ) − pmwa (λ, λ)2 )2 2 = √ ≈ 0.8944. 5

Thus, for the accelerated McWeeny expansion it may happen that

(3.34) 2 Ki−1 Ki

< 1 for some i.

To summarize, for the regular McWeeny scheme we will not stop prematurely when the 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Frobenius norm is used in place of the spectral norm. In the accelerated scheme we suggest to turn off the acceleration at the start of the purification phase when its effect anyway is small. After that the regular scheme is used and the Frobenius norm may be used. We will discuss how to detect the transition between the conditioning and purification phases in Section 4.

4

Second order spectral projection (SP2) expansion

In the remainder of this article we will focus on recursive polynomial expansions based on 2 (2) 2 33 the polynomials p(1) In an algorithm sp (x) = x and psp (x) = 2x − x proposed by Mazziotti.

proposed by Niklasson the polynomials used in each iteration are chosen based on the trace of the matrix. 13 This algorithm, hereinafter the original SP2 algorithm, is given by Algorithm 3, including the new stopping criterion proposed here. Knowledge about the homo and lumo Algorithm 3 The original SP2 algorithm with new stopping criterion 1: input: F , λmin , λmax I−F 2: X0 = λλmax−λ max min f =X +E 3: X 0 0 0 f f 4: e0 = kX0 − X02 k2 e sp = 6.8872 5: C 2 6: for i = 1, 2, . . . do   f ]>n f ]=n 7: if Tr[X mod(i, 2) and Tr[X then i−1 occ or i−1 occ f2 8: Xi = X i−1 9: pi = 1 10: else f f2 11: Xi = 2X i−1 − Xi−1 12: pi = 0 13: end if f =X +E 14: X i i i f −X f2 k 15: ei = kX i i 2 16: if i ≥ 2 and pi 6= pi−1 and ei > Ce2sp e2i−2 then 17: n=i 18: break 19: end if 20: end for f 21: output: n, X n

22

ACS Paragon Plus Environment

Page 22 of 48

Page 23 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

eigenvalues also makes it possible to work with a predefined sequence of polynomials and employ a scale-and-fold acceleration technique 14 as described in Algorithms 4 and 5. In in Algorithm 4, λout homo and λhomo are bounds of the homo eigenvalue such that

in λout homo ≤ λhomo ≤ λhomo ,

(4.1)

out λin lumo and λlumo are bounds of the lumo eigenvalue such that

out λin lumo ≤ λlumo ≤ λlumo ,

(4.2)

and εM denotes the machine epsilon. Using the scale-and-fold technique the eigenspectrum is stretched out outside the [0, 1] interval and folded back by applying the SP2 polynomials. The unoccupied part of the eigenspectrum is partially stretched out below 0 and folded back into [0, 1] using the polynomial ((1 − α)I + αx)2 , where α ≥ 1 determines the amount of stretching or acceleration. Similarly, the occupied part of the eigenspectrum is stretched out above 1 and folded back using 2αx − (αx)2 . In the following we will make use of Theorem 1 to derive the stopping criteria employed in Algorithms 3 and 5. We first note that neither of the limits

lim−

x→1

(1) 2 p(1) sp (x) − psp (x) (x − x2 )2

and

lim+

x→0

(2) 2 p(2) sp (x) − psp (x) (x − x2 )2

(4.3)

exist. The limits required by Theorem 1 do exist for compositions of alternating polynomials

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Algorithm 4 Determination of polynomials for SP2-ACC homo/lumo-based expansion in in out 1: input: λmin , λmax , λout homo , λhomo , λlumo , λlumo λmax −λin homo λmax −λmin out λ −λhomo βlo = 1 − λmax max −λmin λ −λin lumo γup = λmax max −λmin out λ −λlumo γlo = λmax max −λmin

2: βup = 1 − 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:

δ = 0.01 i=0 2 2 while βup − βup > εM or γup − γup > εM or pi = pi−1 do i=i+1 if βlo < δ and γlo < δ then βlo = 0, γlo = 0 nmin = i + 1 δ=0 end if   if γup > βup or mod(i, 2) and γup = βup then pi = 1 αi = 2/(2 − γlo ) γlo = ((1 − αi ) + αi γlo )2 , γup = ((1 − αi ) + αi γup )2 βlo = 2αi βlo − (αi βlo )2 , βup = 2αi βup − (αi βup )2 else pi = 0 αi = 2/(2 − βlo ) γlo = 2αi γlo − (αi γlo )2 , γup = 2αi γup − (αi γup )2 βlo = ((1 − αi ) + αi βlo )2 , βup = ((1 − αi ) + αi βup )2 end if end while nmax = i output: nmin , nmax , pi , αi , i = 1, 2, . . . , nmax

24

ACS Paragon Plus Environment

Page 24 of 48

Page 25 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Algorithm 5 SP2-ACC algorithm 1: input: F , λmin , λmax , nmin , nmax , pi , αi , i = 1, 2, . . . , nmax I−F 2: X0 = λλmax−λ max min f 3: X0 = X0 + E0 f −X f2 k 4: e0 = kX 0 0 2 sp e = 6.8872 5: C 2 6: for i = 1, 2, . . . , nmax do 7: if pi = 1 then f )2 8: Xi = ((1 − αi )I + αi X i−1 9: else 2 f f 10: Xi = 2αi X i−1 − (αi Xi−1 ) 11: end if f =X +E 12: X i i i f −X f2 k 13: ei = kX i i 2 14: if i ≥ nmin and pi 6= pi−1 and ei > Ce2sp e2i−2 then 15: n=i 16: break 17: end if 18: end for f 19: output: n, X n (1) (2) (21) (2) (1) p(12) sp (x) = psp (psp (x)) and psp (x) = psp (psp (x)): (12) 2 (21) 2 p(12) p(21) sp (x) − psp (x) sp (x) − psp (x) = lim = 2, x→1− x→0 (x − x2 )2 (x − x2 )2 (12) 2 (21) 2 p(12) p(21) sp (x) − psp (x) sp (x) − psp (x) = lim+ = 4. lim x→1− x→0 (x − x2 )2 (x − x2 )2

lim+

(4.4) (4.5)

However, the limits

lim+

x→0

(22) 2 p(22) sp (x) − psp (x) (x − x2 )2

and

lim−

x→1

(11) 2 p(11) sp (x) − psp (x) (x − x2 )2

(4.6)

do not exist. We therefore want to find the smallest value C2sp such that log(ei /C2sp ) ≥ 2, log(ei−2 )

(4.7)

(21) provided that pi 6= pi−1 . Consequently, we invoke Theorem 1 with f = p(12) sp and f = psp ,

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 48

a = 0, and q = 2. We have that (21) 2 (12) 2 p(21) p(12) sp (x) − psp (x) sp (x) − psp (x) = max x∈[0,1] x∈[0,1] (x − x2 )2 (x − x2 )2 √  1  71 + 17 17 = 32

(4.8)

max

where the maximum value is attained at xsp 1 =

√ 5− 17 4

(4.9)

sp if p(12) sp is used and at x2 =



17−1 4

if

p(21) sp is used. Therefore, we get C2sp =

√  1  71 + 17 17 ≈ 4.40915. 32

c such that kX c− X c2 k = xsp − (xsp )2 = xsp − (xsp )2 = Using equation (2.30) with X 2 1 1 2 2

(4.10) √ 3 17−11 8



0.1712, we have

τ˜2sp ≈ 0.0726,

(4.11)

Ce2sp ≈ 6.8872,

(4.12)

which leads to the stopping criterion of Algorithm 3. The inequality (2.22) is satisfied for matrices X such that kX − X 2 k2 ≤ 0.1325. Theorem 1 can also be used to derive a stopping criterion for the accelerated algorithm. However, in order for the limits (2.6) to exist the parameter a should be chosen larger than 0 and vary throughout the iterations. Since the acceleration is effective only in the conditioning phase, we have found it easier to turn off the acceleration when entering the purification phase and then use the stopping criterion for the regular expansion. We turn off the acceleration as soon as the relevant homo and lumo eigenvalue bounds are close enough to 1 and 0 respectively, and start to check the stopping criterion in the next iteration nmin , see lines 10–14 of Algorithm 4. There is no need for precise detection of the transition between the conditioning and purification phases—the parameter δ in the algorithm is to some extent arbitrary. A larger 26

ACS Paragon Plus Environment

Page 27 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

25

n, σ = 10−9 n, σ = 10−3 nmin nmax

20 15 10 5 0 10 -10 10 -8 10 -6 10 -4 10 -2 10 0

δ Figure 4.1: The estimated number of iterations nmax , iteration nmin where the acceleration has been turned off, and total number of iterations n in the SP2-ACC expansion for various values of δ. The recursive expansion is applied to the matrix X0 = diag(0.48, 0.52) perturbed in each iteration by a diagonal matrix with random elements from a normal distribution and with spectral norm σ. δ-value results in less acceleration. A smaller value means that we will start to check the stopping criterion later possibly resulting in superfluous iterations, particularly in low accuracy calculations, see Figure 4.1. By numerical experiments we have found δ = 0.01 to be an appropriate value. For values smaller than 0.01 the effect of the acceleration is less than 1 percent compared to the regular iteration, see Figure 4.2.

4.1

Estimation of the order of convergence

As for the McWeeny expansion one may want to use the Frobenius norm instead of the spectral norm to measure the idempotency error. However, for the SP2 expansion, relation (2.33) translates to 2 Ki−2 /Ki ≥ 1,

if pi 6= pi−1

(4.13)

given second order convergence and the application of Theorem 1 for compositions of alternating polynomials from two iterations. We have not been able to prove that (4.13) always holds. However, we have also not been able to find any counterexample where (4.13) does not hold in exact arithmetics. 27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

200 100

Efficiency [%]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10 1

10 -2

10 0

λ Figure 4.2: Efficiency of the accelerated SP2-ACC scheme relative to the regular SP2 scheme. The figure shows how much more an unoccupied eigenvalue λ is reduced by the polynomial (1 − α + αλ)2 with α = 2/(2 − λ) compared to the polynomial λ2 , in the relative sense, i.e. |λ2 − (1 − α + αλ)2 |/|λ − λ2 |. The corresponding (identical) figure for an occupied eigenvalue can be constructed in the same way. Clearly the acceleration has almost no effect in the purification phase when all eigenvalues are close to their desired values of 0 or 1. The plot is in log-log scale. We have encountered cases when (4.13) does not hold due to numerical errors. However, the use of the Frobenius norm has not resulted in too early stops in those cases. An example is given in Figure 4.3 where we applied the recursive expansion to a random symmetric dense matrix. The occupied and unoccupied eigenvalues were distributed equidistantly in [0, 0.495] and [0.505, 1], respectively. The eigenvectors of the matrix were taken from a QR factorization of a matrix with random elements from a normal distribution. The use of the Frobenius norm in the stopping criterion results in a stop in iteration 25 while the use of the spectral norm results in a stop in iteration 27. However, being clear from figures 4.3b and 4.3c, the stagnation phase started already in iteration 25. Although (4.13) does not hold, the use of the Frobenius norm does not result in a too early stop in this case. In Section 2 we pointed out that the mixed norm may be a good alternative to the Frobenius norm. The ratio (4.13) corresponding to the mixed norm has no clear connection to the ratio (4.13) corresponding to the Frobenius norm. However, as for the Frobenius norm, in our numerical experiments we have not encountered any case where the stopping

28

ACS Paragon Plus Environment

Page 28 of 48

Page 29 of 48

8

pi−1 6= pi pi−1 = pi After stop

2 Ki−2 /Ki

6 4 2 1 3 5

10

15

20

25

30

Iteration 2 /K (a) Ki−2 i

10 0

10 0

kXi − Xi2 kF

kXi − Xi2 k2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

10 -2 x2 2x − x2 After stop

10 -4

1

5

10

15

20

25

10 -2 x2 2x − x2 After stop

10 -4

30

1

5

Iteration

10

15

20

25

30

Iteration

(b) Idempotency error using spectral norm

(c) Idempotency error using Frobenius norm

Figure 4.3: Example illustrating a special case when use of the Frobenius norm in the stopping criterion results in an earlier stop than use of the spectral norm. The regular SP2 expansion is applied to a random symmetric dense 100 × 100 matrix with all eigenvalues in [0, 1], homo-lumo gap 0.01 located symmetrically around 0.5, and occupation number 50. In each iteration i the matrix Xi is perturbed by a random symmetric matrix with elements from a normal distribution and with spectral norm 10−4 . criterion is triggered too early when the mixed norm is used. In Figure 4.4 we illustrate the ratio (4.13) corresponding to the Frobenius norm and the mixed norm with two different block sizes b for a diagonal matrix of size 105 with occupied and unoccupied eigenvalues distributed equidistantly in [0, 0.495] and [0.505, 1], respectively.

29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

10 2 2 Ki−2 /Ki

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

b = 10000

10 1 b = 10

10

0

3 5

10

15

20

25

Iteration Figure 4.4: The ratio (4.13) corresponding to the Frobenius norm (blue solid line) and mixed norm (red dotted lines) in each iteration of the recursive expansion. The regular SP2 expansion is applied to a diagonal matrix of size 105 with all eigenvalues in [0, 1], homo-lumo gap 0.01 located symmetrically around 0.5, and occupation number 105 /2. In each iteration i the matrix Xi is perturbed by a random diagonal matrix with elements from a normal distribution and with spectral norm 10−4 . Two different block sizes were used for the mixed norm: b = 10 and b = 10000. We mark 6 additional iterations after the stopping criterion was triggered with ’∇’. The iterations i where pi−1 = pi we mark with ’∗’ and the iterations where pi−1 6= pi with ’o’.

4.2

Number of subsequent iterations with the same polynomial

The stopping criteria in Algorithms 3 and 5 include a condition of alternating polynomials, i.e. that pi 6= pi−1 in iteration i. When the polynomials in each iteration are chosen based on the trace of the matrix as in Algorithm 3, it is possible to construct examples where the same polynomial appears in an arbitrary number of subsequent iterations. In particular, you will get a large number of consecutive iterations with the same polynomial if there are many eigenvalues clustered very close to either the homo or the lumo eigenvalue. However, besides artificially constructed matrices we have not come across Fock or Kohn–Sham matrices that give more than a few subsequent iterations with the same polynomial. When the polynomials are chosen based on the location of the homo and lumo eigenvalues as in Algorithm 4, the number of subsequent iterations is strictly bounded. When Algorithm 4 is used without acceleration there can after an initial startup phase be at most two subsequent iterations with the same polynomial, as shown by the following theorem.

30

ACS Paragon Plus Environment

Page 30 of 48

Page 31 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Theorem 2. In Algorithm 4, let βup 6= 0, γup 6= 0, βlo = 0, and γlo = 0. Assume that pi 6= pi−1 . Then, if pi+1 = pi it follows that pi+2 6= pi+1 . Proof. We use here the notation

βi := βup

and γi := γup

(4.14)

in every iteration i of the recursive expansion. Assume that i is even. Without loss of generality we consider the case i = 2. Assume that p1 = 1. Then, β0 ≤ γ0 and the largest possible number of subsequent iterations with pk = 0, k ≥ 2 is obtained with β0 = γ0 . Then, following Algorithm 4 we have that γ1 = γ02 ,

(4.15)

β1 = 2β0 − β02 = 2γ0 − γ02 .

(4.16)

Since β1 > γ1 , p2 = 0 and γ2 = 2γ1 − γ12 = 2γ02 − γ04 ,

(4.17)

β2 = β12 = (2γ0 − γ02 )2 = 2γ02 − γ04 + 2γ02 (γ0 − 1)2 .

(4.18)

Then, since 2γ02 (γ0 − 1)2 > 0, we have that β2 > γ2 and p3 = 0. Therefore γ3 = 2γ2 − γ22 = 2(2γ02 − γ04 ) − (2γ02 − γ04 )2 = 4γ02 − 6γ04 + 4γ06 − γ08 ,

(4.19)

β3 = β22 = (2γ0 − γ02 )4 = 4γ02 − 6γ04 + 4γ06 − γ08 −4γ02 (2 − 11γ02 + 16γ03 − 10γ04 + 4γ05 − γ06 ) . |

{z γ3

}|

{z

Ck kf (Xk )k22

(A.9)

1 Ck ≥ kXk−1 k22 . 4

(A.10)

where

The value of Ck should be chosen the smallest possible, see the discussion in Section 2. If in addition we assume that the matrix A is normal, then for k ≥ 1 the absolute values of the eigenvalues of Xk are bounded from below by 1 and kXk−1 k22 ≤ 1. Thus for all k ≥ 1 we define 1 Ck := C = . 4

B

(A.11)

Stopping criteria for Newton’s method

Let f (x) be a twice continuously differentiable function, f (x∗ ) = 0, and f ′ (x∗ ) 6= 0. Then, Newton’s method xk+1 = xk + ∆xk ,

∆xk = −

f (xk ) f ′ (xk )

(B.1)

is locally quadratically convergent to x∗ . Assume that f ′ (xk ) 6= 0. Taylor expansion around xk with step ∆xk and using (B.1) and Lagrange’s form of the remainder gives

f (xk+1 ) =

f ′′ (ξk ) (f (xk ))2 ′ 2 2(f (xk ))

(B.2)

where ξk is some value between xk and xk+1 . Following the ideas of the present article, this

43

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 48

suggests the stopping criterion: stop as soon as |f (xk+1 )| > Ck |f (xk )|2

(B.3)

where Ck =

Fk ′ 2|f (x

2 k )|

,

Fk ≥

max

ξ between xk and xk+1

|f ′′ (ξ)|.

(B.4)

We note that if f ′ (xk ) comes close to zero Ck becomes very large and the stopping criterion is not triggered. Thus, there is no need for special treatment in such cases.

References (1) Roothaan, C. C. J. Rev. Mod. Phys. 1951, 23, 69–89. (2) Hohenberg, P.; Kohn, W. Phys. Rev. 1964, 136, B864–B871. (3) Kohn, W.; Sham, L. J. Phys. Rev. 1965, 140, 1133. (4) Bowler, D. R.; Miyazaki, T. Rep. Prog. Phys. 2012, 75, 036503. (5) Goedecker, S.; Colombo, L. Phys. Rev. Lett. 1994, 73, 122–125. (6) Higham, N. J. Functions of matrices: theory and computation; SIAM: Philadelphia, 2008. (7) Rubensson, E. H. SIAM J. Sci. Comput. 2012, 34, B1–B23. (8) Rubensson, E. H.; Rudberg, E.; Sałek, P. J. Chem. Phys. 2008, 128, 074106. (9) Rudberg, E.; Rubensson, E. H. J. Phys.: Condens. Matter 2011, 23, 075502. (10) Benzi, M.; Boito, P.; Razouk, N. SIAM Rev. 2013, 55, 3–64. (11) McWeeny, R. Proc. R. Soc. London Ser. A 1956, 235, 496–509.

44

ACS Paragon Plus Environment

Page 45 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(12) Palser, A. H. R.; Manolopoulos, D. E. Phys. Rev. B 1998, 58, 12704–12711. (13) Niklasson, A. M. N. Phys. Rev. B 2002, 66, 155115. (14) Rubensson, E. H. J. Chem. Theory Comput. 2011, 7, 1233–1236. (15) VandeVondele, J.; Borštnik, U.; Hutter, J. J. Chem. Theory Comput. 2012, 8, 3565– 3573. (16) Rudberg, E.;

Rubensson, E. H.;

Sałek, P. Ergo (Version 3.4). Available at

http://www.ergoscf.org (Accessed 14 June 2016). (17) Rudberg, E.; Rubensson, E. H.; Sałek, P. J. Chem. Theory Comput. 2011, 7, 340–350. (18) Bock, N.; Challacombe, M.; Gan, C. K.; Henkelman, G.; Nemeth, K.; Niklasson, A. M. N.; Odell, A.; Schwegler, E.; Tymczak, C. J.; Weber, V. FreeON. 2014; Los Alamos National Laboratory (LA-CC 01-2; LA-CC-04-086), Copyright University of California. (19) Qin, X.; Shang, H.; Xiang, H.; Li, Z.; Yang, J. Int. J. Quantum Chem. 2015, 115, 647–655. (20) Cawkwell, M. J.; Niklasson, A. M. N. J. Chem. Phys. 2012, 137, 134105. (21) Borštnik, U.; VandeVondele, J.; Weber, V.; Hutter, J. Parallel Comput. 2014, 40, 47–58. (22) Cawkwell, M. J.; Wood, M. A.; Niklasson, A. M. N.; Mniszewski, S. M. J. Chem. Theory Comput. 2014, 10, 5391–5396. (23) Chow, E.; Liu, X.; Smelyanskiy, M.; Hammond, J. R. J. Chem. Phys. 2015, 142, 104103. (24) Weber, V.; Laino, T.; Pozdneev, A.; Fedulova, I.; Curioni, A. J. Chem. Theory Comput. 2015, 11, 3145–3152. 45

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(25) Daniels, A. D.; Scuseria, G. E. J. Chem. Phys. 1999, 110, 1321–1328. (26) Mazziotti, D. A. Phys. Rev. E 2003, 68, 066701. (27) Shao, Y.; Saravanan, C.; Head-Gordon, M.; White, C. A. J. Chem. Phys. 2003, 118, 6144–6151. (28) Suryanarayana, P. Chem. Phys. Lett. 2013, 555, 291–295. (29) Note that only two phases were identified in the previous work of Rubensson and Niklasson. 30 Here, a third stagnation phase is identified. (30) Rubensson, E. H.; Niklasson, A. M. N. SIAM J. Sci. Comput. 2014, 36, B147–B170. (31) Rubensson, E. H.; Rudberg, E. J. Comput. Chem. 2011, 32, 1411–1423. (32) Holas, A. Chem. Phys. Lett. 2001, 340, 552–558. (33) Mazziotti, D. A. J. Chem. Phys. 2001, 115, 8305–8311. (34) Pulay, P. Chem. Phys. Lett. 1980, 73, 393. (35) Pulay, P. J. Comput. Chem. 1982, 3, 556. (36) Rubensson, E. H.; Rudberg, E.; Sałek, P. J. Comput. Chem. 2007, 28, 2531–2537. (37) Rubensson, E. H.; Zahedi, S. J. Chem. Phys. 2008, 128, 176101. (38) Arioli, M.; Duff, I.; Ruiz, D. SIAM J. Matrix Anal. Appl. 1992, 13, 138–144. (39) Arioli, M.; Georgoulis, E. H.; Loghin, D. SIAM J. Sci. Comput. 2013, 35, A1537– A1559. (40) Axelsson, O.; Kaporin, I. Numer. Linear Algebra Appl. 2001, 8, 265–286. (41) Frommer, A.; Simoncini, V. SIAM J. Sci. Comput. 2008, 30, 1387–1412.

46

ACS Paragon Plus Environment

Page 46 of 48

Page 47 of 48

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(42) Kaasschieter, E. BIT Numer. Math. 1988, 28, 308–322. (43) Bennani, M.; Braconnier, T. CERFACS, Toulouse, France, Tech. Rep. TR/PA/94/22 1994, (44) Golub, G. H.; Ye, Q. BIT Numer. Math. 2000, 40, 671–684. (45) Vömel, C.; Tomov, S. Z.; Marques, O. A.; Canning, A.; Wang, L.-W.; Dongarra, J. J. J. Comput. Phys. 2008, 227, 7113–7124. (46) Al-Mohy, A. H.; Higham, N. J. Numer. Algorithms 2009, 53, 133–148. (47) Deadman, E.; Relton, S. D. Linear Algebra Appl. 2016, 504, 354 – 371. (48) Mathias, R. SIAM J. Matrix Anal. Appl. 1993, 14, 1061–1063.

47

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Graphical TOC Entry 10 0

kXi − Xi2 k2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10

-2

10

-4

δX = 10−3 δX = 10−5 x2 2x − x2 After stop

10 -6 10 -8

1

5

10

δX = 10−8

15

Iteration

48

ACS Paragon Plus Environment

20

Page 48 of 48