Large Scale Electron Correlation Calculations: Rank-Reduced Full

Jun 11, 2018 - We apply this method in the context of complete active space configuration interaction calculations to acenes with 2-5 aromatic rings, ...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

Quantum Electronic Structure

Large Scale Electron Correlation Calculations: Rank-Reduced Full Configuration Interaction B. Scott Fales, Stefan Seritan, Nick F Settje, Benjamin G. Levine, Henrik Koch, and Todd J. Martínez J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00382 • Publication Date (Web): 11 Jun 2018 Downloaded from http://pubs.acs.org on June 14, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Large Scale Electron Correlation Calculations: Rank-Reduced Full

2

Configuration Interaction

3 4

B. Scott Fales,†,‡ Stefan Seritan, †,‡ Nick F. Settje, †,‡ Benjamin G. Levine,¶ Henrik Koch,§ Todd J. Martínez*,†,‡

5 6 7 8 9

†Department of Chemistry and the PULSE Institute, Stanford University, Stanford, CA 94305, United States ‡SLAC National Accelerator Laboratory, Menlo Park, CA 94305, United States ¶Department of Chemistry, Michigan State University, East Lansing, MI 48824, United States §Department of Chemistry, Norwegian University of Science and Technology, 7491 Trondheim, Norway

E-mail: [email protected]

10

We present the rank-reduced full configuration interaction (RR-FCI) method, a variational approach

11

for the calculation of extremely large full configuration interaction (FCI) wavefunctions. In this

12

report we show that RR-FCI can provide ground state singlet and triplet energies within kcal/mol

13

accuracy of full CI (FCI) with computational effort scaling as the square root of the number of

14

determinants in the CI space (compared to conventional FCI methods which scale linearly with the

15

number of determinants). Fast graphical processing unit (GPU) accelerated projected σ=Hc matrix-

16

vector product formation enables calculations with configuration spaces as large as 30 electrons in

17

30 orbitals, corresponding to an FCI calculation with over 2.4x1016 configurations. We apply this

18

method in the context of complete active space configuration interaction calculations to acenes with

19

2-5 aromatic rings, comparing absolute energies against FCI when possible and singlet/triplet

20

excitation energies against both density matrix renormalization group (DMRG) and experimental

21

results. The dissociation of molecular nitrogen was also examined using both FCI and RR-FCI. In

22

each case we found that RR-FCI provides a low cost alternative to FCI, with particular advantages

23

when relative energies are desired.

24

Fales – Rank-reduced Full Configuration Interaction – Page 1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

25

1. Introduction

26

The development of accurate and efficient methods that can describe electron correlation in

27

complex molecules is an ongoing effort in theoretical and computational chemistry. Full

28

configuration interaction (FCI), a linear expansion of configurations in the vector space defined by

29

single particle basis functions, represents the exact solution to the electronic structure problem in

30

the limit of a given basis set. Besides being formally exact, FCI is also conceptually simpler than

31

other high accuracy methods. Unfortunately, because of its high computational expense and

32

factorial scaling with respect to molecular size, FCI is rarely used for routine chemical

33

investigations. Instead, it is usually used as a benchmarking tool for quantifying the relative

34

accuracy of more tractable approximate methods.

35

Benchmark FCI calculations typically include all configurations arising from exciting any of

36

the electrons within the entire orbital space. Another use of FCI is in the context of active space

37

methods, where the configurations include all electronic excitations for a limited number of “active”

38

electrons in a limited number of “active” orbitals. This defines the complete active space

39

configuration interaction (CASCI) method. When the orbitals are also varied to minimize the

40

resulting energy, the method is termed complete active space self-consistent field (CASSCF)1-3 or

41

fully optimized reaction space (FORS).4 Both CASCI and CASSCF require solution of the FCI

42

problem in a user-defined configuration space (or active space). These multiconfigurational

43

approaches were developed to describe static electron correlation, where the electronic structure is

44

poorly characterized by a single Slater determinant. This situation often arises in polyradicaloid

45

systems that are encountered in many photochemical processes, in transition metal systems, and in

46

molecules where covalent bonds have been stretched far from the equilibrium distance.

47

FCI may be considered a mature electronic structure method. The development of direct CI

48

methods reduced memory requirements significantly, since these avoid explicit formation and

49

diagonalization of the full Hamiltonian matrix.5 The success of direct CI has motivated

50

development of numerical approaches designed for the problem of iterative matrix diagonalization.6

51

Additional progress by Siegbahn7 resulted in approaches even better suited for modern computing

52

by leveraging fast matrix-vector and matrix-matrix operations, and further refinement in the context

53

of vector processing machines has been performed.8-10 In FCI programs, Slater determinants are

54

often used en lieu of spin adapted configuration state functions due to the convenient factorization

55

of the determinantal CI vector into α- and β-strings and because the one-particle coupling Fales – Rank-reduced Full Configuration Interaction – Page 2 ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

56

coefficients become simple integers.11 Olsen’s determinant based algorithms for solving the FCI

57

and restricted active space CI (RASCI) problems enabled early calculations of unprecedented

58

size.12-13 Even when using cutting edge hardware and highly efficient algorithms, configuration

59

spaces for high-throughput calculations are still limited to approximately 1010 determinants.14 Thus,

60

the problem of high computational scaling with respect to the configuration space size places hard

61

limits on the dimension of problems that can be reasonably studied, both due to computational time

62

and memory constraints.

63

Efforts to improve scaling can be focused on either (or both) of the many-electron

64

configuration space or one-electron orbital space dimensions. These are of course linked, since the

65

size of the configuration space is a factorial function of the number of orbitals. However, the

66

2 distinction arises because the formal scaling of direct CI is O N det N o4 , where Ndet is the number of

67

configurations (determinants in our case) and No is the number of (active) molecular orbitals (MOs).

68

In practice, the sparsity of the Hamiltonian matrix reduces this to O N det N o4 . Algorithms having

69

smaller operation counts have been described, for example by Olsen,12 but these are difficult to

70

implement efficiently for vector machines. Approaches that attempt to improve the N o4 (orbital

71

space) scaling include pseudospectral methods,15-16 density fitting and Cholesky decomposition,17-22

72

and tensor hypercontraction.23-29 These approaches all decompose the fourth order tensor

73

representing the MO basis electron repulsion integrals as a product of lower-order tensors. This

74

decomposition can decrease both the scaling of the integral transformation and the computational

75

cost of the σ=Hc vector formation during the direct CI iterations. Unfortunately, the size of the

76

configuration space increases combinatorially with the number of orbitals, growing much faster

77

than the quartic scaling of the N o4 component. Thus, schemes aimed at improving the orbital space

78

scaling have been of limited utility so far. This motivates the development of methods that directly

79

reduce the scaling with respect to the size of the configuration space.

(

)

(

)

80

Direct CI algorithms based on Krylov subspace diagonalization rely on iteratively expanding

81

the subspace size with search directions determined by preconditioning the residual vector from the

82

previous iteration. This concept leads naturally to that of selected CI, where the search directions for

83

the addition of trial vectors to the subspace are determined through alternative approaches including

Fales – Rank-reduced Full Configuration Interaction – Page 3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

84

those based on perturbation theory30-33 and stochastics.34 Selected CI methods are being actively

85

developed as promising lower cost FCI alternatives.

86

Stochastic approaches to solving the FCI problem such as quantum Monte Carlo-FCI35-36

87

(QMC-FCI) have been demonstrated in some cases to provide accurate descriptions of correlated

88

fermionic systems. Unfortunately, uncontrolled errors resulting from the constraint imposed by the

89

antisymmetry principle (i.e., the Fermion sign problem) remain open challenges in QMC-FCI.37

90

Alternatively, variational methods based on tensor networks, such as density matrix renormalization

91

group38-39 (DMRG), provide high-accuracy deterministic solutions for both one-dimensional and

92

branched/tree-like systems using matrix product states and tree tensor network states,40 respectively.

93

Systems possessing higher orders of quantum entanglement, however, are still challenging problems

94

for DMRG.

95

Another productive approach for reducing the configuration space scaling relies on taking

96

advantage of sparsity in the CI vector. Examples include methods by Knowles,41-42 Mitrushenkov,43-

97

44

98

correction vectors to the CI subspace, resulting in a shorter CI vector, especially during early

99

iterations, and a reduction in operation count.

and Rolik.45 These approaches adaptively expand the linear search space during the addition of

100

Olsen et al. developed the low-rank CI (LR-CI) method using a spectral resolution of the CI

101

coefficient matrix46 to compress the wavefunction. In this work a second-order Newton-Raphson

102

scheme was used to solve the CI with single and double replacements (CISD) problem. LR-CISD

103

was applied to test systems including neon, nitrogen, and water, and systematic improvement of the

104

energy relative to full CISD was observed as higher-rank wavefunctions were employed. Lindh et

105

al. followed this work by implementing a low-rank multi-configuration self-consistent field (LR-

106

SCF) method47 and demonstrated that polarizabilities were better described using LR-SCF than with

107

LR-CI, though wavefunction convergence was hindered by strong coupling between orbital

108

rotations and single excitations. Attacking the CI problem from another angle, Koch and Dalgaard

109

formulated a variational matrix decomposition using non-linear optimization methods to solve the

110

FCI problem.48 A pilot version of a program performing the decomposition was applied to the

111

ground state of beryllium atom. Rapid convergence to the FCI energies within mH accuracy was

112

observed in this case after 25 iterations, and 95% of the correlation energy was obtained after only

113

10 iterations.

Fales – Rank-reduced Full Configuration Interaction – Page 4 ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

114

Both the LR-CI and variational matrix decomposition methods rely on the assumption that

115

information represented by the CI coefficients is redundant. An equivalent statement is that the

116

matrix form of the CI vector has low rank. Recently, Taylor reported a study where the singular

117

value decomposition (SVD) was used to quantify the effect of solving the linear CI equations using

118

low rank approximations to the current trial vector at each iteration with various accuracy

119

thresholds.49 We will refer to this as the low rank SVD (LR-SVD) method. Several representative

120

FCI cases were examined, including examples of both “stout” (typical of a CASSCF or a CASCI

121

calculation, where the number of electrons is similar to the number of orbitals) and “slim” (more

122

characteristic of a benchmark type calculation, where the number of orbitals is much larger than the

123

number of electrons) CI matrices. These naming conventions were derived from the shape of the

124

graphical representations of the corresponding CI spaces,50 where the slim graph is tall and narrow

125

and the stout graph is more symmetrical. In each case, Taylor demonstrated that while use of a

126

lower-rank trial vector at each iteration slowed convergence, the LR-SVD calculation ultimately

127

converged to the same eigenvector as standard FCI (to within a given residual threshold). Instead of

128

demonstrating the low rank of the converged CI vectors, however, it was observed that a relatively

129

large rank of the CI vector was required to achieve the desired accuracy. The high computational

130

3/2 cost of the SVD operation, N det versus ≈ N det for a σ build in standard direct CI, and absence of a

131

reduction in trial vector length in later LR-SVD iterations imply that the LR-SVD approach has few

132

advantages in large scale production calculations.

133

As described above, writing the CI vector (c) in matrix form (C), where the rows correspond to

134

α-strings and columns to β-strings permits a reduction in scaling by using a matrix decomposition.

135

In the present work, we extend the ideas of Koch and Dalgaard,48 where the product space vectors

136

are determined variationally through solution of a non-linear multi-scale eigenvalue problem (a type

137

of super-CI approach, similar to that which is often used in two-step CASSCF implementations51-

138

52

139

follows: in Section 2 we introduce the augmented eigenvalue problem for the product space vectors

140

(Section 2.1), describe projected σ formation (Section 2.2), and provide implementation details

141

including one-particle coupling coefficient formation (Section 2.3.1), eigenvalue problem solution

142

(Section 2.3.2), metric formation (Section 2.3.3), and projected σ formation (Section 2.3.4). Next

143

we demonstrate computational performance of the RR-FCI method (Section 3.2) before applying

), in developing a rank-reduced FCI (RR-FCI) formulation. The structure of the paper is as

Fales – Rank-reduced Full Configuration Interaction – Page 5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

144

RR-FCI to acenes having 2-5 polycyclic aromatic rings (Section 3.3) and to molecular nitrogen

145

dissociation (Section 3.5). We conclude with a brief review of the RR-FCI method before

146

discussing future directions of work.

147

2. Methods

148

The second-quantized electronic Hamiltonian is

(

1 Hˆ = ∑ hkl Eˆ kl + ∑ (ij| kl ) Eˆ ij Eˆ kl − δ jk Eˆ il 2 ijkl kl

149

)

(1)

150

where i,j,k,l index molecular orbitals, hij and (ij|kl) are the one- and two-electron integrals, and Eˆ is

151

a spin-averaged single-particle excitation operator.

152

Instead of building and diagonalizing the full Hamiltonian to determine the eigenvalues and

153

their corresponding eigenvectors, direct methods allow the iterative solution for a few of the lowest-

154

lying eigenvalues and eigenvectors. In the present work we focus on the calculation of the ground

155

state (lowest for a given spin) eigenvalue/eigenvector pair. Future work will explore the

156

determination of higher-lying (excited) states. The rate-limiting step in the direct CI procedure is the

157

formation of the matrix-vector product σ defined as

σ I = ∑ H IJ cJ

158

(2)

J

( ) operations, but modern algorithms exploit Hamiltonian

159

2 Formally, σ formation requires O N det

160

sparsity to achieve an effective scaling approaching O(Ndet). In this work we describe an approach

161

that reduces the formal scaling to O(Ndet) with an observed effective scaling of O

162

rate-limiting σ formation.

163

2.1 Eigenvalue Problem

(

N det

) for the

164

We begin by describing the variational decomposition approach taken by Koch and Dalgaard.

165

We will largely restrict our attention to singlet spin states in the following. In a determinantal basis,

166

the full CI wavefunction can be written as:

167

N det

Nα N β

I =1

J =1 K =1

ψ = ∑ d I Φ I = ∑ ∑ CJ K φJα ,φ Kβ

Fales – Rank-reduced Full Configuration Interaction – Page 6 ACS Paragon Plus Environment

(3)

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

168

where Nα and Nβ are the number of α- and β-strings and Ndet=NαNβ is the number of determinants,

169

labeled as Φ I

170

The wavefunction coefficients, C, expressed as a matrix according to the second equality in Eq. (3),

171

can be decomposed as:

with coefficients dI or as a product of α/β-strings φJα ,φ Kβ

Nr

N

( )

(

with coefficients CJK.

N

)

r r †  C = ∑ ci  P i Q i  ≡ ∑ ci P i ⊗ Q i = ∑ ci Ψ i   i i i

172

(4)

173

where Nr (the allowed rank, or equivalently, the number of product terms) is less than or equal to

174

the full rank of C, ci are product term coefficients, and P and Q are product state vectors of length

175

Nα and Nβ, respectively. In the following, we will adopt the convention of normalizing the Pi and Qi

176

vectors as follows:

177

Ψi = P i ⊗ Q i

178

Ψ i* Ψ i = Tr Q i ⊗ P i P i ⊗ Q i

((

)(

2

(5)

)) = P

i

2

2

Qi = 1

2

P i = Qi = 1

179

(6) (7)

180

The vectors P and Q correspond to coefficients of product space basis functions φα and φβ . This

181

provides us with an expression for the initial wavefunction, ψ 0 :

182

(8)

183

where PJ0 and QK0 are elements of P0 and Q0. Equivalently, we can also expand the initial

184

wavefunction in terms of the product space basis sets Nα N β

ψ 0 = ∑ ∑ φJα ,QK0 PJ0

185

(9)

K

J

Nα N β

ψ 0 = ∑ ∑ PJ0 ,φ Kβ QK0

186

J

187 188

(10)

K

The full wavefunction can be expressed as Nr

Nr

i

i

Nα N β

ψ = ∑ ci ψ i = ∑ ci ∑ ∑ φJα ,φ Kβ PJi QKi J

K

Fales – Rank-reduced Full Configuration Interaction – Page 7 ACS Paragon Plus Environment

(11)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 40

189

Applying the variational principle to the full wavefunction leads to coupled equations for the first

190

set of product terms: Nα N β

∑∑

191

J

∑∑ J

(12)

PJ ,φ Kβ ′ Hˆ − E PJ ,φ Kβ QK = 0 ∀ K ′

(13)

K

Nα N β

192

φJα′ ,QK Hˆ − E φJα ,QK PJ = 0 ∀ J ′

K

193

Addition of the second and subsequent product terms result in coupled augmented eigenvalue

194

problems:

195

      





ψ Hˆ − E ψ

ψ Hˆ − E φJα ,QK

K Nβ



φJα′ ,QK Hˆ − E ψ

K



∑ φ α ,Q J′

K

Hˆ − E φJα ,QK

K

196

for the P optimization and

197

      





ψ Hˆ − E ψ

ψ Hˆ − E PJ ,φ Kβ

J Nα



PJ ,φ Kβ ′ Hˆ − E ψ

J





PJ ,φ Kβ ′ Hˆ − E PJ ,φ Kβ

J

    a0    P  

   = 0    0 

(14)

    a0    Q  

   = 0    0  

(15)

198

for the Q optimization, where a0 is a scalar which couples the eigenvalue problems. As previously

199

described, this approach provides fast convergence to within mH accuracy of the FCI energy.

200

Unfortunately, this formulation is prone to spin contamination and only singlet states may be

201

described.48 We can remedy both of these issues by modifying this ansatz to be a linear combination

202

of products of P and Q vectors.

203

Ψi = P i ⊗ Q i ± Q i ⊗ P i

(16)

204

where the singlet corresponds to the symmetric case (+) and the triplet to the antisymmetric case

205

(−). This linear combination requires P and Q to be of length Nstr, where Nstr=Nα=Nβ is the number

206

of α- or β-trings for the restricted case described here. From this point on, we will assume that

207

Nα=Nβ, and leave it to future work to generalize the formalism further. While wavefunction

208

symmetry/antisymmetry alone does not guarantee that the final eigenvector is a spin eigenfunction

209

(quintet states can in principle contaminate the singlet states, for example), we find that in practice

Fales – Rank-reduced Full Configuration Interaction – Page 8 ACS Paragon Plus Environment

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

210

this is not a serious issue. We have performed a detailed analysis of the expectation value of S2 for

211

the RR-FCI wavefunction in several test cases and refer the interested reader to the Supporting

212

Information. The coupled equations for the initial set of product space vectors then become N str

∑  (φ α ,Q

213

J′

K

J ,K

) ± (Q

K

)

(

) (

)

,φJβ ′ Hˆ − E φJα ,QK ± QK ,φJβ  PJ = 0 ∀J ′ 

N str

∑  ( P ,φ β ) ± (φ α ,P ) Hˆ − E ( P ,φ β ) ± (φ α ,P

214

J

K′

K′

J

J

K

K

J

J ,K

)  Q

215

giving the eigenvalue problem for the subsequent product terms as

216

     

N

str





) (

)

N

str



β ,Q K ± Q K ,φ J ′ Hˆ − E ψ J′ α

K

     

)



α J′

) (

)

(

) (

β α β ,QK ± QK ,φ J ′ Hˆ − E φ J ,QK ± QK ,φ J

218

     

N



ψ Hˆ − E ψ

str

(

) (

)

)

(

) (

β α ψ Hˆ − E PJ ,φ K ± φ K ,PJ

J N

)

K

for the P optimization and

∑ str

( P ,φ ) ± (φ J

β

α

K′

K′

)

N



, PJ Hˆ − E ψ

J

str

( P ,φ ) ± (φ J

β

α

K′

K′

β α ,PJ Hˆ − E PJ ,φ K ± φ K , PJ

J

)

(18)

a 0

P

  =  

0 0

  (19) 

     a 0  =  0  (20)  Q   0      

for the Q optimization. The full wavefunction can be expressed as Nr

N str

i

J ,K

(

) (

)

ψ = ∑ ci ∑ φJα ,φ Kβ ± φ Kα ,φJβ PJi QKi

220

221

) (

α β ψ Hˆ − E φ J ,QK ± QK ,φJ

∀K ′

K N

217

219

(

str



ψ Hˆ − E ψ

=0

K

(17)

(21)

2.2 Projected  Formation

222

Since our factorization scheme relies on separation of the  and  components of the CI vector,

223

a reasonable starting point for  formation is Olsen’s CI equations,12 where  is separated into three

224

terms

σ = σ1 + σ 2 + σ 3

225 226 227

(22)

with

(σ )

1 J ′K

Nα No

= ∑ ∑γ J

kl

N

JJ ′ kl

hkl′ CJ K

N

1 α o + ∑ ∑ γ ijJM γ klM J ′ (ij| kl )CJ K 2 J ,M ijkl

Fales – Rank-reduced Full Configuration Interaction – Page 9 ACS Paragon Plus Environment

(23)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

228

(σ )

2 JK ′

Nβ N o

N

= ∑ ∑γ

KK ′ kl kl

h ′ CJK

kl

K

N

1 β o + ∑ ∑ γ ijKM γ klM K ′ (ij| kl )CJ K 2 K ,M ijkl

(24)

Nα N β N o

(σ )

229

Page 10 of 40

3 J ′K ′

= ∑ ∑ ∑ γ ijK K ′γ klJJ ′ (ij| kl )CJ K J

(25)

K ijkl

230

where J,J’ index α occupation strings, K,K’ index β occupation strings, M indexes both α and β

231

strings, and γ ,γ are the one-particle coupling coefficients defined as

232

γ ijJ J ′ = φJα Eˆ i j φJα′

(26)

233

γ ijK K ′ = φ Kβ Eˆ i j φKβ ′

(27)

234

with Eˆ ij a single particle generator either between α occupation strings J and J’ or β occupation

235

strings K and K’ corresponding to an excitation between orbitals i and j. In the case we consider

236

here, where Nα=Nβ, the α/β coupling coefficients are equal:

γ ijK K ′ = φ Kβ Eˆ ij φ Kβ ′ = γ ijK K ′ = φ Kα Eˆ ij φ Kα ′

237

(28)

238

Note that in Eqs. (23)-(25), we have combined the one-electron part of the two-electron integrals

239

(ij|kl) with the one-electron integrals hkl for computational efficiency No

hkl′ = hkl − ∑ (kj| jl )

240

(29)

j

241

Two different types of σ formations correspond to Equations 12 and 13. Projection onto the P

242

vector space results in Nα N β

σ = ∑ ∑ QK φJα′ ,φ Kβ Hˆ φJα ,φ Kβ PJ QK P J′

243

J

244

(30)

K

while projection onto the Q vector space gives

σ

245

Nα Nβ

Q K′

= ∑ ∑ PJ φJα ,φKβ ′ Hˆ φJα ,φKβ PJ QK J

(31)

K

246

We can think about projected σ formation in terms of operations that span the full determinantal

247

space. Figure 1 depicts how a standard CI program can be used to form the projected σ vector

248

corresponding to Eq. (31).

Fales – Rank-reduced Full Configuration Interaction – Page 10 ACS Paragon Plus Environment

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

249

Frame 1 from Figure 1 defines formation of the rank 1 full CI vector C from the P and Q

250

vectors. The outer product of the P and Q vectors forms a matrix CJK. Expanding this matrix

251

produces the standard full CI vector, C. The second frame corresponds to formation of the σ vector,

252

which is then folded into matrix form, σJ’K’. Finally, σJ’K’ is then projected by the Q vector in

253

Frame 3 to give the projected σ vector, σP. While this approach assists in understanding the nature

254

of the data structures involved in the algorithm, direct formation of the factorized σ vector is needed

255

to avoid unnecessary computation and storage.

256

Factorizing the projected σ results in 3 terms

(σ )

P

1 J′

257

N N  N o JJ ′  1 str o J M M J ′ = ∑ ∑ QK  ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ QK 2 M ijkl J K  kl  N str N str

N N  N str  N str  N o J J  1 str o JM M J ′ ′ =  ∑ QK QK  ∑  ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ 2 M ijkl  K  J  kl 

(σ )

P

2 J′

258

N N  N o KJ ′  1 str o KM M J ′ = ∑ ∑ QJ  ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ QK 2 M ijkl K J  kl  N str N str

N N  N str  N str  N o K J  1 str o KM M J ′ ′ =  ∑ PJ QJ  ∑  ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) QK 2 M ijkl  J  K  kl  N str N str

No

(σ ) = ∑ ∑ Q ∑γ P

3 J′

259

260

J

K ,K ′

K′

(33)

γ klJJ ′ (ij| kl )PJ QK

KK ′ ij

ijkl

   N str J J ′  KK = ∑  ∑ QK ′γ i j QK   ∑ γ kl PJ  (ij| kl ) ij kl  K ,K ′  J  No

N str

(34)

for the P factorization and

(σ )

Q

1 K′

261

N N  N o JK ′  1 α o JM MK ′ = ∑ ∑ PK  ∑ γ kl hkl′ + ∑ ∑ γ i j γ kl (ij| kl ) PJ QK 2 M i jkl K J  kl  N str N str

N N  N str  N str  N o J K  1 str o J M M K ′ ′ =  ∑ PK QK  ∑  ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ 2 M ijkl  K  J  kl 

(σ )

Q

2 K′

262

(32)

N N  N o KK ′  1 str o KM M K ′ = ∑ ∑ PJ  ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ QK 2 M ij kl J K  kl 

(35)

N str N str

N N  N str  N str  N o K K  1 str o KM M K ′ ′ =  ∑ PJ PJ  ∑  ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) QK 2 M ijkl  J  K  kl 

Fales – Rank-reduced Full Configuration Interaction – Page 11 ACS Paragon Plus Environment

(36)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

( ) σ3

263

Q K′

=

N str N str

No

J ,J ′ K

ijkl

Page 12 of 40

∑ ∑ PJ ′ ∑ γ ijKK ′γ klJJ ′ (ij| kl )PJ QK

 N str   N str  = ∑  ∑ γ iKj K ′QK   ∑ PJ ′γ klJJ ′ PJ  (ij| kl ) ijkl  K   J ,J ′  No

(37)

264

for the Q terms. Since we will focus on the restricted case where Nα=Nβ, we will not distinguish

265

between  and  for the remainder of this work. Note that this restriction also leads to a loss of the

266

correspondence between α/β and J/K indices in Eqs. (32)-(37). For each projected σ2 formation

267

three state vectors are required in addition to the current trial vector. For the P optimization, for

268

example, the current Q vector is needed in addition to each of the previously converged P and Q

269

vectors, and the P being optimized is taken to be the current trial vector.

270

2.3 Computational Methods

271

2.3.1 Coupling Coefficient Formation

272

Data structures required for evaluation of σ include the one-particle coupling coefficients , the

273

orbital labels , and the configuration labels . When possible, our program constructs and stores

274

these lists in memory at the beginning of the calculation rather than computing elements on-the-fly.

275

Forming these lists for configuration spaces typical of direct CI calculations is a routine task and

276

does not require exceptional computational effort. In contrast, the large configuration spaces

277

accessible to RR-FCI require extremely large lists. To handle these cases we have developed an

278

efficient GPU accelerated algorithm listed in Algorithm 1 for construction of the one-particle

279

coupling coefficients and related data structures in a lexical (and dense) format. The approach

280

described here is also well suited for forming these lists on-the-fly if desired.

281

The outer loop of Algorithm 1 can be tiled to enable vectorization over multiple GPUs and/or

282

compute nodes. The stradr() function is an implementation of Equations 11 and 12 from Knowles

283

and Handy,8 where the binomial coefficients are retrieved from a precomputed list (i.e. Pascal’s

284

triangle) rather than calculated on-the-fly. The get_parity() function is an abstraction that can be

285

implemented trivially in several ways (we compute the value inline). This algorithm produces dense

286

data structures in the minimum number of operations, (Nstr*Ne(No-Ne)), where Ne and No are the

287

number of  electrons and orbitals in the configuration space, respectively. Even so, larger

288

configuration spaces require storage on the order of hundreds of GB. For the reader’s convenience,

289

Table 1 provides configuration space and array sizes for some of the CAS spaces used in this study. Fales – Rank-reduced Full Configuration Interaction – Page 12 ACS Paragon Plus Environment

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

290

Journal of Chemical Theory and Computation

2.3.2 Eigenvalue Problem

291

Our program solves a nonlinear eigenvalue problem using a two-step Davidson approach,

292

where we iterate over P and Q vector optimization until they are self-consistently converged. The

293

procedure we use is given graphically in Figure 2. The inner (micro) iterations are performed using

294

a generalized Davidson algorithm to solve the non-orthonormal subspace eigenvalue problem. The

295

outer (macro) iterations are considered converged when the micro-iterations for each of the P and Q

296

vectors converge in a single step. We “balance” the eigenvalue problem by scaling the projected

297

metric matrix as recommended previously.53-54 A residual norm convergence criterion of

298

r = 1.0 ×10−6 is used for the micro-iterations.

299

To begin the iterative procedure for the P and Q vector optimizations we must provide a guess

300

vector for each. In the first macro-iteration, the guess for each of P0 and Q0 is the vector

301

corresponding to the Hartree-Fock determinant, i.e. , where the first vector element

302

corresponds to the coefficient a0 that couples the P and Q eigenvalue problems. In the second and

303

subsequent macro-iterations the guess for the P vector becomes , which corresponds

304

to the previously converged P,Q pairs being used as the Pi guess. The Qi guess corresponds to a unit

305

vector in the direction of the i th macro-iteration, i.e. if we are adding the third P,Q pair, the Q

306

guess is . This permits efficient sampling of the configuration space while avoiding

307

linear dependence of the added trial vectors. Other guesses for the Q can be conceived; the original

308

implementation used a uniform Q guess, i.e.

309

guesses derived from projected residual vector formation (similar to typical trial vector formation in

310

standard Davidson algorithms) appear promising. In our tests we investigated all three of these

311

options. Interestingly, the unit guess approach (uniformly sampling the configuration space) results

312

in the best convergence, with the other approaches producing linearly dependent trial vectors

313

shortly after mH accuracy is obtained.

0,1

N str ,1

N str ,1

N str ,... , and Q vector

314

Preconditioning of the Davidson routine is performed using the reference determinant energy

315

modified by orbital energy differences as described by Evangelisti.55 Advantages of orbital energy

316

difference preconditioning are that spin contamination is not introduced into the trial vectors and

317

convergence is generally similar to that observed when using exact diagonal energies. Convergence

318

of the RR-FCI micro-iterations generally requires 10 − 30 iterations. To test the propensity for spin

319

contamination of each of the ansatzes described in Equations 11 and 21 we have reconstructed the Fales – Rank-reduced Full Configuration Interaction – Page 13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

320

final wavefunction | Ψ〉 for a variety of manageable configuration spaces (i.e. 16 electrons in 16

321

orbitals and smaller). The spin purity of the full wavefunction | Ψ〉 and for each contributing P,Q

322

pair was assessed by directly calculating the expectation value of S2. In doing so we confirmed the

323

presence of spin contamination in the ansatz described in Equation 11 and the absence of spin

324

contamination in the wavefunction defined in Equation 21.

325

Metric Formation

326

Since we are solving a generalized (non-orthonormal) eigenvalue problem it is necessary to

327

build the subspace metric matrix/trial vector product corresponding to Equations 19-20. We form

328

the projected metric according to

329

   BP =    

N str



ψψ

K N str

∑ K



α J′

) (

β

)

,QK ± QK ,φJ ′ ψ

330

for the P vector optimization and

331

   Q B =   

N str

∑ K



α J′

) (

N str

∑ K

∑ K

( P ,φ ) ± (φ J′

β

α

K′

K′

)

,PJ ′ ψ

N str

∑ K

) (

(

) (φ

α J

) (

) (

β

α

K′

K′

,PJ ′

)

)

) ( P ,φ ) ± (φ J

    a0    P  

)

,QK ± QK ,φJβ

ψ PJ ,φ Kβ ± φ Kα ,PJ

( P ,φ ) ± (φ J′

β

ψ φJ ,QK ± QK ,φJ

,QK ± QK ,φJβ ′

ψψ N str

(

α

β

α

K

K

,PJ

)

  (38) 

    a0       Q   

(39)

332

for the Q vector optimization. The algorithm for computing the metric is low enough scaling that

333

we have implemented it in serial on the host (CPU). Details are given in Algorithm 2. Projected

334

metric formation requires double precision dot product (DDOT) and y = Ax + y (DAXPY)

335

operations, both of which are performed using optimized linear algebra library routines.56

336

Projected Sigma Formation

337

We have developed projected σ formation algorithms based on a hybrid of the Olsen

338

factorization12 and the Knowles and Handy FCI algorithm.8 A configuration label driven scheme for

339

formation of σP is described in Algorithms 3 – 5. Orbital label driven algorithms which more

340

closely resemble the Olsen CI factorization are presented in Supporting Information.

341

The GPU kernels in the above algorithms strongly resemble those described in our previous

342

direct CI work.10 By forming the projected σ vector directly we benefit from a significant reduction Fales – Rank-reduced Full Configuration Interaction – Page 14 ACS Paragon Plus Environment

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

343

in scaling: in RR-FCI we are able to eliminate the loop over β strings entirely. Further, direct CI

344

kernels that contribute to the σ array must use atomic operations to avoid data collisions. While

345

hardware support for atomics ameliorates some of the computational overhead associated with their

346

use, performance still suffers relative to algorithms that can avoid atomic operations altogether. The

347

RR-FCI projected σ algorithm described above is able to restrict atomic operations to the σ1 and σ2

348

terms, using DAXPY and DGEMV operations to form the σ3 term. Each of the PQT and the QP T

349

projected σ vector combinations can be formed (for both P and Q vector optimizations) using these

350

algorithms and interchanging the ordering and identities of the input vectors. In practice, we tile the

351

loop indices to allow for configuration spaces of arbitrary size (subject to memory constraints

352

imposed by the length of the product space vectors). We avoid description of our tiling scheme in

353

Algorithms 3–5 for clarity, and instead present this information in more detail in the Supporting

354

Information. Due to the high cost associated with CPU host to GPU device memory transfers we

355

select the largest tile size possible (limited by GPU device memory) to minimize incurring memory

356

transfer penalties.

357

Results and Discussion

358

Computational Details

359

The RR-FCI method was implemented in the TeraChem GPU accelerated electronic structure

360

package using the Compute Unified Device Architecture (CUDA) API57 and the NVidia CUDA

361

Basic Linear Algebra Subroutines library (cuBLAS).58 Benchmark calculations were performed

362

using a single core of an Intel Xeon E5-2699 @2.2GHz and a single NVidia K40c GPU.

363

Configuration spaces are defined according to the notation (X,Y), where X corresponds to the

364

number of electrons and Y to the number of orbitals. Geometries for all molecules used in this work

365

are reported in Cartesian coordinates in the Supporting Information. The cc-pVDZ basis set (in

366

Cartesian form including contaminants) is used in all calculations unless stated otherwise.

367

Algorithm Performance

368

The computational performance of σ vector formation depends solely on the number of active

369

electrons and orbitals. Benchmark CASCI (RR-FCI and conventional FCI) calculations were

370

performed on ethylene dimer using molecular orbitals from a preceding Hartree-Fock calculation.

371

The active spaces range from (10,10) through (30,30) for RR-FCI and up to (16,16) for Fales – Rank-reduced Full Configuration Interaction – Page 15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

372

conventional FCI. Results are presented in Figure 3 and in Table 2. For comparison we also present

373

GPU accelerated σ formation timings from our conventional GPU-accelerated FCI program.10

374

A linear regression fit of observed computational times shows that the conventional FCI σ

375

formation scales linearly with the number of determinants, with a scaling exponent of 1.079. This is

376

in agreement with previously reported results for computational scaling of direct CI σ vector,10 two-

377

particle reduced density matrix59 and SA-CASSCF direct Hessian matrix60 formation. In contrast,

378

RR-FCI projected σ vector formation scales with the square root of the number of determinants,

379

with an observed scaling exponent of 0.543. The RR-FCI approach is more computationally

380

efficient than FCI by a factor approaching

N det .

381

Table 2 provides detailed timings for the most time-consuming portions of RR-FCI calculations

382

with active spaces ranging from (18,18) through (30,30). Coupling coefficient and other list

383

formation requires less computational effort than σ formation in every case. To reduce the storage

384

requirements for the largest lists we considered forming elements of these lists as needed during σ

385

formation, but even considering CPU–GPU memory transfer overhead it is advantageous to

386

precalculate and store these lists when possible. If one desires to use larger configuration spaces

387

than system memory allows, these lists may be formed directly during the σ formation at a

388

computational cost proportional to a standard σ formation.

389

Projected metric formation incurs only marginal computational expense for even the largest

390

configuration spaces. Instead, as expected, the rate-limiting step is σ formation. Timings for each

391

function comprising σ formation show that for every configuration space the matrix-matrix multiply

392

is the most expensive step. This step is performed using dense matrix operations, and analysis of

393

sparsity in the intermediate matrices and two-electron integrals may offer additional performance

394

advantages. Each of σ1 and σ3 formation requires multiple (2) matrix multiplication operations

395

(DGEMMs) while σ2 formation requires only a single DGEMM. This, combined with the increased

396

number of GPU kernels (D and σ matrix formations) required, results in σ2 being less

397

computationally expensive than σ1 and σ3. Additional details related to this may be found in the

398

Supporting Information.

399

Memory transfer between the host (CPU) and device (GPU) becomes increasingly important

400

with larger configuration spaces. Active spaces smaller than (24,24) show negligible contribution

401

from CPU–GPU memory transfer, as their data structures reside entirely in GPU memory and have Fales – Rank-reduced Full Configuration Interaction – Page 16 ACS Paragon Plus Environment

Page 16 of 40

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

402

dimension smaller than the tile size, but at configuration spaces of (24,24) and larger the high cost

403

of CPU–GPU memory transfer becomes apparent, comprising nearly 30% of the overall cost of the

404

projected σ formation for the (30,30) active space. The GPU kernels corresponding to Dij and σij

405

formation are the next most costly operations, followed by the remaining linear algebra operations

406

(DGEMV, DDOT, DAXPY). Finally, the largest configuration space required an extremely small

407

tile size (1024), and this is reflected in both the list and the σ formation times accordingly.

408

Acene Absolute Energy Convergence

409

Linear polycyclic aromatic hydrocarbons, or acenes, possess interesting electronic properties

410

making them suitable for incorporation into organic semiconductor materials. Correlation of the π

411

valence electrons provides a reasonably accurate description of the electronic structure, making

412

complete active space methods such as CASSCF or CASCI well suited for these systems. We have

413

calculated the ground singlet and triplet states for napthalene and anthracene using both CASCI and

414

RR-CASCI with full- π valence active spaces—(10,10) and (14,14) for naphthalene and anthracene,

415

respectively. Energy differences between the rank-reduced and exact (FCI) approaches are shown in

416

Figures 4 and 5. In addition to their interesting electronic characteristics, the calculations described

417

in this section are examples of configuration spaces commonly referred to as “stout” CI, where the

418

number of orbitals is similar to the number of electrons. Geometries for the singlet and triplet

419

systems described in the following study were optimized using UB3LYP/6-31G(d) and are the same

420

as those reported by Hachmann et al.61 For the reader’s convenience they are available in the

421

Supporting Information in Cartesian coordinates.

422

The RR-FCI energy rapidly converges to within mH accuracy of the FCI energy after 30

423

product terms, representing inclusion of rank structure of the CI vector having the largest influence

424

on the final energy. Once an accuracy of 1.0 × 10 −4 H is obtained, convergence slows considerably

425

and displays asymptotic behavior. Including 252 product states corresponds to the largest possible

426

rank of the CI vector, and we obtain a final accuracy approaching 1.0 × 10 −5 H. Since we solve a

427

nonorthogonal eigenvalue problem, product terms comprised of P and Q vectors may contain

428

redundant information. This occurs because we do not reoptimize P and Q vectors added in

429

previous iterations. This impedes convergence to the exact FCI solution when the number of

430

product terms included equals the FCI vector rank.

Fales – Rank-reduced Full Configuration Interaction – Page 17 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

431

The (14,14) CAS space of anthracene approaches the upper size limit of routinely performed CI

432

calculations. Similar to the naphthalene calculation, mH accuracy is achieved relatively quickly,

433

requiring fewer than 100 product terms (representing < 3% of the full rank of the CI vector, 3432 in

434

this case). Following addition of ∼ 150 product terms, convergence behavior decays before

435

asymptotically approaching an accuracy of 1.0 × 10 −4 H. Irregularities in the convergence at ∼ 350

436

and ∼ 800 product terms are likely artifacts of our guess vector procedure.

437

Relative Energy Convergence: Acene Singlet-Triplet Gaps

438

In addition to absolute energy convergence of the shortest acenes compared with FCI, we have

439

also investigated the singlet-triplet energy gap for the acene series having 2–5 rings. Singlet-triplet

440

energy gaps for each system are depicted in Figure 6. While absolute energy convergence to within

441

mH accuracy requires ∼ 30 product terms for naphthalene and ∼ 100 product terms for anthracene,

442

convergence of a relative property, in this case the singlet-triplet gap, requires fewer product terms

443

to achieve similar convergence. The S0 − T0 gap for naphthalene is converged after only 12 product

444

terms. Only slightly worse convergence is observed for the longer acenes, with anthracene,

445

tetracene, and pentacene requiring 15, 20, and 24 product terms to converge, respectively. A

446

comparison of our RR-CASCI energies with the local DMRG results of Hachmann is given in Table

447

3, along with estimates of the experimental singlet-triplet gaps for each system.62-65

448

Even with the disparity in CAS spaces between the RR-CI and DMRG results reported above,

449

the agreement between the two methods is surprisingly good. In each case, RR-CI lies within ∼ 4

450

kcal/mol of both the DMRG and the estimated experimental energies. Even more impressive is how

451

few product terms as a function of full CI space rank are required to obtain this level of agreement:

452

naphthalene requires 4.7% of the full rank, anthracene 0.4%, tetracene 0.04% and pentacene only

453

0.003%. These results are encouraging and motivate us to pursue refinement and continued

454

development of the RR-FCI approach.

455

Nitrogen Bond Dissociation

456

FCI is often used as a benchmarking tool for evaluating lower-cost approximations. A common

457

benchmark is the computation of a potential energy curve for dissociation of a multiple bond, which

458

can be particularly difficult since both static and dynamic electron correlation often contribute. We

459

applied the RR-FCI and FCI methods to molecular nitrogen, varying the bond length from 0.8 to 1.8

460

Å. The RR-FCI and FCI results are compared in Figure 7. All 10 valence electrons are correlated, Fales – Rank-reduced Full Configuration Interaction – Page 18 ACS Paragon Plus Environment

Page 18 of 40

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

461

corresponding to a (10,28) configuration space. Conventional FCI calculations were performed

462

using the Molpro electronic structure program.66-67

463

While the configuration space corresponds to 9.6 billion determinants (without symmetry),

464

excellent agreement between RR-FCI and FCI is observed using only 100 product terms. Energy

465

differences [ ∆E = ( ERR − FCI ) − ( EFCI ) ] between the two methods are provided in the inset of Figure

466

7. It is interesting to note that RR-FCI calculations at or near the equilibrium geometry converge

467

more rapidly than the stretched geometries, as can be seen by the higher energies relative to exact

468

FCI when the nitrogen-nitrogen distance is greater than 1.2 Å. Increasing the number of product

469

terms for the stretched geometry calculations continues to reduce the errors observed relative to the

470

FCI energy. This demonstrates that our ansatz treats static and dynamic correlation in an unbalanced

471

fashion. We are currently investigating this behavior with the intention of improving the treatment

472

of static correlation. The low cost-to-accuracy ratio of RR-FCI makes it an ideal tool for use in

473

performing benchmark type calculations for evaluation of lower-cost alternatives.

474

Conclusions

475

We have presented a low-cost FCI alternative for both singlet and triplet ground states with

476

computational cost that scales as

477

large active spaces. GPU acceleration of rate-limiting components of the algorithm, including

478

projected σ and one-particle coupling coefficient formation, expands the size of accessible

479

configuration spaces to

480

our methods to the full π valence space CI calculations of acenes having 2 − 5 polycyclic aromatic

481

rings, demonstrating excellent agreement for both absolute and relative energies compared to

482

conventional FCI. Finally, we investigated dissociation of molecular nitrogen, and found that our

483

results compare favorably against benchmark FCI studies. Given the success of RR-FCI in

484

describing the ground state singlet and triplet wavefunctions, we are currently working to extend

485

our approach to allow for excited state calculations and the evaluation of molecular properties.

486

Acknowledgments

N det , allowing routine calculation of CASCI wavefunctions for

determinants while achieving sub-mH accuracy. We have applied

487

This material is based upon work supported by the U.S. Department of Energy, Office of

488

Science, Office of Advanced Scientific Computing Research, Scientific Discovery through

Fales – Rank-reduced Full Configuration Interaction – Page 19 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

489

Advanced Computing (SciDAC) program. BGL acknowledges support from the National Science

490

Foundation (CHE-1565634). Computational resources were provided through the Extreme Science

491

and Engineering Discovery Environment68 (Grant No. ACI-1548562, allocation TG-CHE140101)

492

and the XStream computational resource supported by the NSF MRI program (Grant No. ACI-

493

1429830). SS acknowledges support from an NSF Graduate Fellowship. HK acknowledges support

494

from the Norwegian Research Council through FRINATEK project no. 263110/F20. TJM is a

495

cofounder of PetaChem, LLC.

496

Fales – Rank-reduced Full Configuration Interaction – Page 20 ACS Paragon Plus Environment

Page 20 of 40

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

497 498 499 500 501

Figure 1. Pictorial representation of projected σ formation using a full CI program. Subscripts correspond to basis indices of determinant space vectors represented as matrices in product space. Array dimensions are provided in the uppermost panel where Nα and Nβ are the numbers of α- and β-strings and Nr is the rank of the decomposition (number of product terms in RR-FCI).

502 503

Fales – Rank-reduced Full Configuration Interaction – Page 21 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

504 505 506

Figure 2. RR-FCI program flow. Note that in the first iteration, Eqs. 19 and 20 are simplified because |Ψ  0.

507 508 509

Fales – Rank-reduced Full Configuration Interaction – Page 22 ACS Paragon Plus Environment

Page 22 of 40

Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

510 511

512 513 514 515 516 517

Figure 3. CASCI σ vector formation times in seconds for ethylene dimer RR-FCI (orange circles) and conventional FCI (green triangles). Basis set is cc-pVDZ in both cases. RR-FCI active spaces range from (10,10) through (30,30) and FCI active spaces from (10,10) through (16,16). Linear regression was performed to determine the scaling exponent and is shown as a solid (RR-FCI) or a dashed (FCI) line.

518

Fales – Rank-reduced Full Configuration Interaction – Page 23 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

519 520 521 522 523

Figure 4. Energy error (relative to conventional CASCI, in Hartree) for RR-FCI with increasing number of product terms in napthalene. Lowest singlet state (brown) and lowest triplet state (blue) are shown using CAS(10,10)-CI with cc-pVDZ basis set. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively.

524

Fales – Rank-reduced Full Configuration Interaction – Page 24 ACS Paragon Plus Environment

Page 24 of 40

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

525

526 527 528 529 530

Figure 5. Energy error (relative to conventional CASCI, in Hartree) for RR-FCI with increasing number of product terms in anthracene. Lowest singlet state (brown) and lowest triplet state (blue) are shown using CAS(14,14)-CI with cc-pVDZ basis set. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively.

531

Fales – Rank-reduced Full Configuration Interaction – Page 25 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

532

533 534 535 536

Figure 6. Singlet-triplet energy gap (in units of kcal/mol) for the acene series naphthalene through pentacene using RR-CASCI with the cc-pVDZ basis set. In each case, the active space comprises all π electrons in all valence π orbitals.

537

Fales – Rank-reduced Full Configuration Interaction – Page 26 ACS Paragon Plus Environment

Page 26 of 40

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

538

539 540 541 542 543 544

Figure 7. Molecular N2 dissociation curve calculated with RR-FCI and conventional FCI using the cc-pVDZ basis set. RR-FCI calculations include 100 product terms each. Energy differences (in Hartree) between RR-FCI and FCI are given in the inset. Units for the inset are the same as the main plot. Time of a particular Davidson micro-iteration for N2 (RNN=0.7938 Å) is 0.72 s, and optimization of the first and second product terms required 1127.47 and 605.28 s, respectively.

545

Fales – Rank-reduced Full Configuration Interaction – Page 27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

546 547

Fales – Rank-reduced Full Configuration Interaction – Page 28 ACS Paragon Plus Environment

Page 28 of 40

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

548 549 550 551

Fales – Rank-reduced Full Configuration Interaction – Page 29 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

552 553 554 555 556

Fales – Rank-reduced Full Configuration Interaction – Page 30 ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

557 558

Fales – Rank-reduced Full Configuration Interaction – Page 31 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

559 560

Fales – Rank-reduced Full Configuration Interaction – Page 32 ACS Paragon Plus Environment

Page 32 of 40

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

561 562 563

Journal of Chemical Theory and Computation

Table 1. Active space sizes and memory requirements for supporting data structures including orbital labels and coupling coefficients (in GB). Note that storage for P, Q, σ, and Davidson trial vectors are not included.

CAS Size Strings Memory Usage (GB) (18,18) 48,620