Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES
Quantum Electronic Structure
Large Scale Electron Correlation Calculations: Rank-Reduced Full Configuration Interaction B. Scott Fales, Stefan Seritan, Nick F Settje, Benjamin G. Levine, Henrik Koch, and Todd J. Martínez J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00382 • Publication Date (Web): 11 Jun 2018 Downloaded from http://pubs.acs.org on June 14, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
1
Large Scale Electron Correlation Calculations: Rank-Reduced Full
2
Configuration Interaction
3 4
B. Scott Fales,†,‡ Stefan Seritan, †,‡ Nick F. Settje, †,‡ Benjamin G. Levine,¶ Henrik Koch,§ Todd J. Martínez*,†,‡
5 6 7 8 9
†Department of Chemistry and the PULSE Institute, Stanford University, Stanford, CA 94305, United States ‡SLAC National Accelerator Laboratory, Menlo Park, CA 94305, United States ¶Department of Chemistry, Michigan State University, East Lansing, MI 48824, United States §Department of Chemistry, Norwegian University of Science and Technology, 7491 Trondheim, Norway
E-mail:
[email protected] 10
We present the rank-reduced full configuration interaction (RR-FCI) method, a variational approach
11
for the calculation of extremely large full configuration interaction (FCI) wavefunctions. In this
12
report we show that RR-FCI can provide ground state singlet and triplet energies within kcal/mol
13
accuracy of full CI (FCI) with computational effort scaling as the square root of the number of
14
determinants in the CI space (compared to conventional FCI methods which scale linearly with the
15
number of determinants). Fast graphical processing unit (GPU) accelerated projected σ=Hc matrix-
16
vector product formation enables calculations with configuration spaces as large as 30 electrons in
17
30 orbitals, corresponding to an FCI calculation with over 2.4x1016 configurations. We apply this
18
method in the context of complete active space configuration interaction calculations to acenes with
19
2-5 aromatic rings, comparing absolute energies against FCI when possible and singlet/triplet
20
excitation energies against both density matrix renormalization group (DMRG) and experimental
21
results. The dissociation of molecular nitrogen was also examined using both FCI and RR-FCI. In
22
each case we found that RR-FCI provides a low cost alternative to FCI, with particular advantages
23
when relative energies are desired.
24
Fales – Rank-reduced Full Configuration Interaction – Page 1 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
25
1. Introduction
26
The development of accurate and efficient methods that can describe electron correlation in
27
complex molecules is an ongoing effort in theoretical and computational chemistry. Full
28
configuration interaction (FCI), a linear expansion of configurations in the vector space defined by
29
single particle basis functions, represents the exact solution to the electronic structure problem in
30
the limit of a given basis set. Besides being formally exact, FCI is also conceptually simpler than
31
other high accuracy methods. Unfortunately, because of its high computational expense and
32
factorial scaling with respect to molecular size, FCI is rarely used for routine chemical
33
investigations. Instead, it is usually used as a benchmarking tool for quantifying the relative
34
accuracy of more tractable approximate methods.
35
Benchmark FCI calculations typically include all configurations arising from exciting any of
36
the electrons within the entire orbital space. Another use of FCI is in the context of active space
37
methods, where the configurations include all electronic excitations for a limited number of “active”
38
electrons in a limited number of “active” orbitals. This defines the complete active space
39
configuration interaction (CASCI) method. When the orbitals are also varied to minimize the
40
resulting energy, the method is termed complete active space self-consistent field (CASSCF)1-3 or
41
fully optimized reaction space (FORS).4 Both CASCI and CASSCF require solution of the FCI
42
problem in a user-defined configuration space (or active space). These multiconfigurational
43
approaches were developed to describe static electron correlation, where the electronic structure is
44
poorly characterized by a single Slater determinant. This situation often arises in polyradicaloid
45
systems that are encountered in many photochemical processes, in transition metal systems, and in
46
molecules where covalent bonds have been stretched far from the equilibrium distance.
47
FCI may be considered a mature electronic structure method. The development of direct CI
48
methods reduced memory requirements significantly, since these avoid explicit formation and
49
diagonalization of the full Hamiltonian matrix.5 The success of direct CI has motivated
50
development of numerical approaches designed for the problem of iterative matrix diagonalization.6
51
Additional progress by Siegbahn7 resulted in approaches even better suited for modern computing
52
by leveraging fast matrix-vector and matrix-matrix operations, and further refinement in the context
53
of vector processing machines has been performed.8-10 In FCI programs, Slater determinants are
54
often used en lieu of spin adapted configuration state functions due to the convenient factorization
55
of the determinantal CI vector into α- and β-strings and because the one-particle coupling Fales – Rank-reduced Full Configuration Interaction – Page 2 ACS Paragon Plus Environment
Page 2 of 40
Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
56
coefficients become simple integers.11 Olsen’s determinant based algorithms for solving the FCI
57
and restricted active space CI (RASCI) problems enabled early calculations of unprecedented
58
size.12-13 Even when using cutting edge hardware and highly efficient algorithms, configuration
59
spaces for high-throughput calculations are still limited to approximately 1010 determinants.14 Thus,
60
the problem of high computational scaling with respect to the configuration space size places hard
61
limits on the dimension of problems that can be reasonably studied, both due to computational time
62
and memory constraints.
63
Efforts to improve scaling can be focused on either (or both) of the many-electron
64
configuration space or one-electron orbital space dimensions. These are of course linked, since the
65
size of the configuration space is a factorial function of the number of orbitals. However, the
66
2 distinction arises because the formal scaling of direct CI is O N det N o4 , where Ndet is the number of
67
configurations (determinants in our case) and No is the number of (active) molecular orbitals (MOs).
68
In practice, the sparsity of the Hamiltonian matrix reduces this to O N det N o4 . Algorithms having
69
smaller operation counts have been described, for example by Olsen,12 but these are difficult to
70
implement efficiently for vector machines. Approaches that attempt to improve the N o4 (orbital
71
space) scaling include pseudospectral methods,15-16 density fitting and Cholesky decomposition,17-22
72
and tensor hypercontraction.23-29 These approaches all decompose the fourth order tensor
73
representing the MO basis electron repulsion integrals as a product of lower-order tensors. This
74
decomposition can decrease both the scaling of the integral transformation and the computational
75
cost of the σ=Hc vector formation during the direct CI iterations. Unfortunately, the size of the
76
configuration space increases combinatorially with the number of orbitals, growing much faster
77
than the quartic scaling of the N o4 component. Thus, schemes aimed at improving the orbital space
78
scaling have been of limited utility so far. This motivates the development of methods that directly
79
reduce the scaling with respect to the size of the configuration space.
(
)
(
)
80
Direct CI algorithms based on Krylov subspace diagonalization rely on iteratively expanding
81
the subspace size with search directions determined by preconditioning the residual vector from the
82
previous iteration. This concept leads naturally to that of selected CI, where the search directions for
83
the addition of trial vectors to the subspace are determined through alternative approaches including
Fales – Rank-reduced Full Configuration Interaction – Page 3 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
84
those based on perturbation theory30-33 and stochastics.34 Selected CI methods are being actively
85
developed as promising lower cost FCI alternatives.
86
Stochastic approaches to solving the FCI problem such as quantum Monte Carlo-FCI35-36
87
(QMC-FCI) have been demonstrated in some cases to provide accurate descriptions of correlated
88
fermionic systems. Unfortunately, uncontrolled errors resulting from the constraint imposed by the
89
antisymmetry principle (i.e., the Fermion sign problem) remain open challenges in QMC-FCI.37
90
Alternatively, variational methods based on tensor networks, such as density matrix renormalization
91
group38-39 (DMRG), provide high-accuracy deterministic solutions for both one-dimensional and
92
branched/tree-like systems using matrix product states and tree tensor network states,40 respectively.
93
Systems possessing higher orders of quantum entanglement, however, are still challenging problems
94
for DMRG.
95
Another productive approach for reducing the configuration space scaling relies on taking
96
advantage of sparsity in the CI vector. Examples include methods by Knowles,41-42 Mitrushenkov,43-
97
44
98
correction vectors to the CI subspace, resulting in a shorter CI vector, especially during early
99
iterations, and a reduction in operation count.
and Rolik.45 These approaches adaptively expand the linear search space during the addition of
100
Olsen et al. developed the low-rank CI (LR-CI) method using a spectral resolution of the CI
101
coefficient matrix46 to compress the wavefunction. In this work a second-order Newton-Raphson
102
scheme was used to solve the CI with single and double replacements (CISD) problem. LR-CISD
103
was applied to test systems including neon, nitrogen, and water, and systematic improvement of the
104
energy relative to full CISD was observed as higher-rank wavefunctions were employed. Lindh et
105
al. followed this work by implementing a low-rank multi-configuration self-consistent field (LR-
106
SCF) method47 and demonstrated that polarizabilities were better described using LR-SCF than with
107
LR-CI, though wavefunction convergence was hindered by strong coupling between orbital
108
rotations and single excitations. Attacking the CI problem from another angle, Koch and Dalgaard
109
formulated a variational matrix decomposition using non-linear optimization methods to solve the
110
FCI problem.48 A pilot version of a program performing the decomposition was applied to the
111
ground state of beryllium atom. Rapid convergence to the FCI energies within mH accuracy was
112
observed in this case after 25 iterations, and 95% of the correlation energy was obtained after only
113
10 iterations.
Fales – Rank-reduced Full Configuration Interaction – Page 4 ACS Paragon Plus Environment
Page 4 of 40
Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
114
Both the LR-CI and variational matrix decomposition methods rely on the assumption that
115
information represented by the CI coefficients is redundant. An equivalent statement is that the
116
matrix form of the CI vector has low rank. Recently, Taylor reported a study where the singular
117
value decomposition (SVD) was used to quantify the effect of solving the linear CI equations using
118
low rank approximations to the current trial vector at each iteration with various accuracy
119
thresholds.49 We will refer to this as the low rank SVD (LR-SVD) method. Several representative
120
FCI cases were examined, including examples of both “stout” (typical of a CASSCF or a CASCI
121
calculation, where the number of electrons is similar to the number of orbitals) and “slim” (more
122
characteristic of a benchmark type calculation, where the number of orbitals is much larger than the
123
number of electrons) CI matrices. These naming conventions were derived from the shape of the
124
graphical representations of the corresponding CI spaces,50 where the slim graph is tall and narrow
125
and the stout graph is more symmetrical. In each case, Taylor demonstrated that while use of a
126
lower-rank trial vector at each iteration slowed convergence, the LR-SVD calculation ultimately
127
converged to the same eigenvector as standard FCI (to within a given residual threshold). Instead of
128
demonstrating the low rank of the converged CI vectors, however, it was observed that a relatively
129
large rank of the CI vector was required to achieve the desired accuracy. The high computational
130
3/2 cost of the SVD operation, N det versus ≈ N det for a σ build in standard direct CI, and absence of a
131
reduction in trial vector length in later LR-SVD iterations imply that the LR-SVD approach has few
132
advantages in large scale production calculations.
133
As described above, writing the CI vector (c) in matrix form (C), where the rows correspond to
134
α-strings and columns to β-strings permits a reduction in scaling by using a matrix decomposition.
135
In the present work, we extend the ideas of Koch and Dalgaard,48 where the product space vectors
136
are determined variationally through solution of a non-linear multi-scale eigenvalue problem (a type
137
of super-CI approach, similar to that which is often used in two-step CASSCF implementations51-
138
52
139
follows: in Section 2 we introduce the augmented eigenvalue problem for the product space vectors
140
(Section 2.1), describe projected σ formation (Section 2.2), and provide implementation details
141
including one-particle coupling coefficient formation (Section 2.3.1), eigenvalue problem solution
142
(Section 2.3.2), metric formation (Section 2.3.3), and projected σ formation (Section 2.3.4). Next
143
we demonstrate computational performance of the RR-FCI method (Section 3.2) before applying
), in developing a rank-reduced FCI (RR-FCI) formulation. The structure of the paper is as
Fales – Rank-reduced Full Configuration Interaction – Page 5 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 40
144
RR-FCI to acenes having 2-5 polycyclic aromatic rings (Section 3.3) and to molecular nitrogen
145
dissociation (Section 3.5). We conclude with a brief review of the RR-FCI method before
146
discussing future directions of work.
147
2. Methods
148
The second-quantized electronic Hamiltonian is
(
1 Hˆ = ∑ hkl Eˆ kl + ∑ (ij| kl ) Eˆ ij Eˆ kl − δ jk Eˆ il 2 ijkl kl
149
)
(1)
150
where i,j,k,l index molecular orbitals, hij and (ij|kl) are the one- and two-electron integrals, and Eˆ is
151
a spin-averaged single-particle excitation operator.
152
Instead of building and diagonalizing the full Hamiltonian to determine the eigenvalues and
153
their corresponding eigenvectors, direct methods allow the iterative solution for a few of the lowest-
154
lying eigenvalues and eigenvectors. In the present work we focus on the calculation of the ground
155
state (lowest for a given spin) eigenvalue/eigenvector pair. Future work will explore the
156
determination of higher-lying (excited) states. The rate-limiting step in the direct CI procedure is the
157
formation of the matrix-vector product σ defined as
σ I = ∑ H IJ cJ
158
(2)
J
( ) operations, but modern algorithms exploit Hamiltonian
159
2 Formally, σ formation requires O N det
160
sparsity to achieve an effective scaling approaching O(Ndet). In this work we describe an approach
161
that reduces the formal scaling to O(Ndet) with an observed effective scaling of O
162
rate-limiting σ formation.
163
2.1 Eigenvalue Problem
(
N det
) for the
164
We begin by describing the variational decomposition approach taken by Koch and Dalgaard.
165
We will largely restrict our attention to singlet spin states in the following. In a determinantal basis,
166
the full CI wavefunction can be written as:
167
N det
Nα N β
I =1
J =1 K =1
ψ = ∑ d I Φ I = ∑ ∑ CJ K φJα ,φ Kβ
Fales – Rank-reduced Full Configuration Interaction – Page 6 ACS Paragon Plus Environment
(3)
Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
168
where Nα and Nβ are the number of α- and β-strings and Ndet=NαNβ is the number of determinants,
169
labeled as Φ I
170
The wavefunction coefficients, C, expressed as a matrix according to the second equality in Eq. (3),
171
can be decomposed as:
with coefficients dI or as a product of α/β-strings φJα ,φ Kβ
Nr
N
( )
(
with coefficients CJK.
N
)
r r † C = ∑ ci P i Q i ≡ ∑ ci P i ⊗ Q i = ∑ ci Ψ i i i i
172
(4)
173
where Nr (the allowed rank, or equivalently, the number of product terms) is less than or equal to
174
the full rank of C, ci are product term coefficients, and P and Q are product state vectors of length
175
Nα and Nβ, respectively. In the following, we will adopt the convention of normalizing the Pi and Qi
176
vectors as follows:
177
Ψi = P i ⊗ Q i
178
Ψ i* Ψ i = Tr Q i ⊗ P i P i ⊗ Q i
((
)(
2
(5)
)) = P
i
2
2
Qi = 1
2
P i = Qi = 1
179
(6) (7)
180
The vectors P and Q correspond to coefficients of product space basis functions φα and φβ . This
181
provides us with an expression for the initial wavefunction, ψ 0 :
182
(8)
183
where PJ0 and QK0 are elements of P0 and Q0. Equivalently, we can also expand the initial
184
wavefunction in terms of the product space basis sets Nα N β
ψ 0 = ∑ ∑ φJα ,QK0 PJ0
185
(9)
K
J
Nα N β
ψ 0 = ∑ ∑ PJ0 ,φ Kβ QK0
186
J
187 188
(10)
K
The full wavefunction can be expressed as Nr
Nr
i
i
Nα N β
ψ = ∑ ci ψ i = ∑ ci ∑ ∑ φJα ,φ Kβ PJi QKi J
K
Fales – Rank-reduced Full Configuration Interaction – Page 7 ACS Paragon Plus Environment
(11)
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 40
189
Applying the variational principle to the full wavefunction leads to coupled equations for the first
190
set of product terms: Nα N β
∑∑
191
J
∑∑ J
(12)
PJ ,φ Kβ ′ Hˆ − E PJ ,φ Kβ QK = 0 ∀ K ′
(13)
K
Nα N β
192
φJα′ ,QK Hˆ − E φJα ,QK PJ = 0 ∀ J ′
K
193
Addition of the second and subsequent product terms result in coupled augmented eigenvalue
194
problems:
195
Nβ
∑
ψ Hˆ − E ψ
ψ Hˆ − E φJα ,QK
K Nβ
∑
φJα′ ,QK Hˆ − E ψ
K
Nβ
∑ φ α ,Q J′
K
Hˆ − E φJα ,QK
K
196
for the P optimization and
197
Nα
∑
ψ Hˆ − E ψ
ψ Hˆ − E PJ ,φ Kβ
J Nα
∑
PJ ,φ Kβ ′ Hˆ − E ψ
J
Nα
∑
PJ ,φ Kβ ′ Hˆ − E PJ ,φ Kβ
J
a0 P
= 0 0
(14)
a0 Q
= 0 0
(15)
198
for the Q optimization, where a0 is a scalar which couples the eigenvalue problems. As previously
199
described, this approach provides fast convergence to within mH accuracy of the FCI energy.
200
Unfortunately, this formulation is prone to spin contamination and only singlet states may be
201
described.48 We can remedy both of these issues by modifying this ansatz to be a linear combination
202
of products of P and Q vectors.
203
Ψi = P i ⊗ Q i ± Q i ⊗ P i
(16)
204
where the singlet corresponds to the symmetric case (+) and the triplet to the antisymmetric case
205
(−). This linear combination requires P and Q to be of length Nstr, where Nstr=Nα=Nβ is the number
206
of α- or β-trings for the restricted case described here. From this point on, we will assume that
207
Nα=Nβ, and leave it to future work to generalize the formalism further. While wavefunction
208
symmetry/antisymmetry alone does not guarantee that the final eigenvector is a spin eigenfunction
209
(quintet states can in principle contaminate the singlet states, for example), we find that in practice
Fales – Rank-reduced Full Configuration Interaction – Page 8 ACS Paragon Plus Environment
Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
210
this is not a serious issue. We have performed a detailed analysis of the expectation value of S2 for
211
the RR-FCI wavefunction in several test cases and refer the interested reader to the Supporting
212
Information. The coupled equations for the initial set of product space vectors then become N str
∑ (φ α ,Q
213
J′
K
J ,K
) ± (Q
K
)
(
) (
)
,φJβ ′ Hˆ − E φJα ,QK ± QK ,φJβ PJ = 0 ∀J ′
N str
∑ ( P ,φ β ) ± (φ α ,P ) Hˆ − E ( P ,φ β ) ± (φ α ,P
214
J
K′
K′
J
J
K
K
J
J ,K
) Q
215
giving the eigenvalue problem for the subsequent product terms as
216
N
str
∑
(φ
) (
)
N
str
∑
β ,Q K ± Q K ,φ J ′ Hˆ − E ψ J′ α
K
)
(φ
α J′
) (
)
(
) (
β α β ,QK ± QK ,φ J ′ Hˆ − E φ J ,QK ± QK ,φ J
218
N
∑
ψ Hˆ − E ψ
str
(
) (
)
)
(
) (
β α ψ Hˆ − E PJ ,φ K ± φ K ,PJ
J N
)
K
for the P optimization and
∑ str
( P ,φ ) ± (φ J
β
α
K′
K′
)
N
∑
, PJ Hˆ − E ψ
J
str
( P ,φ ) ± (φ J
β
α
K′
K′
β α ,PJ Hˆ − E PJ ,φ K ± φ K , PJ
J
)
(18)
a 0
P
=
0 0
(19)
a 0 = 0 (20) Q 0
for the Q optimization. The full wavefunction can be expressed as Nr
N str
i
J ,K
(
) (
)
ψ = ∑ ci ∑ φJα ,φ Kβ ± φ Kα ,φJβ PJi QKi
220
221
) (
α β ψ Hˆ − E φ J ,QK ± QK ,φJ
∀K ′
K N
217
219
(
str
∑
ψ Hˆ − E ψ
=0
K
(17)
(21)
2.2 Projected Formation
222
Since our factorization scheme relies on separation of the and components of the CI vector,
223
a reasonable starting point for formation is Olsen’s CI equations,12 where is separated into three
224
terms
σ = σ1 + σ 2 + σ 3
225 226 227
(22)
with
(σ )
1 J ′K
Nα No
= ∑ ∑γ J
kl
N
JJ ′ kl
hkl′ CJ K
N
1 α o + ∑ ∑ γ ijJM γ klM J ′ (ij| kl )CJ K 2 J ,M ijkl
Fales – Rank-reduced Full Configuration Interaction – Page 9 ACS Paragon Plus Environment
(23)
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
228
(σ )
2 JK ′
Nβ N o
N
= ∑ ∑γ
KK ′ kl kl
h ′ CJK
kl
K
N
1 β o + ∑ ∑ γ ijKM γ klM K ′ (ij| kl )CJ K 2 K ,M ijkl
(24)
Nα N β N o
(σ )
229
Page 10 of 40
3 J ′K ′
= ∑ ∑ ∑ γ ijK K ′γ klJJ ′ (ij| kl )CJ K J
(25)
K ijkl
230
where J,J’ index α occupation strings, K,K’ index β occupation strings, M indexes both α and β
231
strings, and γ ,γ are the one-particle coupling coefficients defined as
232
γ ijJ J ′ = φJα Eˆ i j φJα′
(26)
233
γ ijK K ′ = φ Kβ Eˆ i j φKβ ′
(27)
234
with Eˆ ij a single particle generator either between α occupation strings J and J’ or β occupation
235
strings K and K’ corresponding to an excitation between orbitals i and j. In the case we consider
236
here, where Nα=Nβ, the α/β coupling coefficients are equal:
γ ijK K ′ = φ Kβ Eˆ ij φ Kβ ′ = γ ijK K ′ = φ Kα Eˆ ij φ Kα ′
237
(28)
238
Note that in Eqs. (23)-(25), we have combined the one-electron part of the two-electron integrals
239
(ij|kl) with the one-electron integrals hkl for computational efficiency No
hkl′ = hkl − ∑ (kj| jl )
240
(29)
j
241
Two different types of σ formations correspond to Equations 12 and 13. Projection onto the P
242
vector space results in Nα N β
σ = ∑ ∑ QK φJα′ ,φ Kβ Hˆ φJα ,φ Kβ PJ QK P J′
243
J
244
(30)
K
while projection onto the Q vector space gives
σ
245
Nα Nβ
Q K′
= ∑ ∑ PJ φJα ,φKβ ′ Hˆ φJα ,φKβ PJ QK J
(31)
K
246
We can think about projected σ formation in terms of operations that span the full determinantal
247
space. Figure 1 depicts how a standard CI program can be used to form the projected σ vector
248
corresponding to Eq. (31).
Fales – Rank-reduced Full Configuration Interaction – Page 10 ACS Paragon Plus Environment
Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
249
Frame 1 from Figure 1 defines formation of the rank 1 full CI vector C from the P and Q
250
vectors. The outer product of the P and Q vectors forms a matrix CJK. Expanding this matrix
251
produces the standard full CI vector, C. The second frame corresponds to formation of the σ vector,
252
which is then folded into matrix form, σJ’K’. Finally, σJ’K’ is then projected by the Q vector in
253
Frame 3 to give the projected σ vector, σP. While this approach assists in understanding the nature
254
of the data structures involved in the algorithm, direct formation of the factorized σ vector is needed
255
to avoid unnecessary computation and storage.
256
Factorizing the projected σ results in 3 terms
(σ )
P
1 J′
257
N N N o JJ ′ 1 str o J M M J ′ = ∑ ∑ QK ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ QK 2 M ijkl J K kl N str N str
N N N str N str N o J J 1 str o JM M J ′ ′ = ∑ QK QK ∑ ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ 2 M ijkl K J kl
(σ )
P
2 J′
258
N N N o KJ ′ 1 str o KM M J ′ = ∑ ∑ QJ ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ QK 2 M ijkl K J kl N str N str
N N N str N str N o K J 1 str o KM M J ′ ′ = ∑ PJ QJ ∑ ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) QK 2 M ijkl J K kl N str N str
No
(σ ) = ∑ ∑ Q ∑γ P
3 J′
259
260
J
K ,K ′
K′
(33)
γ klJJ ′ (ij| kl )PJ QK
KK ′ ij
ijkl
N str J J ′ KK = ∑ ∑ QK ′γ i j QK ∑ γ kl PJ (ij| kl ) ij kl K ,K ′ J No
N str
(34)
for the P factorization and
(σ )
Q
1 K′
261
N N N o JK ′ 1 α o JM MK ′ = ∑ ∑ PK ∑ γ kl hkl′ + ∑ ∑ γ i j γ kl (ij| kl ) PJ QK 2 M i jkl K J kl N str N str
N N N str N str N o J K 1 str o J M M K ′ ′ = ∑ PK QK ∑ ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ 2 M ijkl K J kl
(σ )
Q
2 K′
262
(32)
N N N o KK ′ 1 str o KM M K ′ = ∑ ∑ PJ ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) PJ QK 2 M ij kl J K kl
(35)
N str N str
N N N str N str N o K K 1 str o KM M K ′ ′ = ∑ PJ PJ ∑ ∑ γ kl hkl′ + ∑ ∑ γ ij γ kl (ij| kl ) QK 2 M ijkl J K kl
Fales – Rank-reduced Full Configuration Interaction – Page 11 ACS Paragon Plus Environment
(36)
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
( ) σ3
263
Q K′
=
N str N str
No
J ,J ′ K
ijkl
Page 12 of 40
∑ ∑ PJ ′ ∑ γ ijKK ′γ klJJ ′ (ij| kl )PJ QK
N str N str = ∑ ∑ γ iKj K ′QK ∑ PJ ′γ klJJ ′ PJ (ij| kl ) ijkl K J ,J ′ No
(37)
264
for the Q terms. Since we will focus on the restricted case where Nα=Nβ, we will not distinguish
265
between and for the remainder of this work. Note that this restriction also leads to a loss of the
266
correspondence between α/β and J/K indices in Eqs. (32)-(37). For each projected σ2 formation
267
three state vectors are required in addition to the current trial vector. For the P optimization, for
268
example, the current Q vector is needed in addition to each of the previously converged P and Q
269
vectors, and the P being optimized is taken to be the current trial vector.
270
2.3 Computational Methods
271
2.3.1 Coupling Coefficient Formation
272
Data structures required for evaluation of σ include the one-particle coupling coefficients , the
273
orbital labels , and the configuration labels . When possible, our program constructs and stores
274
these lists in memory at the beginning of the calculation rather than computing elements on-the-fly.
275
Forming these lists for configuration spaces typical of direct CI calculations is a routine task and
276
does not require exceptional computational effort. In contrast, the large configuration spaces
277
accessible to RR-FCI require extremely large lists. To handle these cases we have developed an
278
efficient GPU accelerated algorithm listed in Algorithm 1 for construction of the one-particle
279
coupling coefficients and related data structures in a lexical (and dense) format. The approach
280
described here is also well suited for forming these lists on-the-fly if desired.
281
The outer loop of Algorithm 1 can be tiled to enable vectorization over multiple GPUs and/or
282
compute nodes. The stradr() function is an implementation of Equations 11 and 12 from Knowles
283
and Handy,8 where the binomial coefficients are retrieved from a precomputed list (i.e. Pascal’s
284
triangle) rather than calculated on-the-fly. The get_parity() function is an abstraction that can be
285
implemented trivially in several ways (we compute the value inline). This algorithm produces dense
286
data structures in the minimum number of operations, (Nstr*Ne(No-Ne)), where Ne and No are the
287
number of electrons and orbitals in the configuration space, respectively. Even so, larger
288
configuration spaces require storage on the order of hundreds of GB. For the reader’s convenience,
289
Table 1 provides configuration space and array sizes for some of the CAS spaces used in this study. Fales – Rank-reduced Full Configuration Interaction – Page 12 ACS Paragon Plus Environment
Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
290
Journal of Chemical Theory and Computation
2.3.2 Eigenvalue Problem
291
Our program solves a nonlinear eigenvalue problem using a two-step Davidson approach,
292
where we iterate over P and Q vector optimization until they are self-consistently converged. The
293
procedure we use is given graphically in Figure 2. The inner (micro) iterations are performed using
294
a generalized Davidson algorithm to solve the non-orthonormal subspace eigenvalue problem. The
295
outer (macro) iterations are considered converged when the micro-iterations for each of the P and Q
296
vectors converge in a single step. We “balance” the eigenvalue problem by scaling the projected
297
metric matrix as recommended previously.53-54 A residual norm convergence criterion of
298
r = 1.0 ×10−6 is used for the micro-iterations.
299
To begin the iterative procedure for the P and Q vector optimizations we must provide a guess
300
vector for each. In the first macro-iteration, the guess for each of P0 and Q0 is the vector
301
corresponding to the Hartree-Fock determinant, i.e. , where the first vector element
302
corresponds to the coefficient a0 that couples the P and Q eigenvalue problems. In the second and
303
subsequent macro-iterations the guess for the P vector becomes , which corresponds
304
to the previously converged P,Q pairs being used as the Pi guess. The Qi guess corresponds to a unit
305
vector in the direction of the i th macro-iteration, i.e. if we are adding the third P,Q pair, the Q
306
guess is . This permits efficient sampling of the configuration space while avoiding
307
linear dependence of the added trial vectors. Other guesses for the Q can be conceived; the original
308
implementation used a uniform Q guess, i.e.
309
guesses derived from projected residual vector formation (similar to typical trial vector formation in
310
standard Davidson algorithms) appear promising. In our tests we investigated all three of these
311
options. Interestingly, the unit guess approach (uniformly sampling the configuration space) results
312
in the best convergence, with the other approaches producing linearly dependent trial vectors
313
shortly after mH accuracy is obtained.
0,1
N str ,1
N str ,1
N str ,... , and Q vector
314
Preconditioning of the Davidson routine is performed using the reference determinant energy
315
modified by orbital energy differences as described by Evangelisti.55 Advantages of orbital energy
316
difference preconditioning are that spin contamination is not introduced into the trial vectors and
317
convergence is generally similar to that observed when using exact diagonal energies. Convergence
318
of the RR-FCI micro-iterations generally requires 10 − 30 iterations. To test the propensity for spin
319
contamination of each of the ansatzes described in Equations 11 and 21 we have reconstructed the Fales – Rank-reduced Full Configuration Interaction – Page 13 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 40
320
final wavefunction | Ψ〉 for a variety of manageable configuration spaces (i.e. 16 electrons in 16
321
orbitals and smaller). The spin purity of the full wavefunction | Ψ〉 and for each contributing P,Q
322
pair was assessed by directly calculating the expectation value of S2. In doing so we confirmed the
323
presence of spin contamination in the ansatz described in Equation 11 and the absence of spin
324
contamination in the wavefunction defined in Equation 21.
325
Metric Formation
326
Since we are solving a generalized (non-orthonormal) eigenvalue problem it is necessary to
327
build the subspace metric matrix/trial vector product corresponding to Equations 19-20. We form
328
the projected metric according to
329
BP =
N str
∑
ψψ
K N str
∑ K
(φ
α J′
) (
β
)
,QK ± QK ,φJ ′ ψ
330
for the P vector optimization and
331
Q B =
N str
∑ K
(φ
α J′
) (
N str
∑ K
∑ K
( P ,φ ) ± (φ J′
β
α
K′
K′
)
,PJ ′ ψ
N str
∑ K
) (
(
) (φ
α J
) (
) (
β
α
K′
K′
,PJ ′
)
)
) ( P ,φ ) ± (φ J
a0 P
)
,QK ± QK ,φJβ
ψ PJ ,φ Kβ ± φ Kα ,PJ
( P ,φ ) ± (φ J′
β
ψ φJ ,QK ± QK ,φJ
,QK ± QK ,φJβ ′
ψψ N str
(
α
β
α
K
K
,PJ
)
(38)
a0 Q
(39)
332
for the Q vector optimization. The algorithm for computing the metric is low enough scaling that
333
we have implemented it in serial on the host (CPU). Details are given in Algorithm 2. Projected
334
metric formation requires double precision dot product (DDOT) and y = Ax + y (DAXPY)
335
operations, both of which are performed using optimized linear algebra library routines.56
336
Projected Sigma Formation
337
We have developed projected σ formation algorithms based on a hybrid of the Olsen
338
factorization12 and the Knowles and Handy FCI algorithm.8 A configuration label driven scheme for
339
formation of σP is described in Algorithms 3 – 5. Orbital label driven algorithms which more
340
closely resemble the Olsen CI factorization are presented in Supporting Information.
341
The GPU kernels in the above algorithms strongly resemble those described in our previous
342
direct CI work.10 By forming the projected σ vector directly we benefit from a significant reduction Fales – Rank-reduced Full Configuration Interaction – Page 14 ACS Paragon Plus Environment
Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
343
in scaling: in RR-FCI we are able to eliminate the loop over β strings entirely. Further, direct CI
344
kernels that contribute to the σ array must use atomic operations to avoid data collisions. While
345
hardware support for atomics ameliorates some of the computational overhead associated with their
346
use, performance still suffers relative to algorithms that can avoid atomic operations altogether. The
347
RR-FCI projected σ algorithm described above is able to restrict atomic operations to the σ1 and σ2
348
terms, using DAXPY and DGEMV operations to form the σ3 term. Each of the PQT and the QP T
349
projected σ vector combinations can be formed (for both P and Q vector optimizations) using these
350
algorithms and interchanging the ordering and identities of the input vectors. In practice, we tile the
351
loop indices to allow for configuration spaces of arbitrary size (subject to memory constraints
352
imposed by the length of the product space vectors). We avoid description of our tiling scheme in
353
Algorithms 3–5 for clarity, and instead present this information in more detail in the Supporting
354
Information. Due to the high cost associated with CPU host to GPU device memory transfers we
355
select the largest tile size possible (limited by GPU device memory) to minimize incurring memory
356
transfer penalties.
357
Results and Discussion
358
Computational Details
359
The RR-FCI method was implemented in the TeraChem GPU accelerated electronic structure
360
package using the Compute Unified Device Architecture (CUDA) API57 and the NVidia CUDA
361
Basic Linear Algebra Subroutines library (cuBLAS).58 Benchmark calculations were performed
362
using a single core of an Intel Xeon E5-2699 @2.2GHz and a single NVidia K40c GPU.
363
Configuration spaces are defined according to the notation (X,Y), where X corresponds to the
364
number of electrons and Y to the number of orbitals. Geometries for all molecules used in this work
365
are reported in Cartesian coordinates in the Supporting Information. The cc-pVDZ basis set (in
366
Cartesian form including contaminants) is used in all calculations unless stated otherwise.
367
Algorithm Performance
368
The computational performance of σ vector formation depends solely on the number of active
369
electrons and orbitals. Benchmark CASCI (RR-FCI and conventional FCI) calculations were
370
performed on ethylene dimer using molecular orbitals from a preceding Hartree-Fock calculation.
371
The active spaces range from (10,10) through (30,30) for RR-FCI and up to (16,16) for Fales – Rank-reduced Full Configuration Interaction – Page 15 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
372
conventional FCI. Results are presented in Figure 3 and in Table 2. For comparison we also present
373
GPU accelerated σ formation timings from our conventional GPU-accelerated FCI program.10
374
A linear regression fit of observed computational times shows that the conventional FCI σ
375
formation scales linearly with the number of determinants, with a scaling exponent of 1.079. This is
376
in agreement with previously reported results for computational scaling of direct CI σ vector,10 two-
377
particle reduced density matrix59 and SA-CASSCF direct Hessian matrix60 formation. In contrast,
378
RR-FCI projected σ vector formation scales with the square root of the number of determinants,
379
with an observed scaling exponent of 0.543. The RR-FCI approach is more computationally
380
efficient than FCI by a factor approaching
N det .
381
Table 2 provides detailed timings for the most time-consuming portions of RR-FCI calculations
382
with active spaces ranging from (18,18) through (30,30). Coupling coefficient and other list
383
formation requires less computational effort than σ formation in every case. To reduce the storage
384
requirements for the largest lists we considered forming elements of these lists as needed during σ
385
formation, but even considering CPU–GPU memory transfer overhead it is advantageous to
386
precalculate and store these lists when possible. If one desires to use larger configuration spaces
387
than system memory allows, these lists may be formed directly during the σ formation at a
388
computational cost proportional to a standard σ formation.
389
Projected metric formation incurs only marginal computational expense for even the largest
390
configuration spaces. Instead, as expected, the rate-limiting step is σ formation. Timings for each
391
function comprising σ formation show that for every configuration space the matrix-matrix multiply
392
is the most expensive step. This step is performed using dense matrix operations, and analysis of
393
sparsity in the intermediate matrices and two-electron integrals may offer additional performance
394
advantages. Each of σ1 and σ3 formation requires multiple (2) matrix multiplication operations
395
(DGEMMs) while σ2 formation requires only a single DGEMM. This, combined with the increased
396
number of GPU kernels (D and σ matrix formations) required, results in σ2 being less
397
computationally expensive than σ1 and σ3. Additional details related to this may be found in the
398
Supporting Information.
399
Memory transfer between the host (CPU) and device (GPU) becomes increasingly important
400
with larger configuration spaces. Active spaces smaller than (24,24) show negligible contribution
401
from CPU–GPU memory transfer, as their data structures reside entirely in GPU memory and have Fales – Rank-reduced Full Configuration Interaction – Page 16 ACS Paragon Plus Environment
Page 16 of 40
Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
402
dimension smaller than the tile size, but at configuration spaces of (24,24) and larger the high cost
403
of CPU–GPU memory transfer becomes apparent, comprising nearly 30% of the overall cost of the
404
projected σ formation for the (30,30) active space. The GPU kernels corresponding to Dij and σij
405
formation are the next most costly operations, followed by the remaining linear algebra operations
406
(DGEMV, DDOT, DAXPY). Finally, the largest configuration space required an extremely small
407
tile size (1024), and this is reflected in both the list and the σ formation times accordingly.
408
Acene Absolute Energy Convergence
409
Linear polycyclic aromatic hydrocarbons, or acenes, possess interesting electronic properties
410
making them suitable for incorporation into organic semiconductor materials. Correlation of the π
411
valence electrons provides a reasonably accurate description of the electronic structure, making
412
complete active space methods such as CASSCF or CASCI well suited for these systems. We have
413
calculated the ground singlet and triplet states for napthalene and anthracene using both CASCI and
414
RR-CASCI with full- π valence active spaces—(10,10) and (14,14) for naphthalene and anthracene,
415
respectively. Energy differences between the rank-reduced and exact (FCI) approaches are shown in
416
Figures 4 and 5. In addition to their interesting electronic characteristics, the calculations described
417
in this section are examples of configuration spaces commonly referred to as “stout” CI, where the
418
number of orbitals is similar to the number of electrons. Geometries for the singlet and triplet
419
systems described in the following study were optimized using UB3LYP/6-31G(d) and are the same
420
as those reported by Hachmann et al.61 For the reader’s convenience they are available in the
421
Supporting Information in Cartesian coordinates.
422
The RR-FCI energy rapidly converges to within mH accuracy of the FCI energy after 30
423
product terms, representing inclusion of rank structure of the CI vector having the largest influence
424
on the final energy. Once an accuracy of 1.0 × 10 −4 H is obtained, convergence slows considerably
425
and displays asymptotic behavior. Including 252 product states corresponds to the largest possible
426
rank of the CI vector, and we obtain a final accuracy approaching 1.0 × 10 −5 H. Since we solve a
427
nonorthogonal eigenvalue problem, product terms comprised of P and Q vectors may contain
428
redundant information. This occurs because we do not reoptimize P and Q vectors added in
429
previous iterations. This impedes convergence to the exact FCI solution when the number of
430
product terms included equals the FCI vector rank.
Fales – Rank-reduced Full Configuration Interaction – Page 17 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
431
The (14,14) CAS space of anthracene approaches the upper size limit of routinely performed CI
432
calculations. Similar to the naphthalene calculation, mH accuracy is achieved relatively quickly,
433
requiring fewer than 100 product terms (representing < 3% of the full rank of the CI vector, 3432 in
434
this case). Following addition of ∼ 150 product terms, convergence behavior decays before
435
asymptotically approaching an accuracy of 1.0 × 10 −4 H. Irregularities in the convergence at ∼ 350
436
and ∼ 800 product terms are likely artifacts of our guess vector procedure.
437
Relative Energy Convergence: Acene Singlet-Triplet Gaps
438
In addition to absolute energy convergence of the shortest acenes compared with FCI, we have
439
also investigated the singlet-triplet energy gap for the acene series having 2–5 rings. Singlet-triplet
440
energy gaps for each system are depicted in Figure 6. While absolute energy convergence to within
441
mH accuracy requires ∼ 30 product terms for naphthalene and ∼ 100 product terms for anthracene,
442
convergence of a relative property, in this case the singlet-triplet gap, requires fewer product terms
443
to achieve similar convergence. The S0 − T0 gap for naphthalene is converged after only 12 product
444
terms. Only slightly worse convergence is observed for the longer acenes, with anthracene,
445
tetracene, and pentacene requiring 15, 20, and 24 product terms to converge, respectively. A
446
comparison of our RR-CASCI energies with the local DMRG results of Hachmann is given in Table
447
3, along with estimates of the experimental singlet-triplet gaps for each system.62-65
448
Even with the disparity in CAS spaces between the RR-CI and DMRG results reported above,
449
the agreement between the two methods is surprisingly good. In each case, RR-CI lies within ∼ 4
450
kcal/mol of both the DMRG and the estimated experimental energies. Even more impressive is how
451
few product terms as a function of full CI space rank are required to obtain this level of agreement:
452
naphthalene requires 4.7% of the full rank, anthracene 0.4%, tetracene 0.04% and pentacene only
453
0.003%. These results are encouraging and motivate us to pursue refinement and continued
454
development of the RR-FCI approach.
455
Nitrogen Bond Dissociation
456
FCI is often used as a benchmarking tool for evaluating lower-cost approximations. A common
457
benchmark is the computation of a potential energy curve for dissociation of a multiple bond, which
458
can be particularly difficult since both static and dynamic electron correlation often contribute. We
459
applied the RR-FCI and FCI methods to molecular nitrogen, varying the bond length from 0.8 to 1.8
460
Å. The RR-FCI and FCI results are compared in Figure 7. All 10 valence electrons are correlated, Fales – Rank-reduced Full Configuration Interaction – Page 18 ACS Paragon Plus Environment
Page 18 of 40
Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
461
corresponding to a (10,28) configuration space. Conventional FCI calculations were performed
462
using the Molpro electronic structure program.66-67
463
While the configuration space corresponds to 9.6 billion determinants (without symmetry),
464
excellent agreement between RR-FCI and FCI is observed using only 100 product terms. Energy
465
differences [ ∆E = ( ERR − FCI ) − ( EFCI ) ] between the two methods are provided in the inset of Figure
466
7. It is interesting to note that RR-FCI calculations at or near the equilibrium geometry converge
467
more rapidly than the stretched geometries, as can be seen by the higher energies relative to exact
468
FCI when the nitrogen-nitrogen distance is greater than 1.2 Å. Increasing the number of product
469
terms for the stretched geometry calculations continues to reduce the errors observed relative to the
470
FCI energy. This demonstrates that our ansatz treats static and dynamic correlation in an unbalanced
471
fashion. We are currently investigating this behavior with the intention of improving the treatment
472
of static correlation. The low cost-to-accuracy ratio of RR-FCI makes it an ideal tool for use in
473
performing benchmark type calculations for evaluation of lower-cost alternatives.
474
Conclusions
475
We have presented a low-cost FCI alternative for both singlet and triplet ground states with
476
computational cost that scales as
477
large active spaces. GPU acceleration of rate-limiting components of the algorithm, including
478
projected σ and one-particle coupling coefficient formation, expands the size of accessible
479
configuration spaces to
480
our methods to the full π valence space CI calculations of acenes having 2 − 5 polycyclic aromatic
481
rings, demonstrating excellent agreement for both absolute and relative energies compared to
482
conventional FCI. Finally, we investigated dissociation of molecular nitrogen, and found that our
483
results compare favorably against benchmark FCI studies. Given the success of RR-FCI in
484
describing the ground state singlet and triplet wavefunctions, we are currently working to extend
485
our approach to allow for excited state calculations and the evaluation of molecular properties.
486
Acknowledgments
N det , allowing routine calculation of CASCI wavefunctions for
determinants while achieving sub-mH accuracy. We have applied
487
This material is based upon work supported by the U.S. Department of Energy, Office of
488
Science, Office of Advanced Scientific Computing Research, Scientific Discovery through
Fales – Rank-reduced Full Configuration Interaction – Page 19 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
489
Advanced Computing (SciDAC) program. BGL acknowledges support from the National Science
490
Foundation (CHE-1565634). Computational resources were provided through the Extreme Science
491
and Engineering Discovery Environment68 (Grant No. ACI-1548562, allocation TG-CHE140101)
492
and the XStream computational resource supported by the NSF MRI program (Grant No. ACI-
493
1429830). SS acknowledges support from an NSF Graduate Fellowship. HK acknowledges support
494
from the Norwegian Research Council through FRINATEK project no. 263110/F20. TJM is a
495
cofounder of PetaChem, LLC.
496
Fales – Rank-reduced Full Configuration Interaction – Page 20 ACS Paragon Plus Environment
Page 20 of 40
Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
497 498 499 500 501
Figure 1. Pictorial representation of projected σ formation using a full CI program. Subscripts correspond to basis indices of determinant space vectors represented as matrices in product space. Array dimensions are provided in the uppermost panel where Nα and Nβ are the numbers of α- and β-strings and Nr is the rank of the decomposition (number of product terms in RR-FCI).
502 503
Fales – Rank-reduced Full Configuration Interaction – Page 21 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
504 505 506
Figure 2. RR-FCI program flow. Note that in the first iteration, Eqs. 19 and 20 are simplified because |Ψ 0.
507 508 509
Fales – Rank-reduced Full Configuration Interaction – Page 22 ACS Paragon Plus Environment
Page 22 of 40
Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
510 511
512 513 514 515 516 517
Figure 3. CASCI σ vector formation times in seconds for ethylene dimer RR-FCI (orange circles) and conventional FCI (green triangles). Basis set is cc-pVDZ in both cases. RR-FCI active spaces range from (10,10) through (30,30) and FCI active spaces from (10,10) through (16,16). Linear regression was performed to determine the scaling exponent and is shown as a solid (RR-FCI) or a dashed (FCI) line.
518
Fales – Rank-reduced Full Configuration Interaction – Page 23 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
519 520 521 522 523
Figure 4. Energy error (relative to conventional CASCI, in Hartree) for RR-FCI with increasing number of product terms in napthalene. Lowest singlet state (brown) and lowest triplet state (blue) are shown using CAS(10,10)-CI with cc-pVDZ basis set. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively.
524
Fales – Rank-reduced Full Configuration Interaction – Page 24 ACS Paragon Plus Environment
Page 24 of 40
Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
525
526 527 528 529 530
Figure 5. Energy error (relative to conventional CASCI, in Hartree) for RR-FCI with increasing number of product terms in anthracene. Lowest singlet state (brown) and lowest triplet state (blue) are shown using CAS(14,14)-CI with cc-pVDZ basis set. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively.
531
Fales – Rank-reduced Full Configuration Interaction – Page 25 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
532
533 534 535 536
Figure 6. Singlet-triplet energy gap (in units of kcal/mol) for the acene series naphthalene through pentacene using RR-CASCI with the cc-pVDZ basis set. In each case, the active space comprises all π electrons in all valence π orbitals.
537
Fales – Rank-reduced Full Configuration Interaction – Page 26 ACS Paragon Plus Environment
Page 26 of 40
Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
538
539 540 541 542 543 544
Figure 7. Molecular N2 dissociation curve calculated with RR-FCI and conventional FCI using the cc-pVDZ basis set. RR-FCI calculations include 100 product terms each. Energy differences (in Hartree) between RR-FCI and FCI are given in the inset. Units for the inset are the same as the main plot. Time of a particular Davidson micro-iteration for N2 (RNN=0.7938 Å) is 0.72 s, and optimization of the first and second product terms required 1127.47 and 605.28 s, respectively.
545
Fales – Rank-reduced Full Configuration Interaction – Page 27 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
546 547
Fales – Rank-reduced Full Configuration Interaction – Page 28 ACS Paragon Plus Environment
Page 28 of 40
Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
548 549 550 551
Fales – Rank-reduced Full Configuration Interaction – Page 29 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
552 553 554 555 556
Fales – Rank-reduced Full Configuration Interaction – Page 30 ACS Paragon Plus Environment
Page 30 of 40
Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
557 558
Fales – Rank-reduced Full Configuration Interaction – Page 31 ACS Paragon Plus Environment
Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
559 560
Fales – Rank-reduced Full Configuration Interaction – Page 32 ACS Paragon Plus Environment
Page 32 of 40
Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
561 562 563
Journal of Chemical Theory and Computation
Table 1. Active space sizes and memory requirements for supporting data structures including orbital labels and coupling coefficients (in GB). Note that storage for P, Q, σ, and Davidson trial vectors are not included.
CAS Size Strings Memory Usage (GB) (18,18) 48,620