Article Cite This: J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
pubs.acs.org/JCTC
Fast Overlap Evaluations for Nonadiabatic Molecular Dynamics Simulations: Applications to SF-TDDFT and TDDFT Seunghoon Lee,† Eunji Kim,† Sangyoub Lee,† and Cheol Ho Choi*,‡ †
Department of Chemistry, Seoul National University, Seoul 08826, South Korea Department of Chemistry, Kyungpook National University, Daegu 702-701, South Korea
‡
Downloaded via EASTERN KENTUCKY UNIV on January 27, 2019 at 11:57:05 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
S Supporting Information *
ABSTRACT: Fast overlap integral algorithms for the spin−flip time-dependent density functional theory (SF-TDDFT) and the linear response (LR)-TDDFT were proposed on the basis of determinant factorization (DF) and the truncated Leibnitz formula (TLF). These in turn allow efficient computation of nonadiabatic coupling terms (NACTs) in nonadiabatic molecular dynamics simulations. The TLF(0), TLF(1), and TLF(2) were proposed according to the truncation order. The DF and TLF(1) or TLF(2) provide a four order combined performance improvement to the conventional method without introducing additional errors in the finite difference approximation. On the other hand, the DF and TLF(0) provide a five orders performance improvement making it the most efficient algorithm for NACT calculations so far with errors slightly larger than those of the finite difference approximation. The same techniques can be also applicable to other determinantal wave functions.
I. INTRODUCTION Overlap integrals (OIs) between electronic states are typically required for numerical evaluation of nonadiabatic coupling terms (NACTs)1 in nonadiabatic dynamics simulations. The NACTs are required to determine the probability of electronic transitions between states. In dynamics simulations, the OIs between electronic states are evaluated at two consecutive time steps. Thus, the states at different time steps are composed of nonorthogonal spin−orbitals with different molecular geometries. If the corresponding wave functions are expressed by a linear combination of determinants, the Löwdin formula2 can be adopted, which means that the overlap of two Slater determinants with nonorthogonal spin−orbitals can be expressed by a determinant with the matrix elements of mutually overlapping orbitals. In a simple approach, the OI was approximated as a product of configurational interaction vectors of different states.3−5 Exact OI evaluation with the Löwdin formula2 was also implemented for the semiempirical6 and time-dependent density functional theory (TDDFT),7−10 as well as multireference theories.11−14 This evaluation approximately scales to O(nocc5nvir2) for OIs between configuration interaction singles (CIS)-type wave functions, where nocc and nvir are numbers of occupied and virtual orbitals, respectively. Therefore, complete evaluations of the determinants are computationally prohibitive especially for large systems. Recently, Plasser et al.15 proposed a precomputation algorithm that can reduce the number of determinant computations by storing the ones that are requested repeatedly. To further reduce computational overhead, the authors proposed additional algorithms for screenings. The © XXXX American Chemical Society
effective scaling of this approach is generally considered as O(nocc4nvir1) for complete active space self-consistent field (CASSCF) wave functions with a large active space.15,16 This method has been implemented for NACT as well as Dyson norm calculations in the multireference self-consistent field (MCSCF),17 multireference CI (MRCI),15 and TDDFT.18−23 For the numerical NACT calculation within the finite difference approximation, the determinant derivatives (DDs) are expressed by the OIs. In addition, the OIs themselves have been utilized to identify the character of the excited state.24 In the special case of spin-flip (SF)-TDDFT, the OI plays an important role in the algorithm for determining and filtering a mixed triplet state.25 On the other hand, Ryabinkin et al.26 reported a very efficient O(noccnvir2) method for evaluating NACTs without employing expensive OI calculations. The authors derived analytic expressions of NACTs in the form of orbital derivatives (ODs) with significantly improved performances. To evaluate the derivatives, the ODs are combined with the finite difference approximation. This method also has been implemented in various theories of TDDFT,27−29 timedependent density functional tight binding (TD-DFTB),30 CIS,26 second-order algebraic diagrammatic construction (ADC(2)),31,32 and second-order approximate coupled-cluster (CC2).31 The two approaches are comparable and have their own unique features that were described in the recent review by Crespo-Otero and Barbatti.16 In the OD approach, the Received: October 17, 2018 Published: January 8, 2019 A
DOI: 10.1021/acs.jctc.8b01049 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
where SG is the symmetric group on the set G, i.e., the set of all possible permutations of n elements, while the σ denotes one of the permutations. The ith element of the permutation σ is σ(i). The sign of the σ permutation can be given by
calculation time is almost negligible. However, it cannot generate the OI. Meanwhile, the DD approach can calculate the OI, although additional sophisticated screening techniques with a threshold are necessary for sufficient performance. Screening techniques in general are subject to particular molecular situations and size of basis sets. Therefore, it is desirable to have stable and systematic methods with guaranteed performance and accuracy. In this respect, devising new methods with significantly reduced scaling is preferred in the first place, which of course can be combined with screening techniques for additional performance improvements. Since both the DD and OD methods for NACT calculation apply the finite difference approximation, their actual accuracies are dependent on the first order of the time-step size during molecular dynamics (MD) simulations. Ryabinkin et al.26 have estimated that the approximation introduces 10−7−10−5 errors to the resulting NACTs when using timestep size of 0.1−0.5 fs. We have noticed that determinant calculation in the DD approach is overly accurate, despite modest accuracies of NACTs exhibited by the finite difference approximation. In MD simulations, the molecular geometries at two consecutive time steps differ only slightly. Hence, the corresponding molecular orbitals (MOs) at consecutive time steps are nearly orthonormal. Therefore, the determinants comprised of them are close to the diagonal with very small off-diagonal values. This is particularly true as the time-step size becomes smaller. In this case, a truncation procedure may be devised if the determinant computations are reformulated with terms enumerated according to their relative magnitudes. In this study, we explored the possibilities of determinant factorization and the truncation procedure by taking SFTDDFT and LR-TDDFT as examples. The Leibniz formula was adopted as the proposed candidate for determinant reformulation. A brief introduction to the Leibniz formula and OIs is provided in the following Section II. The precomputation by Plasser et al.15 is presented in Section III. Subsequently, our main propositions regarding determinant factorization (DF) and the truncated Leibniz formula (TLF) are presented in Section IV and V, respectively. Finally, the performance improvements and accuracies are analyzed in Section VI.
sgn(σ ) = (− 1)N(σ )
where N(σ) is the number of inversions in the σ permutation. The fact that the terms on the right-hand side of eq 1 can be rearranged according to their relative magnitudes makes this formula particularly interesting. II.B. Overlap Integrals between the Response States. Determinants or configuration state functions (CSFs) are used to expand many electron wave functions. While determinants are a simple product of spin−orbitals, CSFs are an antisymmetrized combination of a space orbital product and a spin-adapted linear combination of simple α−β products. They are both primarily used for MCSCF33 wave functions and can also be used in CI34 and TDDFT,35 as well as SFTDDFT,36 theories, which take the same basis to expend their response states. SF-TDDFT typically take the determinant basis, while TDDFT typically take CSFs. In this paper, we used both theories to illustrate how our techniques can be implemented in such different formulations. We chose SFTDDFT as the main example for the investigations in this study. Similar derivations using linear-response (LR) TDDFT are provided in Appendix A. It is important to note that the same derivations are readily applicable to our recent mixed reference (MR)-SF-TDDFT, which eliminates the problematic spin-contaminations of SF-TDDFT.37 The SF-TDDFT uses a αα (Ms = +1) triplet-ground reference, which can be represented by a single Kohn−Sham (KS) Slater determinant as |Φ0⟩ =
∑
G = {1, 2, ···, n}
sgn(σ )ϕσ(1)(1)ϕσ(2)(2)
σ ∈ SGref
nα , (n + 1)α }
(4)
(5)
where 2n is the number of electrons and ϕσ(p)(p) denotes the spin-unrestricted KS MO. As an example, if the pth element of a permutation σ of the symmetric group on the set Gref is 1α, then ϕσ(p)(p) is represented by ϕσ(p)(p) = ϕ1 (x p) = ϕ1(rp)α(ω) α
(6)
where xp denotes both position and spin coordinates of the pth electron, xp = (rp, ω). During MD simulations using a timestep size of Δ, the Ith and Jth excited response states of SFTDDFT within the Tamm−Dancoff approximation (TDA)38 at two consecutive time steps is represented by nocc n vir
|ΨI (t − Δ)⟩ =
∑ ∑ XiaI(t − Δ)|Φia(t − Δ)⟩ i
a
(7)
nocc n vir
|Ψ′ J (t )⟩ =
sgn(σ ) ∏ ai , σ(i) i=1
∑
(2n)!
Gref = {1α , 1β , 2α , 2β , ···, iα , iβ , ···, (n − 1)α , (n − 1)β ,
n σ ∈ SG
1
··· ϕσ(2n − 1)(2n − 1)ϕσ(2n)(2n)
II. THEORETICAL BACKGROUND II.A. Leibniz Formula. Gaussian elimination (GE) algorithms or decomposition algorithms (the LU, QR, or Cholesky) are generally employed to compute the determinant. Their computations scale as O(n3), where n is the size of the matrix. Meanwhile, the direct determinant evaluation by the Leibniz formula is highly inefficient for large matrices since the number of required operations grows as n!. However, the Leibniz formula can be reformulated with the terms enumerated in a descending order of their magnitudes. In general, the Leibniz formula represents the determinant of a square matrix with regard to the matrix element permutations. If A is an n × n square matrix whose ij-element is ai,j, the determinant A is expressed as det(A) =
(3)
∑ ∑ X′ Jjb (t )|Φ′ jb(t )⟩ j
(1)
b
(8)
where the nocc and nvir denote the numbers of occupied α and virtual β KS MOs of the reference state, respectively. These are nocc = n + 1 and nvir = K − n + 1 with the basis-set size K. The
(2) B
DOI: 10.1021/acs.jctc.8b01049 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation amplitude matrices XIia(t − Δ) and X′Jjb(t) can be interpreted as the configuration coefficients within the TDA. In the remaining discussions, i, j and a, b are used as indices of occupied and virtual MOs, respectively. The notation prime in eq 8 is introduced to emphasize that the state is in a different time-step t as opposed to t − Δ. With the notation prime, therefore, the time variable is omitted. The determinants |Φia⟩ and |Φ′jb⟩ in eqs 7 and 8 are formed by a spin−flip excitation of an electron from the occupied α to virtual β KS MOs from the reference of eq 4, which are expressed as 1
|Φia⟩ =
∑
(2n)!
sgn(σ )ϕσ(1)(1)ϕσ(2)(2)
σ ∈ SGia
··· ϕσ(2n − 1)(2n − 1)ϕσ(2n)(2n)
1
|Φ′ jb⟩ =
(2n)!
∑
(9) Figure 1. Overall flowchart for calculating OIs. Steps either descend or branch right and left, except the steps represented by the curved line with an arrow. The algorithm comprises five branching steps depicted by diamonds, and each branch is designated as D, P4, P2, PGE, PT0, PT1, and PT2. D represents direct calculations without screening techniques. P4 and P2 indicate the four- and two-index determinants. PGE, PT0, PT1, and PT2 represent the determinant computation methods. While PGE depicts the Gaussian Elimination, PT0, PT1, and PT2 depict TLF(0), TLF(1), and TLF(2). Six different computational procedures of the simple Löwdin formula, precomputation, precomputation with determinant factorization, TLF(0), TLF(1), and TLF(2) can be described as sequences in the chart. The Löwdin formula takes the sequence of S → D → E. The precomputation and precomputation + determinant factorization are the sequences of S → P4 → E and S → P2 → PGE → E, respectively. Finally, TLF(0), TLF(1), and TLF(2) are the sequences of S → P2 → PT0 → E, S → P2 → PT1 → E, and S → P2 → PT2 → E, respectively.
sgn(τ )ϕ′τ(1)(1)ϕ′τ(2)(2)
τ ∈ SGjb
··· ϕ′τ(2n − 1)(2n − 1)ϕ′τ(2n)(2n)
(10)
with Gia = {1α , 1β , 2α , 2β , ···, aβ , iβ , ···, (n − 1)α , (n − 1)β , nα , (n + 1)α }
(11)
Gjb = {1α , 1β , 2α , 2β , ···, bβ , jβ , ···, (n − 1)α , (n − 1)β , nα , (n + 1)α }
(12)
respectively. With these formulations, the OI between Ith and Jth excited states can be simplified as ⟨Ψ|Ψ I ′J⟩ =
∑∑ ia
β Siajb =
∑
sgn(σ )⟨ϕ1 |ϕ′σ(1)⟩···⟨ϕ(i − 1) |ϕ′σ(i − 1)⟩⟨ϕa |ϕ′σ(i)⟩ β
σ ∈ SG β
XiaI*X ′ Jjb ⟨Φia|Φ′ jb⟩
β
(13)
jb
⟨ϕi |ϕ′σ(i + 1)⟩···⟨ϕ(n − 1) |ϕ′σ(n)⟩ β
The overlap integral (OI) between Ith and Jth response states can be simply evaluated by eq 13, which is illustrated as the sequence of S → D → E in Figure 1a. Its direct evaluation requires prohibitive O(n2occn2virn3) work, since the conventional determinant calculations scale as O(n3) under the condition of quadruple summations of eq 13. This particular procedure is referred to as the Löwdin procedure throughout this paper.
β ⟨Φia|Φ′ jb⟩ = SijαSiajb
Gjbβ = {1β , 2β , ···, (j − 1)β , bβ , jβ , ···, (n − 1)β }
Sαij
(14)
α
IV. DETERMINANT FACTORIZATION The determinant Sβiajb with four indices {i, a, j, b} can be further factorized by row- and column-switching operations as
α
⟨ϕ(i + 1) |ϕ′σ(i)⟩···⟨ϕn |ϕ′σ(n − 1)⟩⟨ϕ(n + 1) |ϕ′σ(n)⟩ α
α
α
(18)
Sβiajb
Precomputations of and (eqs 15 and 16) for the sets of all indices {i, j} and {i, a, j, b} scale as O(n2occn3) and O(n2occn2virn3), respectively. The latter has the same scaling as eq 13. Although the precomputation technique certainly reduces the computational prefactor, it alone cannot reduce the overall computational scaling. The precomputation of the spin separated determinants is summarized by the paths of S → P4 → E in Figure 1a. They are denoted as precomputation in Figure 2 and Table 1. Similar derivations for LR-TDDFT are also given in Appendix A. In the case of LR-TDDFT, the OI between two singlet spin-adapted configuration state functions (CSFs), CSF ⟨ΦCSF ia |Φ′jb ⟩, is represented by four types of determinants [1] [2] [4] (S , Sijab, S[3] ia , Sjb ) in eqs A4−10.
sgn(σ )⟨ϕ1 |ϕ′σ(1)⟩···⟨ϕ(i − 1) |ϕ′σ(i − 1)⟩
σ ∈ SGαj
(16)
(17)
with
∑
β
Gjα = {1α , 2α , ···, (j − 1)α , (j + 1)α , ···, nα , (n + 1)α }
III. PRECOMPUTATION As described in the introduction, the precomputation of determinants that are repeatedly needed in the calculation, before carrying out the four-index summation of eq 13, has been proposed by Plasser et al.15 This precomputation is obviously applicable to both SF-TDDFT and LR-TDDFT. The determinant overlap ⟨Φia|Φ′jb⟩ on the right-hand side of eq 13 can be separated into two determinants for spin α and β parts as
Sijα =
β
jb
β Siajb = ( −1)i + j Sab
(15) C
(19) DOI: 10.1021/acs.jctc.8b01049 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
2 2 3 2 3 overhead by two orders from O(nocc nvir n ) to O(nvir n ). However, the precomputation + factorization step is still computationally expensive. On the other hand, the LR-TDDFT with CSF also has a four-index determinant S[2] ijab of eq A6b in Appendix A. Unlike eq 19 for SF-TDDFT, the four-index term S[2] ijab of LR-TDDFT of eq A6b in Appendix A cannot be factorized into two-index factors. However, by applying an approximation proposed in the following Section V, it can be represented by one- or twoindex terms of eqs A15−17. Therefore, the computational scaling can also be reduced for LR-TDDFT as well.
V. TRUNCATED LEIBNITZ FORMULA V.A. Introduction of Order Parameter λ. As discussed in the introduction, although the MOs at consecutive time steps are formally nonorthogonal, they become nearly orthogonal as time-step size Δ becomes zero. As a result, OIs between MOs, i.e., ⟨ϕlα|ϕ′mα⟩ and ⟨ϕlβ|ϕ′mβ⟩ become nearly δlm. To exploit this condition, a parameter λ representing relative magnitudes is introduced. With this, the Sij and Sab of eqs 23 and 20 can be rewritten as
Figure 2. Ratios of computation times for evaluating OIs with respect to the time for the TLF(0) in log scale. The x-axis represents the size of a hypothetical system (nsys = nocc = nvir) in log scale. Hypothetical systems were introduced to systematically compare scaling between different algorithms. All matrix elements were generated with random numbers only for the performance analysis. Trend lines of data points for each algorithm are also illustrated, whose slope represents the approximate relative scaling of their algorithms.
Sij = ( −1)i + j
∑
sgn(σ )λ1 − δ1α ,σ(1)⟨ϕ1 |ϕ′σ(1)⟩ α
σ ∈ SGαj
with Sab =
∑
··· λ1 − δ(i−1)α ,σ(i−1)⟨ϕ(i − 1) |ϕ′σ(i − 1)⟩ α
sgn(σ )⟨ϕ1 |ϕ′σ(1)⟩···⟨ϕ(i − 1) |ϕ′σ(i − 1)⟩⟨ϕi |ϕ′σ(i)⟩ β
σ ∈ SG β
β
1 − δ(i + 1)α , σ(i)
⟨ϕ(i + 1) |ϕ′σ(i)⟩··· λ1 − δnα ,σ(n−1)⟨ϕn |ϕ′σ(n − 1)⟩
λ
β
α
b
···⟨ϕ(n − 1) |ϕ′σ(n − 1)⟩⟨ϕa |ϕ′σ(n)⟩ β
β
Gbβ = {1β , 2β , ···, (j − 1)β , jβ , ···, (n − 1)β , bβ }
Sij = ( −1)i + j Sijα
⟨ϕ(n + 1) |ϕ′σ(n)⟩
λ Sab =
(21)
(24)
α
(20)
∑
sgn(σ )λ1 − δ1β ,σ(1)⟨ϕ1 |ϕ′σ(1)⟩··· λ1 − δ(i−1)β ,σ(i−1) β
σ ∈ SG β b
Thus, eq 14 can be rewritten as ⟨Φia|Φ′ jb⟩ = SijSab
α
1 − δ(n + 1)α , σ(n)
⟨ϕ(i − 1) |ϕ′σ(i − 1)⟩λ1 − δiβ ,σ(i)⟨ϕi |ϕ′σ(i)⟩··· λ1 − δ(n−i)β ,σ(n−1) β
(22)
β
1 − δaβ , σ(n)
⟨ϕ(n − 1) |ϕ′σ(n − 1)⟩λ β
(23)
⟨ϕa |ϕ′σ(n)⟩ β
(25)
These two equations can be arranged with the powers of λ as
The four-index determinant, Sβiajb, is factorized into twoindex factors, (−1)i+j and Sab in eq 19. Subsequently, the OI between configurations can be represented by the two-index determinants, Sij and Sab, in eq 22 in the case of SF-TDDFT. The four to two index factorized determinants of Sij and Sab (eqs 20 and 23) significantly reduce the latter precomputation to O(n2virn3). After the evaluation of these factored precomputable determinants, the four-index summation step of eq 13 was performed. The corresponding algorithm of these two-index procedures is summarized by the paths of S → P2 → PGE → E in Figure 1a and denoted as precomputation + factorization (see Table 1). It should be emphasized that the four to two index factorization can theoretically reduce the computational
n
Sij =
∑ Aij(k)λk
(26)
k=0 n
Sab =
∑ Bab(k)λk
(27)
k=0
(k) Resultantly, the terms A(k) ij and Bab are given in descending order of their relative magnitudes with k. (k) V.B. Explicit Expressions of A(k) ij and Bab. The simple rearrangement of Sij in eqs B1−B6 in Appendix B can readily deliver the explicit expression of A(k) ij . On the other hand, the
Table 1. Six Algorithms with Their Corresponding Scaling and Exact Procedures for Calculating OIs in This Study overall scaling Löwdin precomputation precomputation + factorization TLF(0)a TLF(1)a TLF(2)a
O(n2occn2virn3) O(n2occn2virn3) O(n2occn3 + n2virn3) O(noccnvir) O(n2occnvir + noccn2vir) O(n2occnvir + noccn2vir + n2occn + n2virn)
procedure S S S S S S
→ → → → → →
D→E P4 → E P2 → PGE → E P2 → PT0 → E P2 → PT1 → E P2 → PT2 → E
a
Combined with precomputation and factorization. D
DOI: 10.1021/acs.jctc.8b01049 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation expressions of B(k) ab can be obtained directly from eq 25. The (0) (1) (1) particular leading terms of A(0) ij , Bab , Aij , and Bab are given by Aij(0)
=
Aii(0)δij
∏ ⟨ϕkα|ϕ′kα⟩, k=1
Mα = δij α ⟨ϕi |ϕ′iα⟩
⟨ϕjα|ϕ′iα⟩ ⟨ϕiα|ϕ′iα⟩⟨ϕjα|ϕ′ jα⟩
(29)
(1 − δij)
Aij(2)
(30)
(1) Bab = Mβ ⟨ϕaβ |ϕ′bβ ⟩(1 − δab)
n−1
Mβ =
∏ ⟨ϕkβ|ϕ′kβ⟩ k=1
(31)
l n + 1 ⟨ϕα |ϕ′ α ⟩⟨ϕ α |ϕ′ α ⟩ o o 1 j k k i o o M (for i ≠ j) o ∑ α o o ⟨ϕi α|ϕ′iα⟩⟨ϕjα|ϕ′ jα⟩ k = 1 ⟨ϕkα|ϕ′kα⟩ o o o o o k≠i ,j o o =m o o n 1 n + o ⟨ϕkα|ϕ′lα⟩⟨ϕl α|ϕ′kα⟩ o 1 o o (for i = j) − Mα α α ∑ ∑ o o o ⟨ϕi |ϕ′i ⟩ k = 1 l = k + 1 ⟨ϕkα|ϕ′kα⟩⟨ϕl α|ϕ′lα⟩ o o o o o k≠i l≠i n
where
(33)
(2) Bab
l n−1 o ⟨ϕaβ |ϕ′kβ ⟩⟨ϕkβ |ϕ′bβ ⟩ o o o o o− Mβ ∑ o o ⟨ϕkβ |ϕ′kβ ⟩ o k=1 o =m o o n−2 l o o n − 1 ⟨ϕaβ |ϕ′kβ ⟩⟨ϕkβ |ϕ′aβ ⟩ o o− M o β β o ′ + ⟨ ϕ | ϕ ⟩ m∑ o ∑ β o a a o o o o ⟨ϕkβ |ϕ′kβ ⟩ o k=1 o k=1 n n
[3] Likewise, for the four types of determinants, S[1], S[2] ijab, Sia , [4] and Sjb of LR-TDDFT, the corresponding explicit expressions (k) (k) up to the second order of A(k), B(k) ijab, Cia , and Djb (k = 0, 1, 2) are given in eqs A12−A23 of Appendix A. As discussed at the end of Section IV, it is important that the explicit expressions of the four-index determinant up to the second order, B(0) ijab, (1) (2) Bijab , and Bijab in the case of LR-TDDFT, are in fact represented as one- or two-index terms, B(0)(l) , B(1)(l) (l = 1, p pq (2)(m) 2), and Bpq (m = 1−7), respectively (see Appendix A). V.C. Truncations of Leibniz Formula. With the order parameter λ, the higher terms in eqs 26 and 27 can be safely omitted. This process is referred to as truncation of the Leibniz formula (TLF) in this paper. The simplest algorithm takes only the zeroth order in eqs 26 and 27, yielding
Sij ≅ SijTLF(0) ≡ Aii(0)δij ,
(0) TLF(0) Sab ≅ Sab ≡ Baa δab
(35)
Sij ≅ SijTLF(1) ≡ Aij(0) + λAij(1) , TLF(1) (0) (1) Sab ≅ Sab ≡ Bab + λBab
(36)
Sij ≅ SijTLF(2) ≡ Aij(0) + λAij(1) + λ 2Aij(2) , TLF(2) (0) (1) (2) Sab ≅ Sab ≡ Bab + λBab + λ 2Bab
The above TLF methods can be applied directly to the LRTDDFT. With the precomputed sets of determinants of {Sij} and {Sab}, a simple contraction can further reduce the 4-index summation of {i, a, j, b} in eq 13 into a 2-index formula of ij
yzij z {k
ib
j k
j
yz
a
{
ib
n−1
∑ l=k+1
o ⟨ϕkβ |ϕ′l β ⟩⟨ϕl β |ϕ′kβ ⟩ | o } (for a = b) o ⟨ϕkβ |ϕ′kβ ⟩⟨ϕl β |ϕ′l β ⟩ o o ~
(34)
VI. RESULTS AND DISCUSSION VI.A. Performance Analysis. Two new techniques of DF and TLF approximation are introduced in Section IV and V, respectively. Their effects on the performance improvement are investigated in this section. The computation time depends on the number of occupied (nocc) and virtual orbitals (nvir). Unfortunately, there is no unique way of investigating the performances, especially regarding their scaling with respect to system size. Therefore, a hypothetical set of systems in which nocc = nvir were created. For simplicity, we adopt a variable nsys
(37)
∑ jjjjj∑ SijX′ Jjb zzzzzjjjjj∑ XiaISabzzzzz ≡ ∑ CibJDibI
(for a ≠ b)
V.D. Efficient Algorithms with TLF. With the help of determinant factorization, the computational procedures for evaluating OI with TLF(0), TLF(1), and TLF(2) are shown in Figure 1as the sequences of S → P2 → PT0 → E, S → P2 → PT1 → E, and S → P2 → PT2 → E, respectively. The Mα and Mβ of eq 32 are precomputed for all TLF(0), TLF(1), and TLF(2), which scale as O(n). With the two values, TLF(0) (0) only requires sets of {A(0) ii } and {Baa } with the scaling of O(nocc + nvir). One of the most significant aspects of TLF(0) is the linear dependence on nvir. The scaling of all other methods including TLF(1) and TLF(2) in this study exhibit a quadratic dependence on nvir. TLF(1) additionally requires sets of (1) 2 2 [{A(1) ij }, {Bab }] with the scaling of O(nocc + nvir), and the (2) second-order TLF(2) also requires sets of [{Aij }, {B(2) ab }] with the scaling of O(n2occn + n2virn). The computational scales are observed to proportionally increase with the order of TLF. These values are utilized to form {Sij} and {Sab}, followed by the contractions of {Sij} and {Sab} with {X′Jjb} and {XIia}. The contraction step requires O(n2occnvir + noccn2vir) computations. However, in the particular case of TLF(0), since {Sij} and {Sab} only survive when i = j and a = b, so TLF(0) is reduced to O(noccnvir), yielding a very efficient quadratic algorithm. In the OI calculation, the contraction is the most timeconsuming step for TLF(0) with the quadratic scaling and for both TLF(1) and TLF(2) with the cubic scaling. In the case of (2) TLF(2), the precomputing {A(2) ij } and {Bab } also requires an 3 O(n ) step. Thus, TLF(2) has a larger computational prefactor than TLF(1) with the same computational scaling. The overall features of five different procedures are summarized in Table 1.
which is defined as TLF(0). A remarkable feature of this approximation is its ability to reduce the typical O(n3) GE algorithm down to O(n). Furthermore, it is noticeable that the equation above only becomes nonzero, when i = j and a = b, dramatically reducing the terms that are computationally necessary. By keeping terms up to the first or the second order in eqs 26 and 27, more accurate TLF(1) and TLF(2) can be also defined with the scales of O(n) and O(n2), respectively.
⟨Ψ|Ψ I ′J⟩ =
(32)
(2) The second-order coefficients of Aij(2) and Bab are also represented as
(28)
(0) (0) Bab = Baa δab = Mβ ⟨ϕaβ |ϕ′aβ ⟩δab
Aij(1) = −Mα
n+1
Mα =
(38) E
DOI: 10.1021/acs.jctc.8b01049 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation that represents the system size and conveniently lets nsys = nocc = nvir. The elements of determinants are generated with random numbers. All the calculations in this study were performed using the modified GAMESS program.39 Six different computational procedures of the simple Löwdin formula, precomputation, precomputation with determinant factorization, TLF(0), TLF(1), and TLF(2) were tested. It is noted that the TLF(0), TLF(1), and TLF(2) were combined with precomputation and determinant factorization techniques. They are denoted as Lö w din, precomputation, precomputation + factorization, TLF(0), TLF(1), and TLF(2), respectively (see Table 1). Since the TLF(0) is the most efficient, the relative timings of all other procedures compared to TLF(0) in log scale are illustrated in Figure 2. The x-axis represents nsys in log scale. All timing ratios exhibit almost linear lines with nsys in log scale. Consequently, their slopes represent their scaling orders 5.0 3.0 1.1 1.0 of n4.9 sys :nsys :nsys :nsys:nsys:1 for Lö wdin, precomputation, precomputation + factorization, TLF(2), TLF(1), and TLF(0), respectively. These correspond to the expected scaling of O(n7sys):O(n7sys):O(n5sys): O(n3sys):O(n3sys):O(n2sys) (see Table 1). The precomputation suggested by Plasser et al.15 reduces the computational prefactor of the Löwdin procedure, but it fails to reduce the overall scale as a function of system size. However, it should be emphasized that any screening techniques were not adopted in the current study, and thus their actual performance is expected to be higher. The determinant factorization significantly reduces the computational overhead by two orders. The algorithms with TLF(1) and TLF(2) achieve an additional performance improvement by nearly two orders. The prefactor of TLF(1) is smaller than that of TLF(2) with the same computational order, as expected. The unprecented quadratic scaling of TLF(0) makes it the most efficient algorithm. In short, the factorization with TLF achieves remarkable four to five orders of combined performance improvements making the OI computation nearly negligible. In order to study the performance in real systems, the molecule stilbene, as shown in Figure 3c, is chosen. NACTs between the first and the second lowest electronic states were calculated with six different algorithms in Table 1. The relative speed-ups of NACT calculations with respect to Löwdin methods are presented in Table 2. The speed-ups were calculated with three different basis sets to investigate the dependences on the occupied (nocc) and the virtual orbitals (nvir). It should be pointed out that no additional techniques, such as screening, were combined with any of these methods. Therefore, the actual speed-ups are different from other studies. For instance, although the precomputation suggested by Plasser et al.15 exhibits merely an improvement of a factor of 2, its actual speed-ups are much better with screening techniques. Once combined with factorization, the performances increase by 3 orders of magnitude, which is similar to the theoretically expected performance improvement O(n2vir) ∼ 103 for the STO-3G basis set and O(n2occ) ∼ 103 for the 6-31G(d) and 6-311G(d) basis sets. The TLF(2) adds an improvement of 3 orders of magnitude compared to the precomputation + factorization, which is again similar to O(n2) ∼ 103, regardless of basis set. TLF(1) exhibits slightly improved performance compared to TLF(2), since they have similar scaling with a different prefactor. In contrast, the TLF(0) adds nearly 1 order of magnitude improvement compared to TLF(1), which is similar to the theoretically expected performance improvement
Figure 3. Molecular structures of three different conical intersections studied in this work: (a) twisted-pyramidalized, (b) H-migrated ethylene, and (c) twisted-pyramidalized stilbene. Detail geometrical information is provided in Supporting Information.
O(nocc) ∼ 101 for STO-3G and O(nvir) ∼ 101 for other basis sets. Since all algorithms except for TLF(0) have a computational scaling O(n2vir), the performances are similar to larger basis sets. This tendency appears between 6-31G(d) and 6311G(d) basis sets, whose virtual orbitals are larger in comparison to occupied orbitals. VI.B. Accuracies of TLF. Determinant factorization is an exact process. Therefore, the analysis of prediction accuracies are only relevant to TLF. To investigate the accuracies in actual calculations, the NACTs near three different conical interactions were studied. They are twisted-pyramidalized, Hmigrated ethylene, and twisted-pyramidalized stilbene, as shown in Figure 3. Their exact structures are presented in the Supporting Information. The BH&HLYP functional40 in combination with 6-31G(d) basis set was utilized. By using the finite difference approximation,1 the NACT can be obtained with the OIs as 1 y ∂ 1 y i i ΨS0jjjt − Δzzz ΨS1jjjt − Δzzz 2 { ∂t 2 { k k 1 ≅ (⟨ΨS0|Ψ′S1⟩ − ⟨Ψ′S0|ΨS1⟩) 2Δ
F
(39)
DOI: 10.1021/acs.jctc.8b01049 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation Table 2. Speed-Ups for Various Algorithms with Respect to the Löwdin Algorithm without Any Screeninga speed-ups molecule
basis set
b
c
stilbene
STO-3G 6-31G(d)c 6-311G(d)c
precomp.
precomp. + factoriz.
TLF(0)
2.0 2.0 1.9
2.9 × 10 8.8 × 103 8.9 × 103
1.1 × 10 5.2 × 108 6.9 × 108
3
TLF(1) 8
TLF(2)
8.0 × 10 1.8 × 107 1.6 × 107 6
1.4 × 106 5.4 × 106 5.3 × 106
In all calculations, NACTs between the first and the second lowest electronic states were calculated. bnocc = 49 and n = 48. cnvir = 35 (STO-3G), 187 (6-31G(d)), and 255 (6-311G(d)). a
Table 3. NACTs (a.u.) between S1 and S0 States Using the Löwdin Algorithm for Three Different Conical Intersections and for Three Different Time-Step Sizes (Δ), NACT Differences (a.u.) between the Löwdin Algorithm and Other Algorithmsa NACT with TLF − NACT with Löwdin (a.u.)
NACT (a.u.) molecule twisted-pyramidalized ethylene
H-migrated ethylene
twisted-pyramidalized stilbene
Δ (fs) 0.1 0.2 0.5 0.1 0.2 0.5 0.1 0.2 0.5
Löwdin
TLF(1)b
TLF(0)
0.2361078 0.1199419 0.0480514 0.0885232 0.1156795 0.0472909 0.0105009 0.0092561 0.0085117
2.8 2.5 4.5 4.8 3.0 5.4 4.7 5.2 9.9
× × × × × × × × ×
−6
−7
10 10−6 10−6 10−6 10−6 10−6 10−6 10−6 10−6