Benchmarking for Perturbative Triple-Excitations in EE-EOM-CC

Feb 13, 2013 - Benchmarking for Perturbative Triple-Excitations in EE-EOM-CC. Methods. Thomas J. Watson, Jr.,*. ,†. Victor F. Lotrich,. †. Peter G...
0 downloads 0 Views 914KB Size
Article pubs.acs.org/JPCA

Benchmarking for Perturbative Triple-Excitations in EE-EOM-CC Methods Thomas J. Watson, Jr.,*,† Victor F. Lotrich,† Peter G. Szalay,‡ Ajith Perera,† and Rodney J. Bartlett*,† †

Quantum Theory Project, Department of Chemistry and of Physics, University of Florida, P.O. Box 118435, Gainesville, Florida 32611-8435, United States ‡ Institute of Chemistry, Eötvös University, H-1518 Budapest, P.O. Box 32, Hungary S Supporting Information *

ABSTRACT: Perturbative triples corrections ((T) and (T̃ )) to the equation of motion coupled cluster singles and doubles (EOM-CCSD) are rederived and implemented in a pilot parallel code. The vertical excitation energies of molecules in the test set of Sauer et al. [J. Chem. Theor. Comput. 2009, 5, 555] are reported and compared to the iterative EOM-CCSDT-3 method. The average absolute deviations of EOMCCSD(T) and EOM-CCSD(T̃ ) from EOM-CCSDT-3 over this wide test set are 0.06 and 0.18 eV, respectively. The poor performance of the latter suggests misbalanced handling of the (T̃ ) terms. Scaling curves for EOMCCSD(T) are also presented to demonstrate the performance across multiple compute nodes, thus enabling the routine and accurate study of excited states for ever larger molecular systems.



INTRODUCTION

Adhering to the above, the methods and programs will remain relevant to applications for ever larger and chemically relevant molecules. For ground states dominated by a single reference wave function, CC limited to single (T̂ 1) and double (T̂ 2) substitutions with a noniterative perturbative treatment of triples (T̂ 3) (CCSD(T))10−13 in the general formulation of Watts’ et al.13 satisfies the aforementioned desiderata and provides accurate results for most computationally accessible chemistry problems, though not all. Further extensions, like ΛCCSD(T),14−16 and the variants of Kowalski and Piecuch et al.17,18 sometimes offer improvements. In addition, many attempts to include quadruples (T̂ 4) effects have been attempted, some using factorized quadruples, Q f, like CCSDT(Q f), and the ∼n7 CCSD(TQ f)14,19 and approximate iterative quadruples, like CCSDTQ-1,20 which is ∼n9. A related perturbative hierarchy for the CC ground state includes the CCSD(2) method of Gwaltney and Head-Gordon et al.21−23 which also uses Q f, and higher order nonfactorized extensions in the cluster operators and perturbation truncations, m and n, respectively (CC(m)PT(n)) by Hirata et al.24 Unlike the Q f, which is a fifth-order ∼n7 approximation, in CCSDT(Q)25 the Q approximation is not factorized, making it ∼n9 like in CCSDTQ-1.3 Once the ground state solution for some T̂ approximation is obtained, then the Schrödinger equation becomes, H̅ |Φ0⟩ =

With the age of serial computing coming to an end, new algorithms for the massively parallel implementations of the most accurate theoretical methods are critical to push the boundaries toward larger molecules including, but not limited to, biologically relevant ones.1,2 However, the chosen methodology must be able to accurately describe the phenomena encountered in molecular structure and spectra and to provide a foundation for building the new theory required by the next generation. Guided by insight from the wide-ranging success of coupled cluster (CC) theory for ground states3 and its equation of motion (EOM-CC) extensions3−5 to excited states, essential desiderata for theoretical methods is that they should • provide systematic convergence to the full configuration interaction (FCI) solution; • be size-extensive (completely linked) for global properties and size-intensive for energy differences;3,6 • apply to ground and excited states of any type, with a seamless connection to all other states of interest that involve different particle numbers like ionized and electron attached states; • maintain orbital invariance between occupied to occupied, virtual to virtual, and if used, active to active orbitals, as otherwise the results can change with orbital choice; • enable feasible computations, which definition of “feasibility” is changing with the massive parallelization techniques of quantum chemistry program packages.7−9 © 2013 American Chemical Society

Received: August 30, 2012 Revised: February 6, 2013 Published: February 13, 2013 2569

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

limitations can be removed via methods like EOM-CCSD(T̃ ) and EOM-CCSD(T). These noniterative perturbative triples methods have mostly been tested on small prototypical systems. Recently, a robust test set consisting of over 24 molecules and over 120 excited states has been proposed and used to assess the accuracy of TDDFT and MRCI methods49 as well as CASPT2, CC2, EOM-CCSD, and CC3 methods.50 This test set was also used for benchmarking the perturbative CCSDR(3) method compared to its iterative analogue, CC3.51 This paper assesses how well any of these other perturbative methods perform over this wide variety of chromophores. In that vein, the EOM-CCSD(T̃ ) and EOM-CCSD(T), which satisfy the above desiderata, are rederived, implemented in a pilot parallel code within the ACESIII program package,7 and compared against the robust EOM-CCSDT-3 method28,29 with the aforementioned test set. This is followed by some computational details for the parallel implementation given in the Appendix. The side-by-side comparisons of these noniterative pertrubative triples corrections are then given, as well as performance numbers for the EOM-CCSD(T) method, which will be shown to be the more accurate method compared to EOM-CCSD(T̃ ). Conclusions and future outlook are given in the final section.

E|Φ0⟩, for H̅ = exp(−T)H exp(T) and |Φ0⟩, any single determinant wave function. When making the transition to EOM-CC for excited states, a new operator, R̂ k, is introduced such that [H̅ ,R̂ k]|Φ0⟩ = ωkR̂ k |Φ0⟩, where ωk is the excitation energy. There are contributions from single (R̂ 1), double (R̂ 2), and triple (R̂ 3)excitations in the excited state with similar computational demands as in the ground state. EOM-CCSDT26 scales as iterative ∼n8 in the ground state (CCSDT solution) and iterative ∼n8 for each excited state; and since the excited state calculation requires the iterative diagonalization of a matrix whose rank is ∼n6, another ∼n8 step is added due to the diagonalization algorithm. Clearly, simplified methods are warranted to include triples in excited states. The first noniterative correction for the equation-of-motion extensions of CC theory for excited states was, EOMCCSD(T) presented by Watts and Bartlett,.27 This one perturbatively estimates R̂ 3 from R̂ 2, where R̂ 2 is known from an EOM-CCSD calculation and H̅ is replaced by H. There is no contribution to R̂ 3 from R̂ 1. A second perturbative approximation, EOM-CCSD(T̃ ), was introduced to potentially better include the single excitations which appear once H̅ is used.28 However, in this approximation there are several terms of the same order that are neglected to retain a tractable theoretical model. This work also introduced the iterative EOM-CCSDT-1 and EOM-CCSDT-3 methods. When generalizing to the excited states, the EOM-CCSDT-3 is considered to offer the best ∼ n7 reference value, but it still requires the diagonalization of the ∼ n6 triple excitation block of H̅ . The CCSDT-3 for the ground state means that all possible contributions from T̂ 1 and T̂ 2 to T̂ 3 are included iteratively, but not T̂ 3 to T̂ 3, which makes the method scale as iterative ∼ n7. Then, based upon this transformed Hamiltonian (H̅ = H̃ + [H̃ T̂ 3]C), all R̂ 1, R̂ 2, and R̂ 3 are included in the excited state description. Several applications were presented,29−31 demonstrating that triples corrections were particularly important when there are significant double excitation contributions in excited states,30 but can also affect EOM-CCSD single excitation states by ∼0.2 eV. The Watts, et al. corrections have also been used by Kowalski et al., in their large scale applications.32 Akin to the ground state, completely renormalized variants of Piecuch et al.33−39 have been developed for excited states (including EOM-CCSD(T), EOM-CCSD(T)L, r-EOM-CCSD(T)). However, these methods do not satisfy all of our desiderata since they are either not fully size-extensive or not invariant to orbital rotations by virtue of using Epstein-Nesbet energy denominators.40,41 They have been shown to be accurate for porphyrin and fused porphyrin systems though.42,43 The CC(m)PT(n) hierarchy of Hirata et al.24 have been extended for excited states in a similar fashion and denoted EOM-CC(m)PT(n).6,44,45 This hierarchy converges to the Full CI limit, but incurs severe formal scaling bottlenecks as m and n are increased. The specific EOM-CC(2)PT(2)T variant is formally equivalent to the EOM-CCSD(T̃ ) method of Watts and Bartlett.28,29 Related approximations for excited states, like CCSDR(3), have also been developed, based on the linear response CCn family of methods of Christiansen, Koch, and Jørgensen.46−48 CC3 is computationally the same as EOM-CCSDT-3, requiring an iterative ∼n7 treatment of the ground state and the diagonalization of an ∼n6 rank matrix, due to the presence of the triples−triples block of H̅ . One would hope that both



THEORY In the EOM-CC methods, the Hamiltonian that is diagonalized in the appropriate sector of Fock space, H̅ , is not hermitian, having distinct left and right eigenvectors yielding the same eigenvalue. (H̅ 9̂ k)C |Φ0⟩ = ωk 9̂ k|Φ0⟩

(1)

⟨Φ0|3̂ kH̅ = ⟨Φ0|3̂ kωk

(2)

The similarity transformed Hamiltonian is defined as ̂ ̂ H̅ = (e−T Hê T )C − E0

(3)

being the electronic Hamiltonian parametrized with the ground state CC amplitudes, T̂ , restricted to connected terms.3 As a consequence, H̅ contains one-, two-, three-, four-, and higher body operators denoted f ̅, V̅ , II̅ I, and IV ̅ , respectively. The ground state reference determinant energy, E0, is subtracted to obtain direct expressions for the excitation energy, ωk. These vectors are chosen to be biorthonormal δkl = ⟨Φ0|3̂ k9̂ l|Φ0⟩

(4)

yielding the CC energy functional ωk = ⟨Φ0|3̂ kH̅ 9̂ k|Φ0⟩

(5)

When the CC equations are limited to single and double substitutions only, the 3̂ k and 9̂ k eigenvectors have the form 9̂ k|Φ0⟩⟩ = (r0 + R1̂ k + R̂ 2k)|Φ0⟩ = r0|Φ0⟩ +

∑ ria|Φia⟩ + ia

1 4

∑ rijab|Φijab⟩ ijab

(6)

⟨Φ0|3̂ k = ⟨Φ0|(l0 + L1̂ k + L̂ 2k ) = ⟨Φ0|l0 +

∑ ⟨Φia|lai + ai

2570

1 4

∑ ⟨Φijab|labij abij

(7)

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

is also relevant. Whereas in terms of H̃ = H̅ [0] + ΔH̃ [1], any contributors to the perturbation are first order, the second breakdown suggests the comparative importance of the terms in ΔH̃ [1], in order of V̂ , the correlation perturbation, which is useful in establishing suitable approximations like EE-EOMCCSD(T̃ ). This first partitioning will define the EE-EOM-CCSD(T̃ ) method originally formulated by Watts and Bartlett29 and derived in a different way by Watts, Gwaltney, and Bartlett.28 H̅ is given explicitly as

Note the linear vectors resemble a CI ansatz, where the r0 term is necessary to describe excited states of the same symmetry as the ground state; l0 is zero by construction for all excited states (3 0 = 1 + Λ for the ground state).3 The P′-space for EEEOM-CCSD is spanned by the reference determinant with a projector, |0⟨0| and the Q′-space is spanned by the singly, |S, and doubly, |D, excited determinants with a projector (the prime is used because these spaces are partitioned differently later in the text) Q ′ = |S⟩⟨S| + |D⟩⟨D| =∑ |Φia⟩⟨Φia| + ia

1 4

(8)

∑ |Φijab⟩⟨Φijab| (9)

ijab

This partitions the Hamiltonian, H̅ , explicitly as We note in passing that the Q3-projections (H̅ T0)would be zero if the full triples were solved for in the ground state, or any of the iterative triples methods like EOM-CCSDT-n. The Watts− Bartlett derivations exclusively use the EOM-CCSDT-n framework for the direct energy dif ference, ω, meaning H̅ T0 is zero before the perturbation approximation is made, so the issue does not arise. Also, since their derivation stays within the EOM-CC framework, it never needs to address the issue of unlinked diagrams. This is contrary to that of Shiozaki et al.,6 who first have to eliminate the latter and then try to correct from there. This causes them to address the nonvanishing triples in the ground state in their derivation, but the final expression for EOM-CCSD(T̃ ) is the same.29,28 The exact left and right-hand eigenvectors up to triples for state k, 3̂ k and 9̂ k , are

clearly indicating that only the Q′-space of H̅ needs to be diagonalized. This is critical when examining perturbative triples corrections to EE-EOM-CCSD. It will be useful to define the following quantities used in the perturbative triples correction to EE-EOM-CCSD. We will define a new P-space to be spanned by the ground, singly, and doubly excited determinants with a projector P = |0⟩⟨0| + |S⟩⟨S| + |D⟩⟨D|

=|Φ0⟩⟨Φ0| +

∑ |Φia⟩⟨Φia| + ia

(11)

1 4

∑ |Φijab⟩⟨Φijab| ijab

(12)

and the Q-space will be limited to triples, Q  Q3 = (1/36) abc ∑ijk∑abc |Φabc ijk ⟩ ⟨Φijk |. The complete similarity transformed Hamiltonian, H̅ , is partitioned in the P- and Q-spaces as

⟨Φ0|3̂ k = ⟨Φ0|3̂ k(P + Q ) = ⟨Φ0|L̂P + ⟨Φ0|LQ̂

(20)

=⟨Φ0|(L1̂ k + L̂ 2k ) + ⟨Φ0|L̂ 3k

(21)

H̅ = (P + Q )H̅ (P + Q )

(13)

9̂ k|Φ0⟩ = (P + Q )9̂ k|Φ0⟩ = R̂P|Φ0⟩ + R̂ Q |Φ0⟩

(22)

=HPP ̅ + HPQ ̅ + H̅ QP + H̅ QQ

(14)

= (r0k + R1̂ k + R̂ 2k)|Φ0⟩ + R3̂ k|Φ0⟩

(23)

and partitioned in orders relative to the zeroth-order Hamiltonian as H̅ = H̅ [0] + ΔH̃ [1]

With these definitions, the excitation energy can be written as ωk = ω = ωP + ωQ = ωCCSD + ωQ

(15)

The starting point for perturbative corrections to the excitation energy is the EE-EOM-CC energy functional partitioned with the defined P and Q-spaces,

The choice of zeroth-order Hamiltonian is H̅ [0] = PHP ̅ + Q (ωCCSD + foo + fvv )Q

(16)

ω⟨Φ0|3̂ k9̂ k|Φ0⟩ = ⟨Φ0|3̂ kH̅ 9̂ k|Φ0⟩

Note that the subscripts P and Q are used to reduce clutter. The freedom to include the EE-EOM-CCSD excitation energy, ωCCSD, and the (semi)canonical occupied-occupied and virtual− virtual Fock matrices, foo and f vv, respectively in the Q-space will define (and simplify) the resulting energy denominators while maintaining orbital invariance.6,16 A second breakdown of the perturbation in terms of the original (semicanonical) correlation perturbation, H0 =

∑ f pp {p† p}

V̂ =

∑ f ia {a†i + i†a} + i,a

1 4

∑ p,q,r ,s

(25)

(ωP + ωQ )(L̂P RP̂ + LQ̂ R̂ Q ) ⎡ HPP HPQ ̅ ̅ ⎤ ⎡ R ̂P ⎤ ⎥⎢ ⎥ = [ L̂P LQ̂ ]⎢ ⎢⎣ H̅ QP H̅ QQ ⎥⎦⎢ R̂ ⎥ ⎣ Q⎦

= L̂P HPP ̅ R̂P + L̂P HPQ ̅ R̂ Q + LQ̂ H̅ QPR̂P + LQ̂ H̅ QQ R̂ Q

(26) (27)

The first terms on the left and right-hand side cancel. We are free to subtract L̂ QωCCSDR̂ Q from both sides to remove the second term on the right-hand side and therefore arrive at the following expression for the energy correction, using the biorthonormality condition ⟨Φ0|L̂ PR̂ P|Φ0⟩ = 1,

(17)

p

(24)

⟨pq||rs⟩{p† q†sr } (18) 2571

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

Since we have limited ourselves to Q3, the correction scales ∼no3nv4. This method clearly couples triples with singles via the (ΔH̃ [1]R̂ 1k)C contribution, which occurs with the three- (III) and four-body operators (IV) as shown in the second expansion above. Since the largest terms for singly excited states are likely to occur due to R̂ 1, these couplings have the potential to be important.28,29,31 However, to include all terms that would occur in the same order in V̂ as does Q3(III R̂ 1k)|0⟩ demands the inclusion of Q3(W̃ ·R[2] 3 )|0⟩, too. And that is an ∼n8 term which is impractical in such a noniterative approximation. Watts and Bartlett considered the ∼n8 term numerically for small molecules, and its neglect seemed to be a reasonable assumption as EOM-CCSD(T̃ ) approached the Full CI result for those cases.6,28,29,31 However, as will be shown later, this approximation always over corrects the excitation energy in the large test set used. It is possible that either this ∼n8 term or even the Q4(IV R̂ 1k)C|Φ0⟩ contribution could be non-negligible for larger systems due to the similar contribution in the Q3-space (Q3(W̅ R̂ 1k|Φ0⟩)). The effect of the ∼n8 term can be estimated by shifting the denominator, D3, by the abc ̂ diagonal elements ⟨abc ijk |V|ijk ⟩ − ⟨0|V|0⟩. We traditionally resist such approximations because of their lack of orbital invariance, though others have considered them.33−39 This might be a case where the easily applied denominator shift may pay dividends to avoid computational scaling of ∼n8. A convenient approximation is based upon the alternative expansion of orders in the two-electron perturbation, V̂ , in eq 43. Recognizing that III is an order higher than Ŵ in V̂ removes the (Q3(Ŵ (1)R̂ [1] 1k )C|Φ0⟩ as it can not be connected, and yields the perturbed R̂ Q3 vector as

ωQ 1 + ωQ ⟨Φ0|LQ̂ R̂ Q |Φ0⟩ = ⟨Φ0|L̂P HPQ ̅ R̂ Q + LQ̂ H̅ QPR̂P + LQ̂ (H̅ QQ − ωCCSD)R̂ Q |Φ0⟩ (28)

To further simplify, we examine the projected equations for the right-hand side Qω 9̂ k|Φ0⟩ = QH̅ 9̂ k|Φ0⟩

(29)

(ωCCSD + ωQ )R̂ Q |Φ0⟩ = H̅ QPR̂P|Φ0⟩ + H̅ QQ R̂ Q |Φ0⟩

(30)

ωQ R̂ Q |Φ0⟩ = H̅ QPRP̂ |Φ0⟩ + (H̅ QQ − ωCCSD)R̂ Q |Φ0⟩

(31)

Insertion of this final expression in eq 28 simplifies the energy correction ωQ 1 + ωQ ⟨Φ0|LQ̂ R̂ Q |Φ0⟩ = ⟨Φ0|L̂P HPQ ̅ R̂ Q + ωQ LQ̂ R̂ Q |Φ0⟩

(32)

ωQ = ⟨Φ0|L̂P HPQ ̅ R̂ Q |Φ0⟩ [1]

(33)

[2]

[1] ̂ ωQ[4] = ⟨Φ0|L̂P H̅PQ R Q |Φ0⟩

(34)

ω[4] Q

Notice, ωQ = and higher in order of H̅ . All that remains is to define R̂ Q. To this end, we use the Hamiltonian separated by the defined orders and collect the lowest-order correction in ΔH̃ [1], to ω[4] Q . Applying this to the amplitude eq 31, showing explicitly the orders and triples Q-space (Q3) ωQ 3R̂ Q 3|Φ0⟩ = H̅ Q 3PRP̂ |Φ0⟩ + (H̅ Q 3Q 3 − ωCCSD)R̂ Q 3|Φ0⟩ (35)

(ωQ 3 − H̅ Q 3Q 3 + ωCCSD)R̂ Q 3|Φ0⟩ = H̅ Q 3PR̂P|Φ0⟩

(ωQ[4]3



H̅ Q[0]3Q 3

+

[2] [0] ωCCSD )R̂ Q 3 |Φ0⟩

=

[1] ΔH̃ Q[1]3PRP̂ |Φ0⟩

[2] [1] −(foo + fvv )R̂ Q 3 |Φ0⟩ = ΔH̃ Q[1]3PRP̂ |Φ0⟩

[2]

[2] (1) [1] R̂ Q 3 |Φ0⟩ = D3(Ŵ R̂ 2k )C |Φ0⟩

(36)

This approximation, named EE-EOM-CCSD(T), performs very satisfactorily, possibly because it eliminates the unbalanced treatment of the terms in the EE-EOM-CCSD(T̃ ) approximation discussed above. It can also be adapted to computer codes with a ground state ΛCCSD(T) implementation, since the only difference is the 9̂ amplitudes replace the T̂ amplitudes and the 3̂ amplitudes replace the Λ̂ amplitudes in the programmable equations.52,53 Since the derivation is the same as for the ground state (ΛCCSD(T))14 there can be no ambiguity with the triples contribution in the ground state, but that is not to say that the EE-EOM-CCSD(T) method is the correct, time-dependent, response equation for CCSD(T) or ΛCCSD(T). No finite order approximation can be used to generate the time-dependent response in a fully satisfactory manner. Instead, perturbation theory is being used for the ground state to define ΛCCSD(T) and to define EOMCCSD(T) for excited states. Both expressions are most easily derived from their equivalent functionals; that is, for the ground state, ET = ⟨0|(1 + Λ)(H̃ T3)C|0⟩ = ⟨0|Λ(H̃ T3)C|0⟩, since R0 = 1, and for the excited state ωT = ⟨0|L(H̃ − E)R3)C|0⟩. Then simply introducing the H[0] used to define orders, isolating any [2] terms that arise from triples, the equations for R[2] 3 and T3 are immediately derived leaving the simple expressions for (T) in either case.52,53

(37) (38)

[1]

R̂ Q 3 |Φ0⟩ = D3ΔH̃ Q[1]3PRP̂ |Φ0⟩

(39)

̃ [1] consists Note that ω[4] Q3 drops out as it is of higher order. ΔH of only two-, three- and higher electron operators because there can be no connected triple excitation contribution from f ̃ with R̂ 1 or R̂ 2. The energy denominator, D3, using the definitions given in eq 16 is D3 = Q 3(H̅ Q[0]3Q 3 − ωCCSD)Q 3

(40)

=Q 3(ωCCSD + foo + fvv − ωCCSD)Q 3

(41)

1 = 36

The

R̂ [2] Q3

∑∑

abc abc |Φijk ⟩⟨Φijk |

εi + εj + εk − εa − εb − εc

ijk

abc



Q3R̂ [2] 3k

(42)

is defined explicitly as

[2]

Q 3R3̂ k = Q 3D3((ΔH̃ [1]R1̂ k)C |Φ0⟩ + (ΔH̃ [1]R̂ 2k)C |Φ0⟩) (43)



with the fourth- order correction to the EE-EOM-CCSD excitation energy given as ωQ[4] =

[1] [1] ̂ [2] ⟨Φ0|L̂P ΔH̃PQ R |Φ0⟩ 3 Q3

(45)

COMPUTATIONAL DETAILS The EE-EOM-CCSD(T) and EE-EOM-CCSD(T̃ ) methods are implemented in the massively parallel program package,

(44) 2572

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

ACESIII,7 with specific programming details given in the Appendix. The main parallelization is over the three virtual indices in the rabc ijk amplitude array. Parallelizing only over these three virtual indices does provide a bottleneck, as the performance will begin to flatten when the number of virtual blocks cubed is equal to the number of processors. A future release will incorporate the necessary changes to overcome this bottleneck in a “black-box” manner; though preliminary results are reported for more optimized code (though the “black-box” nature is lost). However, the principle focus of this paper is the benchmarking of these perturbative methods. The CFOUR54 program package is used for the EE-EOMCCSDT-3 calculations, and has the advantage to run in the computational point group symmetry up to D2h. The test set consists of a subset of the 121 singlet valence states for 24 molecules used in the performance studies of Sauer et al.51 All geometries are obtained from MP2/6-31G* optimizations with the core electrons frozen. The subsequent EOM calculations are performed with the TZVP basis set55 with core electrons frozen to allow direct comparison to CCSDR(3) and CC3 results, which have been shown to be accurate.51 All EOM-CCSD(T) and EOM-CCSD(T̃ ) calculations on the test set were performed on 256 processors using the ARSC machine, Chugach, a Cray XE6 with 728 compute nodes each with 2 octocores, or 16 processors, per node and 30 GB accessible RAM per node. As a quick demonstration of the timings of the EOM-CCSD(T) code, the largest calculation in the test set, Naphthalene, with 238 basis functions and 24 doubly occupied valence orbitals takes 2.9 h for each spin case ijk (ααα and ααβ) of the rabc ijk and labc amplitudes for 12 total roots. The smallest calculation, ethene, with 50 basis functions and 6 doubly occupied valence orbitals takes 1.0 min for each spin case for 12 total roots. More representative timings are demonstrated with cytosine, guanine, and their corresponding Watson−Crick base pair; the latter is chosen to demonstrate the applicability of the program. The nucleobasescytosine, adenine, thymine, and guanine in aug-cc-pVDZ basis sets56 are tested over a range of processors starting with 32 and ending with 512 to demonstrate the efficiency and scaling properties. The parameters for each molecule are shown in Table 1, with the block sizes7 being kept constant across the CPU range.

Figure 1. Correlation plot for calculated single excited states with CCSD vs CCSDT-3.

comparison is from the 21A1g π → π* state of all-E-hexatriene with a deviation of 0.72 eV. Therefore, we examine the perturbative triples corrected methods. The correlation plots for the EOM-CC perturbative triples methods compared to EOM-CCSDT-3 are shown in Figure 2. The EOM-CCSD(T̃ ), which has been shown to lower EOMCCSD excitation energies for many states,28,29,31 continues to do so for this test set, overshooting the EOM-CCSDT-3 in essentially all cases, with an average deviation of 0.18 eV, and a maximum deviation of 0.32 eV in the 11A1″ n → π* excited state of s-triazine. We expect that the contribution of the other neglected terms of the same order mentioned above, which are being neglected to keep the approximation ∼ n7, would bring the excitation energy closer to that of EOM-CCSDT-3. Shiozaki et al. incorporates the quadruples contribution for small molecules including CH+, C2, and H2CO, and shows that the quadruples contribution lowers the excitation energy even more,6 as in earlier work,57 however, those molecules are too small to reach broad conclusions, as is witnessed in this study when EOM-CCSD(T̃ ) is performed with larger molecules and the ground state is treated differently between the two methods (EOM-CC(2)PT(2)6 and EOM-CCSD(T̃ )). The hypothesis remains that the following substantial term, present in Q3 for this method



RESULTS AND DISCUSSION Benchmarking of Excitation Energies. In Figure 1, a correlation plot is constructed for visual demonstration of EOM-CCSD compared to EOM-CCSDT-3. Not surprisingly, triples effects cannot be neglected in terms of absolute values compared to experiment.51 The largest deviation with this

abc rijk =

m

block sizes molecule

electrons

cytosine thymine adenine guanine

21 24 25 28

basis functions

atomic orbital

occupied

virtual

229 261 275 298

33 33 35 33

21 17 18 20

30 33 30 33

e

(46)

would cancel to a degree with the neglected terms of the same order. The EOM-CCSD(T) method performs quite well for this large test set; better than expected, as mentioned by Shiozaki et al.6 alluding to earlier studies,28,29,31 and most likely is due to the wide range of molecules in this set. The average absolute deviation is 0.06 eV, within the range of desired accuracy for excited states. The maximum deviation is 0.15 eV for the 31A1 π → π* state of furan. Comparison with CCSDR(3) is insightful, as the largest deviations typically occur in a side-by-side manner. For instance, the largest deviation of CCSDR(3) is for the 21A1g π → π* state of all-E-hexatriene, the state with largest deviation in EOM-CCSD. The deviations for CCSDR(3) and EOM-CCSD(T) are 0.16 and 0.13 eV, respectively. This trend is demonstrated in Figure 3. The maximum deviation for each system is plotted for both methods as well as the EOMCCSD(T) deviation for the root that corresponds to the root

Table 1. Parameters for each Nucleobase Used to Demonstrate Performance of EOM-CCSD(T) Calculationsa b

∑ rmaIIIijkmbc + ∑ rieIIIejkabc

a

Details on block sizes can be found in ref 7. bThe number of doubly occupied valence orbitals are reported; the core orbitals have been dropped in the calculation. 2573

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

Figure 2. Correlation plot for calculated single excited states with EOM-CC methods vs CCSDT-3.

Figure 5. Correlation plot for calculated single excited states with CCSD(T) vs CCSDR(3).

Figure 3. Maximum deviations (errors) from EOM-CCSDT-3 for each system, ordered by increasing basis set size, using the EOMCCSD(T) and CCSDR(3) methods. For detailed comparison between the two methods, the error in the root obtained with EOM-CCSD(T) corresponding to the root with maximum error using CCSDR(3) is also plotted.

an accurate description of excited states, and the maximum errors in each typically coincide. The maximum deviations of only five molecules of the 24, do not coincide. The two correlation plots in Figure 4 compare the linear response triples methods with EOM-CCSDT-3. There is little difference between the methods, supporting previous work of Sauer et al.51 Unlike EOM-CCSD(T) which only requires matrix diagonalization for a matrix of the size of single and

from CCSDR(3) with maximum deviation, to demonstrate that the excited state roots with maximum deviation are typically the same for each method. The CCSDR(3) results and the EOMCCSD(T) are both within the range of accuracy necessary for

Figure 4. Correlation plot for calculated single excited states with the linear response CC methods vs CCSDT-3. 2574

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

Table 2. Parameters and Timings of the ACESIII Calculations on Cytosine and Guanine and the Corresponding Watson−Crick Base Pair, Where All Calculations Employ the aug-cc-pVDZ Basis Set no. of

elapsed time (s)

system

atoms

val. els.

bf

CPUs

CCSD

EOM-CCSDa

(T)

total time (h)b

cytosine

13

42

229

Thymine

15

48

261

adenine

15

50

275

guanine

16

56

298

98

527

7050 3936 2273 1528 1170 14 609 6918 3751 2087 1326 63 349 34 210 18 993 12 581 9346 33 124 16 849 8354 4494 2625 276 499 152 370 85 439 51 701

5.47 3.07 1.84 1.13 0.90 10.43 5.38 3.13 1.99 1.39 27.31 14.95 8.51 5.51 4.17 25.72 13.51 7.09 4.09 2.80

29

1077 612 364 261 229 1509 811 441 264 175 2700 1521 870 624 540 4468 2347 1245 693 413 16 886 9673 5619 4135

10 192 5692 3464 1925 1517 18 734 10 132 6214 4299 3114 28 681 16 057 9544 5816 4519 49 825 26 685 14 405 8611 6448

cytosine−guanine

32 64 128 256 512 32 64 128 256 512 32 64 128 256 512 32 64 128 256 512 256 512 1024 2048

378 000 100 335 51 929

193.1 55.48 31.80

a

Timings reported are for the construction of H̅ and both the left and right EOM solutions for the lowest excited state. bReported time includes the SCF reference computation, CIS solution, integral transformation, CCSD for the ground state, left and right EOM-CCSD solutions for the lowest root, and the (T) solution for the corresponding root.

double excitations, the iterative ∼n7 CC3 method and the EOM-CCSDT-3 method require diagonalization of a matrix whose rank is the dimension of triples. This leads to an ∼n8 step in the iterative diagonalization that is worse than the underlying ∼n7 iterative procedure for the ground state solution itself. Consequently, they tend to be too prohibitive to be widely used for larger molecules, so we choose to focus on the perturbative corrections. The average absolute deviation from EOM-CCSDT-3 is 0.03 eV with a maximum deviation, as mentioned earlier, of 0.16 eV for the 21A1g π → π* state of allE-hexatriene. For further detail, a correlation plot for EOMCCSD(T) compared to CCSDR(3) is shown in Figure 5. The

Table 3. Preliminary Timing and Efficiency (Referenced to 400 Processors) for Thymine in an aug-cc-pVTZ Basis Set (552 Basis Functions)a NCPUs

time (sec)

scaling efficiency (%)

400 800 1600 3200 6400

33 952 17 717 9938 5089 2947

100 96 85 83 72

a Calculations were performed with testing allocations on the experimental Blue Waters Cray XE6.

Figure 6. A log plot depicting the scaling behavior of the EOM-CCSD(T) calculation for various nucleobases in an aug-cc-pVDZ basis set. Part A corresponds to the total time, and part B corresponds to the triples piece only. The y-axis for each nucleobase is shifted for clarity and the dashed line represents the ideal referenced to 32 pocessors. 2575

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

Figure 7. A log plot depicting the scaling behavior of the EOM-CCSD(T) calculation for the Guanine-Cytosine dimer in an aug-cc-pVDZ basis set. Part A corresponds to the total time, and part B corresponds to the triples piece only. The dashed line represents the ideal scaling referenced to 256 processors.

individual timings for the different spin blocks are available in the Supporting Information. Also, since point-group symmetry is not taken into account, the floating point operations for one root will be identical for any root that is sought. This allows one to parallelize over multiple excited state solutions and obtain, essentially, the same time for the (T) part, regardless of the number of roots. In fact, assume that one is interested in obtaining X roots, or excited state solutions for the GC pair and can run on a reasonable number of processors, e.g., 60 000. Since the EOMCCSD code has been shown to scale well up to 60 000 processors, then by running X-ααα and X-ααβ independently, each on 512 processors, one can compute the triples correction for

absolute average deviation of EOM-CCSD(T) from CCSDR(3) is 0.08 eV with a maximum deviation of 0.66 eV for the 21B1u π → π* state of all-E-octatetraene. However, the CCSDR(3) differs from CC3 for this state by 0.85 eV. The EOM-CCSD(T) differs from CC3 by only 0.19 eV. Sauer et al. recognize that all-E-octatetraene provided a few outliers, but EOM-CCSD(T) does not seem to follow this trend. Parallel Performance. With no statistically distinguishable difference between CCSDR(3) and EOM-CCSD(T) over this test set, we focus on the massively parallel performance of the EOM-CCSD(T) method. As this method is computationally equivalent to the ground state ΛCCSD(T) equations,15,16,58 except that the Λ̂ and T̂ amplitudes are replaced by 3̂ k and 9̂ k amplitudes, respectively, the existing ground state ΛCCSD(T) code can accommodate this method with minor changes. To this end, more appropriate test systems that demonstrate the capabilities of ACESIII are chosen for representative timings. These include the cytosine, thymine, adenine, and guanine nucleobases, as well as the cytosine-guanine Watson− Crick base pair. Complete timings for each step of the computations, including the SCF reference, CIS solution, and integral transform are given in the Supporting Information; these are only fractions of the total time. Focusing on the individual nucleobases demonstrates reasonable performance across the number of processors. Each calculation was purposefully submitted in a “black-box” manner; simply by specifying the molecular geometry, basis set, and number of processors desired. One can obtain better efficiency at higher processor count by modifying the block sizes (and subblock sizes). Preliminary data for thymine in an aug-cc-pVTZ basis set, taking into account appropriate block sizes as well as fine-tuning the code, are shown in Table 3. The efficiency is very good across a wider range of processers. However, this is not “black-box”, and current work is aimed at achieving this high efficiency in a seamless manner. Timings for thein this respectmost relevant system, the GC pair, are shown in Table 2, along with the nucleobase monomers. The final column reports the total time in hours for the complete calculation including only the lowest EOMCCSD excited state solution, for the right and left eigenvectors, as well as the corresponding EOM-CCSD(T) calculation. The (T) column is slightly misleading as the calculations can be partitioned according to the ααα and ααβ summations indices and can therefore be performed separately. The table lists the cumulative time of these two parts, but in essence, the different summations can be “parallelized” for time savings. The

(60000 procs)/(512procs root−1(spin case)−1) /(2 spin cases) ≈ 58 roots

This is a bit of hyperbole which is only meant to show that, for reasonably sized systems, the EOM-CCSD solutions will, potentially, be the rate-determining steps, as each successive root requires previous iteration information. The scaling curves for the total calculation on each nucleobase and the triples portion are shown in Figure 6; Figure 7 shows the scaling results for the GC pair. Thymine and adenine exhibit what appears to be superscaling performance at low processor count. However, this is an artifact of the chosen block sizes (the unit of parallelization in ACESIII) being inadequate for the smaller processor range. As the size of the nucleobase increases from cytosine to guanine, the behavior is as expected, the larger the system, the better the performance across cores. This enables larger molecules to be routinely studied in an efficient manner as exemplified by the GC scaling curves.



CONCLUSIONS In this work, we present a detailed derivation of the economical EOM-CCSD(T̃ ) and EOM-CCSD(T). We implement these methods in a massively parallel environment for their broad use for molecular systems of moderate size, and compare to the reference EOM-CCSDT-3 method, and other approximations. A surprising observation, though anticipated earlier,57 is the EOM-CCSD(T̃ ) method typically overshoots the EOMCCDST-3 result with an average absolute deviation of 0.18 eV and a maximum deviation of 0.32 eV. This is most likely due to the neglect of other terms of the same order because of their 2576

dx.doi.org/10.1021/jp308634q | J. Phys. Chem. A 2013, 117, 2569−2579

The Journal of Physical Chemistry A

Article

from the action of the permutation operator are independent and can be computed separately. Consequently, we can compute all nine permutations in each triples amplitude at the same time, parallelizing over three virtual indices. This is how the EOM-CCSD(T) method is implemented. Parallelizing only over the three virtual indices does provide a bottleneck, as the performance will begin to flatten when the number of virtual blocks cubed is equal to the number of processors. The EE-EOM-CCSD(T̃ ) amplitude equations are a bit more tedious. Explicitly, they are given in spin−orbital form, using the aforementioned expansion of H̅ , as

excessive scaling. The EOM-CCSD(T) method performs just as well as EOM-CCSDT-3, with an average absolute deviation of 0.06 eV and a maximum deviation of 0.15 eV. We therefore conclude that the EOM-CCSD(T) may perform better for excited states than what was previously anticipated from test calculation on smaller molecules.6,27 The linear response analogues are also shown to give excellent agreement compared to EOM-CCSDT-3. The CCSDR(3) and CC3 methods have average absolute errors of 0.03 and 0.05 eV, respectively with a maximum deviations of 0.16 eV for both, though for the same molecule and state hexatriene. There is also no statistical difference between EOMCCSD(T) and CCSDR(3). The parallel implementation of the EOM-CCSD(T) method demonstrates excellent performance across a wide range of processors, and is shown to scale better as the size of the system increases. This combination of accuracy and efficiency thus enables the accurate description of vertical excitation energies on a wide range of chemically relevant systems. In particular, we plan on examining nucleobase and nucleobase networks (dimers, sugar attached, and microhydrated structures).1,2

abc[2] abc d rijk = (P(i/jk) ∑ IIIdjk ri − P(a /bc) ∑ IIIijklbcrla d



+

ld

1 + 2

=A+B

ijk[2] labc = P(a /bc)P(k /ij) ∑ laeijW̅ bcek e im jk − P(i/jk)P(c /ab) ∑ lab W̅ mc + P(k /ij)P(c /ab)⟨ij||ab⟩lck l

+

ij k P(k /ij)P(c /ab)lab f c̅

(52)

The construction of the right-hand eigenvector is only two times more expensive to compute than in the EE-EOMCCSD(T) variant. Each of the computationally expedient terms, A and B, can be programmed as δR3̂ k = P(a /bc)P(k /ij) ∑ Y ekbcXijae e

e

= P(a /bc)P(k /ij) ∑

de

l

abc[2] rijk = P(a /bc)P(k /ij) ∑ ⟨bc||ek⟩rijae

ijk[2] labc

∑ IIIdjeabcrikde

ab W̅ jkmcrim )



The difference between the methods is in the amplitude equations. In the EOM-CCSD(T) method, the spin−orbital equations are

l

ld

1 ac + P(b/ac) ∑ IIIijklbmrlm ) 2 lm e

APPENDIX The details of the implementation of the triples corrected excitation energy are important, as the method formally scales as no3nv4. The triples method is ideally suited for ACESIII, since the data passing is for arrays of, at most, size nonv3, and the entire time of the computation is spent in the contraction step. The expression for the correction to the energy is identical for EOM-CCSD(T̃ ) and EOM-CCSD(T) ijk[2] 1 abc[2] Δω[4] = ∑ ∑ labc rijk εijkabc i