Efficient Computation of the Hartree–Fock Exchange in Real-Space

Jul 7, 2016 - Efficient Computation of the Hartree–Fock Exchange in Real-Space with ... an efficient projection-based real-space implementation of t...
1 downloads 0 Views 2MB Size
Article pubs.acs.org/JCTC

Efficient Computation of the Hartree−Fock Exchange in Real-Space with Projection Operators Nicholas M. Boffi,†,§ Manish Jain,‡ and Amir Natan*,†,∥ †

Department of Physical Electronics, Tel-Aviv University, Tel-Aviv 69978, Israel Department of Physics, Indian Institute of Science, Bangalore 560 012, India § Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, United States ∥ The Sackler Center for Computational Molecular and Materials Science, Tel-Aviv University, Tel-Aviv 69978, Israel ‡

S Supporting Information *

ABSTRACT: We describe an efficient projection-based real-space implementation of the nonlocal single-determinant exchange operator. Through a matrix representation of the projected operator, we show that this scheme works equally well for both occupied and virtual states. Our scheme reaches a speedup of 2 orders of magnitude and has no significant loss of accuracy compared to an implementation of the full nonlocal single-determinant exchange operator. We find excellent agreement upon comparing Hartree−Fock eigenvalues, dipoles, and polarizabilities of selected molecules calculated using our method to values in the literature. To illustrate the efficiency of this scheme we perform calculations on systems with up to 240 carbon atoms.



INTRODUCTION Density functional theory (DFT)1 within the Kohn−Sham formalism2 has had remarkable success in the prediction of structural and ground-state electronic properties of condensed matter systems. In the past two decades, a new class of exchange-correlation approximations, called hybrid functionals, have been introduced. Such functionals mix a fraction of the nonlocal single-determinant exchange with the conventional DFT exchange-correlation functionals.3 These functionals have a formal justification within the framework of generalized Kohn−Sham theory4 and have received significant interest due to improved agreement with experiments3 when compared with Kohn−Sham DFT alone. Furthermore, the improved accuracy incurs only a moderate computational cost. A related class of approximations is the set of screened exchange functionals, which includes examples such as HSE5,6 and other range separated functionals.7−10 Screened exchange methods have shown success in the description of molecules, solids, insulators, and metals and are therefore becoming a necessary tool in many theoretical calculations. A key step in the implementation of hybrid or screened hybrid functionals is the ability to include the nonlocal singledeterminant exchange within the conventional Kohn−Sham formalism. In one extreme limit, the hybrid functional formalism reduces to just the self-consistent solution of the Hartree−Fock (HF) equations. In this sense, hybrid functionals bridge the gap between DFT and HF theory.3,11 One of the bottlenecks which prevents universal usage of hybrid functionals in condensed matter systems, particularly for large © 2016 American Chemical Society

systems, is the computational cost of the nonlocal singledeterminant exchange operator. While this cost is minor for small molecules, it becomes prohibitive for larger systems whether they are isolated or periodic. Within the quantum chemistry community, localized Gaussian basis sets are often used to represent the single particle wave functions. Using these basis sets, the HF equations are solved self-consistently via the well-known Roothaan equations in the basis set representation.11 The local Gaussian basis set representation has two main advantages: First, the basis size is relatively small, and so the Hamiltonian can be constructed explicitly and directly diagonalized. Second, the nonlocal single-determinant exchange electrostatic integrals can be calculated analytically or prepared in advance for the basis functions.11 While the Gaussian basis set representation is very efficient, there are several reasons one may want to solve the equations in other orthogonal basis representations such as plane-waves,12−14 finite-difference,15−17 wavelets,18 finite-elements, or a Galerkin basis.19 Such basis representations enable calculations on large systems with flexible boundary conditions and with a consistent strategy to expand the basis representation and hence increase the simulation accuracy. A common problem shared with all of the listed bases is that a naive calculation of the nonlocal singledeterminant exchange operator can be extremely expensive for Received: April 14, 2016 Published: July 7, 2016 3614

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Article

Journal of Chemical Theory and Computation

Projection Approximation Details. Recently, Duchemin and Gygi20 have suggested approximating the nonlocal singledeterminant exchange operator by projecting onto the manifold of occupied states. This approximation becomes exact for the occupied states at convergence of the SCF. While they have used this scheme in the plane-waves basis, the suggested approximation can be used with any other basis, including a real-space grid. The approximation is based on the fact that we can write20

large systems. As a result, nonstandard strategies are required for an efficient implementation of this operator. In this work, we propose the use of a projection scheme to alleviate the computational expense of the self-consistent HF equations in real-space. This scheme is based on a method that was suggested by Duchemin and Gygi20 in the plane-waves basis for the occupied states. As has been explained above, an efficient implementation of HF is a prerequisite for the use of hybrid functionals. We first examine an implementation of our method in real-space for isolated boundary conditions; we then show the applicability of this scheme to virtual states. After presenting the details of our real-space implementation, we demonstrate the approach by calculating properties such as eigenvalues, dipoles, and polarizabilities of several organic molecules. To illustrate the computational efficiency of this scheme, we also report studies on systems containing up to 240 carbon atoms (480 orbitals) using this method.

K̂ = K̂ ·P ̂ + P ·̂ K̂ − P ·̂ K̂ ·P ̂ + Q̂ ·K̂ ·Q̂

where P̂ = |φnσ⟩⟨φnσ| projects onto the subspace of occupied states from the previous SCF cycle, and Q̂ = I ̂ − P̂ . We can now approximate K̂ by dropping the last term in eq 5:20 K͠ ̂ ≡ K̂ ·P ̂ + P ·̂ K̂ − P ·̂ K̂ ·P ̂



(6)

The neglected term is identically zero at convergence for the occupied states, and so at self-consistency the approximation should give the exact operator for the occupied states. To gain some additional insight into the structure of this approximation, we examine its matrix representation in the basis of eigenstates of the single-particle Hamiltonian. The projection operator on the occupied states in the basis of these eigenstates can simply be written as

REAL-SPACE IMPLEMENTATION The Hartree−Fock equations (in atomic units) in the realspace representation can be written as ⎛ ∇2 ⎞ + Vpŝ + VH − K̂ ⎟φn , σ = ϵn , σ φn , σ ⎜− ⎝ 2 ⎠

(5)

σ ∑Nn=1

(1)

where the ionic pseudopotential, V̂ ps, is defined as in Chelikowsky et al.,16 and the Hartree potential, VH, is defined by the electrostatic interaction VH( r ⃗) =



d r1⃗

ρ( r1⃗) | r ⃗ − r1⃗|

∑σ ∑Nn=1

where the top-left quarter of the matrix connects occupied states to occupied states and the bottom-right connects empty states to empty states. In this matrix representation, we have Pij Nσ = ∑n=1 ⟨φiσ|φnσ⟩⟨φnσ|φjσ⟩ which equals δij if both i and j are occupied and zero otherwise. Note that in the real-space basis the number of occupied states is the number of electrons in the system while the number of empty states can reach millions (as it depends on the size of the grid). Using this matrix representation of eq 7, we can now rewrite the block elements of K͠ ̂ in eq 6 as

(2)

where ρ(r)⃗ = |φn,σ| is the density. The nonlocal single-determinant exchange operator, K̂ , is defined by N

K̂ ψσ(r ) =



∑ ⎜⎜∫ n=1



d r1⃗

2

φn*, σ ( r1⃗) ψσ( r1⃗) ⎞ ⎟φ ( r ⃗ ) | r ⃗ − r1⃗| ⎟⎠ n , σ

N



∑ Vn,ψ ,σ( r ⃗) φnσ ( r ⃗) n=1

(3)

To solve the HF equations (eq 1) in real-space, we represent the orbitals, φn,σ, on a uniform discrete grid and replace the Laplacian operator by its high order finite difference equivalent as implemented in the PARSEC code.16,21,22 The Coulomb integrals for the Hartree and nonlocal single-determinant exchange calculation are calculated by solving the equivalent Poisson equation with the iterative conjugate gradient method: 2

∇ Vn , ψ , σ(r ) = −4πφn*, σ ( r ⃗) ψσ( r ⃗)

And finally combining all the terms, eq 6 becomes

where Q̂ = I ̂ − P̂ . K̂ OO is the restriction of K̂ to the subspace of occupied states, and the representation of K̂ OO in the full Hilbert space is P̂K̂ P̂. In a similar manner, K̂ OE and K̂ EO connect occupied to virtual (empty) states. From the definition of K͠ ̂ in eq 9, it is easy to see that at self-consistency the projected operator reduces to the full operator for the occupied states. This is simply because the part of the matrix which operates on the occupied states is identical to the original operator. This can also be seen from eqs 5 and 6 as the term we dropped, Q̂ ·K̂ · Q̂ , is identically zero for the occupied states. Furthermore, it is evident that this representation stays Hermitian. Other choices of representation such as K̂ ·P̂ would also give the correct value for the occupied states at self-consistency but would not preserve the Hermitian structure of the operator. Many eigensolvers require or assume a Hermitian operator, and therefore the choice of a Hermitian representation is imperative

(4)

While it is possible to use eqs 3 and 4 to directly solve the HF equation, its solution on a real-space grid is very expensive. One approach to alleviate the problem is to develop methods for faster Poisson solvers for eq 4 or for its equivalent integral form.23−27 While there has been significant progress in this domain, the direct calculation of K̂ can still be an expensive task even with the use of such fast solvers. This is mostly because the eigenvalues are found by an iterative eigensolver which typically requires hundreds or thousands of calculations of the Fock operator per self-consistent field (SCF) iteration. In this work, we present an alternative approach which utilizes projection operators to circumvent re-evaluation of the full nonlocal single-determinant exchange operator. 3615

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Article

Journal of Chemical Theory and Computation

full nonlocal single-determinant exchange operator. Furthermore, we compared our projected operator results to Gaussian basis set HF calculations that are found in the CCCBDB31 database at NIST. Atom coordinates for the molecules were taken from the CCCBDB optimized HF geometry for the augcc-pVTZ basis. We compared HF eigenvalues, dipole moments, and polarizabilities for several molecules. One difference between data in the CCCBDB and our calculations is the use of pseudopotentials in our calculations. In order to improve the comparison with all-electron HF calculations, we used HF norm conserving pseudopotentials generated with the Opium code.32 Projection Accuracy for Occupied and Virtual States. To check the projection accuracy, we have compared the projection operator scheme to the full nonlocal singledeterminant exchange operator (i.e., implemented without any approximation). It is well-known that positive virtual states of small molecules are generally strongly basis dependent, and their energies will tend to zero when the basis is large enough.33,34 We therefore focus only on the occupied states of small neutral molecules. In cationic systems and for large molecules, there is a finite or infinite number of negative virtual states. These states require a large basis when they are close to zero. Such negative virtual states have a definite physical interpretation as the electron affinity of the cationic molecule. While the domain size affects the accuracy of virtual eigenvalues, one can still check the correctness of the projected scheme by comparing results to the full nonlocal singledeterminant exchange operator for fixed domain size. We first demonstrate that the projection scheme is correct by examining the first 47 states of neutral C2H6 with a spherical domain of r = 14 au. For comparison, we project onto four virtual states and 40 virtual states and finally also do the full calculation without any projection. Figures 1 and 2 demonstrate that the error between the calculations using the full nonlocal single-determinant exchange

for convergence. Numerical experimentation demonstrates that the K̂ ·P̂ representation requires more cycles to converge with a Chebyshev−Davidson solver28 and does not converge at all with ARPACK.29 These results are discussed in the Supporting Information (SI). It should be noted that the operator approximation of eq 9 is exact only for the occupied states, as is also evident by examining the matrix representation. We now extend this construction to include some low-lying virtual states in addition to the set of occupied states. In order to do so, we replace the projection operator P̂ by P̂ M = P̂ + Q̂ M where Q̂ M projects on the first M virtual states that we want to calculate. Similar to the above equations, we can write

Using this matrix representation of the projection operator in the basis of the eigenstates of the single-particle Hamiltonian, K̂ can be approximated as

where, as before, K̂ OO connects occupied to occupied, K̂ MO and K̂ OM connect occupied to the first M virtual, and K̂ EO and K̂ OE connect occupied to the rest of the virtual states. The structure of the operator shown by eq 11 suggests that we can get the correct eigenvalues for the M virtual states in addition to the occupied states. It is evident that if the first M virtual states are correct, or even just span the correct subspace, the approximation of eq 11 should give the exact result also for those states. In practice, we see that the differences between the projected and the full operator eigenvalues are less than 2 × 10−4 eV for both the occupied and the first M virtual states when a grid of h = 0.3 au and an orbital SCF tolerance of 10−8 Ry are used. When the desired number of empty states is not large, it is significantly less expensive to use the projected operator over the full nonlocal single-determinant exchange operator. This approach is close in spirit to the recent “Adaptively Compressed Exchange” (ACE) operator by Lin Lin30 but differs in the explicit projection scheme of the approximation that is given in eq 11. Using K͠ ̂ as described by eq 11 and combining it with a selfconsistent cycle as described in Appendix A, our implementation obtains around a 100-fold speedup compared to the evaluation of the full operator as in eq 4. At each orbital SCF cycle, we take the orbitals of the last cycle, ψn, and calculate K̂ |ψn⟩, we call this the “Fock preparation stage,” and we show later that this is the most time-consuming part of the calculation. We then use the projected operator K͠ ̂ [{ψ }] for the inner density SCF diagonalization as described in Appendix A. We employed a conjugate gradient Poisson solver for the Fock pairs calculation. With a faster solver, the speedup would be smaller but still significant. A detailed analysis of the terms affecting the speedup is found later in the text and in Appendix B.

Figure 1. Comparison of the first 20 eigenvalues of C2H6 with r = 14 au. The full calculation is shown with red circles. The four-state extended projection is shown with blue crosses and the 40-state extended projection with black pluses. The vertical dashed line shows the highest occupied state. The eigenstates with positive values are virtual states that are affected by the domain size.34

operator and the approximate projected operator is less than 0.0002 eV when using a grid spacing of h = 0.3 au. We attribute the remaining discrepancy to the integration scheme used for the projection operator. This error can be made much smaller with higher order integration methods.35 We show a similar analysis for the α and β channels of C2H+6 in the Supporting



RESULTS To check the implementation, we first compared calculations on several molecules using both the projected operator and the 3616

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Article

Journal of Chemical Theory and Computation

Table 1. Hartree−Fock Leading Eigenvalues for Spin Polarized Molecules (*calculated with h = 0.25 au instead of 0.3 au) Hartree−Fock Eigenvalues (values in eV) molecule

h = 0.4 au

h = 0.3 au

aug-cc-pVTZ (CCCBDB)

−16.25 −14.91 −25.72 −13.35 −13.68 −16.34 −11.31 −17.61 −17.81 −12.3 −12.68 −12.76 −9.35 −13.43 −13.93 −10.19 −10.38 −12.97 −9.3 −9.8 −12.39 −8.14 −8.9 −10.67 −7.29 −8.61 −9.65

−16.26 −14.89 −25.78 −13.29 −13.83 −16.32 −11.53 −17.33 −17.37 −12.28 −12.69 −12.76 −9.23 −13.45 −13.82 −10.09 −10.3 −12.52 −9.26 −9.69 −12.36 −8 −8.77 −10.56 −7.17 −8.50 −9.53

−16.23 −14.86 −25.72 −13.28 −13.86 −16.28 −11.59 −17.32 −17.32 −12.27 −12.68 −12.75 −9.17 −13.44 −13.75 −10.05 −10.28 −12.45 −9.23 −9.65 −12.38 −8.01 −8.75 −10.53 −7.19 −8.48 −9.52

H2 CH4 C2H6

*NH3

Figure 2. Differences in eigenvalues in comparison to the exact nonlocal single-determinant exchange operator. Calculations are for C2H6 using a domain size of r = 14 au. The 40-state extended projection is shown with large red circles, and the four-state extended projection is shown with smaller blue circles. The vertical dashed line shows the last occupied state.

chloroform (CHCl3) benzene

Information, where we achieve an error below 0.001 eV for both channels with h = 0.4 au. We also completed a PBE0 calculation for the virtual states density of states (DOS) of C60; see the SI. In all examples, it is evident, as the matrix representation eq 11 suggested, that the use of M virtual states in an extended projection scheme will provide accurate eigenvalues for M virtual states. Occupied States, Dipoles, and Polarizabilities. The occupied states, dipoles, and polarizabilities can be calculated without the use of any virtual states in the projection. We provide these calculations to demonstrate the correctness of the implementation. Virtual states are used for cationic systems and for the density of states of C60, C180, and C240, as well as for additional calculations in the SI. Occupied States. We have calculated the occupied states for the spin-unpolarized molecules H2, CH4, C2H6, ammonia (NH3), chloroform (CHCl3), methyl-chloride (CH3Cl), benzene (C6H6), naphthalene, (C10H8), anthracene (C14H10), C60, C180, and C240. We have also calculated the occupied states for the spin-polarized molecules CH3, SiCl3, CH+4 , and C2H+6 . For the cationic systems, we furthermore show the first three virtual states of each spin channel. For all molecules (excluding Cn and anthracene), we have used the atomic coordinates reported in the CCCBDB optimized using the aug-cc-pVTZ basis set. Although this basis is not always optimal for geometry, our intention was merely to compare the HF eigenvalues reported in the database to the ones calculated using the projection scheme. For anthracene, we have used the 6-31+G** basis optimized coordinates. Table 1 shows the comparison between our calculated values of the HOMO (Highest Occupied Molecular Orbital), HOMO−1, and HOMO−2 and those in the CCCBDB. As can be seen from the table, in all cases, the difference in eigenvalues between our calculations and the CCCBDB data is less than 0.1 eV (and less than 0.05 eV in most cases). It should be noted that the difference from the basis sets results is not due to the approximation scheme of eq 11 as we have shown that it reaches a much smaller error relative to the full operator. One can systematically improve this comparison by going to finer real-space grids as well as using harder pseudopotentials. We also performed spin polarized calculations on neutral and cationic molecules. Table 2 compares the leading eigenvalues calculated with the projection scheme to the CCCBDB. As also

*nitrobenzene

Cl-benzene

naphtalene

anthracene

Table 2. Hartree−Fock Leading Eigenvalues for Spin Polarized Molecules Hartree−Fock Eigenvalues (values in eV) molecule CH3

SiCl3

CH+4 , virtual

CH+4 , occupied

C2H+6 , virtual

C2H+6 , occupied

3617

h = 0.3 au, α

aug-ccpVTZ, α

−10.4

−10.45

−15.91, −15.93 −25.97 −10.24 −12.86 −13.22 −2.39 −2.43 −3.92 −22.45

h = 0.3 au, β

aug-ccpVTZ, β

−15.88

−15.51, −15.53 −23.45

−15.49 −23.42

−25.87 −10.2 −12.8 −13.21 −2.42 −2.47 −3.92 −22.43

−12.79 −13.1 −13.62 −2.37 −3.71 −8.29 −24.99

−12.76 −13.1 −13.7 −2.4 −3.73 −8.27 −24.96

−25.33 −26.86 −2.08 −2.36 −3.01 −20.27

−25.29 −26.85 −2.18 −2.44 −3.05 −20.26

−26.08 −34.31 −2.3 −2.98 −9.3 −20.05

−26.08 −34.3 −2.39 −3.02 −9.24 −20.03

−21.82 −23.31

−21.79 −23.33

−21.24 −22.73

−21.2 −22.75

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Article

Journal of Chemical Theory and Computation seen in the previous table, the eigenvalues are in excellent agreement with each other. To demonstrate the scalability of this method, we performed HF calculations on C60, C180, and C240. In order to obtain the atomic coordinates of these molecules, we started with a forcefield optimized structure.36 The Hartree−Fock calculations of C180 and C240 were performed with a grid spacing of 0.4 au without additional geometry relaxation. For C60, we further relaxed the coordinates with LDA and then performed Hartree−Fock calculations on the LDA relaxed structure with a grid spacing of 0.4 au. All calculations were performed on a single node with up to 16 Xeon cores (two CPUs of eight cores). The results for the DOS are shown in Figure 3. The

Table 3. Hartree−Fock Dipole Values for Selected Molecules calculated dipoles (values in debye units) molecule

h = 0.4 au

h = 0.3 au

aug-cc-pVTZ (CCCBDB)

CO H2O NH3 nitrobenzene aniline Cl-benzene chloroform HNC HCN nitro-methane NH2COOH

0.33 1.94 1.76 4.88 1.74 2.02 1.2 2.61 3.6 4.35 2.98

0.15 1.97 1.56 4.99 1.46 2.04 1.23 2.97 3.26 3.96 2.7

0.149 1.94 1.54 4.9 1.43 1.97 1.19 2.95 3.26 3.95 2.67

there is no aug-cc-pVTZ polarizability value in the database). As shown in Table 4, we need a fine grid spacing of 0.3 au for convergence. Our converged polarizabilities are in excellent agreement with the CCCBDB reported values. Table 4. Hartree−Fock Polarizability Values for Selected Molecules calculated polarizabilities (values in Å3)

Figure 3. DOS of C60 computed using our projection scheme with a grid spacing of 0.4 au is shown in solid blue, C180 in dashed black, and C240 in solid cyan. The NWCHEM calculation using the 6-311G** basis is shown in dashed red for C60. The structure of C240 is shown in the inset.

molecule

h = 0.4 au

h = 0.3 au

aug-cc-pVTZ (CCCBDB)

CH4, αxx αyy αzz benzene, αxx αyy αzz naphtalene, αxx αyy αzz anthracene, αxx αyy αzz

2.39 2.39 2.39 11.29 11.35 6.51 9.62 23.48 17.38 12.66 39.09 23.6

2.39 2.39 2.39 11.34 11.39 6.61 9.75 23.56 17.4 12.83 39.26 23.61

2.35 2.35 2.35 11.52 11.52 6.63 9.75 23.47 17.35

Algorithm Speed Analysis. To demonstrate the performance of the code with the projection method, we have measured the restricted Hartree−Fock calculation times of both the full operator and the projected method for H2, CH4, C2H6, C6H6, C20, C30, and C40. The last three molecules were taken from force field optimized data.36 The runs were performed on a single node with two Xeon processors having eight cores each. The clock speed for each core was 2.6 GHz. All the runs were performed with a grid spacing of 0.4 au for a balance of speed and accuracy. Another factor that affects performance is the domain size: larger molecules require more extensive domains and thus longer calculation times, both due to the increased number of states and the size of the resulting algebraic systems. We used spherical domains with radii of 8 au, 10 au, 10 au, 12 au, 12 au, 14 au, and 14 au for H2, CH4, C2H6, C6H6, C20, C30, and C40, respectively. All systems were calculated with an extended projection onto four virtual states and with the Chebyshev− Davidson eigensolver. This solver was found in practice to be faster than the ARPACK solver. Figure 4 shows the total run time for the projection method and the Fock preparation time. It is easy to see that the diagonalization time becomes a small part in the larger molecules (around 10%).

DOS graph was calculated by convolving the results with a Gaussian function with σ = 0.5 eV. For C60, we have also calculated the DOS with NWCHEM37 with the 6-311G** basis. The two calculations are virtually identical over the energy range considered. Dipoles. We have compared the dipoles of small molecules with the dipoles reported in the CCCBDB database. We studied the following molecules: H2O, CO, NH3, nitrobenzene (C6H5NO2), aniline (C6H7N), and Cl-benzene (C6H5Cl). The results are summarized in Table 3. A grid spacing of 0.3 au was required for convergence. As explained previously, the necessity of a fine grid spacing is primarily due to the low order integration scheme used in PARSEC. The incorporation of higher order integration can lead to similar results on a coarser grid.35 Polarizabilities. We have also calculated the polarizabilities of CH4, benzene (C6H6), naphthalene (C10H8), and anthracene (C14H10) by calculating the derivative of the dipole with respect to the field as described by Vasiliev et al.38 We compare to the CCCBDB values for the first three molecules (for anthracene, 3618

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Article

Journal of Chemical Theory and Computation

pertinent question. If we refer to Figure 5, a faster solver will reduce the calculation time for for both the full operator and the Fock preparation stage. Assuming we have five cycles in the density SCF, we will use the data for C20 from Figure 5 to analyze the possible effect of a faster Poisson solver on calculation speed. For the full operator and using CG as the Poisson solver, the total calculation time would be 5 × 2300 = 11 500 s. Also using CG as the Poisson solver, the time for projected operator calculation would be 77 + 5 × 1.7 = 85.5 s, giving a speed factor of around 134.5. Assuming a faster Poisson solver which computes the solution at 10 times the speed of CG, in the best case, both the full operator and the preparation stage would be accelerated by a factor of 10, while the projected diagonalization time would stay the same. We would then have a time of 5 × 230 = 1150 s for the full operator and 7.7 + 5 × 1.7 = 16.2 s for the projection scheme, leading to a speed factor of ∼70. Finally, if we have a 100-fold faster Poisson solver, we would have 5 × 23 = 115 s for the full operator and 0.77 + 5 × 1.7 = 9.3 s for the projection scheme, leading to a speed factor of ∼12. The fact that most of the calculation time is in the Fock preparation stage makes the calculation comparable in speed to the local EXX potential in the Krieger, Li, and Iafrate (KLI)39 approximation. Indeed the EXX-KLI real-space implementation40−42 achieves similar speeds to our full Hartree−Fock implementation. This is because our projected operator is only slightly slower than local potentials. Finally, we investigate the effect of more virtual states in the extended projection on performance using C20 as a testbed. We performed the calculations using between zero and 30 virtual states. The results are shown in Figure 6. Figure 6 shows a near linear growth in the total time for the calculation, primarily because more states are needed in the Fock preparation time. Even in the 30 virtual state case, it is evident that the performance is much faster than using the full operator, which demonstrates that the extended projection scheme is a valuable technique when a small to moderate

Figure 4. Total calculation time for restricted Hartree−Fock. The blue solid line with asterisks shows the total time; the dashed red line with circles shows the total time spent on the Fock preparation stage. The name of each molecule and the run times are also shown near each point.

We next compared the diagonalization time for the full operator to the diagonalization time for the projected operator, both calculated for a single density SCF cycle. It is evident from Figure 5 that diagonalization of the projected operator is around 1000 times faster than diagonalization of the full operator.

Figure 5. Comparison of diagonalization time for the full opearator (solid blue line with asterisks) and the projected operator (dashed red line with circles). Fock prepation time is shown with cyan diaomnds and a dashed-dotted line.

Typically there are five to seven diagonalization steps and one Fock preparation step in each density SCF cycle, and so the preparation step takes roughly 90% of the time in the projection scheme. We have used a conjugate gradient (CG) Poisson solver for the calculation of Fock pairs. This is not the fastest possible solver; thus the effect of a faster solver is a

Figure 6. Effect of number of projected virtual states on performance. Total time is shown with a solid blue line with asterisks. Total Fock preparation time is shown in solid red with circles, and total diagonalization time is shown in solid red with squares. Single Fock preparation time is shown in dashed black with circles; single diagonalization time is shown in dashed black with squares. 3619

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Article

Journal of Chemical Theory and Computation Nσ + M

number of virtual states are of interest. It should be noted that, while the projection was suggested first for a plane-waves basis,20 its application in a localized grid based basis has a bigger impact as the Poisson solving is a larger bottleneck for localized nongaussian basis sets. This is true for both the occupied states (eq 5) projection and the extended projection (eq 11) with the additional virtual states.



PM̂ ·K̂ |ϕσ ⟩ =

|ψn , σ ⟩⟨ψn , σ |K̂ |ϕσ ⟩

∑ n=1 Nσ + M

=



|ψn , σ ⟩⟨KX , n , σ |ϕσ ⟩

n=1

and finally:

SUMMARY

Nσ + M

PM̂ ·K̂ ·PM̂ |ϕσ ⟩ =PM̂

We have presented an efficient real-space implementation of a projection method for computation of the nonlocal singledeterminant exchange operator for both the occupied and virtual states. We have demonstrated the projection accuracy by comparing the projected operator solution to the full nonlocal single-determinant exchange operator for a number of molecular systems. The method was also verified against traditional Gaussian basis set calculations. Our projection scheme, which is a generalization of the scheme demonstrated for plane-waves,20 works equally well on a real-space grid as it does for plane-waves. The solution of Poisson equation on a real-space grid is significantly slower than in the plane-waves basis, and therefore this projection scheme has an even higher impact on the speed of calculation. We expect similar methods to work with any local basis such as wavelet and Galerkin bases. The proposed method can easily be combined with other methods that aim to accelerate the calculation of the nonlocal single-determinant exchange itself (e.g., faster Poisson solvers) to gain additional speed. Furthermore, as the projection operation only requires dot products, this method can be easily parallelized using domain decomposition and MPI. This is particularly advantageous as it ensures that the proposed method is scalable to parallel supercomputing systems and can be adapted for calculations on large molecules in a straightforward way. While we have demonstrated this scheme for isolated systems, there is a natural extension to periodic boundary conditions as we have recently done for the EXX-KLI formalism.43,44 The implementation of hybrid functionals is simple, and we have already implemented PBE0 with this method. Another useful direction is the implementation of the screened exchange which will enable the use of more advanced functionals such as HSE and optimally tuned range-separated hybrids. To implement the screened exchange, we can note that the approximation we used in eqs 9 and 11 does not assume much about the operator K̂ . Thus, the essential ideas of our method can be used with any other operator, such as the operator for screened exchange or any other type of range separated operator.

∑ n=1



∑ ∑ L=1

|ψL , σ ⟩⟨ψL , σ |KX , n , σ ⟩⟨ψn , σ |ϕσ ⟩

n=1

Nσ + M Nσ + M

=

∑ ∑ L=1

|ψL , σ ⟩KX̅ , L , n , σ ⟨ψn , σ |ϕσ ⟩

n=1

Nσ + M

=



|ψL , σ ⟩KX , L , ϕ , σ

L=1

(14)

where we have defined K̅ X,L,n,σ ≡ ⟨ψ L,σ | K X,n,σ ⟩ and KX , L , ϕ , σ ≡ ∑n KX̅ , L , n , σ ⟨ψn , σ |ϕσ ⟩. Note that K̅ X,L,n,σ is calculated once in each SCF iteration just for the eigenvectors (Nσ+M), while KX , L , ϕ , σ needs to be calculated for every guess vector. Also, although it seems that we need 3(N + M) projections, we can actually do 2(N + M) projections, as the construction of the P̂M·K̂ ·P̂M part can make use of the quantities that were already calculated for K̂ ·P̂ M and P̂ M·K̂ , as is demonstrated in eq 14. We employ the following scheme for convergence. A similar scheme was suggested in the Quantum Espresso package implementation:

In many cases (but not always), the first LDA-based Hartree−Fock density calculation is very close to the final result. Regardless, to get accurate results, several orbital SCF cycles are necessary in general. In the inner loop (lines 6−10 in the above pseudocode), the operator K͠ ̂ [{ψ }] is fixed and is calculated with the {ψ} orbitals which do not change. The density is changed until self-consistency is reached. We use a linear mixer for the update of VH. We then check if the difference between the set {φ} and the set {ψ} is above the desired level of tolerance (line 12); if so, we update the orbitals and operator (lines 4−5) and repeat the inner loop. If tolerance has been reached, the outer orbital loop (lines 3−12) is finished and self-consistency is announced.

Nσ + M

⟨ψn , σ |ϕσ ⟩K̂ |ψn , σ ⟩ =

|KX , n , σ ⟩⟨ψn , σ |ϕσ ⟩

Nσ + M Nσ + M

=



Nσ + M

∑ n=1

APPENDIX A. PROJECTION IMPLEMENTATION DETAILS AND SCF STRATEGY In this appendix, we describe the details of the projection and SCF scheme. We use the symbols |ψn,σ⟩ for the nth eigenvector and define |KX,n,σ⟩ ≡ K̂ |ψn,σ⟩. We then show how we express the components of the projected operator in terms of these quantities by describing their operation on an arbitrary vector | ϕσ⟩: K̂ ·PM̂ |ϕσ ⟩ =

(13)

⟨ψn , σ |ϕσ ⟩|KX , n , σ ⟩

n=1

(12) 3620

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Journal of Chemical Theory and Computation



APPENDIX B. PERFORMANCE ESTIMATION Assume that the number of occupied orbitals is N, that the eigensolver requires P0 matrix−vector (Hψ̂ ) operations per eigenvalue, and that there are NSCF cycles on average for the density SCF. In the projection scheme, the most expensive calculation is the nonlocal single-determinant pairs calculation. We assume that the projection itself has little effect on the operation Hψ̂ as the kinetic term is generally more expensive. We can therefore write

(15)

where TFOCK is the time to calculate the exchange interaction for all Fock pairs, Teigensolver is the time of the eigensolver, TPoisson is the time it takes to calculate a single Poisson equation, THproj is the time for the Hamiltonian with the additional projection operation, and NSCF is the number of density SCF rounds for a given orbitals set. We can now also say that Tfull ∼ P0 × N × NSCF × (N × TPoisson + THHartree) (16)

where THHartree is the time for a Hartree-only Hamiltonian matrix−vector operation and THfull is the time for the full operator Hartree−Fock Hamiltonian. If we assume that for large N we have N × TPoisson ≫ THHartree, and (N + 1)/2 × TPoisson ≫ P0 × NSCF × THproj, and approximate N(N + 1)/2 ≃ N2/2, we find that Tfull /Tproj ∼ 2 × P0 × NSCF

(17)

For larger molecules, the number of Hψ̂ operations per eigenvalue tends to a fixed value. In our systems, we find experimentally P0 ≃ 15 for an ARPACK solver and P0 ≃ 13 for a Chebyshev−Davidson solverthis is further analyzed in the SI. In general, P0 depends on the level of accuracy that is requested from the eigensolver. If we use the example of C20 from the main text, we find a factor of around 2 × 13 × 5 = 130this is in very good agreement with the factor of 134.5 we determined experimentally.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.6b00376. (1) Effect of projection on virtual states in spin polarized calculation of C2H+6 , (2) effect of projection on the density of states of virtual states of C60 calculated with the PBE0 hybrid functional, (3) matrix−vector operations scaling with different solvers, (4) additional performance analysis, and (5) performance of the K̂ ·P̂ representation (PDF)



ACKNOWLEDGMENTS



REFERENCES

(1) Hohenberg, P.; Kohn, W. Phys. Rev. 1964, 136, B864. (2) Kohn, W.; Sham, L. J. Phys. Rev. 1965, 140, A1133−A1138. (3) Koch, W.; Holthausen, M. A Chemist’s Guide to Density Functional Theory; Wiley-VCH, 2000. (4) Levy, M. Proc. Natl. Acad. Sci. U. S. A. 1979, 76, 6062−6065. (5) Heyd, J.; Scuseria, G. E.; Ernzerhof, M. J. Chem. Phys. 2003, 118, 8207−8215. (6) Heyd, J.; Scuseria, G. E. J. Chem. Phys. 2004, 121, 1187−1192. (7) Toulouse, J.; Colonna, F.; Savin, A. Phys. Rev. A: At., Mol., Opt. Phys. 2004, 70, 062505. (8) Baer, R.; Neuhauser, D. Phys. Rev. Lett. 2005, 94, 043002. (9) Livshits, E.; Baer, R. Phys. Chem. Chem. Phys. 2007, 9, 2932− 2941. (10) Kronik, L.; Stein, T.; Refaely-Abramson, S.; Baer, R. J. Chem. Theory Comput. 2012, 8, 1515−1531. (11) Szabo, A.; Ostlund, N. Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Dover Books on Chemistry Series; Dover Publications, 1989. (12) Giannozzi, P.; et al. J. Phys.: Condens. Matter 2009, 21, 395502. (13) Segall, M. D.; Lindan, P. J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C. J. Phys.: Condens. Matter 2002, 14, 2717−2744. (14) Kresse, G.; Furthmüller, J. Phys. Rev. B: Condens. Matter Mater. Phys. 1996, 54, 11169−11186. (15) Hirose, K. First-Principles Calculations in Real-Space Formalism: Electronic Configurations and Transport Properties of Nanostructures; World Scientific Publishing Company Pte Limited, 2005. (16) Chelikowsky, J.; Troullier, N.; Wu, K.; Saad, Y. Phys. Rev. B: Condens. Matter Mater. Phys. 1994, 50, 11355−11364. (17) Castro, A.; Appel, H.; Oliveira, M.; Rozzi, C. a.; Andrade, X.; Lorenzen, F.; Marques, M. a. L.; Gross, E. K. U.; Rubio, A. Phys. Status Solidi B 2006, 243, 2465−2488. (18) Genovese, L.; Neelov, A.; Goedecker, S.; Deutsch, T.; Ghasemi, S. A.; Willand, A.; Caliste, D.; Zilberberg, O.; Rayson, M.; Bergman, A.; Schneider, R. J. Chem. Phys. 2008, 129, 014109. (19) Hu, W.; Lin, L.; Yang, C. Phys. Chem. Chem. Phys. 2015, 17, 31397−31404. (20) Duchemin, I.; Gygi, F. Comput. Phys. Commun. 2010, 181, 855− 860. (21) Chelikowsky, J.; Troullier, N.; Saad, Y. Phys. Rev. Lett. 1994, 72, 1240−1243. (22) Fornberg, B. Mathematics of computation 1988, 51, 699−706. (23) Hine, N. D. M.; Dziedzic, J.; Haynes, P. D.; Skylaris, C.-K. J. Chem. Phys. 2011, 135, 204103. (24) Losilla, S. a.; Sundholm, D. J. Chem. Phys. 2012, 136, 214104. (25) Khoromskaia, V.; Khoromskij, B. N. Comput. Phys. Commun. 2014, 185, 3162−3174. (26) Khoromskij, B. N.; Khoromskaia, V.; Flad, H.-J. SIAM Journal on Scientific Computing 2011, 33, 45−65. (27) Zuzovski, M.; Boag, A.; Natan, A. Phys. Chem. Chem. Phys. 2015, 17, 31550−31557. (28) Zhou, Y.; Saad, Y.; Tiago, M. L.; Chelikowsky, J. R. Phys. Rev. E 2006, 74, 066704. (29) ARPACK homepage. http://www.caam.rice.edu/software/ ARPACK/ (accessed June 1, 2016). (30) Lin, L. J. Chem. Theory Comput. 2016, 12, 2242−2249. (31) Russell, D., Johnson, I. I. I. NIST Computational Chemistry Comparison and Benchmark Database NIST Standard Reference Database Number 101 Release 16a, August 2013. http://cccbdb.nist. gov/ (accessed on January 15, 2016). (32) Al-Saidi, W. A.; Walter, E. J.; Rappe, A. M. Phys. Rev. B: Condens. Matter Mater. Phys. 2008, 77, 075112.

N (N + 1) × TPoisson + P0 × N × NSCF × THproj 2

= P0 × N × NSCF × THfull



N.M.B. and A.N. thank the Fulbright Foundation for funding, A.N. thanks ISF grant 1722/13 for funding.

Tproj ∼ TFOCK + Teigensolver =

Article

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest. 3621

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622

Article

Journal of Chemical Theory and Computation (33) Garza, J.; Nichols, J. A.; Dixon, D. A. J. Chem. Phys. 2000, 113, 6029. (34) Boffi, N. M.; Jain, M.; Natan, A. J. Chem. Phys. 2016, 144, 084104. (35) Scott Bobbitt, N.; Schofield, G.; Lena, C.; Chelikowsky, J. R. Phys. Chem. Chem. Phys. 2015, 17, 31542−31549. (36) Cn Fullerenes, David Tomanek and Nick Frederick at the Michigan State University Computational Nanotechnology Lab. http://www.nanotube.msu.edu/fullerene/fullerene-isomers.html (accessed onJune 1, 2016). (37) Valiev, M.; Bylaska, E.; Govind, N.; Kowalski, K.; Straatsma, T.; Van Dam, H.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T.; de Jong, W. Comput. Phys. Commun. 2010, 181, 1477−1489. (38) Vasiliev, I.; Ö ğüt, S.; Chelikowsky, J. R. Phys. Rev. Lett. 1997, 78, 4805. (39) Krieger, J.; Li, Y.; Iafrate, G. Phys. Rev. A: At., Mol., Opt. Phys. 1992, 46, 5453−5458. (40) Kümmel, S.; Kronik, L.; Perdew, J. P. Phys. Rev. Lett. 2004, 93, 213002. (41) Mundt, M.; Kümmel, S.; Huber, B.; Moseler, M. Phys. Rev. B: Condens. Matter Mater. Phys. 2006, 73, 205407. (42) Körzdörfer, T.; Kümmel, S.; Mundt, M. J. Chem. Phys. 2008, 129, 014110. (43) Natan, A.; Benjamini, A.; Naveh, D.; Kronik, L.; Tiago, M.; Beckman, S.; Chelikowsky, J. R. Phys. Rev. B: Condens. Matter Mater. Phys. 2008, 78, 075109. (44) Natan, A. Phys. Chem. Chem. Phys. 2015, 17, 31510−31515.

3622

DOI: 10.1021/acs.jctc.6b00376 J. Chem. Theory Comput. 2016, 12, 3614−3622