Optimal Path Search for Recurrence Relation in Cartesian Gaussian

Nov 16, 2016 - Center for Computational Research, University at Buffalo, Buffalo,. New York ... Department of Chemistry, Middle Tennessee State Univer...
0 downloads 0 Views 268KB Size
Subscriber access provided by University of Otago Library

Article

Optimal Path Search for Recurrence Relation in Cartesian Gaussian Integrals Fenglai Liu, Thomas R. Furlani, and Jing Kong J. Phys. Chem. A, Just Accepted Manuscript • DOI: 10.1021/acs.jpca.6b10468 • Publication Date (Web): 16 Nov 2016 Downloaded from http://pubs.acs.org on November 20, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry A is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Optimal Path Search for Recurrence Relation in Cartesian Gaussian Integrals Fenglai Liu,† Thomas Furlani,‡ and Jing Kong∗,¶,† Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY, 14260, Center for Computational Research, University at Buffalo, Buffalo, NY, 14203, and Department of Chemistry, Middle Tennessee State University, TN, 37130 E-mail: [email protected]



To whom correspondence should be addressed Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY, 14260 ‡ Center for Computational Research, University at Buffalo, Buffalo, NY, 14203 ¶ Department of Chemistry, Middle Tennessee State University, TN, 37130 †

1 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 28

Abstract In quantum chemistry applications the computation of analytical integrals with Gaussian basis functions such as electron repulsion integrals is often the rate-determining step. In this work we have developed a general search algorithm to find the optimal path for the recurrence relations in the integral evaluation. This optimal path uses the least amount of intermediate integrals for building the recurrence relations to improve the computational efficiency. By further combining this with a redundant integral removal technique, and an efficient hybrid scheme to compute Incomplete Gamma functions; the algorithms are able to generate efficient integral code for electron repulsion integrals and many other type integrals used in quantum chemistry. Because the algorithms are independent of the details of the recurrence relations, the software also provides an easy way to implement recurrence relations that can be used to generate new type of analytical integrals.

Introduction Integral evaluation with primitive Gaussian functions is the major computational task for both wave function theories and density functional theory methods. 1 The integral is calculated in two ways. The first is numerical integration which is widely employed in density functional theory methods. 2,3 The other is analytical integration, which is generally expressed as:

∫ Iijkl =







ϕi (r)ϕj (r)f (r, r )ϕk (r )ϕl (r )drdr



,

(1)

ϕ is the basis function. Kinetic integrals(KI), nuclear attraction integrals(NAI), electron repulsion integrals(ERI) etc. can all be computed analytically. In particular the ERI is often the rate-determining step in practical calculations. Therefore developing efficient analytical integral methods plays an important role in quantum chemistry. Since Boys 4 first suggested to use Cartesian Gaussian functions as atomic orbital (AO) basis

2 ACS Paragon Plus Environment

Page 3 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

functions χ = xi y j z k e−αr

2

,

(2)

most of the quantum chemistry programs use Cartesian Gaussian function for fundamental integral calculations. 5–8 Cartesian Gaussian functions sharing the same angular components are often combined linearly to form a contracted AO basis function with predetermined coefficients. Uncontracted Gaussian functions are often called primitive Gaussians. Cartesian Gaussians with the same total angular momentum L = i+j +k can be grouped into a “shell”. For instance the P shell (L = 1) has three basis functions, and the D shell (L = 2) has 6 basis functions etc. The basis functions in the shell share the same contraction coefficient. In general, it is more efficient to generate the analytical integrals in Eq 1 based on shell sets, e.g. to produce all four-index ERIs associated with the same shell quartet together because different ERI in a shell quartet share intermediate integrals during calculation. There are other types of Gaussian functions with different angular parts, for example the spherical Gaussian basis function combines the spherical harmonics with the exponential part. In practise, the spherical Gaussian basis functions are transformed into Cartesian Gaussian functions before the analytical integrals are computed. How to efficiently compute ERI plays a central role in developing theory for analytical integrals. The first efficient algorithm to compute ERI was introduced by Pople and Hehre(PH). 9 This method provides an exceptional efficient algorithm for computing high contracted low angular momentum integrals involving S and P Gaussians. McMurchie and Davidson(MD) 10 proposed a general formalism for computing ERIs and analytical integrals in terms of Hermitian polynomials, and their method is applicable to integrals with high angular momentum. Dupuis, Rys and King(DRK) 11–13 developed another formalism for ERI based on exact numerical quadrature using root and weights generated from Rys polynomials. Both MD and DRK methods are indirect methods where auxiliary polynomials need to be involved. In 1980s, Obara and Saika(OS) 14,15 derived a set of recursive formulas to compute analytical integrals directly based on recursive expression of three body overlap integral. The recurrence 3 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 28

relation(RR) for ERI in the OS scheme is given as: ((a + ιi )b|cd)(m) = (Pi − Ai )(ab|cd)(m) + (Wi − Pi ) (ab|cd)(m+1) ) ρ Ni (A) ( (m) (m+1) ((a − ιi )b|cd) − ((a − ιi )b|cd) + 2ϵ ϵ ) Ni (B) ( ρ + (a(b − ιi )|cd)(m) − (a(b − ιi )|cd)(m+1) ϵ ) ( 2ϵ 1 Ni (C) + (ab|(c − ιi )d)(m+1) 2 ϵ+η ( ) Ni (D) 1 + (ab|c(d − ιi ))(m+1) , 2 ϵ+η

(3)

where the (ab|cd)(m) is:

(m)

(ab|cd)

2 =√ π



(



du 0

u2 ρ + u2

)m (ab|u|cd) .

(4)

In the above equation, a, b, c, and d refer to the Cartesian parts of the four Gaussian functions. (ab|cd)(0) is the target ERI. In the OS scheme, the bottom integrals such as (00|00)(0) , (00|00)(1) , etc. are evaluated first. The angular momentum is then raised up recursively through the Eq. 3 until the target integral (ab|cd) is reached. Lindh, Ryu and Liu 16 (LRL) proposed a similar recursive formalism based on Rys polynomials. Because the integrals here are calculated in “vertical” way, such methods are referred as “vertical recurrence relations”(VRR). Based on VRR, Head-Gordon and Pople(HGP) 17 found another general formula which performs a “horizontal recurrence relation”(HRR) on analytical integrals. In HRR the ERI is evaluated as: (a(b + ιi )|cd) = ((a + ιi )b|cd) + (Ai − Bi )(ab|cd) .

(5)

Consequently the ERI (ab|cd) can be recursively derived from a set of auxiliary integrals in the form of (e0|f 0). Because the HRR only involves the basis function centers, it can be applied to contracted integrals. Hamilton and Schaefer 18 (HS) derived a similar HRR using a translational invariance condition by shifting the integral (a + b + c + d0|00) to (a + b0|c + d0). 4 ACS Paragon Plus Environment

Page 5 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

In practice, HRR needs to be combined with other methods to finish the whole integral derivation. For instance, the HGP 17 scheme combines the HRR with the OS scheme, and Gill 19,20 combines the HRR and the MD method. Other combinations are also proposed, 16,18 and the implementations of these methods 1 constitute the kernels for analytical integrals evaluation in current quantum chemistry packages. Although VRR and HRR provide a clear formalism for recursively generating integrals, it’s still uncertain the best way to generate the integrals with the least amount of computation. This problem was referred as “tree-search problem” by Head-Gordon and Pople 17 and others. 21–23 Such a problem arises because the RR in Eq. 3 can be started from any of the four shell positions in the LHS shell quartet unless that position is an S shell. Thus each LHS integral has multiple expanding choices for expansion, and recursively its RHS integrals also have multiple expanding choices until only bottom integrals (00|00)m are left on the RHS of the Eq. 3. A set of particular choices of expanding positions from the target shell quartet to the bottom integrals forms a VRR “path” for the target. Obviously, the number of available paths grows very fast with an increase in the angular momentum. As an example, Figures 1 and 2 demonstrate two VRR paths from shell quartet (DS|P S) to bottom integrals (00|00)m . The difference between the two paths is that in Path 1 (DS|P S) is expanded from the P shell position, and in Path 2 the recurrence expansion is on the D shell position. Different paths have different computational costs. In the above example, Path 1 has 9 less intermediate integrals than Path 2 and therefore costs less computationally. The so-called “tree-search problem” is therefore fundamentally about how to find the optimal path involving with minimum number of intermediate integrals. To solve this problem, empirical rules were suggested to generate the path for forming VRR and HRR. 17 Ryu etc. 21,22 proposed rules to generate the optimal VRR path for the LRL scheme and achieved an improved FLOPS count, which however is not extendable to other methods. Johnson etc. suggest rules 23 to improve the performance of VRR in the MD algorithm. Despite these successes there has been no report of a general approach for the search of the optimal path due to the

5 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 28

(DS|PS)

(DS|SS)^{0}

(PS|SS)^{1}

(PS|SS)^{0}

(SS|SS)^{0}

(DS|SS)^{1}

(SS|SS)^{1}

(PS|SS)^{2}

(SS|SS)^{2}

(SS|SS)^{3}

Figure 1: VRR path 1 for deriving (DS|P S) complexity inside the recurrence relation. In this work, we report a general search algorithm for solving this problem on VRR. By inspecting the VRR formula, we find that the VRR has irreversible character following the change of total angular momentum of shell quartet L and m value in Eq. 4. Along this direction of VRR, for each L and m it’s able to construct a “package” where the number of LHS shell quartets that potentially share the RHS terms is rather limited. Therefore it’s able to perform an exhaustive search to find the best RR expanding position combination. This optimal combination yields the least number of intermediate integrals considering LHS shell quartets corresponding to L and m values in the package. By recursively following the changing direction of L and m values, it’s able to construct an optimal path for VRR. Such a search algorithm can be generally applied to any VRR algorithm with irreversible character, and it’s essentially a Greedy Algorithm. 24 Because the irreversible character of VRR, and the integral number for shell quartets decreases from LHS to RHS on VRR; the VRR path searching problem can be theoretically divided and conquered. Therefore we can expect that the above procedure will yield result path close to the global optimum.

6 ACS Paragon Plus Environment

Page 7 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 2: VRR path 2 for deriving (DS|P S) Once the optimal path is found, the computational effort can be further reduced by avoiding the computation of redundant intermediate Cartesian integrals within a shell quartet. There can be many integrals associated with one shell quartet, however it’s possible that not every integral is needed in the VRR/HRR. Thus an interesting question is whether all these integrals of Cartesian Gaussians are necessary in the recurrence relation generation. The question can be drawn from the observation that a shell of Cartesian Gaussian functions have redundancy in general in comparison with the spherical Gaussian functions. In this work, we design a general recursive procedure to eliminate the redundancy for both VRR and HRR. It shows that for integrals with higher angular momentum the number of redundant integrals is about one third of the total. We also devised a new method for the computation of the bottom integral (00|00)(m) . The core of the bottom integral is the incomplete Gamma function fm (t): ∫

1

fm (t) =

u2m e−tu du . 2

(6)

0

These integrals have been discussed in detail of literatures 25,26 etc. In this work we present a 7 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

hybrid scheme for the fast calculations of fm (t) based on a combination of different formulas in ref. 25 This scheme can provide exceptional performance for low angular momentum cases, and it’s able to maintain a balance between the computation cost and precision for all integral calculations.

Finding the Optimal Path for the Recurrence Relation The optimal path for RR can be defined in various ways. In this paper, the optimal path is defined as the one that involves the minimum number of intermediate integrals in an RR. Our discussion below is based on the HGP scheme. 17 However, the idea and its implementation are transferable to other RR schemes. When an ERI is computed with VRR, the calculation is performed together with other ERIs that belong to the same shell quartet to reduce the repeated use of intermediate integrals. The number of the ERIs of a shell quartet depends on the angular momentum of each shell. Following the convention in literature, we call the shells a and b on the “bra” side, c and d on the “ket” side in a shell quartet (ab|cd). Their positions in the shell quartet are called bra1, bra2, ket1 and ket2, respectively. The VRR may be started from any of the positions. For example, all of shell positions in the shell quartet (ab|cd) on the LHS of Eq. 3 are expandable unless the shell is a “S” type Gaussian function. The corresponding shell quartets on the RHS of VRR need to be expanded again and this recursive VRR process ends until all the RHS of VRR become bottom integrals. A complete path consists of all the choices of the expanding positions for all the shell quartets involved. Because a different expanding position leads to different terms on the RHS, the intermediate shell quartets and the number of intermediate ERIs vary among different paths for the same target shell quartet. Therefore finding the optimal path becomes a problem of finding the optimal expanding positions such that the number of intermediate integrals is minimized. Our method takes advantage of a unique feature of the VRR Eq. 3, which is that the change

8 ACS Paragon Plus Environment

Page 8 of 28

Page 9 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

in the total angular momentum of a shell quartet and m is directional as each shell quartet is expanded. This can be seen if the equation is generalized as in the following where SQ stands for shell quartet:

SQ(L, m) = a0 SQ0 (L − 1, m) + a1 SQ1 (L − 1, m + 1) + a2 SQ2 (L − 2, m) − a3 SQ3 (L − 2, m + 1) + a4 SQ4 (L − 2, m) − a5 SQ5 (L − 2, m + 1) + a6 SQ6 (L − 2, m + 1) + a7 SQ7 (L − 2, m + 1) .

(7)

In the above equation, L is the sum of angular momentum of each shell in shell quartet (ab|cd): L = La + Lb + Lc + Ld

,

(8)

m is the parameter for auxiliary integral (ab|cd)(m) in Eq. 4. From Eq. 7, it is clear that L is constantly decreasing and m is increasing in same manner from LHS to RHS, i.e. the VRR for ERI is “irreversible” between LHS and RHS. This irreversibility character indicates that if the optimal expanding positions for a group of LHS shell quartets are chosen, the search of the optimal expanding positions for the corresponding RHS shell quartets is independent from the previous search. Based on this feature, we propose the following search algorithm. First, we group all the LHS shell quartets for a set of given values of L and m, and other LHS shell quartets who possibly share the RHS shell quartets into a “package”. The LHS shell quartets in the package are called unsolved shell quartets because their expanding position have not been determined yet. Inside the package, an exhaustive search is performed to find the best expanding positions for the unsolved shell quartets corresponding to values of L and m. The same search is performed for all the packages at the next levels of L and m along the direction in Eq. 7. This algorithm can be described by the pseudo code below: create and initialize the unsolved shell quartet archive with

9 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

VRR output shell quartets; loop over m (from m=0 to maximum): loop over L (from maximum to 0): while(true): perform optimal expanding postion search for all (L,m) LHS shell quartets in the unsolved shell quartet archive; push the result RHS shell quartets into the unsolved shell quartet archive; if there are no (L,m) LHS shell quartets break out the loop; end while end loop with L end loop with m The above search algorithm is performed recursively with the unsolved shell quartets, because unsolved shell quartets are generated as a result of the previous search iteration; except the VRR output ones. In general such a process transforms the VRR path into a directional search based on the package following L and m changing direction. The search algorithm actually finds the shortest path inside each package, and the whole VRR path is formed by combining all of these shortest “local” paths. The reverse of the above search result gives the optimal VRR path from bottom integral (00|00)(m) to VRR output shell quartets (ab|cd). A general exhaustive search to derive the global optimum path is generally impossible due to the exponential growth of possible expanding position combinations for VRR LHS shell quartets. By grouping closely related shell quartets into packages, our method simplifies the relations between packages and allows exhaustive searches inside each package. Compared with standard search algorithms, the search scheme described above is a “Greedy Algorithm”. 24 A Greedy Algorithm is an algorithm that follows the problem solving heuristic 10 ACS Paragon Plus Environment

Page 10 of 28

Page 11 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

of making the locally optimal choice at each stage with the hope of finding a global optimum. The Greedy Algorithm is able to give the optimum solution to the problem exhibits optimal substructure if an optimal solution to the problem contains optimal solutions to the sub-problems. For this case, because the angular momentum is constantly decreasing from LHS to RHS on VRR, the integral numbers are constantly decreasing from the LHS to RHS; therefore the optimal path search for VRR is a problem with optimal substructure therefore we could expect that the above procedure may give result in a VRR path close to the global optimum. In performing a complete search for the package with respect to the given L and m values, it’s necessary to include shell quartets with different (L, m) values if they possibly share RHS terms with the LHS (L, m) shell quartets; even though their expansion positions will not be determined within the search of the current package. It can be seen in Eq. 7 that for LHS shell quartet SQ(L, m), shell quartets SQ(L − 1, m), SQ(L, m + 1) and SQ(L − 1, m + 1) may possibly share RHS with SQ(L, m). Their unsolved expansion positions will affect the determination of expanding position for the (L, m) shell quartets in the current package. Therefore it is necessary to include them in the current package. The only exception is that shell quartets with L + 1 or m − 1 are not included in the current package even though they also possibly share RHS terms with SQ(L, m), since they have been included in earlier packages along the VRR direction and thus their expanding position have been determined. This is a result of the irreversibility of the VRR discussed above. Based on the above analysis, the unsolved shell quartets for a given package can be divided into two groups: • unsolved main list; • correlated list; The unsolved main list contains the unsolved shell quartets with given values of (L, m). Their expanding positions will be determined in the current search iteration. The correlated list is 11 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

composed of the unsolved shell quartets in the unsolved shell quartet archive that possibly share the RHS terms with the unsolved main list, such as SQ(L−1,m) etc. The unsolved main list and the correlated list are combined to form a package. A search is performed on all possible expansion combinations for every unsolved shell quartets inside the package. The search result gives the final expanding positions to the unsolved main list. Once the search is done for the package at certain L, m level, the package is formed and searched at the next L, m level along the direction of VRR. This process repeats all the way to the bottom integrals (00|00)(m) , and the complete optimal path is formed through this process. It can be depicted as the following pseudo code: create archive for solved shell quartets; create archive for unsolved shell quartets; initilize the unsolved shell quartet archive with VRR output shell quartets; loop over m (from m=0 to maximum): loop over L (from maximum to 0): while(true) create main shell quartet list, and initialize it with shell quartets with (L,m) from unsolved shell quartet archive; create empty correlated shell quartet list and initialized it with shell quartets in (L-1,m), (L,m+1) and (L-1,m+1) from unsolved shell quartet archive; combine the main shell quartet list and correlated shell quartet list into package; loop over all possible combinations of expanding positions in package: compare the new RHS shell quartets with solved 12 ACS Paragon Plus Environment

Page 12 of 28

Page 13 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

and unsolved shell quartets, remove the new RHS shell quartet if it’s double couting (vertical comparison); compare the new RHS shell quartets with each other and remove the repeat ones (horizontal comparison); count the number of integrals from all remaining RHS shell quartets and find the best combination; end loop of expanding combination push the main shell quartet list into solved shell quartet archive; push the new RHS shell quartets corresponding to the best combination into unsolved shell quartet archive; store the best combination; is there remaining unsolved shell quartets with values of (L,m)? if not, step out the while loop; end while end loop with L end loop with m The final optimal path is the collection of the best combinations of expanding positions of LHS shell quartets for all (L, m) levels. We note that the algorithm above not only applies to ERI, but also to KI, NAI etc. as long as the integral can be derived from VRR with irreversible character. HRR, on the other

13 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 28

hand, cannot employ the above algorithm because the irreversibility is missing inside:

(a(b + ιi )|cd) = ((a + ιi )b|cd) + ABi (ab|cd) ⇕ ((a + ιi )b|cd) = (a(b + ιi )|cd) − ABi (ab|cd) .

(9)

In the HRR formula, if LHS shell quartet (F D|P S) is expanded in terms of shell D position according to Eq. 9, in search of expanding position of result RHS shell quartet (GP |P S) it will resort to the expansion on shell G because the previous LHS (F D|P S) becomes the RHS for expanding (GP |P S). As a result, the expansion search forms a cycle and the expanding positions for LHS shell quartets would not be correctly determined in the generated HRR path. Instead, we apply another strategy of path search for HRR. Since HRR expansion treats the bra and ket sides separately, a trial expanding test is performed to determine the best bra/ket expansion for HRR. For ERI the trial expanding positions are grouped into four cases, namely (bra1,ket1), (bra1,ket2), (bra2,ket1) and (bra2,ket2). For each position combination a HRR path search is conducted and the final HRR path uses the position combination which generates the minimum number of intermediate integrals. Based on the combination of VRR and HRR, the evaluation of target shell quartet (ab|cd) is divided into three modules: • convert (ab|cd) into a shell quartet list in the form of (e0|cd) through “HRR2” module; • convert the shell quartets of (e0|cd) into shell quartets in the form of (e0|f 0) through “HRR1” module; • use VRR module to transform the shell quartets (e0|f 0) into bottom integrals (00|00)(m) . Each module goes through its own optimal path search, and their combination in the reverse order from VRR to HRR2 generates the complete path in terms of intermediate shell quartets 14 ACS Paragon Plus Environment

Page 15 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

for a target shell quartet to be calculated from bottom integrals. After the optimal path for shell quartets is set, the next step is to generate the recursive expansion for individual integrals in each LHS shell quartet associated with the given path. The integral generation implicitly comes with a question whether every integral in an intermediate shell quartet is needed based on the consideration that the natural redundancy of the Cartesian type of Gaussian functions compared with the spherical type of Gaussian functions. We find through a recursive algorithm that the answer for this question varies from case to case and each case needs to be investigated on the fly. The algorithm begins with setting the basis functions in a particular order. For example the basis functions in a P shell can listed as Px , Py , Pz or Pz , Py , Px . Different orders are intrinsically equivalent with each other. For a predetermined basis function order, it establishes the sequence of basis set functions in a shell. Therefore in a general shell quartet (ab|cd), one-to-one mapping can be set up between the integral and its index derived from the basis function order. Using the index instead of the integral itself saves significant amount of memory for code generation and makes it feasible for high angular momentum shell quartets. With the index expression of an integral, the algorithm traverses each module above in reverse order from HRR2 to VRR to produce recurrence relation expression for integrals in every LHS shell quartets. Inside each module the recurrence relation is also generated in reverse order by beginning from module output to module input, so that during the generation process every RHS integral will be defined through a recurrence relation in the previous context. In this process, there are LHS integrals that are not defined. This is because such integrals are not used as RHS terms of recurrence relation in the following context. We call these integrals “redundant integrals”, and they are excluded from the recurrence relation. The algorithm is described in the pseudocode below: create LHS shell quartet list and initializes with output shell quartets required by 15 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

next modules; create LHS integrals list corressponding to every LHS shell quartet and initialize with integrals required by next modules; create archive to hold RR expression for this module; while(true): create temporary LHS shell quartet list and it’s associated integral list; loop over the LHS shell quartets in the list: form RR expansion for integrals appearing in the LHS integrals list; if there’s no such RR expansion in the achive, push the result RR expansion into archive; if the RR expansion already exist in archive, merge the new integral RR expansion into the archive; if the new RHS integrals can be further derived through RR, push it into the temporary LHS integral list; end loop if temporary LHS list is empty, break the loop; replace the content of LHS shell quartet list and corresponding integral list with the temporary one; end while

16 ACS Paragon Plus Environment

Page 16 of 28

Page 17 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

A New Scheme for calculating fm(t) The fm (t) integral



1

fm (t) =

u2m e−tu du , 2

(10)

0

is a necessary component in calculating the bottom integrals of ERI (00|00)m , NAI (0|0)m etc. In this work we present a hybrid scheme for efficient computation of fm (t) based on the formulas given in ref. 25 At m = 0 the fm (t) becomes error function: ∫

1

f0 (t) =

e−tu du , 2

(11)

0

which is available in variety of standard libraries. For m > 0, fm (t) is an incomplete Gamma function and it satisfies a recursive expression:

fm (t) =

( ) 1 2tfm+1 (t) + e−t 2m + 1

,

(12)

thus fm (t) can be derived from fmmax (t). On the other hand, Eq. 12 can be reorganized as:

fm (t) =

) 1 ( (2m − 1)fm−1 (t) − e−t 2t

.

(13)

This expression provides the most efficient way to compute the fm (t) by starting from f0 (t). However, it is numerically unstable due to error propagation as m grows larger. A stable computation of fm (t) can be found through a polynomial expansion for any t and m values: fm (t) = e

−t

∞ ∑ k=0

(2m − 1)!! (2t)k (2m + 2k + 1)!!

(14)

This expansion, however, converges slowly with respect to increasing t, therefore is typically applied for cases with small t. As t becomes large, a continued fraction representation can

17 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 28

be used instead to compute fm (t): 25 fm (t) =

√ (2m − 1)!! πv m 1

{ 2t 2 } v (1 − 2m)v 2v (3 − 2m)v 4v (5 − 2m)v 6v −t −e , 1+ 1+ 1+ 1+ 1+ 1+ 1 + ···

(15)

where v = (2t)−1 . Although Eq. 15 can yield a very accurate result, its computational cost is higher than that of Eq. 13. Many standard libraries adopt the hybrid strategy for implementing fm (t). For instance, the BOOST math library expands fm (t) into polynomial series for small t, for large t it uses Legendre’s continued fraction representation for computation. In our implementation a polynomial expansion of 14 is also used when t is smaller than Tlimit . This equation applies to all of m values in fm (t). For large t, instead of using continued fraction representation we try to find the upper limit of m for which fm (t) can be accurately calculated with Eq. 13. For m less than this limit Mlimit , fm (t) is calculated with Eq. 13 by starting from error function f0 (t). When m is larger than the Mlimit , the recurrence relation 12 is applied together with the calculation of fmmax (t). This hybrid procedure can be summarized as: 1. if Mmax == 0, use error function; 2. if Mmax >= 1 and Mmax Mlimit : (a) if t Tlimit , calculate fMmax (t) and use recurrence relation in 12 for all of other fm (t). Mmax is the largest m value for (00|00)m type of integrals, and Tlimit is the maximum limit of t used in polynomial expansion 14, and Mlimit is the limit of m value for which recurrence relation 13 can applied. To explore the best Tlimit and Mlimit combinations, a trial test is performed on the above hybrid scheme. In this test the T value is sampled in step length of 1.0E-6 between 0 and Tlimit for Eq. 14, and recurrence relation 13 employs T from Tlimit to Tmax with same step length. When T is large enough, the e−t term in Eq. 13 becomes 0 thereafter the recurrence relation becomes stable in terms of error propagation. Considering this fact the Tmax is set to be 40.0. For polynomial expansion 14, m is tested between 0 and 40. All of trial tests use the BOOST math library for calculating the standard fm (t). Table 1 summarizes the result of the trial test. It can be seen that the polynomial expression 14 always keeps the maximum absolute error(MAE) within 1.0E-14 for the given Tlimit , and the MAE for recurrence relation 13 with respect to different Tlimit and Mlimit combinations are all below 1.0E-12, while with the combination of Tlimit = 2.0, Mlimit = 8 the recurrence relation 13 reports MAE of 1.0E-14. From the table we can see that the new hybrid scheme is able to generate satisfiable precision for all of applications with fm (t). Table 1: maximum absolute error for recurrence relation 13

a

T = 1.8(18 terms) T = 1.9(20 terms) T = 2.0(22 terms) a

M=8 M=9 3.0E-14 1.2E-13 2.0E-14 0.7E-13 1.0E-14 0.4E-13

M = 10 6.3E-13 3.4E-13 2.0E-13

this is the number of terms used in Eq. 14

19 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 28

Performance Assessment The algorithms shown in previous sections have been implemented for a variety of analytical integrals, such as ERI, NAI, two-center and three-center overlap and kinetic integrals etc. Because the algorithms are independent of the details of recurrence relations, the implementation here provides an easy way to implement recurrence relations that can be used to generate new type analytical integrals. The software is available for download at https://github.com/murfreesboro/cppints, and it uses MIT license for free distribution for any purpose. FLOPS(floating point operations) count 17,27 is traditionally used for comparing the performance of analytical integral algorithms. The computation of ERI can be expressed as the following: (ab|cd) =

K K ∑ K ∑ K ∑ ∑ i

j

k

di dj dk dl (χi χj |χk χl ) .

(16)

l

K is the contraction degree for basis set functions. Eq. 16 assumes that all of four basis set functions in above ERI are in same contraction degree. Thus the corresponding FLOPS count is: NF LOP S−COU N T = x × K 4 + y × K 2 + z × K 0

,

(17)

where x is the FLOPS count inside four contraction loops etc. Table 2 lits the FLOPS counts for both of the original HGP implementation 17 and the new HGP implementation based on the algorithms described above. Only the operations of adds, subtracts, multiplies and divides are counted so to comply with the FLOPS count setting in original HGP work. We can see that the new implementation consistently outperforms the original well-optimized implementation for all of shell quartets except the (SP SP |SP SP ), where the K 4 FLOPS count exceeds 353 FLOPS count in contrast with the original HGP implementation. (SP SP |SP SP ) is a special type of composite shell quartet where S and P shells share the same set of exponents and are considered one composition shell. It contains many shell quartets with combination of S shells and P shells. In a (SP SP |SP SP ), the 20 ACS Paragon Plus Environment

Page 21 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

shell quartets containing one P shell and (P S|P S), (P S|SP ) and (SP |SP ) and (SP |P S) are directly calculated through VRR step in our implementation and the rest of them are calculated through the combination of VRR and HRR. Another possible implementation that may perform better is to calculate all of them through the VRR and HRR combination except (SS|SS). With this route, VRR produces the shell quartets in the type (e0|f 0). The angular momentum of different shell positions are shifted in HRR according to Eq. 9. Such a procedure saves FLOPS in VRR, but has additional costs in HRR step. The original HGP algorithm may take the second way of implementation, as a result it saves VRR cost but its HRR cost is significantly higher than the new HGP implementation. Table 2: FLOPS count between old HGP and new HGP implementation on ERI shell quartet (P P |P P ) (SP SP |SP SP )

(DD|DD)

(F F |F F )

(GG|GG)

(HH|HH)

a b

coefficients old HGPa new HGPb x(×K 4 ) 920 896 2 y(×K ) 30 7 z(×K 0 ) 330 330 x(×K 4 ) 1400 1753 2 y(×K ) 30 16 z(×K 0 ) 800 582 x(×K 4 ) 14600 13439 2 y(×K ) 30 7 z(×K) 11300 10592 x(×K 4 ) 108000 90855 y(×K 2 ) 30 7 0 z(×K ) 135000 111018 4 x(×K ) 413030 y(×K 2 ) 7 0 z(×K ) 700786 4 x(×K ) 1412642 y(×K 2 ) 7 0 z(×K ) 3209502

FLOPS count taken from original HGP 17 FLOPS count for this work

A significant part of the performance improvement in our method is due to the algorithm that removes the redundant integrals used in RR. Table 3 presents the summary of redundant

21 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 28

integrals number as well as its ratio for both VRR and HRR in ERI. The ratio is computed by dividing the redundant integrals number by the total number of integrals. The total number of integrals is the sum of redundant integrals and the LHS integrals calculated in the code. As revealed in Table 3, the proportion of the redundant integrals increases systematically with the increase of the angular momentum of the shell quartet for both VRR and HRR. By removing redundant integrals, the integral computation can save 20% or more for basis functions in F shell(L = 3) and beyond. Table 3: summary of redundant integral number and ratio in ERI shell quartet (P P |P P ) (SP SP |SP SP ) (DD|DD) (F F |F F ) (GG|GG) (HH|HH)

VRR HRR 0 0 0 0 261(%11) 335(%8) 3127(%20) 12006(%19) 15552(%22) 119140(%27) 56997(%25) 721064(%33)

To assess the correctness and the suitability of the new algorithms for practical computation, we implemented the self-consistent field(SCF) calculation for Hartree-Fock(HF) method. 28 The KI, NAI, ERI and two body overlap integrals are all generated through this implementation. ERI and NAI use the combination of Mlimit = 10, Tlimit = 1.8 for bottom integral calculation. For exchange and Coulomb calculation, the Cauchy-Schwarz Inequality is used for screening: 29 |(µν|λη)| ⩽



(µν|µν) ×

√ (λη|λη) = Qµν Qλη

.

(18)

The inequality in Eq. 18 is further combined with the absolute largest density matrix value Dmax in the corresponding shell pair block to predetermine the significance of the integral calculation. For example, the calculation of the Coulomb matrix is screened through the following criteria: max max )