Article Cite This: J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
pubs.acs.org/JCTC
Dual Basis Set Approach for Density Functional and Wave Function Embedding Schemes Bence Heǵ ely, Pet́ er R. Nagy, and Mihaĺ y Kaĺ lay* MTA-BME Lendület Quantum Chemistry Research Group, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, P.O. Box 91, H-1521 Budapest, Hungary
J. Chem. Theory Comput. Downloaded from pubs.acs.org by UNIV OF SOUTH DAKOTA on 09/02/18. For personal use only.
S Supporting Information *
ABSTRACT: A dual basis (DB) approach is proposed which is suitable for the reduction of the computational expenses of the Hartree−Fock, Kohn−Sham, and wave function-based correlation methods. The approach is closely related to the DB approximation of Head-Gordon and co-workers [J. Chem. Phys. 2006, 125, 074108] but specifically designed for embedding calculations. The new approach is applied to our variant of the projector-based embedding theory utilizing the Huzinaga-equation, multilevel local correlation methods, and combined density functional-multilevel local correlation approximations. The performance of the resulting DB density functional and wave function embedding methods is evaluated in extensive benchmark calculations and also compared to that of the corresponding embedding schemes exploiting the mixedbasis approximation. Our results show that, with an appropriate combination of basis sets, the DB approach significantly speeds up the embedding calculations, and, for chemical processes where the electronic structure considerably changes, it is clearly superior to the mixed-basis approximation. The results also demonstrate that the DB approach, if integrated with the mixedbasis approximation, efficiently eliminates the major weakness of the latter, and the combination of the DB and mixed-basis schemes is the most efficient strategy to accelerate embedding calculations.
1. INTRODUCTION The computational expenses of electronic structure calculations steeply grow with the size of the studied systems.1 The computation time of the Hartree−Fock (HF) and Kohn− Sham (KS) density functional theory (DFT) calculations using a hybrid functional scale formally as the fourth-power of the systems size (N) and the scaling of KS calculations with a “pure”, i.e., nonhybrid functional, is still N3. The more advanced ab initio correlation methods, such as the secondorder Møller−Plesset (MP2) approach2 or the coupled-cluster (CC) models,3 e.g., CC with single and double excitations (CCSD)4 and CCSD with perturbative triples [CCSD(T)],5 exhibit a scaling of at least N5. Consequently, numerous approaches have been developed over the years to break down the expenses of quantum chemical calculations. Since the computation time for any electronic structure calculation strongly depends on the size of the atomic orbital (AO) basis set applied, the most plausible strategy to decrease the computational costs is the reduction of the size of the AO basis using, for instance, the mixed-basis set approach. In this simplistic approximation the atoms of the chemically relevant part of the system are described using a large AO basis set, whereas a smaller basis set is employed for the remaining atoms.6−14 Although this approach is conceptually simple and trivial to implement, it is not elegant from the theoretical point of view. The use of basis sets of different sizes gives rise to the artificial polarization of the electronic structure since the © XXXX American Chemical Society
electron density grows in the region where more basis functions can be found. These artifacts considerably worsen the efficiency of the approach for chemical processes where the electronic structure significantly changes, such as chargemanifestation processes.15 In addition, the use of mixed basis sets might cause convergence problems for self-consistent field (SCF) algorithms and also increases the probability that the SCF procedure converges to an unstable solution (see Sect. 3.4). The so-called dual basis (DB) set approach, which is free from the aforementioned problems, is another related strategy to decrease the basis set size. In a DB HF or KS SCF calculation the SCF equations are first solved in a relatively small basis set, the resulting electron density or molecular orbitals (MOs) are some way expanded in a bigger AO basis, and usually an approximate SCF energy is also evaluated with the latter basis set, hereafter referred to as the large basis. For the computation of the correlation energy in a DB framework the simplest approach is to use the larger basis set together with the MOs constructed in the preceding DB HF calculation, but more complicated correlated DB methods also exist, in which two basis sets are employed also at the correlation calculation. The DB approximation was first employed in 1983 at the HF level by Havriliak and King,16 who studied the Received: April 14, 2018 Published: July 26, 2018 A
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
different approaches. In an orbital free representation of the electron density or if the orbitals representing the embedded and embedding electron density are not orthogonal, the sum of the subsystem electron densities does not agree with the total density. Consequently, the kinetic energy is also not additive, which can be remedied by the use of approximate nonadditive functionals58,59 or with optimized effective potential methods.60−66 In the alternative projector-based embedding theories developed by Rajchel et al.,67 Manby and coworkers,68−71 and Hoffmann et al.72 projectors are employed to impose the orthonormality of the active and the environmental orbitals. Another new approach, which resolves the above issue in an elegant way, is the embedded mean-field theory (EMFT) developed by Miller and Manby and their coworkers.73,74 In EMFT the system is partitioned at the basis set level, the two subsystems are treated by a low- and a high-level DFT approach, and the subsystem densities are simultaneously iterated in a single SCF run. Unfortunately, EMFT does not allow for WFT-in-DFT-type embedding calculations. The embedding of a WFT method into a lower-level WFT approach does not seem to be a trivial task either. Nevertheless, simple WFT-in-WFT type embedding schemes can be developed if the locality of the electron correlation is exploited, and the electrons localized on different parts of a molecule are treated at different levels of theory.75−81 Recently, we put forward a simple extension of the projector-based embedding of Manby and co-workers,80 which, instead of a somewhat arbitrary level-shift projector, uses the Huzinaga-equation.82 In the Huzinaga-equation-based embedding, hereafter shortly referred to as Huzinagaembedding, the orthonormality of the active and inactive orbitals is strictly enforced. The potential of the Huzinagaequation was subsequently also realized by Chulhai and Goodpaster for embedding schemes where the subsystem’s density or wave function is expanded in terms of basis functions associated with the subsystem’s atoms.83 A simple alternative to the Huzinaga-equation-based scheme was proposed by Hammes-Schiffer et al. utilizing the orthogonality constrained basis set expansion procedure, which also allows both the active and the environmental densities to respond to each other.84 The mixed-basis approach has also been combined with quantum embedding schemes, where the use of mixed basis sets seems to be a natural way to reduce the computational costs because the system is anyway partitioned into high- and low-level regions.73,74 Of course, the major disadvantage of the mixed-basis methods, the imbalanced description of the highand low-level densities, is also inherited by the combined approaches and can be even amplified for embedding theories where the environmental densities are not frozen, such as EMFT,15 especially when using a minimal basis set for the environment. For the latter method it was recently demonstrated that the electron density artifacts can be largely eliminated by the semiempirical Fock-corrected DFT approach,85 where the energy evaluated with an inexpensive functional and a minimal basis is corrected to recover the KS energy obtained with an accurate functional and a larger basis set. Nevertheless, to our knowledge, embedding schemes have not yet been combined with the genuine DB approaches presented above. In this paper we propose a slightly modified version of the DB method of Head-Gordon and co-workers.23 The new approach might be interesting on its own right but is
Rydberg states of the ammonium radical. Since HF calculations with large basis sets including diffuse functions were not feasible at that time, the authors solved the SCF equations with a smaller basis set. Then an appropriate virtual MO space was constructed by orthonormalizing a bigger AO basis containing the required diffuse functions to the occupied space and diagonalizing the Fock operator in the new basis. For the first time, DB sets were applied in correlated calculations by Jurgens-Lutovsky and Almlöf.17 In their DB MP2 method, a DB HF calculation was performed following Havriliak and King, and the MP2 energy was evaluated as usual but utilizing that occupied orbitals were expanded in the small basis, and consequently only integrals with two indices in the larger basis are required. This approach was later also adapted in the DB MP2 method of Wolinski and Pulay.18,19 Significant progress has been made on DB methods by Head-Gordon and co-workers.20−29 In their DB SCF approach an SCF calculation is converged in a small basis set, the density matrix is projected into the large basis set, a single Roothaan step is taken, and an energy correction is evaluated.22,23 Alternatively, the expression for the energy lowering can be reformulated to avoid the diagonalization of the Fock matrix, which makes the method suitable for a linear-scaling implementation.22 Based on the new DB HF approach a DB MP2 method was also developed utilizing the density-fitting (DF) MP2 technique.23 Later analytic gradients were also implemented for the DB SCF24 and the DF-MP2 26 methods. The utility of these approaches was demonstrated in ab initio molecular dynamics simulations28 and for noncovalent interactions.27,30 Further promising DB HF approximations were introduced by Gill et al. deriving perturbative corrections that estimate the complete basis set HF energy by diagonalizing the Fock potential from a small basis in a large one.31,32 New DB MP2 methods were also proposed by these authors, where the MOs are obtained from the large basis HF calculation, and the DB approximation is invoked at the integral transformation.33,34 An interesting dual-level DFT approach was published by Nakajima and Hirao, where, in the primary SCF calculation, not only a smaller basis set but also a less sophisticated functional was used.35 This approach was also extended to time-dependent DFT36 and relativistic DFT,37 while Gill and co-workers generalized the method by also applying a cruder quadrature grid at the first SCF run.38 The DB approximation was used in reduced-scaling correlation methods by Kobayashi and Nakai39 and Friedrich and Dolg,40−43 and very recently Røeggen and Gao proposed a scheme closely related to the DB approach for the calculation of the correlation energy of molecules and solids.44 We also note that the complete auxiliary basis set singles correction employed frequently for improving the HF reference energy in explicitly correlated methods45−48 can also be regarded as a special DB approximation. The other large class of reduced-cost methods that are relevant for our study includes quantum embedding approaches.49−57 In these methods the system is divided into an active subsystem comprising the chemically important part and its environment. The latter is described at a lower-level of theory, such as semiempirical or DFT, while a more accurate approximation, such as a more advanced DFT approach or a wave function theory (WFT) method is used for the active subsystem. The arising DFT-in-DFT and WFT-in-DFT schemes face the important problem of the coupling of the B
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
theory somewhat more accurate and theoretically more satisfactory. On the other hand, the calculation of energy gradients is more complicated with our approach, and our purified D0 is less sparse than the density matrix obtained simply by projecting the small basis density matrix into the large basis, which worsens the efficiency of the Fock-matrix construction in the large basis. Anyway, the n-representability of the density matrix or, equivalently, the orthonormality of the orbitals used to evaluate the large-basis Fock-matrix is a must for embedding theories, which are of primary concern in this study. Furthermore, as it will be obvious soon, the generalization of eq 2 to our embedding method would be problematic if E[D0] was replaced by the converged small basis SCF energy. 2.2. Huzinaga-Equation-Based Embedding. Here we briefly review our Huzinaga-embedding,80 which is a modified version of the projector-based embedding theory of Manby and co-workers.68,69 As customary in embedding theories the system is divided into two subsystems defined by the corresponding atoms, an embedded (active) part A and its environment B. In the case of DFT-in-DFT embedding the two subsystems are described by a high- and a low-level DFT method, defined by energy functionals E1[D] and E2[D], respectively. The corresponding Fock-matrices will be denoted by F1[D] and F2[D], respectively. In WFT-in-DFT calculations, a WFT method is used for the active subsystem. Then a HF-in-DFT calculation is carried out first, and E1[D] will stand for the HF energy. In the initial step of the embedding calculation the SCF equations of the low-level method defined by E2[D] are converged for the entire system. The occupied MOs are localized, and for each localized MO (LMO) the atoms are identified on which the orbital is localized invoking the Boughton−Pulay (BP) algorithm86 or an algorithm based on Mulliken populations (see the Supporting Information for more details). The LMOs which are localized on the atoms of the embedded subsystem and the environment will be denoted, respectively, by ϕAi and ϕBi . The selection of the orbitals also determines the number of the electrons in the embedded and embedding subsystems, for which symbols nA and nB will stand, respectively. From orbitals ϕAi and ϕBi the density matrices of the two subsystems, DA and DB, respectively, can be computed, and the density matrix of the entire system, DAB, is obviously their sum. In the second step of the embedding calculation a further SCF calculation is performed for the active subsystem of nA electrons keeping the ϕBi orbitals frozen. The DFT-in-DFT (HF-in-DFT) energy is evaluated as
particularly suitable for combination with embedding theories. We apply the new approach to our Huzinaga-embedding method. The performance of the resulting DB embedding theories is evaluated in extensive benchmark calculations and also compared to that of the corresponding mixed-basis embedding schemes.
2. THEORY 2.1. Dual Basis Set Approach. Let us assume that we have a system of n electrons and two basis sets, a small and a large one; the former is not necessarily a subset of the latter. We are interested in the HF or KS energy evaluated with the large basis set. The corresponding energy functional will simply be denoted by E[D], where D is a one-particle density matrix. The Fock-matrix required in the SCF procedure is defined as F=
∂E ∂D
(1)
If not obvious, notation F[D] will be used for the Fockmatrix computed from density matrix D. The DB calculation starts with solving the SCF equations in the small basis set. The resulting occupied MOs are projected into the large basis and orthonormalized. Since the method is invariant with respect to the orthogonalization algorithm, any one can be chosen. From the orthonormal MOs a density matrix, D0, is computed. In the case the initial SCF is converged with a density-based algorithm, i.e., no MOs are available, we can directly project the density matrix onto the large basis and use some purification technique to construct an n-representable D0. If MOs are needed, they can be obtained as the natural orbitals of D0 transformed to the orthogonalized AO basis, that is, by diagonalizing matrix S1/2D0S1/2, where S stands for the AO overlap matrix of the large basis. Subsequently, one SCF iteration is performed with D0 in the large basis set. The Fock-matrix F[D0] is constructed, the E[D0] energy is calculated, the Fock-matrix is diagonalized, and an improved density matrix, D1, is computed from the new MOs. To obtain a good approximation to the converged SCF energy with the large basis set, a first-order estimate for E[D1], E[1][D1], is derived. If we expand E[D] into Taylor series around D0, drop the quadratic and higher-order terms, and utilize eq 1, we arrive at the E[1][D1] = E[D0] + Tr[F[D0](D1 − D0)]
(2)
approximation, which also defines our DB SCF energy. If a DB correlation calculation is carried out, the DB SCF energy is evaluated as described above, and the correlation energy is calculated with Fock-matrix F[D0] and the MOs obtained by the diagonalization of F[D0]. Obviously, both the DB SCF and the DB correlation approaches are exact in the sense that, if the small basis and large basis sets are identical, the SCF or the correlation energy computed with the given basis set is retrieved. The DB approach outlined here differs at two important points from that presented in ref 23. First, in our energy expression, eq 2, the zeroth-order term, E[D0], is an approximate SCF energy evaluated in the large basis, while Head-Gordon and co-workers use the converged small basis SCF energy instead. In this respect our method is analogous to the approach of Gill et al.31 Second, in contrast to ref 23 and most of the other DB schemes our density matrix D0 is nrepresentable. On the one hand, these changes make the
É ÅÄÅ ∂E ÑÑÑ Å DFT ̃ A E12 [D ; DA , DB] = E12[D̃ A ; DA , DB] + TrÅÅÅ(D̃ A − DA ) 12A ÑÑÑ ÅÅÇ ∂D ÑÑÖ
(3)
where E12[D̃ A ; DA , DB] = E2[DAB] + E1[D̃ A ] − E2[DA ]
(4)
and D̃ A is the self-consistent density matrix of the second SCF run computed from orbitals ϕ̃ Ai . Note that E12 is the SCF energy of the entire system calculated with the low-level method replacing the contribution of the density of the active subsystem, DA, by that of the higher-level self-consistent density D̃ A, and the second term on the right-hand-side of eq 3 is a first-order correction to the difference between E12[D̃ A;D̃ A,DB] and E12[D̃ A;DA,DB]. To ensure the orthogonality of orbitals ϕ̃ Ai to the frozen ϕBi orbitals, the C
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation DFT DFT DFT B D S)C̃ A = SC̃ A Ẽ A (F12 − SDBF12 − F12
matrix corresponding to the low-level method is constructed AB AB with DAB 0 and diagonalized. A new approximation to D , D1 , AB is computed from the MOs, and an estimate for E2[D ] is evaluated as
(5)
Huzinaga-equation is solved, where Fock-matrix ̃A A B FDFT 12 [D ;D ,D ] is defined as DFT F12 =
∂E ∂E12 + 12A ∂D ∂D̃ A
E2[1][D1AB] = E2[D0AB] + Tr[F2[D0AB](D1AB − D0AB)]
(6)
Second, to calculate an approximate E2[D ] in a similar way, an appropriate DA1 is required. For that purpose one possibility would be to localize the orbitals obtained in the first step and assign the LMOs to the subsystems A and B as described in Sect. 2.2. In this case, however, orbitals ϕB0i and density matrix DB0 would be redefined, and we should avoid this to preserve the consistency with the other contribution. Instead, we project out the ϕB0i orbitals from the occupied MOs calculated in the first step using projector Q. This results in nA pieces of orbitals similar to ϕA0i and nB MOs of close to zero norm. These orbitals are orthonormalized employing canonical orthogonalization dropping nB eigenvectors of the overlap matrix with small eigenvalues. The remaining orbitals are the updated MOs of the active subsystem, ϕA1i, which are used to compute DA1 . The F2[DA0 ] Fock-matrix is constructed, and the approximate E2[DA] contribution is evaluated as
C̃ A denotes the matrix of MO coefficients for orbitals ϕ̃ Ai , and Ẽ A is a diagonal matrix with the corresponding orbital energies on its diagonal. Here and in the following, for brevity, we omit the arguments of E12 and FDFT 12 . By default, these will be deemed to be constructed of matrices D̃ A, DA, and DB, and it will be pointed out if the corresponding approximate density matrices are used instead of the latter. It is easy to see that the above embedding is exact, which means that the energy of the entire system evaluated with a mean-field method is recovered if that particular method is embedded into itself. In the case of WFT-in-DFT embedding, after the HF-inDFT calculation, a WFT calculation is performed with the above Fock-matrix and frozen ϕBi orbitals. The WFT-in-DFT energy is evaluated as ÄÅ É ÅÅ A ∂E12 ÑÑÑ A A B A Å ÑÑ ̃ D , D , D ] − E2[D ] − TrÅÅD ÅÅÇ ∂DA ÑÑÑÖ
E12WFT[Ψ A ; D̃ A , DA , DB] = E2[DAB] +
E1WFT[Ψ A ;
(8)
A
E2[1][D1A ] = E2[D0A ] + Tr[F2[D0A ](D1A − D0A )]
(9)
F2[DA0 ]
Note that the diagonalization of is avoided here as DA1 is calculated in an indirect way. Third, the F1[D̃ A0 ] Fockmatrix is computed, which is then used to construct DFT DFT DFT DFT B F12 [D̃ 0A;DA0 ,D0B] and the F12 − SD0BF12 − F12 D0 S Huzinaga-matrix. The latter is diagonalized to get orbitals ϕ̃ A1i, which are still orthogonal to orbitals ϕB0i. The updated density matrix of the active subsystem, D̃ A1 , is evaluated using orbitals ϕ̃ A1i. The approximate large basis E1[D̃ A] contribution is calculated similarly to the other ones as
(7)
where EWFT is the total energy computed with the WFT 1 A method using Fock-matrix FDFT 12 , and Ψ is the corresponding wave function. 2.3. Huzinaga-Embedding with Dual Basis Set. We seek the generalization of the DB approach presented in Sect. 2.1 for the Huzinaga-embedding outlined in Sect. 2.2, which is not unambiguous. Of the several possibilities, we explored two ones. In Ansatz 1 (A1), which is probably the most straightforDFT ward DB variant of the Huzinaga-embedding, E12 is computed with the small basis set as described above. The occupied ϕAi and ϕBi orbitals are projected into the large basis set. The projected ϕBi orbitals are orthonormalized, and the resulting orbitals, ϕB0i, are used to compute the nB-representable density matrix DB0 . We suppose that the ϕB0i orbitals and the DB0 density matrix are good enough approximations for the corresponding large basis quantities, or at least their quality is less important compared to that of the orbitals and densities of subsystem A. Hence, we do not endeavor to improve them, and they will be kept fixed in the following. The projected ϕAi orbitals are also orthonormalized, but beforehand orbitals ϕB0i are projected out by multiplying the corresponding MO coefficient matrix with the projector Q = 1 − DB0 S to ensure the orthogonality of the MOs of the active subsystem and the environment. From the ϕA0i orbitals constructed in this way, the nA-representable density matrix DA0 is computed, and we also A B calculate the n-representable DAB 0 = D0 + D0 density matrix. A ̃ The ϕi orbitals are processed similarly to ϕAi : they are projected onto the large basis, orbitals ϕB0i are projected out of them using Q, the resulting MOs are orthonormalized to construct orbitals ϕ̃ A0i, which are used to evaluate the nArepresentable D̃ A0 . Notice again that ϕ̃ A0i and ϕB0i are orthogonal to each other. Utilizing the new density matrices estimates for the largebasis values of the three terms on the right-hand-side of eq 4 are calculated relying on our DB approach. First, the Fock-
E1[1][D̃ 1A ] = E1[D̃ 0A ] + Tr[F1[D̃ 0A ](D̃ 1A − D̃ 0A )]
(10)
After introducing the above three contributions the large basis estimate for E12 can be defined by the [ 1] ̃ A E12 [D1 ; D1A , D0B] = E2[1][D1AB] + E1[1][D̃ 1A ] − E2[1][D1A ]
(11)
expression. To derive our large basis embedding energy we should consider the last term of eq 3. This is a first-order correction for the relaxation of the density of the active subsystem, thus we do not attempt to add another first-order correction for this term for the basis-set extension, but it is simply evaluated with the available D̃ A1 , DA1 , and E12[D̃ A0 ;DA0 ,DB0 ] quantities. Consequently, our final DB DFT-in-DFT (HF-inDFT) energy of A1 reads as DFT[1] ̃ A [D1 ; E12
D1A ,
D0B]
=
[1] ̃ A [D1 ; E12
D1A ,
D0B]
ÉÑ ÄÅ Ñ ÅÅ Å A A ∂E12 Ñ Å ̃ + TrÅÅ(D1 − D1 ) A ÑÑÑÑ ÅÅ ∂D0 ÑÑÑÖ ÅÇ
(12)
In the case of a WFT-in-DFT approach the WFT calculation ̃A A B is carried out with the FDFT 12 [D0 ;D0 ,D0 ] Fock-matrix in the A ̃ basis of the ϕ1i MOs keeping orbitals ϕB0i frozen, and the A1 DB WFT-in-DFT energy is evaluated according to the E12WFT[1][Ψ A ; D̃ 1A , D1A , D0B] = E2[1][D1AB] +
E1WFT[Ψ A ;
D̃ 1A ,
D1A ,
D0B]
−
E2[1][D1A ]
ÄÅ É ÅÅ ∂E ÑÑÑ A 12 Å ÑÑ − TrÅÅÅD1 Ñ ÅÅÅ ∂D0A ÑÑÑÑ Ç Ö (13)
D
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
then the calculations could proceed along the route similar to either A1 or A2. Alternatively, the small-basis orbitals projected into the large basis could also be localized before constructing the corresponding density matrices. We did not explore these possibilities since these are all more complicated than the two Ansätze presented above, which seem the simplest combinations of the Huzinaga-embedding and the DB approximations. Finally, we also note that the DB embedding schemes introduced can be combined with the mixed-basis approach. In the resulting dual mixed basis (DMB) scheme the small basis set of the DB embedding calculation is also the small basis of the mixed-basis approximation, while the large basis set is a mixed basis with the same small basis for the environment and a bigger one for the active subsystem. The major advantage of this scheme is that the initial SCF calculation is carried out with a homogeneous basis, and the density obtained will not be artificially polarized due to the mixing of basis sets. This density is frozen in the subsequent steps of the embedding calculation and expected to generate a more balanced embedding potential than the ones computed with mixed basis sets. Moreover, these steps of the embedding calculation will be faster because of the decreased size of the basis set. 2.4. Further Embedding Approaches. In our previous publication80 we also proposed a simple WFT-in-WFT multilevel local correlation approach based on our local correlation methods.78,87−90 In a local correlation calculation the MOs are localized, and a domain of LMOs is assembled for each occupied LMO. The approximate contribution of the latter to the correlation energy is evaluated in this domain. Additionally, pair domains are constructed for each pair of occupied LMOs, and the long-range correlation is estimated as the sum of the approximate pair correlation energies calculated within these domains. Our multilevel local correlation scheme utilizes this partitioning of the correlation energy: the occupied LMOs are classified as active or inactive as described in Sect. 2.2, and the correlation contributions are evaluated at different levels of theory depending on the type of the corresponding LMOs. For example, in a LNO-CCSD(T)-in-LMP2 embedding calculation, the active subsystem is treated at the CCSD(T) level using local natural orbitals [LNO-CCSD(T)],87,88,90 while our local MP2 (LMP2) model89 is applied to the environment. This WFT-in-WFT approach can also be combined with our Huzinaga-embedding outlined in Sect. 2.2 to define WFT-in-WFT-in-DFT embedding schemes, which will be referenced as the Huzinaga multilevel local correlation approximation. In these three-layer approaches, first, a HF-inDFT calculation is performed, and the WFT-in-WFT energy is calculated on the basis of the HF orbitals keeping the KS orbitals of the environment frozen. We also note that the embedding approaches discussed herein can also be integrated with the quantum mechanics/molecular mechanics (QM/ MM) method91 to introduce further three- or four-level embedding schemes, such as DFT-in-DFT-in-MM, WFT-inDFT-in-MM, or WFT-in-WFT-in-DFT-in-MM. The DB approximation can also accelerate the above embedding methods and the multilevel local correlation schemes as well. In the case of WFT-in-WFT approaches the original DB scheme of Sect. 2.1 can be used for the HF calculation. This is only efficient if the HF is the ratedetermining step, e.g., if a LNO-CCSD(T)-in-LMP2 calculation is carried out, and the embedded subsystem is relatively small. To speed up the three- or four-layer embedding calculations the DB method of Sect. 2.3 is applicable. Again,
equation. It is easy to see that eq 12 reduces to eq 2 in the limit that the entire system is active, and A1 is exact in the sense that the DB DFT energy evaluated for the entire system with the highlevel DFT method is recovered in this limit. A1 also inherits the exactness of the DB approach, that is, independently of the partitioning of the system, the DFT-in-DFT energy, eq 3, calculated with the particular basis set is reproduced if the small basis and large basis sets are the same. These also imply that similar statements hold for the DB WFT-in-DFT energy obtained with A1. We have also designed a somewhat simpler DB embedding scheme, Ansatz 2 (A2). Here the SCF equations are solved with the small basis set and the low-level method for the entire system, the orbitals are localized and assigned to the subsystems, but the further steps of the embedding calculation with the small basis are not taken. With the large basis set the A B ϕA0i, ϕB0i, and ϕA1i orbitals as well as the DA0 , DB0 , DAB 0 , D1 , D1 , and AB D1 density matrices are constructed, and the low-level energy AB [1] A contributions, E[1] 2 [D1 ] and E2 [D1 ], are calculated as in the case of A1. In the third step, however, the Huzinaga-equations ̃A A B are iterated to convergence with the FDFT 12 [D ;D0 ,D0 ] FockDFT B DFT DFT B and the F12 − SD0 F12 − F12 D0 S Huzinaga-matrix, where D̃ A is the self-consistent density matrix formed from orbitals ϕ̃ Ai . The DFT-in-DFT energy is calculated according to eqs 11 ̃A ̃A and 12 with replacing D̃ A1 by D̃ A and E[1] 1 [D1 ] by E1[D ]. If a WFT-in-DFT calculation is carried out, the correlation energy ̃A A B is computed using the Fock-matrix FDFT 12 [D ;D0 ,D0 ] with the B A ̃ ϕ0i and ϕi orbitals, and the energy is evaluated as EWFT[1] [ΨA;D̃ A,DA1 ,DB0 ]. 12 One can simply prove that the A2 DFT-in-DFT (WFT-inDFT) approach, in contrast to A1, is exact in the sense that the high-level DFT (WFT) energy of the whole system is retrieved if subsystem A incorporates the entire system. In turn, it is also true for A2 that the DFT-in-DFT (WFT-in-DFT) energy of the original Huzinaga-embedding is reproduced with identical small and large basis sets. Concerning the computational expenses of our DB DFT-inDFT embedding scheme, we should take into account that, in the small basis, both the low- and the high-level SCF equations must be solved with A1, whereas only the low-level SCF iterations have to be converged with A2. With the large basis set, however, altogether three Fock-matrix buildsone with n and two with nA electronsare necessary for A1, while a Fockmatrix of nA electrons has to be constructed in each SCF cycle with A2. This suggests that A2 is only competitive if the embedded subsystem is small with respect to the entire system but turns expensive when enlarging the active subsystem, even though the energy converges to a more accurate value. Concerning our WFT-in-DFT method, the DB approximation only reduces the costs of the SCF part, but those of the subsequent correlation calculation are not affected. Thus, the DB approximation can efficiently decrease the expenses of the WFT-in-DFT embedding calculation if the SCF run is a bottleneck, that is, the size of subsystem A is small with respect to that of the entire system. As mentioned above, there are further possibilities to introduce DB embedding Ansätze which are compatible with our DB approach and the Huzinaga-embedding. For example, one could directly project the small-basis DAB density onto the large basis, build the F2 Fock matrix from the projected and purified density matrix, and diagonalize it. The resulting orbitals could be localized to obtain orbitals ϕA1i and ϕB1i, and E
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation for WFT-in-WFT-in-DFT, WFT-in-DFT-in-MM, or WFT-inWFT-in-DFT-in-MM embedding considerable gain in speed can only be expected if the subsystem treated at the highest level is compact. Finally, the DMB approach can also be combined with our multilayer methods. Its major advantage is that, in the case of WFT-in-WFT or WFT-in-WFT-in-DFT embeddings, it speeds up not only the SCF iterations but also the correlation calculations.
results are shown in the main text. The rest of the results can be found in the Supporting Information. The WFT-in-DFT, WFT-in-WFT, and WFT-in-WFT-inDFT computations used the LNO-CCSD(T)87,88,90,105 method as the high-level approach with tight truncation thresholds except for aromatic systems, where the default thresholds were applied. In the multilevel local correlation calculations our LMP2 approximation89 was invoked as the low- or medium-level method. The Pipek−Mezey106 localization scheme was employed for the occupied orbitals throughout. Unless otherwise noted, the aMul orbital selection technique (see the Supporting Information) was utilized with the default ϵ = 0.30 population threshold. The improved πMul algorithm was used for the aromatic systems with the κ = 0.95 completeness criterion and the default population threshold for the σ orbitals. The benchmark systems were taken from ref 73: the substitution reaction of 1-chlorodecane with a hydroxide ion to yield 1-decanol and a chloride ion, the deprotonation of decanoic acid, the Diels−Alder reaction between the conjugated octadecanonaene and 1,3-butadiene (see Figure 2d of ref 73), and the hydrogenation of pentacene (see Figure 3d of ref 73). These test systems will shortly be referenced as a, b, c, and d, respectively. The corresponding geometries and the detailed description of the definition of the embedded subsystems can be found in our previous paper.80 The reference reaction energies for our benchmark tests were calculated using the high-level methods with the large basis sets. The reference energies are subtracted from the reaction energies computed by the embedding techniques, and the errors are plotted as a function of the embedded carbon atoms, except for the Diels−Alder reaction where the abscissa represents the number of embedded carbon atoms of octadecanonaene. In the limit of zero embedded carbon atoms, the convention of ref 73 was adopted. This means that the presented errors are the errors of the reaction energies calculated with the low-level method using the basis set compatible with the embedding scheme, namely, the small basis for the DMB and mixed-basis variants, the DB with the DB Ansätze, and the large basis for the original Huzinagaembedding or multilevel local correlation approaches. For the WFT-in-WFT-in-DFT methods, the HF and DFT regions at the initial HF-in-DFT calculations were kept fixed, and only the border of the two correlation approaches was varied. At DMB and mixed-basis WFT-in-WFT-in-DFT calculations, the large basis set was only used for the region treated by the LNO-CCSD(T) method, while the small basis set was employed for the rest of the molecule. 3.2. Dual Basis Set Approximation. To assess the performance of our DB approach and to select the possible small basis and large basis pairings we carried out benchmark calculations for the test reactions with the B3LYP, PBE, and LDA DFT approaches and the LNO-CCSD(T) method without any embedding approximation. Since in our DB scheme the small basis is not necessarily the subspace of the large basis set, one can choose an appropriate small basis from various basis set families. For the DFT theories various combinations of the Pople basis sets were tested, while for the LNO-CCSD(T) method Dunning’s (aug-)cc-pVTZ and augcc-pVQZ basis sets were employed as the large basis set combined with several Pople and Dunning basis sets as the small basis to investigate the effects of the DB approximation on the correlation energies. The basis sets augmented with
3. BENCHMARK CALCULATIONS In this section, after the technical details, first, the accuracy of our DB scheme is discussed briefly with an emphasis on choosing the small basis set. Second, the results of benchmark DFT embedding calculations are presented using our two DB set approaches, namely DB-A1 and DB-A2, and their DMB set alternatives, DMB-A1 and DMB-A2. Finally, the performance of the WFT-in-DFT, WFT-in-WFT, and WFT-in-WFT-inDFT methods is assessed employing the corresponding DB approaches. The accuracy and efficiency of these techniques are compared to the high-level methods using the large basis set, as well as to the original Huzinaga-embedding employing the large basis and mixed-basis sets. We note that the efficiency of the various techniques will be characterized here by the size of the active subsystem required for a particular accuracy. The efficiency of the methods will also be quantified by presenting factual computation times for a larger example in Sect. 4. 3.1. Computational Details. All the schemes presented here have been implemented in the MRCC program suite92 and are available in its current release. In our calculations Pople’s double- (6-31G, 6-31G*, 631G**, 6-31+G*, 6-31+G**, 6-31++G**) and triple-ζ (6311G, 6-311G*, 6-311+G*, 6-311G**, 6-311+G**, 6-311+ +G**) basis sets93,94 as well as Dunning’s (augmented) correlation consistent polarized double-, triple-, and quadrupleζ bases [(aug-)cc-pVDZ, (aug-)cc-pVTZ, and (aug-)ccpVQZ]95−97 were used. Both the SCF and the correlation calculations employed the density fitting approximation using the auxiliary bases developed by Weigend and co-workers.98,99 In all calculations spherical harmonic Gaussians were employed. The DFT-in-DFT, WFT-in-DFT, and WFT-in-WFT-inDFT embedding methods were tested with three different exchange-correlation functionals, which were chosen in order to represent the various rungs on the Jacob’s ladder of density functionals. The simplest functional of them is the local (spin) density approximation (LDA),100,101 which only depends on the electron density itself. An improved functional compared with LDA is the Perdew−Burke−Ernzerhof (PBE)102 functional of the general gradient approximation (GGA) class, which also depends on the gradient of density. The most complete functional in our tests was the hybrid GGA functional B3LYP (Becke’s three-parameter hybrid functional including the correlation functional of Lee, Yang, and Parr).103,104 As the exchange-correlation functional becomes more sophisticated, the accuracy, in general, also improves. However, this comes with increased computational expenses: B3LYP is much more expensive because of the exact exchange term compared with the PBE and LDA functionals, which have more or less similar costs. Since we have found that the convergence of the errors with the subsystem size is similar for these functionals with the Huzinaga-embedding scheme, only the B3LYP-in-LDA, WFT-in-PBE, and WFT-in-WFT-in-PBE F
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 1. Error of B3LYP (top) and LNO-CCSD(T) (bottom) reaction energies calculated with the DB approximation using various small basis sets. The KS, HF, and correlation energy errors are decomposed for the educt and the product. The 6-311++G** and aug-cc-pVTZ large basis sets were used for the B3LYP and the LNO-CCSD(T) calculations, respectively. The large basis B3LYP and LNO-CCSD(T) reference reaction energies are 350.63 and 359.03 kcal/mol, respectively.
diffuse functions were used as the large basis set if the phenomenon considered requires the presence of such functions, that is, for reactions a and b, where one of the reactants or products is an anion. Our findings are in line with the conclusions of HeadGordon et al.,23,27 who found that accurate reaction energies can be achieved by eliminating the functions with high angular momenta from the large basis set. As an example, the errors for the B3LYP and LNO-CCSD(T) approaches with the DB approximation for the deprotonation of decanoic acid are shown in Figure 1, while the other tests can be found in the Supporting Information. The results for B3LYP reveal that the DB approach can reproduce the reaction energies within a half kcal/mol when at least a double-ζ basis set with polarization functions on heavy atoms is employed. Moreover, the errors can be reduced to a few tenths of a kcal/mol with the addition of diffuse functions on heavy atoms, and the accuracy can be further improved to a couple of hundredths of a kcal/mol when the aforementioned functions are also placed on the hydrogens. These findings suggest that diffuse functions in the small basis are less important for DB DFT methods. However, we should keep in mind that the electronic structure of the decanoate anion is less diffuse as the negative charge is delocalized on the carboxylate group. For small anions with rather diffuse electron distribution, such as the chloride and hydroxide ions, diffuse functions are essential in the small basis, which can be seen from the results for test reaction a (see the Supporting Information). The LNO-CCSD(T) DB calculations require at least a double-ζ small basis set with polarization and diffuse functions on heavy atoms for an accuracy of less than a half kcal/mol in the reaction energies. Similarly to the B3LYP results, the addition of the aforementioned functions to hydrogens reduces
the reaction energy errors to a couple of hundredths of a kcal/ mol. This trend holds on for other reactions as well, which means that the correlated calculations seem to be more sensitive to diffuse functions. If the reactants or products of the reaction require diffuse functions, the small basis must also include them at least on the heavy atoms for a tolerable error. The decomposition of the reaction energy errors also reveals that individual HF energies may be more accurately approximated using a smaller basis set (for example, 631+G**) than a bigger one (aug-cc-pVDZ); however, the error cancellation is more beneficial when the small basis and large basis are from the same basis set family. It is also interesting that the error of the correlation energy is less dependent on the small basis when polarization functions are placed at least on the heavy atoms, which can be rationalized by the fact that the correlation energy is more sensitive to the underlying AO basis than the quality of the occupied orbitals. The conclusions for the DB LNO-CCSD(T) results with the aug-cc-pVQZ large basis set are similar to the triple-ζ case (see the Supporting Information) since the smallest errors are obtained when aug-cc-pVTZ is used as the small basis set. The calculations employing the aug-cc-pVDZ basis as the small basis set are also remarkably accurate, and the reaction energy errors are still a kcal/mol or smaller even with the 6-31+G* small basis set. It is also instructive to compare the results of the DB calculations and the simple calculations which are only performed with the corresponding small basis set without any DB approximation. Our results suggest (see the Supporting Information) that the DB calculations are roughly by an order of magnitude more accurate than the corresponding small basis only calculations if reliable small bases are used. In light of these findings we choose the double-ζ counterparts of the triple-ζ large basis sets as small basis in G
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 2. Error of the reaction energies using various embedding approaches as a function of the number of carbon atoms included in the embedded subsystem. The LB and MB acronyms stand for large basis and mixed basis, respectively. The 6-311++G** (a), 6-311++G** (b), 6311G** (c), and 6-311G** (d) large basis sets were used in the calculations, while the corresponding small basis sets in the mixed-basis, DB, and DMB calculations were 6-31+G**, 6-31+G**, 6-31G*, and 6-31G*, respectively. The large basis reference reaction energies are −56.71, 350.63, −16.05, and 34.76 kcal/mol, respectively.
small embedded subsystems, and the various DB schemes perform similarly well when the subsystem boundary is 5 to 6 Å away from the reaction center. Note that this distance approximately coincides with the distance where the error fluctuations of all the embedding variants disappear or reduce to a low level. Interestingly, for the complicated test systems, reactions c and d, the DB Huzinaga-embedding is practically indistinguishable from the parent Huzinaga-embedding method for any reasonable embedded subsystem. The simple mixed-basis approach performs remarkably well for the considered reactions, especially for reactions c and d, but its slower convergence is conspicuous for reaction b, where a charged species is treated with the mixed-basis. The performance of the DMB approach is as good as that of the pure DB scheme. The errors obtained with the DB-A1 and DB-A2 methods are very similar, and this trend can also be observed for the performances of the DMB-A1 and DMB-A2 techniques (see the Supporting Information). Taking into account that the A2 ansatz is more expensive because a high-level SCF calculation is carried out with the large basis, the A1 ansatz is clearly more beneficial. All in all, considering also the computational costs, all the approximate embedding schemes are more efficient than the original Huzinaga-embedding. Using larger embedded subsystems, which are anyway required for accurate results, the DB approach is at least as accurate as the mixed-basis scheme. However, increasing the size of the embedded subsystem the costs of the latter are getting more expensive, and the DB approach tends to be more efficient. The most efficient variant seems to be the DMB approach, which inherits the accuracy of the DB technique but is less expensive than both the DB and the mixed-basis methods. 3.4. WFT Embedding. In this subsection, the results of LNO-CCSD(T)-in-PBE, LNO-CCSD(T)-in-LMP2, and LNO-CCSD(T)-in-LMP2-in-PBE calculations using the large basis as well as the mixed-basis, DB, and DMB approximations
the following to maximize the accuracy of our DB embedding ansatz. 3.3. DFT Embedding. Our benchmark calculations were designed with the purpose to explore the convergence behavior of the embedding approaches with respect to the size of the embedded subsystem. Please note that the considered reactions are more and more difficult for the embedding theories. The simplest test system is reaction a, where the subsystems are separated across a singe σ bond. This is also the case for reaction b, which is, however, more difficult because the error cancellation is less significant due to the differently charged species. Dividing a system through conjugated bonds is modeled by reaction c, and the most difficult one is reaction d, where an aromatic system is broken. The errors for the B3LYP-in-LDA calculations are presented in Figure 2. For better visualization, the results calculated with ansatz A2 are not shown because they are very close to those obtained by A1. The A2 results together with the details of the DFT embedding calculations employing other functionals can be found in the Supporting Information. The DB error for the high-level method, which is equivalent to the error of the DB Huzinaga-embedding at the maximum number of carbon atoms, is less than 0.05 kcal/mol for all the reactions, which suggests that the selection of the small basis sets is appropriate. The results show that the errors of the DB, DMB, mixed-basis, and large basis schemes are similar, except for reactions a and b at smaller embedded subsystems. For the former reaction the DB and DMB variants produce errors of 1.0 to 1.5 kcal/mol larger compared to the parent method for the smaller subsystems. The opposite trends can be observed for test reaction b, where the DB and DMB approaches yield 1.5 to 2.0 kcal/mol smaller errors compared with the large basis approach, but the difference of the errors vanishes again as the embedded subsystem enlarges. These findings show that the error cancellation may have an important role in the DB calculations with small active regions but is not consistent. Nevertheless, the absolute errors are still moderate even at the H
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation are presented for our benchmark systems. The results of further embedding calculations with the LDA and B3LYP functionals can be found in the Supporting Information. The convergence of the Huzinaga multilevel local correlation method was tested with a fixed DFT and HF partitioning, and only the LNO-CCSD(T) and LMP2 regions were varied. The HF and DFT subsystems were chosen so that the size of the HF region will be minimal while also keeping the error of the LNO-CCSD(T)-in-DFT method using the same HF region as small as possible. In the case of Huzinaga multilevel local correlation with DMB or mixed-basis calculations, the large basis and small basis boundary also varies, which means that the large basis set is only placed on the LNO-CCSD(T) region, whereas the small basis set is used for the rest of the system. The results of the various embedding schemes for the benchmark systems a, b, c, and d are presented in Figures 3, 4,
Figure 4. Error of the reaction energies for the deprotonation of decanoic acid using various embedding approaches as a function of the number of carbon atoms included in the embedded subsystem. See Figures 2 and 3 for the abbreviations. The aug-cc-pVTZ and augcc-pVDZ basis sets were used as large basis and small basis sets, respectively. In the Huzinaga-embedding based multilevel local correlation calculations the boundary of DFT and HF regions was fixed at the sixth carbon atom. The reference LNO-CCSD(T)/aug-ccpVTZ energy is 359.03 kcal/mol, and the error of the DB approximation is −0.01 kcal/mol.
Despite the worse performance of the DB approximation in some cases, the errors of the Huzinaga embedding with the DB and DMB schemes follow similar trends as we have seen for the DFT embedding. The DB and DMB approximations are capable of reproducing the reaction energies computed with the large basis scheme within 1 kcal/mol for test systems a and c, while even better convergence with respect to the active subsystem size can be seen for reaction b. For reaction d, the DB (DMB) technique performs worse by 1.0 (3.0 to 7.0) kcal/ mol compared with the large basis scheme if the active subsystem is small, but the differences vanish (or significantly decrease) when the embedded subsystem consists of at least 3 carbon rings. This active subsystem size is also necessary for the original method to produce meaningful reaction energies. In contrary to the DFT results, the mixed-basis technique is unreliable for reactions a and b because large error fluctuations can be observed even at extended embedded subsystems. This is a consequence of the fact that an unstable HF-in-PBE reference is found in particular points with the default settings of the SCF algorithm. This phenomenon disappears for the most difficult test systems, reactions c and d, where no charged species are included in the reaction, but also less accurate results are produced compared to the large basis method. The multilevel local correlation approach is not only more accurate than the Huzinaga embedding but also more expensive, which makes the application of DB approaches even more desirable. The results show that the accuracy of the multilevel local correlation approaches is practically unaffected for any test reaction by the use of the DB approximation. The reaction energies of the DB and DMB schemes are within 0.2 kcal/mol of the large basis and mixed-basis approaches,
Figure 3. Error of the reaction energies for the substitution reaction of 1-chlorodecane using various embedding approaches as a function of the number of carbon atoms included in the embedded subsystem. The MLC abbreviation refers to the multilevel local correlation approach, while HMLC means the Huzinaga-embedding based MLC scheme. See Figure 2 for further abbreviations. The aug-cc-pVTZ and aug-cc-pVDZ basis sets were used as large basis and small basis sets, respectively. In the Huzinaga-embedding based multilevel local correlation calculations the boundary of DFT and HF regions was fixed at the fifth carbon atom. The reference LNO-CCSD(T)/aug-ccpVTZ energy is −52.12 kcal/mol, and the error of the DB approximation is −0.18 kcal/mol.
5, and 6, respectively. Note that the accuracy of the DB approximation for the LNO-CCSD(T) method without any embedding (see the caption of the figures) somewhat decreases compared to the DFT results for test reactions a and d. The increased error of the DB ansatz is likely due to the greater sensitivity of the HF method to the quality of the small basis in the case of challenging electronic structures, such as aromatic systems and anions with diffuse charge distributions (cf. Figure 1). Nevertheless, these errors are still tolerable and could be consistently decreased with a less aggressive truncation of the large basis set.23,27 I
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 5. Error of the reaction energies for the Diels−Alder reaction of octadecanonaene and 1,3-butadiene using various embedding approaches as a function of the number of carbon atoms included in the embedded subsystem. See Figures 2 and 3 for the abbreviations. The cc-pVTZ and cc-pVDZ basis sets were used as large basis and small basis sets, respectively. In the Huzinaga-embedding based multilevel local correlation calculations the boundary of DFT and HF regions was fixed at the fourth carbon atom from the reaction center (that is, 12 carbon atoms were included in the embedded subsystem). The reference LNO-CCSD(T)/cc-pVTZ energy is −35.55 kcal/mol, and the error of the DB approximation is −0.05 kcal/mol.
Figure 6. Error of the reaction energies for the hydrogenation of pentacene using various embedding approaches as a function of the number of carbon atoms included in the embedded subsystem. See Figures 2 and 3 for the abbreviations. The cc-pVTZ and cc-pVDZ basis sets were used as large basis and small basis sets, respectively. In the Huzinaga-embedding based multilevel local correlation calculations the boundary of DFT and HF regions was fixed at the sixth carbon atom from the reaction center (that is, 14 carbon atoms of 3 aromatic rings were included in the embedded subsystem). The reference LNO-CCSD(T)/cc-pVTZ energy is 26.02 kcal/mol, and the error of the DB approximation is −0.36 kcal/mol.
respectively, independently of the size of the embedded subsystem. The combination of the DFT embedding and the multilevel local correlation approach has even more potential for accurate and efficient calculations. The results of the Huzinaga multilevel local correlation technique show that, if the HF and DFT subsystems are carefully chosen, then for any basisset combination the LNO-CCSD(T) region has to cover only 2 to 3 bonds from the reaction center to reproduce the corresponding LNO-CCSD(T)-in-DFT reaction energy within 0.5 kcal/mol. The only exception is reaction d, where larger LNO-CCSD(T) layers are needed. The convergence pattern of the errors with the size of the LNO-CCSD(T) region is reminiscent of that for the multilevel local correlation approach, except for the mixed-basis scheme, which inherits the large error fluctuations from the parent Huzinagaembedding. These tests indicate that it is unnecessary to correlate all the occupied orbitals of the HF region at the LNO-CCSD(T) level as the LMP2 subsystem serves as a costeffective intermediate layer. In conclusion, the WFT embedding test calculations revealed that the DB and DMB schemes, either with the A1 or the A2 approaches, are capable of accelerating the Huzinaga calculations over the original large basis schemes, while the mixed-basis technique shows unacceptable inaccuracies in specific cases. Again, the A2 approach is less favorable over ansatz A1 because no improvement is provided in the accuracy in exchange for the increased computational cost. All the basis set reduction approaches considered here effectively accelerate the multilevel local correlation calculations; however, the DMB
method is the clear winner because it saves computation time not only for the HF step but also for the correlation calculation. The most promising technique is the combination of the Huzinaga DFT embedding Ansatz A1 with the DMB and multilevel local correlation, which effectively breaks down the cost of all computational steps while maintaining high accuracy.
4. APPLICATION: CATALYTIC REACTION OF CATECHOL-O-METHYLTRANSFERASE To check if the conclusions drawn for the small, quasi-onedimensional molecules are valid for real-life 3D systems and to demonstrate the applicability of our multilevel approaches, in this section, the performance of the methods is tested for an enzyme reaction of biological relevance. In these calculations our methods are also combined with the QM/MM technique. The projector-based embedding approach of Manby and coworkers was already applied to such systems;107,108 however, this is the first time when the accuracy of the WFT-in-WFT-inMM and WFT-in-WFT-in-DFT-in-MM approximations as well as multilayer DB techniques is explored with respect to the active subsystem size. Here, we choose the catechol-Omethyltransferase (COMT) enzyme as our model system, which has an important role in neurotransmitter regulation by transferring a methyl group from S-adenosyl-L-methionine (SAM) to deprotonated catechol (CAT) amines. An important feature of the COMT system is that the reacting species, the SAM and the CAT molecules, are noncovalently bound to the protein, thus the boundary effects of the QM/MM technique are small. J
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation In the upcoming subsections, the computational details are discussed. Then, the accuracy of our most efficient methods, namely the Huzinaga-A1, the multilevel local correlation method, and the combination of the latter with DMB, is compared to the reference large basis and the DB LNOCCSD(T) method. Finally, timings for the most relevant calculations are presented. 4.1. Computational Details. The reactant and the product structure of the COMT system were adopted from the work of Martı ́nez et al.109 The ANTECHAMBER110 and the TLEAP utilities of the AMBERTOOLS17 program package111 were invoked to set up the force field parameters for the MM calculations. The ff14SB112 parameters were used for the protein, while the TIP3P113 model was employed for the water and the ions. For the CAT and SAM molecules, the General Amber Force Field 2 (GAFF2)114 was utilized with the AM1BCC charge scheme.115,116 The nonbonded energy terms were not cut off, and only those bonded energy terms were retained that included at least one MM or MM host atom. The residual charges, which are the results of zeroing the charges of the QM and MM host atoms due to electrostatic embedding, were equally distributed among all MM atoms. The QM/MM calculations were also carried out with AMBERTOOLS17 invoking the AMBER-MRCC interface91 and using the aforementioned force fields for the MM region. The QM calculations were performed with the LNOCCSD(T)87,88,90 and LMP289 local correlation methods and the PBE102 density functional employing Grimme’s D3 dispersion correction117,118 together with Dunning’s cc-pVTZ and cc-pVDZ basis sets95 as the large basis and small basis sets, respectively. Again, the QM calculations employed the density fitting approximation with the auxiliary basis sets of Weigend and co-workers.98,99 The exchange contributions in the HF calculations were approximated using local fitting domains89,119 with the default domain settings proposed in ref 89. All the calculations used the Boys localization technique,120 whereas the Huzinaga-embedding, the multilevel local correlation approach, and the Huzinaga multilevel local correlation method employed the aMul orbital selection scheme. The reference reaction energy was computed at the LNO-CCSD(T)-in-MM level with the cc-pVTZ basis set. The presented errors were calculated by subtracting the reference reaction energy from the reaction energies computed with the various approaches. Other conventions that were mentioned in Sect. 3.1 are also adopted here. The selected QM region and QM subsystems are illustrated in Figure 7 and Figure 8, respectively, while the details about the selection of the various subsystems can be found in the Supporting Information. 4.2. Results. The errors of the various schemes are presented in Figure 9. The DB approximation proves very accurate for this system as the DB error is only a few hundredths of a kcal/mol. The results obtained with the Huzinaga-embedding follow similar trends as we have seen in Sect. 3.4. The errors are less than 1.5 kcal/mol even if small embedded subsystems are used. Some error fluctuation can be observed in this region for the DMB scheme, but, for most subsystems, the errors are smaller compared to the parent large basis scheme. Interestingly, the errors of the DMB scheme are also much smaller in the case of subsystems A and C compared to the B and D partitionings, which suggests that the error cancellation is more fortunate if the embedded subsystem is more compact. Nevertheless, the errors decrease to a low level
Figure 7. COMT protein including 3420 atoms. The QM region (highlighted in red) consists of 36 amino acid residues, 3 water molecules, an Mg2+ ion, the SAM, and the CAT molecules, which results in 571 QM atoms including 18 link atoms. See the Supporting Information for more details regarding our selection approach.
if the subsystem border is at least 5.5 Å away from the reaction center, which is in line with our previous findings. It is also important to note that the neglect of the D3 correction (not shown) increases the errors by about 1 kcal/mol for the small embedded regions, which proves the importance of the inclusion of dispersion interactions for embedding calculations even at the low-level method. The performance of the multilevel local correlation scheme with the large basis set is excellent, its maximum error is very low, 0.3 kcal/mol, for any embedded subsystem, and the errors monotonically decrease beyond subsystem C. On the other hand, moderate error fluctuations occur in the case of the multilevel local correlation technique with DMB, but the approach is still very accurate as the maximum error does not exceed 0.5 kcal/mol. These attributes are also inherited by the Huzinaga multilevel local correlation method because their errors approximately equal those of the corresponding multilevel local correlation schemes shifted by the error of the Huzinaga-embedding with active subsystem F. The errors of the Huzinaga multilevel local correlation methods are still within 0.5 kcal/mol compared to the reference reaction energy, which shows that accurate reaction energies can be achieved by applying the highest-level technique only to a small region around the reaction center. These results, in accordance with the conclusions for the small molecules, reveal that all the considered methods have tolerable accuracy even with relatively small embedded subsystems. Note also that the subsystem boundaries cross only one or two bonds for the small molecules, while here the subsystems are separated by many more bonds, but the error does not increase with the number of such bonds. The DFT/ HF border is recommended to be at least 5 Å away from the K
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 8. Visualization of the selected QM subsystems, which are shortly referenced as the corresponding capital letters. The MM, PBE-D3 (LMP2), and LNO-CCSD(T) methods are applied to the regions highlighted in green, blue, and red, respectively, in the case of the Huzinaga (multilevel local correlation) approach. As a reference point, the reacting methyl group is highlighted in purple. The region treated with LNOCCSD(T) is gradually expanded from the reaction center toward the full QM region (see the Supporting Information for details).
effects and to produce close to converged results. The errors of the Huzinaga multilevel local correlation method converge faster with respect to the subsystem size, but a domain of radius of 2.5 Å around the reaction center still has to be treated with the high-level method for accurate results. Taking into account these findings, to demonstrate what we can gain with the various approximations, computation times were measured for the Huzinaga and multilevel local correlation approaches using the F and C partitionings, respectively. The timings for the Huzinaga multilevel local correlation approach were determined employing the LMP2 and LNO-CCSD(T) methods for subsystems F and C, respectively. The timings of the various schemes are presented in Table 1. The results show that a speedup of 1.6 can be achieved for the reference HF calculation by using the DB approximation without embedding approaches. At first glance this can be surprising because the cc-pVTZ basis set consists of more than twice as many AOs as the cc-pVDZ basis. The reason for the relatively low efficiency of the DB approximation is that small local fitting domains are used during the SCF iterations, and finally, to get an accurate HF energy, one iteration is performed with much larger fitting domains.89,119 This final step usually requires computation time comparable to that for the preceding SCF iterations altogether. In a DB HF calculation employing the local fitting approximation the single iteration in the large basis set must be carried out with the large fitting domains, thus only the expenses of the SCF iterations performed with small fitting domains can be saved. Nevertheless, the speedup for the HF step can be increased by the DMB ansatz to 2.1, while the speedup for the SCF part is 3.2 if the DMB HF approach is also combined with DFT via the Huzinaga-embedding. Concerning the overall speedup of the local correlation calculation, we can see that the DB
Figure 9. Error of the reaction energies for the methyl transfer reaction catalyzed by the COMT enzyme using various multilevel approaches as a function of the number of QM atoms included in the embedded subsystem. See Figures 2 and 3 for the abbreviations. The cc-pVTZ and cc-pVDZ basis sets were used as large basis and small basis sets, respectively. In the Huzinaga-embedding based multilevel local correlation calculations the boundary of the DFT and HF regions was fixed, and the latter included the atoms of subsystem F. The reference LNO-CCSD(T)/cc-pVTZ reaction energy is −23.18 kcal/mol, and the error of the DB approximation is −0.06 kcal/mol.
reaction center in the case of the Huzinaga embedding to minimize the ambiguity stemming from the error cancellation L
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation Table 1. Representative Timings for the Various Multilevel Approachesa method
LNO-CCSD(T)
LNO-CCSD(T)
Huzinaga
Huzinaga
MLC
MLC
HMLC
HMLC
basis set approach
LB
DB
LB
DMB
LB
DMB
LB
DMB
HF region
full
full
F
F
full
full
F
F
CCSD(T) region
full
full
F
F
C
C
C
C
2.64 3.65 6.29
1.61 3.65 5.26 1.20 −0.06
0.43 1.32 1.08 2.83 2.23 0.15
0.16 0.66 0.90 1.71 3.68 −0.29
2.64 1.16 3.81 1.65 −0.30
1.24 0.61 1.85 3.39 0.00
0.43 1.32 0.67 2.41 2.61 −0.11
0.16 0.66 0.39 1.21 5.20 −0.21
DFT [day] HF [day] correlation [day] total [day] speedup error [kcal/mol] a
The reported computation times are wall-clock times measured on a 6-core 3.5 GHz Intel Xeon E5-1650 processor. The total computation times are decomposed into the DFT, HF, and correlation contributions. The errors and speedups are given with respect to the LNO-CCSD(T)-in-MM method using the cc-pVTZ basis set. See Figures 2 and 3 for the abbreviations.
approximation alone does not significantly affect the costs of the calculations, though it only results in a negligible loss of accuracy. The speedups are more pronounced for the multilevel schemes at the expense of 0.3 kcal/mol loss in accuracy. The DMB approach roughly doubles the speedup with respect to the corresponding multilevel scheme applying the large basis set. The most efficient but reliable method is clearly the Huzinaga multilevel local correlation technique, followed by the Huzinaga and the multilevel local correlation approach. We note that the potential speedups for the aforementioned methods can be even greater if larger errors can be afforded and a smaller domain is selected for the highest-level method.
also be safely employed together with the Huzinaga multilevel local correlation method using the large basis set only for the atoms of the CCSD(T) layer. The proposed approaches have also been combined with QM/MM techniques and tested for the biologically relevant COMT system. The trends observed for the enzyme reaction verify our conclusions drawn for the small benchmark systems. This application also demonstrates that the proposed methods are useful for the accurate treatment of large molecular systems including thousands of atoms.
■
ASSOCIATED CONTENT
* Supporting Information S
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.8b00350.
5. CONCLUSIONS In this paper, we have presented an alternative dual basis ansatz, which performs similarly to the previous approaches when applied to conventional electronic structure methods but is suitable for the generalization to embedding schemes. Relying on the new ansatz two possible DB approximations have been proposed for the projector-based embedding method using the Huzinaga-equation. We have demonstrated that the DB approximation considerably increases the efficiency of embedding calculations in exchange for negligible loss of accuracy and also outperforms the notorious mixedbasis set approach. However, if the DB and mixed-basis approximations are combined, the resulting DMB scheme inherits the advantageous features of both and further improves the efficiency for both the WFT and the DFT embeddings. The DB and DMB schemes have also been tested for our multilevel correlation approach, where various local correlation methods are used for the different regions of the system. Our results show that the accuracy of the multilevel local correlation method is practically unaffected by the basis set approximations. The combination of the Huzinagaembedding and the multilevel local correlation schemes has also been introduced, namely the Huzinaga multilevel local correlation approach. The benchmark calculations show that it is sufficient to only treat a small subsystem of the molecule at the CCSD(T) level if an MP2 layer is also inserted between the CCSD(T) and the DFT regions. For the CCSD(T) layer, if no aromatic system is broken, it is sufficient to include atoms separated by 2−3 bonds from the reactive center of the molecule, while the MP2 region is recommended to consist of further atoms that are separated by 5−6 bonds from the bonds affected in the reaction. In addition, the DMB approach can
Molecular structures, detailed results of DB calculations, DB embedding calculations with various functionals (PDF) Coordinates for the reactant of the COMT system (PDB) Coordinates for the product of the COMT system
■
(PDB)
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. ORCID
Péter R. Nagy: 0000-0001-6692-0879 Mihály Kállay: 0000-0003-1080-6625 Funding
The authors are grateful for the financial support from the National Research, Development, and Innovation Office (NKFIH, Grant No. KKP126451). This work was also supported by the BME-Biotechnology FIKP grant of EMMI (BME FIKP-BIO). The work of PRN is supported through the New National Excellence Program of the Ministry of Human Capacities, ID: Ú NKP-17-4-BME-55. The computing time granted on the Hungarian HPC Infrastructure at NIIF Institute, Hungary, is gratefully acknowledged. Notes
The authors declare no competing financial interest. M
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
■
theory: A reduced-cost reference for correlation calculations. J. Chem. Phys. 2006, 125, 074108. (24) Steele, R. P.; Shao, Y.; DiStasio, R. A.; Head-Gordon, M. DualBasis Analytic Gradients. 1. Self-Consistent Field Theory. J. Phys. Chem. A 2006, 110, 13915. (25) Steele, R. P.; Head-Gordon, M. Dual-basis self-consistent field methods: 6-31G* calculations with a minimal 6−4G primary basis. Mol. Phys. 2007, 105, 2455. (26) Distasio, R. A., Jr.; Steele, R. P.; Head-Gordon, M. The analytical gradient of dual-basis resolution-of-the-identity secondorder Møller−Plesset perturbation theory. Mol. Phys. 2007, 105, 2731. (27) Steele, R. P.; DiStasio, R. A., Jr.; Head-Gordon, M. NonCovalent Interactions with Dual-Basis Methods: Pairings for Augmented Basis Sets. J. Chem. Theory Comput. 2009, 5, 1560. (28) Steele, R. P.; Head-Gordon, M.; Tully, J. C. Ab Initio Molecular Dynamics with Dual Basis Set Methods. J. Phys. Chem. A 2010, 114, 11853. (29) Mao, Y.; Horn, P. R.; Mardirossian, N.; Head-Gordon, T.; Skylaris, C.-K.; Head-Gordon, M. Approaching the basis set limit for DFT calculations using an environment-adapted minimal basis with perturbation theory: Formulation, proof of concept, and a pilot implementation. J. Chem. Phys. 2016, 145, 044109. (30) Marshall, M. S.; Sears, J. S.; Burns, L. A.; Brédas, J.-L.; Sherrill, C. D. An Error and Efficiency Analysis of Approximations to Møller− Plesset Perturbation Theory. J. Chem. Theory Comput. 2010, 6, 3681. (31) Deng, J.; Gilbert, A. T. B.; Gill, P. M. W. Approaching the Hartree−Fock limit by perturbative methods. J. Chem. Phys. 2009, 130, 231101. (32) Deng, J.; Gilbert, A. T. B.; Gill, P. M. W. Hartree−Fock perturbative corrections for total and reaction energies. J. Chem. Phys. 2010, 133, 044116. (33) Deng, J.; Gill, P. M. W. A new approach to dual-basis secondorder Møller−Plesset calculations. J. Chem. Phys. 2011, 134, 081103. (34) Deng, J.; Gilbert, A. T. B.; Gill, P. M. W. MP2[V] − A Simple Approximation to Second-Order Møller−Plesset Perturbation Theory. J. Chem. Theory Comput. 2015, 11, 1639. (35) Nakajima, T.; Hirao, K. A dual-level approach to densityfunctional theory. J. Chem. Phys. 2006, 124, 184108. (36) Tokura, S.; Sato, T.; Tsuneda, T.; Nakajima, T.; Hirao, K. A dual-level state-specific time-dependent density-functional theory. J. Comput. Chem. 2008, 29, 1187. (37) Mizukami, W.; Nakajima, T.; Hirao, K.; Yanai, T. A dual-level approach to four-component relativistic density-functional theory. Chem. Phys. Lett. 2011, 508, 177. (38) Deng, J.; Gilbert, A. T. B.; Gill, P. M. W. Density functional triple jumping. Phys. Chem. Chem. Phys. 2010, 12, 10759. (39) Kobayashi, M.; Nakai, H. Dual-level hierarchical scheme for linear-scaling divide-and-conquer correlation theory. Int. J. Quantum Chem. 2009, 109, 2227. (40) Friedrich, J.; Dolg, M. Fully Automated Incremental Evaluation of MP2 and CCSD(T) Energies: Application to Water Clusters. J. Chem. Theory Comput. 2009, 5, 287. (41) Zhang, J.; Dolg, M. Approaching the complete basis set limit of CCSD(T) for large systems by the third-order incremental dual-basis set zero-buffer F12 method. J. Chem. Phys. 2014, 140, 044114. (42) Zhang, J.; Dolg, M. Third-Order Incremental Dual-Basis Set Zero-Buffer Approach for Large High-Spin Open-Shell Systems. J. Chem. Theory Comput. 2015, 11, 962. (43) Anacker, T.; Tew, D. P.; Friedrich, J. First UHF Implementation of the Incremental Scheme for Open-Shell Systems. J. Chem. Theory Comput. 2016, 12, 65. (44) Røeggen, I.; Gao, B. Combination of large and small basis sets in electronic structure calculations on large systems. J. Chem. Phys. 2018, 148, 134118. (45) Adler, T. B.; Knizia, G.; Werner, H.-J. A simple and efficient CCSD(T)-F12 approximation. J. Chem. Phys. 2007, 127, 221106. (46) Noga, J.; Kedžuch, S.; Š imunek, J. Second order explicitly correlated R12 theory revisited: A second quantization framework for
REFERENCES
(1) Helgaker, T.; Jørgensen, P.; Olsen, J. Molecular Electronic Structure Theory; Wiley: Chichester, 2000; DOI: 10.1002/ 9781119019572. (2) Møller, C.; Plesset, M. S. Note on an Approximation Treatment for Many-Electron Systems. Phys. Rev. 1934, 46, 618. (3) Gauss, J. In Encyclopedia of Computational Chemistry; Schleyer, P. R., Jorgensen, W. L., Schaefer, H. F., III, Schreiner, P. R., Thiel, W., Eds.; Wiley: New York, 1998; p 615. (4) Purvis, G. D., III; Bartlett, R. J. A full coupled-cluster singles and doubles model: The inclusion of disconnected triples. J. Chem. Phys. 1982, 76, 1910. (5) Raghavachari, K.; Trucks, G. W.; Pople, J. A.; Head-Gordon, M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 1989, 157, 479. (6) Bauschlicher, C. W., Jr.; Bagus, P. S. Mixed basis set calculations for atomic hydrogen on beryllium(0001). Chem. Phys. Lett. 1982, 90, 355. (7) Chesnut, D. B.; Moore, K. D. Locally dense basis sets for chemical shift calculations. J. Comput. Chem. 1989, 10, 648. (8) Orozco, M.; Luque, F. J. On the use of mixed basis sets to compute accurate molecular electrostatic potentials. Chem. Phys. Lett. 1989, 160, 305. (9) Hinton, J. F.; Guthrie, P.; Pulay, P.; Wolinski, K. Ab initio quantum mechanical calculation of the chemical shift anisotropy of the hydrogen atom in the (H2O)17 water cluster. J. Am. Chem. Soc. 1992, 114, 1604. (10) Kirby, R. A.; Hansen, A. E. Study of locally dense and locally saturated basis sets in localized molecular orbital calculations of nuclear shielding: Ab initio LORG calculations for 13C and 17O in norbornenone. Int. J. Quantum Chem. 1996, 57, 199. (11) DiLabio, G. A.; Wright, J. S. Calculation of bond dissociation energies for large molecules using locally dense basis sets. Chem. Phys. Lett. 1998, 297, 181. (12) DiLabio, G. A. Using Locally Dense Basis Sets for the Determination of Molecular Properties. J. Phys. Chem. A 1999, 103, 11414. (13) Moon, S.; Case, D. A. A comparison of quantum chemical models for calculating NMR shielding parameters in peptides: Mixed basis set and ONIOM methods combined with a complete basis set extrapolation. J. Comput. Chem. 2006, 27, 825. (14) Reid, D. M.; Kobayashi, R.; Collins, M. A. Systematic Study of Locally Dense Basis Sets for NMR Shielding Constants. J. Chem. Theory Comput. 2014, 10, 146. (15) Lee, S. J. R.; Miyamoto, K.; Ding, F.; Manby, F. R.; Miller, T. F., III Density-based errors in mixed-basis mean-field electronic structure, with implications for embedding and QM/MM methods. Chem. Phys. Lett. 2017, 683, 375. (16) Havriliak, S.; King, H. F. Rydberg Radicals. 1. Frozen-Core Model for Rydberg Levels of the Ammonium Radical. J. Am. Chem. Soc. 1983, 105, 4. (17) Jurgens-Lutovsky, R.; Almlöf, J. Dual basis sets in calculations of electron correlation. Chem. Phys. Lett. 1991, 178, 451. (18) Wolinski, K.; Pulay, P. Second-order Møller−Plesset calculations with dual basis sets. J. Chem. Phys. 2003, 118, 9497. (19) Ksiazek, A.; Wolinski, K. Molecular properties with dual basis set methods. Mol. Phys. 2008, 106, 769. (20) Lee, M. S.; Head-Gordon, M. Polarized atomic orbitals for selfconsistent field electronic structure calculations. J. Chem. Phys. 1997, 107, 9085. (21) Lee, M. S.; Head-Gordon, M. Absolute and relative energies from polarized atomic orbital self-consistent field calculations and a second order correction. Convergence with size and composition of the secondary basis. Comput. Chem. 2000, 24, 295. (22) Liang, W.; Head-Gordon, M. Approaching the Basis Set Limit in Density Functional Theory Calculations Using Dual Basis Sets without Diagonalization. J. Phys. Chem. A 2004, 108, 3206. (23) Steele, R. P.; DiStasio, R. A., Jr.; Shao, Y.; Kong, J.; HeadGordon, M. Dual-basis second-order Møller−Plesset perturbation N
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation treatment of the operators’ partitionings. J. Chem. Phys. 2007, 127, 034106. (47) Köhn, A.; Tew, D. P. Towards the Hartree−Fock and coupledcluster singles and doubles basis set limit: A study of various models that employ single excitations into a complementary auxiliary basis set. J. Chem. Phys. 2010, 132, 024101. (48) Kong, L.; Valeev, E. F. Perturbative correction for the basis set incompleteness error of complete-active-space self-consistent field. J. Chem. Phys. 2010, 133, 174126. (49) Warshel, A.; Karplus, M. Calculation of ground and excited state potential surfaces of conjugated molecules. I. Formulation and parametrization. J. Am. Chem. Soc. 1972, 94, 5612. (50) Warshel, A.; Levitt, M. Theoretical studies of enzymic reactions: Dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. J. Mol. Biol. 1976, 103, 227. (51) Maseras, F.; Morokuma, K. IMOMM − A new integrated abinitio plus molecular mechanics geometry optimization scheme of equilibrium structures and transition-states. J. Comput. Chem. 1995, 16, 1170. (52) Senn, H. M.; Thiel, W. QM/MM methods for biomolecular systems. Angew. Chem., Int. Ed. 2009, 48, 1198. (53) Gomes, A. S. P.; Jacob, C. R. Quantum-chemical embedding methods for treating local electronic excitations in complex chemical systems. Annu. Rep. Prog. Chem., Sect. C: Phys. Chem. 2012, 108, 222. (54) Gordon, M. S.; Fedorov, D. G.; Pruitt, S. R.; Slipchenko, L. V. Fragmentation methods: A route to accurate calculations on large systems. Chem. Rev. 2012, 112, 632. (55) Libisch, F.; Huang, C.; Carter, E. A. Embedded correlated wavefunction schemes: Theory and applications. Acc. Chem. Res. 2014, 47, 2768. (56) Wesolowski, T. A.; Shedge, S.; Zhou, X. Frozen-Density Embedding Strategy for Multilevel Simulations of Electronic Structure. Chem. Rev. 2015, 115, 5891. (57) Knizia, G.; Chan, G. K.-L. Density Matrix Embedding: A Strong-Coupling Quantum Embedding Theory. J. Chem. Theory Comput. 2013, 9, 1428. (58) Cortona, P. Self-consistently determined properties of solids without band-structure calculations. Phys. Rev. B: Condens. Matter Mater. Phys. 1991, 44, 8454. (59) Wesolowski, T. A.; Warshel, A. Frozen density functional approach for ab initio calculations of solvated molecules. J. Phys. Chem. 1993, 97, 8050. (60) Goodpaster, J. D.; Ananth, N.; Manby, F. R.; Miller, T. F., III Exact nonadditive kinetic potentials for embedded density functional theory. J. Chem. Phys. 2010, 133, 084103. (61) Elliott, P.; Cohen, M. H.; Wasserman, A.; Burke, K. Density Functional Partition Theory with Fractional Occupations. J. Chem. Theory Comput. 2009, 5, 827. (62) Elliott, P.; Burke, K.; Cohen, M. H.; Wasserman, A. Partition density-functional theory. Phys. Rev. A: At., Mol., Opt. Phys. 2010, 82, 024501. (63) Fux, S.; Jacob, C. R.; Neugebauer, J.; Visscher, L.; Reiher, M. Accurate frozen-density embedding potentials as a first step towards a subsystem description of covalent bonds. J. Chem. Phys. 2010, 132, 164101. (64) Huang, C.; Pavone, M.; Carter, E. A. Quantum mechanical embedding theory based on a unique embedding potential. J. Chem. Phys. 2011, 134, 154110. (65) Goodpaster, J. D.; Barnes, T. A.; Miller, T. F., III Embedded density functional theory for covalently bonded and strongly interacting subsystems. J. Chem. Phys. 2011, 134, 164108. (66) Nafziger, J.; Wu, Q.; Wasserman, A. Molecular binding energies from partition density functional theory. J. Chem. Phys. 2011, 135, 234101. (67) Rajchel, L.; Ż uchowski, P. S.; Szczȩsń iak, M. M.; Chałasiński, G. Derivation of the supermolecular interaction energy from the monomer densities in the density functional theory. Chem. Phys. Lett. 2010, 486, 160.
(68) Manby, F. R.; Stella, M.; Goodpaster, J. D.; Miller, T. F., III A Simple, Exact Density-Functional-Theory Embedding Scheme. J. Chem. Theory Comput. 2012, 8, 2564. (69) Goodpaster, J. D.; Barnes, T. A.; Manby, F. R.; Miller, T. F., III Accurate and systematically improvable density functional theory embedding for correlated wavefunctions. J. Chem. Phys. 2014, 140, 18A507. (70) Barnes, T. A.; Goodpaster, J. D.; Manby, F. R.; Miller, T. F., III Accurate basis set truncation for wavefunction embedding. J. Chem. Phys. 2013, 139, 024103. (71) Bennie, S. J.; Stella, M.; Miller, T. F., III; Manby, F. R. Accelerating wavefunction in density-functional-theory embedding by truncating the active basis set. J. Chem. Phys. 2015, 143, 024105. (72) Tamukong, P. K.; Khait, Y. G.; Hoffmann, M. R. Density Differences in Embedding Theory with External Orbital Orthogonality. J. Phys. Chem. A 2014, 118, 9182. (73) Fornace, M. E.; Lee, J.; Miyamoto, K.; Manby, F. R.; Miller, T. F., III Embedded Mean-Field Theory. J. Chem. Theory Comput. 2015, 11, 568. (74) Ding, F.; Manby, F. R.; Miller, T. F., III Embedded Mean-Field Theory with Block-Orthogonalized Partitioning. J. Chem. Theory Comput. 2017, 13, 1605. (75) Li, S.; Shen, J.; Li, W.; Jiang, Y. An efficient implementation of the “cluster-in-molecule” approach for local electron correlation calculations. J. Chem. Phys. 2006, 125, 074109. (76) Mata, R. A.; Werner, H.-J.; Schütz, M. Correlation regions within a localized molecular orbital approach. J. Chem. Phys. 2008, 128, 144106. (77) Li, W.; Piecuch, P. Multilevel Extension of the Cluster-inMolecule Local Correlation Methodology: Merging Coupled-Cluster and Møller−Plesset Perturbation Theories. J. Phys. Chem. A 2010, 114, 6721. (78) Rolik, Z.; Kállay, M. A general-order local coupled-cluster method based on the cluster-in-molecule approach. J. Chem. Phys. 2011, 135, 104111. (79) Myhre, R. H.; Sánchez de Merás, A. M. J.; Koch, H. Multi-level coupled cluster theory. J. Chem. Phys. 2014, 141, 224105. (80) Hégely, B.; Nagy, P. R.; Ferenczy, G. G.; Kállay, M. Exact density functional and wave function embedding schemes based on orbital localization. J. Chem. Phys. 2016, 145, 064107. (81) Sparta, M.; Retegan, M.; Pinski, P.; Riplinger, C.; Becker, U.; Neese, F. Multilevel Approaches within the Local Pair Natural Orbital Framework. J. Chem. Theory Comput. 2017, 13, 3198. (82) Huzinaga, S.; Cantu, A. A. Theory of Separability of Many Electron Systems. J. Chem. Phys. 1971, 55, 5543. (83) Chulhai, D. V.; Goodpaster, J. D. Improved Accuracy and Efficiency in Quantum Embedding through Absolute Localization. J. Chem. Theory Comput. 2017, 13, 1503. (84) Culpitt, T.; Brorsen, K. R.; Hammes-Schiffer, S. Density functional theory embedding with the orthogonality constrained basis set expansion procedure. J. Chem. Phys. 2017, 146, 211101. (85) Miyamoto, K.; Miller, T. F., III; Manby, F. R. Fock-Matrix Corrections in Density Functional Theory and Use in Embedded Mean-Field Theory. J. Chem. Theory Comput. 2016, 12, 5811. (86) Boughton, J. W.; Pulay, P. Comparison of the Boys and Pipek− Mezey Localizations in the Local Correlation Approach and Automatic Virtual Basis Selection. J. Comput. Chem. 1993, 14, 736. (87) Rolik, Z.; Szegedy, L.; Ladjánszki, I.; Ladóczki, B.; Kállay, M. An efficient linear-scaling CCSD(T) method based on local natural orbitals. J. Chem. Phys. 2013, 139, 094105. (88) Kállay, M. Linear-scaling implementation of the direct randomphase approximation. J. Chem. Phys. 2015, 142, 204105. (89) Nagy, P. R.; Samu, G.; Kállay, M. An integral-direct linearscaling second-order Møller−Plesset approach. J. Chem. Theory Comput. 2016, 12, 4897. (90) Nagy, P. R.; Kállay, M. Optimization of the linear-scaling local natural orbital CCSD(T) method: Redundancy-free triples correction using Laplace transform. J. Chem. Phys. 2017, 146, 214106. O
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
L.; York, D. M.; Kollman, P. A. Amber 2017; University of California: San Francisco, 2017. (112) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696. (113) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926. (114) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general Amber force field. J. Comput. Chem. 2004, 25, 1157. (115) Jakalian, A.; Bush, B. L.; Jack, D. B.; Bayly, C. I. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method. J. Comput. Chem. 2000, 21, 132. (116) Jakalian, A.; Jack, D. B.; Bayly, C. I. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J. Comput. Chem. 2002, 23, 1623. (117) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys. 2010, 132, 154104. (118) Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 2011, 32, 1456. (119) Polly, R.; Werner, H.-J.; Manby, F. R.; Knowles, P. J. Fast Hartree−Fock theory using local fitting approximations. Mol. Phys. 2004, 102, 2311. (120) Foster, J. M.; Boys, S. F. Canonical Configurational Interaction Procedure. Rev. Mod. Phys. 1960, 32, 300.
(91) Hégely, B.; Bogár, F.; Ferenczy, G. G.; Kállay, M. A QM/MM program for calculations with frozen localized orbitals based on the Huzinaga equation. Theor. Chem. Acc. 2015, 134, 132. (92) Mrcc, a quantum chemical program suite written by Kállay, M.; Rolik, Z.; Csontos, J.; Nagy, P.; Samu, G.; Mester, D.; Csóka, J.; Szabó, B.; Ladjánszki, I.; Szegedy, L.; Ladóczki, B.; Petrov, K.; Farkas, M.; Mezei, P. D.; Hégely, B. http://www.mrcc.hu/ (accessed June 15, 2018). See also ref 87. (93) Hariharan, P. C.; Pople, J. A. The influence of polarization functions on molecular orbital hydrogenation energies. Theor. Chim. Acta 1973, 28, 213. (94) Krishnan, R.; Binkley, J. S.; Seeger, R.; Pople, J. A. Selfconsistent molecular orbital methods. XX. A basis set for correlated wave functions. J. Chem. Phys. 1980, 72, 650. (95) Dunning, T. H., Jr. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys. 1989, 90, 1007. (96) Kendall, R. A.; Dunning, T. H., Jr.; Harrison, R. J. Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions. J. Chem. Phys. 1992, 96, 6796. (97) Woon, D. E.; Dunning, T. H., Jr. Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through argon. J. Chem. Phys. 1993, 98, 1358. (98) Weigend, F. Hartree−Fock Exchange Fitting Basis Sets for H to Rn. J. Comput. Chem. 2008, 29, 167. (99) Weigend, F.; Köhn, A.; Hättig, C. Efficient use of the correlation consistent basis sets in resolution of the identity MP2 calculations. J. Chem. Phys. 2002, 116, 3175. (100) Dirac, P. A. M. Quantum mechanics of many-electron systems. Proc. R. Soc. London, Ser. A 1929, 123, 714. (101) Slater, J. C. A simplification of the Hartree−Fock method. Phys. Rev. 1951, 81, 385. (102) Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett. 1996, 77, 3865. (103) Becke, A. D. Density-functional exchange-energy approximation with correct asymptotic-behavior. Phys. Rev. A: At., Mol., Opt. Phys. 1988, 38, 3098. (104) Becke, A. D. Density-functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 1993, 98, 5648. (105) Nagy, P. R.; Samu, G.; Kállay, M. Optimization of the LinearScaling Local Natural Orbital CCSD(T) Method: Improved Algorithm and Benchmark Applications. J. Chem. Theory Comput. [Online early access]. DOI: 10.1021/acs.jctc.8b00442. Published online: July 2, 2018. Online: https://pubs.acs.org/doi/abs/10.1021/ acs.jctc.8b00442. (106) Pipek, J.; Mezey, P. A fast intrinsic localization procedure applicable for ab initio and semiempirical linear combination of atomic orbital wave functions. J. Chem. Phys. 1989, 90, 4916. (107) Bennie, S. J.; van der Kamp, M. W.; Pennifold, R. C. R.; Stella, M.; Manby, F. R.; Mulholland, A. J. A Projector-Embedding Approach for Multiscale Coupled-Cluster Calculations Applied to Citrate Synthase. J. Chem. Theory Comput. 2016, 12, 2689. (108) Zhang, X.; Bennie, S. J.; van der Kamp, M. W.; Glowacki, D. R.; Manby, F. R.; Mulholland, A. J. Multiscale analysis of enantioselectivity in enzyme-catalysed ‘lethal synthesis’ using projector-based embedding. R. Soc. Open Sci. 2018, 5, 171390. (109) Kulik, H. J.; Zhang, J.; Klinman, J. P.; Martínez, T. J. How Large Should the QM Region Be in QM/MM Calculations? The Case of Catechol O-Methyltransferase. J. Phys. Chem. B 2016, 120, 11381. (110) Wang, J.; Wang, W.; Kollman, P. A.; Case, D. A. Automatic atom type and bond type perception in molecular mechanical calculations. J. Mol. Graphics Modell. 2006, 25, 247. (111) Case, D. A.; Cerutti, D. S.; Cheatham, T. E., III; Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Götz, A. W.; Greene, D.; Homeyer, N.; Izadi, S.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Lin, C.; Liu, J.; Luchko, T.; Luo, R.; Mermelstein, D.; Merz, K. M.; Monard, G.; Nguyen, H.; Omelyan, I.; Onufriev, A.; Pan, F.; Qi, R.; Roe, D. R.; Roitberg, A.; Sagui, C.; Simmerling, C. L.; Botello-Smith, W. M.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; Xiao, P
DOI: 10.1021/acs.jctc.8b00350 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX