J. Phys. Chem. A 2010, 114, 11853–11860
11853
Ab Initio Molecular Dynamics with Dual Basis Set Methods Ryan P. Steele,*,† Martin Head-Gordon,‡ and John C. Tully† Department of Chemistry, Yale UniVersity, 225 Prospect St., New HaVen, Connecticut 06520, United States, Department of Chemistry, UniVersity of California-Berkeley, Berkeley, California 94720, United States ReceiVed: August 4, 2010; ReVised Manuscript ReceiVed: September 11, 2010
On-the-fly, ab initio classical molecular dynamics are demonstrated with an underlying dual basis set potential energy surface. Dual-basis self-consistent field (Hartree-Fock and density functional theory) and resolutionof-the-identity second-order Møller-Plesset perturbation theory (RI-MP2) dynamics are tested for small systems, including the water dimer. The resulting dynamics are shown to be faithful representations of their single-basis analogues for individual trajectories, as well as vibrational spectra. Computational cost savings of 58% are demonstrated for SCF methods, even relative to Fock-extrapolated dynamics, and savings are further increased to 71% with RI-MP2. Notably, these timings outperform an idealized estimate of extendedLagrangian molecular dynamics. The method is subsequently demonstrated on the vibrational absorption spectrum of two NO+(H2O)3 isomers and is shown to recover the significant width of the shared-proton bands observed experimentally. Introduction Classical molecular dynamics (MD) involves the real-time propagation of atoms and molecules within the framework of Newtonian mechanics and the Born-Oppenheimer (BO) approximation. As such, it inherently explores configuration space beyond critical points on the potential energy surface (PES) and also forms the basis of our classically biased view of “trajectories”. It also provides real-time dynamical information, such as reaction rates and correlation functions. Although certain aspects of chemical problemsszero-point motion, quantum mechanical nuclei, low-temperature dynamics, and nonadiabatic effectsscannot be described within this ansatz, classical MD is sufficient for much of the chemical world. These shortcomings are also heavily offset by the fact that classical MD is inherently local, requiring only forces at a single molecular configuration for a given time step. Thus, classical MD is the computationally least expensive option for real-time dynamical information. It also forms the basis for more accurate treatments of nuclear dynamics, such as surface-hopping,1 semiclassical,2,3 ringpolymer4,5 and centroid6,7 MD, and even many flavors of wavepacket methods.8 With the dynamical method fixed, the sole arbiter of accuracy within the classical framework is the PES on which the classical nuclei are propagated. Accordingly, ab initio surfaces are ideal, due to their combination of high accuracy and generality. Onthe-fly generation of these forces avoids the costly fitting of multidimensional surfaces, thus retaining the locality property mentioned above. The unfortunate fact remains, however, that the computational cost gap between parametrized force fields and BOMD is enormous; often the cost of performing a single ab initio calculation is barely accessible, leading to the common reliance on critical-point properties. Therefore, significant motivation exists to both reduce the cost of standard ab initio methods, as well as improve the interface of ab initio calculations with MD methods. * To whom correspondence should be addressed. E-mail: ryan.steele@ yale.edu. † Yale University. ‡ University of California-Berkeley.
The most popular interface of these two worlds is the socalled Extended Lagrangian MD (ELMD) method9-11 with Kohn-Sham density functional theory12 (DFT) forces, popularized within the Car-Parrinello MD (CPMD) package.9,13 Instead of exact BOMD on a DFT surfaceswhich requires the iterative, self-consistent solution of the Kohn-Sham equationssELMD involves the direct propagation of the Kohn-Sham orbitals or density with an associated “fictitious mass”. This scheme reduces the costly, iterative SCF calculation to a single-step formalism, tempered by the requirement of smaller time steps. Indeed, ELMD/CPMD continues to provide access to dynamical systems that are otherwise unattainable. Recent evidence is beginning to demonstrate, however, that simpler optionsssuch as Fock matrix extrapolation14-16scan actually outperform ELMD, in the sense of cost per simulated time (the sensible metric for dynamical simulations). Although an iterative SCF calculation is still required, extrapolation reduces the number of iterations by at least a factor of 2, and the ability to employ larger timesteps renders these methods a net win. The avoidance of ELMD-based artifacts, such as artificial redshifts in vibrational spectra,10,17 further suggests that these full-BOMD methods are the current methods of choice. The more recent work of Niklasson18-20 has also been brought to our attention (see Acknowledgments). In these methods, a fully time-reversible electronic propagation is achieved. Since the role of SCF energy convergence criteria (as opposed to a fixed number of SCF cycles) has not been solidified in these methods, the results of the current work continue to employ the extrapolation scheme of refs 14 and 15. However, the methods contained herein are equally applicable to these reversiblesand likely superiorsalgorithms. In this work, we explore the use of dual basis set SCF (DBSCF) methods21-31 to further reduce the cost of BOMD simulations. These DB-SCF calculations involve the iterative solution of the SCF equations in a small subset of the target basis set, followed by a single-step correction in the latter basis. Additional errors due to this formalism have been shown to be extremely smallson the order of 0.01 kcal/mol for molecular interactions,30 0.001 Å for molecular structures,27 and 0.1 kcal/
10.1021/jp107342g 2010 American Chemical Society Published on Web 10/12/2010
11854
J. Phys. Chem. A, Vol. 114, No. 43, 2010
Steele et al.
mol for bond-breaking energies24,26swhile reducing the computational cost by a factor of 3-10. Most importantly, the DBSCF energy is well-defined and is not susceptible to systematic energy drifts15 or ELMD-type artifacts.10,32 The small sacrifice in accuracy, therefore, is offset by full precision, thereby rigorously defining a valid PES. These methods have been developed within the framework of Hartree-Fock and DFT,24,25,29 as well as second-order Møller-Plesset perturbation theory (MP2);22,23,26,30,31 analytical gradients27,28 have also been developed for all of these methods, which naturally lead to the classical MD work presented here. Methods Standard Methods. Standard BOMD involves the classical ˙, with propagation of nuclear configurations b R and velocities b R forces obtained from analytic gradients of the ab initio energy b). Although any electronic structure theory method (dE)/(dR could be utilized, cost considerations commonly limit dynamical simulations to SCF-based methods (HF and DFT). The DFT energy, for example, can be cast as
b) ) Tr PH + 1 PIIP + Exc E(R 2
(
)
(1)
where P is the one-electron density matrix, H involves the oneelectron integrals, II is the tensor of two-electron repulsion integrals, and Exc is the exchange-correlation energy. Owing to the stationarity of the converged density, analytic gradients are calculated as
(
)
b) dE dH 1 dII dS + E(R ) Tr P + P P - (PFP) xc b b 2 b b dR dR dR dR
(2)
where S is the atomic orbital overlap matrix, F ) IIP + (δExc)/ (δP) is the Fock matrix, and the last term signifies implicit derivatives of the exchange-correlation energy only (i.e., density derivatives are not required). The main cost considerations for DFT-based BOMD are, therefore, formation/contraction of the two-electron integrals (IIP) and the grid-based quadrature evaluation of the exchangecorrelation terms, all multiplied by the number of SCF iterations (typically 10-15 for small molecules). Derivatives of the twoelectron integrals comprise the main cost of the nuclear force calculation and require roughly 3-4 times the cost of a Fock build. The allure of ELMD (details can be found elsewhere9-11) is clear: instead of roughly 10 SCF iterations per nuclear time step, only one is required, with some minor additional overhead and the inherent reliance on the fictitious mass to maintain selfconsistency. The cost of the force remains the same. For a fixed time step, the total cost is, therefore, reduced by roughly a factor of 4. Unfortunately, ELMD rarely affords the same time step as standard BOMD,10 thus mitigating its savings, and alternative, simpler options appear to be more productive. In particular, a simple polynomial extrapolation of the elements of F can reduce the number of Fock builds by a factor of 3-4 with no observable bias in the resulting dynamics or need for smaller timesteps.15 Still, 3-5 SCF iterations are typically required, even with extrapolation, so motivation for further improvement remains. Dual-Basis Methods. The DB-SCF formalism naturally reduces this number of cycles to one, tempered by the small cost of the iterative, small-basis calculation. The small-basis energy, e, is formed as shown in eq 1. The converged density
matrix is then projected onto the target-basis space, and a single Fock matrix in the target basis is formed. (Note that this lone large-basis Fock matrix also may be formed cheaply, due to integral screening.26,27) Following diagonalization of F to obtain the relaxed density, the DB-SCF energy is formed as a correction in the target space:
E ) e + Tr(∆P · F)
(3)
where ∆P is the relaxation of the density in the target basis set. Despite its simplicity, this formalism affords the high accuracy mentioned in the Introduction and an order-of-magnitude reduction in computational cost. Although details of the DB-SCF analytic gradient are deferred to ref 27, some qualitative aspects warrant discussion. Most importantly, the density obtained with DB-SCF methods is not converged. A response equation33-35 must be solved for the response of the density to a nuclear perturbation (note that ELMD/CPMD should also, in principle, require these terms). Since these (linear) response equations are solved iteratively, thus rendering the cost similar to another set of SCF equations, the presence of these terms superficially appears as a death knell to DB-SCF gradients. However, the choice of correction in eq 3 leads to the finding that only small-basis responses are required. Combined with additional integral screening in the force and the already significant savings in the underlying SCF calculation, dual-basis gradients have been shown to be fast and accurate alternatives to their target-basis analogues.27 Finally, we note that even more impressive savings are demonstrated in the MP2 gradient, where response equations are already required,36 and an efficient implementation of the DB-RI-MP2 gradient has been demonstrated.28,30 With a well-defined PES and its analytical gradients, extension to classical BOMD is straightforward. The main remaining questions concern fidelity of dual-basis molecular dynamics (DBMD) with its target-basis counterparts, as well as the extent of computational savings when properly compared to Fockextrapolated dynamics. These issues and others are explored in the following section, culminating in the analysis of the vibrational spectrum of the atmospherically relevant complex NO+(H2O)3. Results The use of dual-basis methods in a dynamical context is presented in this section; the main analysis involves cost and accuracy considerations. First, we demonstrate that DBMD is a valid MD method and that results are consistent with standard BOMD results. Second, we show that the cost of DBMD is significantly less than standard BOMD simulationsseven when extrapolation is employed in the lattersand that DBMD can outperform an idealized estimate of ELMD-based approaches. Finally, the ability of DBMD to reproduce the experimental vibrational spectrum of a unique, atmospherically relevant complex is assessed. Energy Conservation. Loose convergence criteria and integral thresholds can lead to severe drifts (nonconservation) in MD energies when the previous time step’s MOs are utilized as the initial SCF guess.14,15 In some sense, DB-SCF energies utilize a similar scheme, in which the projected small-basis MOs are used to build the Fock matrix for the lone large-basis SCF step. However, as shown in Table 1, the energy conservation problems are avoided in DB-SCF. Even with the extremely loose numerical settings utilized in the tablessettings that clearly demonstrate the problem with using the previous step’s MOssthe
Ab Initio MD with Dual Basis Set Methods
J. Phys. Chem. A, Vol. 114, No. 43, 2010 11855
TABLE 1: Energy Conservation Tests for HF/6-31G* Dynamics on Tetrafluoroethylene, C2F4a method
SCF guess
noise (µEh)
drift (µEh/ps)
single
SAD previous MOs (12,6) SAD previous MOs (12,6)
30.6 7190 44.6 13.6 360 21.5
3.20 -15900 32.8 4.04 729 29.7
dual
a Statistics are averaged over five 2 ps trajectories using a 20 au time step. Initial velocities have been selected from a thermal distribution at 500 K, and initial positions are simply the optimized structure. Corresponding single- and dual-basis trajectories utilize the same initial conditions. Shown are results using an atomic density (SAD) guess at each MD step, as well as a 12-point, 6th-order Fock extrapolation,15 denoted as (12,6). The target basis is 6-31G*; for dual-basis results, the 6-31G basis has been used as the secondary basis. Integral thresholds and SCF convergence criteria have been set to 10-7 au and 10-4 au, respectively. The drift is defined as the slope of a linear fit to the total energy as a function of simulation time, and the noise is the mean absolute fluctuation after this linear term is subtracted.
DB dynamics fare just as well as their single-basis counterparts. At these rather poor settings, the Fock extrapolation scheme does show a slight drift, but any reasonable tightening of these settings alleviates this problem15 for the length of trajectories studied here. Since a fully converged, small-basis density is utilized, the DB correction (eq 3) is well-defined and devoid of numerical artifacts. Thus, DBMD satisfies a “bare minimum” requirement of dynamics: energy conservation. The time-reversible algorithms of Niklasson18-20 avoid these drift considerations by construction, and the DB dynamics presented here are fully compatible with these algorithms. These methods have not yet been tested with DB dynamics, but since the trajectories generated in the remainder of this work are stable (drift-free) over the length of the simulation, such a modification appears at present to be unnecessary. Much longer trajectories, however, may require further analysis. Fidelity with Target Basis Results. Although the DB scheme defines a valid PES alone, the main question concerning DBMD is its inherent fidelity with target-basis MD results. Past analyses23-31 on static observables have shown that the DB scheme is remarkably consistent with its target-basis counterpart, particularly for large basis sets, but consistency in dynamical quantities remains to be demonstrated. The most direct comparison of single- and dual-basis dynamics involves the propagation of individual trajectories from the same initial phase-space point. The HF molecule is a deceptively simple test case, in which the high vibrational frequency requires relatively small timesteps, and a rapidly changing character of the wave function can lead to serious artifacts in ELMD dynamics.10 In fact, a time step of