Accuracy and Interpretability: The Devil and the Holy Grail. New

Jun 12, 2019 - The past decade has witnessed an increasing interaction between experiment and theory in the field of molecular spectroscopy. On the ...
2 downloads 0 Views 11MB Size

Cite This: Chem. Rev. 2019, 119, 8131−8191

Accuracy and Interpretability: The Devil and the Holy Grail. New Routes across Old Boundaries in Computational Spectroscopy Cristina Puzzarini,*,† Julien Bloino,‡ Nicola Tasinato,‡ and Vincenzo Barone*,‡ †

Dipartimento di Chimica “Giacomo Ciamician”, Università di Bologna, Via F. Selmi 2, I-40126 Bologna, Italy Scuola Normale Superiore, Piazza dei Cavalieri 7, I-56126 Pisa, Italy

Downloaded via IDAHO STATE UNIV on July 18, 2019 at 23:56:09 (UTC). See for options on how to legitimately share published articles.

ABSTRACT: The past decade has witnessed an increasing interaction between experiment and theory in the field of molecular spectroscopy. On the computational side, ongoing developments of hardware and software have moved computational spectroscopy from a highly specialized research area to a general tool for researchers in different fields of chemical science. However, since its dawn, computational spectroscopy has been characterized by the dichotomies of qualitative and quantitative description, and of interpretation and accuracy. Indeed, the analysis of experiments is seldom straightforward because of the subtle interplay of several different effects, which are not easy to evaluate and isolate, and/or the complexity of the system under consideration. Often, the accuracy has to be set aside for a more qualitative analysis that will provide the means for a broad interpretation. In such a scenario, the most recent advances in theoretical treatments as well as computational tools have opened the way to the reconciliation of accuracy and interpretability, resulting in unequivocal analyses and assignments of experimental spectra and their unbiased interpretation. This Review aims at being a comprehensive, authoritative, critical, and readable account of general interest to the chemistry community because of the wealth of qualitative and quantitative information that can be obtained from spectroscopic investigations. Limiting ourselves to rotational and vibrational spectroscopy, emphasis will be put on accuracy and interpretability as well as on the routes toward their reconciliation and integration.

CONTENTS 1. Introduction 2. The Devil: Accuracy in Computational Spectroscopy 2.1. Electronic Structure Calculations 2.2. The Nuclear Problem 2.2.1. Resolution of the Vibrational Problem 2.2.2. Resolution of the Rotational Problem 2.2.3. Accuracy in Vibrational and Rotational Spectroscopy 2.3. Beyond the Born−Oppenheimer Approximation 3. The Holy Grail: Interpretability in Molecular Spectroscopy 3.1. Methodological Tools for the Interpretation of Molecular Spectra 3.1.1. Vibrational Spectroscopy 3.1.2. Rotational Spectroscopy 3.2. Correcting Misinterpretations 4. Virtual Multi-frequency Spectrometer: To Reconcile Accuracy with Interpretability 4.1. The Virtual Multi-frequency Spectrometer 4.2. Vibrational Spectroscopy 4.2.1. The VMS-VIB module 4.3. Rotational Spectroscopy 4.3.1. The VMS-ROT software 5. Immersive Virtual Reality: A Route To Enhance Accuracy and Interpretability 6. New Strategies at Work © 2019 American Chemical Society

6.1. Molecular Structure Determination 6.2. Astrochemistry 7. Outlook 7.1. Weak Interactions 7.2. Condensed Phase 8. Concluding Remarks Author Information Corresponding Authors ORCID Notes Biographies Acknowledgments References

8131 8133 8133 8137 8137 8143 8144 8148

8174 8176 8177 8177 8178 8178 8179 8179 8179 8179 8179 8179 8179


1. INTRODUCTION Spectroscopic methods are unique tools to probe in a noninvasive way the properties of complex molecular systems in a variety of environments and conditions. Indeed, the increasing sophistication of well-established techniques and the parallel blooming of new ones have a huge impact in several fields of science and technology (see, e.g., refs 1−9). In addition to being widely used to infer information about molecular structure and dynamics in both gas and condensed phases (see, e.g., refs 3 and 4), spectroscopy allows for the unequivocal identification of chemical species in hostile

8149 8149 8156 8161 8163 8163 8164 8165 8168 8168 8172 8173

Received: January 3, 2019 Published: June 12, 2019 8131

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


Figure 1. Molecular systems addressed in this Review.

the spectroscopic outcomes, the high computational cost and effort restrict such accurate approaches to small, isolated molecules. To extend the size and complexity of the systems, accuracy is often sacrificed to interpretability, thus leading to more qualitative descriptions. However, recent developments in theoretical and computational models, as well as in computer science (from both software and hardware perspectives), have enabled new scenarios to reconcile accuracy and interpretability. Theoretical studies in the field of molecular spectroscopy play three roles: (a) prediction and support, (b) complementarity, and (c) interpretation. Indeed, computational spectroscopy was born as a branch of quantum chemistry with the aim of providing predictions of spectroscopic properties and features. As soon as the accuracy and applicability of quantum-chemical methods increased, the predictive role turned into an effective support to the experiment itself (see, e.g., refs 2, 8, 13−17). Computational predictions thus started to guide measurements, for instance, by selecting the most appropriate systems to be studied, the frequency range to be considered and/or the experimental conditions to be used.

environments (e.g., the interstellar space) or in samples of unknown composition (see, e.g., refs 5−7, 10, and 11) and plays a pivotal role in the study of photochemical mechanisms in biological systems and in the development of new technological devices, including photovoltaic cells, optoelectronic devices, UV-resistant materials, dyes, and fluorescent probes (see, e.g., refs 7−9). Unfortunately, interpretation of experimental data is often a difficult task: the observed spectroscopic behavior results from the subtle interplay of stereo-electronic, dynamic, and environmental effects, whose specific roles are difficult to disentangle. Furthermore, although conveying additional information, spectral congestion makes quantitative interpretation of experimental data even more difficult. To disentangle these complex signatures and disclose the underlying molecular properties, detailed molecular simulations are crucial.2,12 Therefore, high accuracy of the models to be employed in the interpretation of spectroscopic results and in the derivation of the underlying physicalchemical properties is mandatory. While pushing accurate treatments of the electronic and nuclear problems to the limit ensures rigorous, quantitative analyses and interpretations of 8132

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


and intermolecular interactions in tuning the overall behavior, thus offering more accessibility to the structure−property relationships that are at the heart of modern molecular approaches. However, to reconcile accuracy and interpretation, the challenges posed by the intrinsic properties and the complexity of the system together with thermal effects (the environmental ones can usually be excluded in the gas phase) should be overcome. The way envisaged in the present Review is to rely on a direct vis-à-vis comparison between experimental and computed spectroscopic data, which is however still far from being standard. After addressing the “devil”, that is, the accuracy required, and the “holy grail”, i.e., the interpretability, the most promising solutionin our opinionfor reconciliation is presented. This is provided by a virtual multi-frequency spectrometer (VMS), which has been implemented with the aim of giving access by means of a user-friendly tool to the latest developments of computational spectroscopy, also to non-specialists.12,18−20 The structures of the systems that will be used as examples along this Review are collected in Figures 1 and 2.

Figure 2. Molecular complexes addressed in this Review.

Therefore, computational tools play an increasing role in molecular spectroscopy not only by actively supporting measurements, but also in designing new experiments that would be impossible or very expensive to perform in a blind way. This has been leading to a stronger and stronger integration between experiment and theory in many research areas (see, e.g., refs 2, 8, 12, 15−18); however, while the employment of state-of-the-art computational approaches may be quite straightforwardly pursued in computational−theoretical research groups, their application could be difficult for nonexperts of the field, like experimentally oriented researchers. In the past decade, the developments in theoretical methodologies as well as in hardware facilities have led to such improvements that nowadays state-of-the-art computational methodologies are able to provide results that are comparable to those delivered by accurate experimental techniques, even if in limited spectroscopic fields and for limitedmostly in sizesystems. As a consequence, computational spectroscopy started to play the role of complementing experiment, thus providing the missing information that cannot be obtained by the specific experiment under consideration. In parallel, the increasing reliability of computational results has led to their deeper involvement in the interpretation of experimental outcomes, thus opening the route toward the understanding of the underlying physical phenomena, which might be very difficult based on a pure experimental work. However, interpretation of experimental findings can be a very challenging task once one aims at performing it at a quantitative (and not only qualitative) level. The main aim of this contribution is to provide a comprehensive picture of the state-of-the-art, with a particular focus on how it is now possible to overcome the dichotomy between accuracy and interpretation that has characterized computational spectroscopy since its dawn. However, the field of molecular spectroscopy is too wide to be addressed in a single review. For this reason, we focus our attention on the spectroscopic techniques that we consider more mature for achieving our goal, namely, rotational and vibrational spectroscopies in the gas phase, thus providing the reader with only a few hints concerning other spectroscopic techniques and/or condensed phases. Gas-phase spectroscopies play a crucial role in retrieving structural and property information from the analysis of spectroscopic features because they provide the unique opportunity to avoid the competition between intra-

2. THE DEVIL: ACCURACY IN COMPUTATIONAL SPECTROSCOPY Accuracy is required to support and complement experiment, to retrieve structural and dynamic information from experimental results, to dissect and quantify the role of different effects on spectroscopic features, and to predict molecular and spectroscopic properties for novel systems. However, in computational spectroscopy, accuracy is synonymous with high computational cost and often with unfeasible calculations. In this section, we address the level of accuracy that can be currently reached by the resolution of the electronic and nuclear Schrödinger equations, thus pointing out the feasibility of computations in terms of molecular complexity and also trying to emphasize when high accuracy is mandatory. The last point is strongly connected with the concept of “target” accuracy: depending on the spectroscopic problem under consideration and/or the final aim of the investigation being undertaken, computations must or must not be pushed toward the limit. However, it should be kept in mind that, for a large part of the spectroscopic studies, going beyond the chemical accuracy (1 kcal/mol) and fulfilling the so-called spectroscopic accuracy (e.g., 1 cm−1 in the field of vibrational spectroscopy) are necessary. To push theory toward such an accuracy, a key point is the choice of the most appropriate methodologies, which are presented in some details in this section. 2.1. Electronic Structure Calculations

To obtain quantitative agreement between experimental and computed spectra, as a starting point, high accuracy is required in the electronic structure calculations that lay at its basis. Therefore, the best possible quantum-chemical (QC) methodologies must be employed; that is, QC calculations should be pushed to the limit in order to determine very accurate equilibrium structures, energies, and properties. The key point is thus to reduce as much as possible the errors due to the truncation of both basis set and wave function, the so-called one- and N-electron errors, respectively. To this purpose, composite schemes, which evaluate the contributions important to reach high accuracy at the best possible level and put them together, have been set up (see, e.g., refs 21−35). For systems not showing a strong multi-configurational character, composite schemes based on coupled-cluster (CC) 8133

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


is interesting to note that a very limited loss of accuracy is obtained when the contributions from the full treatment of triple and quadruple excitations in the cluster expansion are ignored, with the geometrical parameters usually characterized by an accuracy within 0.001−0.002 Å for distances and 0.05− 0.1° for angles (see, e.g., refs 15, 26, 52, and 53). An important note concerning the basis sets to be used in conjunction with the extrapolation to the CBS limit is deserved. It should be pointed out that extrapolative schemes require a hierarchy of basis sets to be used. In addition to those mentioned above, set up by Dunning and co-workers (the cc-p(C)VnZ sets), the ANO basis sets developed by Roos and co-workers54−57 and a recent modification by Claudino and Bartlett58 as well as the pc-X basis sets developed by Jensen and co-workers (see ref 59 and references therein) are available in the literature. Moving to the HEAT protocol, as demonstrated in refs 24, 37, and 38, this theoretical model chemistry is designed to achieve high accuracy for thermochemistry; indeed, it is able to reach the so-called sub-kJ accuracy. This approach is entirely independent of experimental data and does not contain any empirical parameter. For electronic energies, the scheme is analogous to that of eq 1, also accounting for first-order spin− orbit coupling, diagonal Born−Oppenheimer correction, and scalar relativistic effects. This high-accuracy fully non-empirical scheme for theoretical thermochemistry is intimately related to other high-precision protocols such as the Weizmann-3/4 (W3, W4),30,31 focal-point analysis (FPA),21,22 and Feller− Dixon−Peterson (FDP)33 approaches. A simplified version of the HEAT protocol is obtained by retaining only the extrapolation to the CBS limit at the CCSD(T) level and the incorporation of the CV corrections, thus leading to the socalled CCSD(T)/CBS+CV scheme. This approach is rather well tested in the literature (see, e.g., refs 50, 60−62) and was shown to provide results with an accuracy well within 0.5 kcal/ mol. Recently, Kállay and co-workers have proposed a new protocol aiming at fulfilling chemical accuracy at a moderate computational cost.63 The starting point of this approach, named diet-HEAT-F12, is the HEAT protocol itself. To reduce the computational cost, the explicitly correlated CCSD(T)F1264,65 method is combined with conventional CC methods up to perturbative treatment of quadruple excitations; furthermore, the HF-SCF/CBS limit is replaced by the CABS-HF/cc-pVQZ-F12 level, where CABS stands for complementary auxiliary basis set singles corrections. The diet-HEAT-F12 protocol is not too dissimilar from the W3F12 scheme.66 Indeed, the explicitly correlated CCSD(T)-F12 methods, and in particular the simplified versions that allow for reducing the computational cost of the F12 contributions, provide a faster convergence to the CBS limit, thus permitting a reduction of the computational costs associated with the extrapolation to the CBS limit within composite schemes. In passing we note that other model chemistry methods for thermochemistry have been developed so far, such as Gn (where n = 2, 3, or 4, with different modifications being available; see refs 67 and 68 and references therein) and CBS-x (where x = 4, 4M, q, Q, or QB3; see refs 23 and 69 and references therein). However, we do not address these in detail because, while they fulfill chemical accuracy, they are not suitable for obtaining spectroscopic accuracy and they involve a few empirical parameters. Concerning reference values, the Active Thermochemical Tables (ATcT)70−72 approach is a novel paradigm in thermochemistry based on constructing, analyzing, adjusting, and solving a thermochemical network

theory and accounting for all important QC contributions are well known to fulfill the accuracy requirements for a quantitative spectroscopic analysis. A significant example of automated procedures to recover the errors due to the truncation of both basis set and wave function as well as to include other minor contributions that are important when aiming at high accuracy is the composite scheme implemented in the QC program package CFOUR,36 which is at the basis of the so-called HEAT approach24,37,38 for energetics and the socalled “gradient” scheme25,26 for equilibrium structure determinations and involves calculations at the CC level. The starting point is the CC singles and doubles with perturbative triples corrections method, CCSD(T),39 which already provides a very good description of the electronic wave function. However, the full CC singles, doubles and triples (CCSDT)40−42 and CC singles, doubles, triples, quadruples (CCSDTQ)43 models can also be employed to further minimize the error associated with the truncation of the wave function. To recover the error due to the basis-set truncation, an extrapolation to the complete basis set (CBS) limit can be performed. This composite scheme (as well as, in general, all composite approaches) is based on the additivity approximation, which means thatas mentioned abovethe various contributions considered are evaluated at the best possible level and then combined together. Focusing on the “gradient” scheme, the energy gradient to be minimized in the geometry optimization procedure is set up according to the specific requirements, the target accuracy, and the dimensions of the system under consideration. Extrapolation to the CBS limit is carried out at the CCSD(T) level within the frozen core approximation, thus requiring core− valence correlation (CV) effects to be added to the energy gradient in terms of the difference between all-electron and frozen-core CCSD(T) calculations in the same basis set. In a similar manner, corrections due to a full treatment of triples (fT) and quadruples (fQ) can be included using small basis sets for their evaluation (usually of double- or triple-ζ quality). The reliability of these corrections is based on the assumption that a given contribution marginally depends on the dimension of the basis set used for its evaluation. The corresponding energy gradient (with x generically indicating nuclear coordinates) is given by the following expression: dE∞(HF‐SCF) dΔE∞(CCSD(T)) + dx dx dx dΔE(CV) dΔE(fT) dΔE(fQ) + + + (1) dx dx dx

dECBS + CV + fT + fQ


with the correlation-consistent polarized cc-p(C)VnZ basis sets44−47 being usually employed in conjunction with this scheme. The first two terms on the right-hand side denote the extrapolation to the CBS limit: Hartree−Fock self-consistent field (HF-SCF) and CCSD(T) correlation energies are extrapolated separately because of their different convergence behavior. From the available literature (see, e.g., refs 25, 26, 48−52 and references therein), the accuracy obtainable with the scheme presented above in terms of structural parameters is better than 0.001 Å for bond lengths and 0.05° for angles. However, inclusion of fT and fQ corrections in the so-denoted CCSD(T)/CBS+CV+fT+fQ scheme makes this approach very expensive from a computational point of view, while the CCSD(T)/CBS+CV one is affordable also for medium-sized molecules, e.g., pyruvic acid52 and glycine.53 In this respect, it 8134

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


Table 1. Relative Energies (ΔE, in kJ/mol) with Respect to the Most Stable Conformer computational model geometry




Pyruvic Acid B2PLYP/AVTZb B2PLYP/AVTZb B2PLYP-D3/AVTZc B2PLYP/AVTZb “cheap” geometry CCSD(T)/CBS+CV


Tt 11.41 11.69 11.69 11.68 11.62 11.67



cis 9.47 8.53 8.49



IIn 2.45 2.29

Ct 18.10 18.35 18.34 18.45 18.39 18.45

IIIp 7.42 7.44

IVn 4.89 4.87

Vn 10.99 10.88

VIp 20.34 20.32

a Conformational energies of the Tt- and Ct-pyruvic acid (PA) with respect to the most stable one, Tc-PA; see ref 52. bB2PLYP/AVTZ stands for calculations employing the double-hybrid B2PLYP functional73 in conjunction with the aug-cc-pVTZ basis set.80 cB2PLYP-D3/AVTZ stands for calculations employing the double-hybrid B2PLYP functional73 including the D3 dispersion correction81 in conjunction with the aug-cc-pVTZ basis set.80 dConformational energy of cis-acrolein with respect to trans-acrolein; see ref 61. eConformational energies of the IIn, IIIp, IVn, Vn, and VIp conformers of glycine with respect to the most stable one, Ip; see ref 50. fB3LYP/SNSD stands for calculations employing the hybrid B3LYP functional82,83 in conjunction with the double-ζ-quality SNSD basis set.84

CCSD(T)/CBS+CV+fT+fQ geometry and the one obtained at the CCSD(T) level in conjunction with a quadruple-ζ basis set. Concerning kinetics, one key point is the accuracy in evaluating transition-state energies. To give an example, the conformational analysis of glycine carried out in ref 53 showed that the choice of the molecular structure has a very limited effect on the computed values. To extend the applicability of the composite schemes presented above to larger molecules, an effective solution is obtained by replacing the CCSD(T) method with a lower, thus cheaper (from a computational point of view), level of theory to incorporate the most important corrections. This led to the definition of the so-called “cheap” (to stress the reduced computational cost) approach,74,75 which keeps the CCSD(T) method in conjunction with a triple-ζ basis set as starting point, but employs Møller−Plesset to second-order theory (MP2)76 to perform the extrapolation to the CBS limit and to include CV corrections:

(TN). This TN contains the known thermochemical interdependencies between the targeted chemical species, and it is constructed from the relevant reaction enthalpies, reaction Gibbs energies, adiabatic ionization energies, electron affinities, etc. that are in turn experimentally measured or accurately computed. An important issue that has not been mentioned so far is the effect of the geometry at which electronic energy calculations are carried out. The HEAT protocol employs structures at the CCSD(T)/cc-pVQZ level, thus suggesting that the referencegeometry effect is limited: it is not necessary to push geometry optimizations to the limit in order to obtain accurate energetics. With respect to this issue, a few interesting examples are summarized in Table 1, which collects the conformational energies obtained at different levels of theory and using different reference geometries for acrolein61 (panel O, Figure 1), pyruvic acid52 (panel P, Figure 1), and glycine50 (panel Q, Figure 1). It is clearly noted that the choice of the geometry does not affect significantly the energetics and that methods rooted in density functional theory (DFT), also employing basis sets as small as the double-ζ ones, are suitable for the purpose, thus allowing the application of CC composite schemes for electronic energy evaluations well beyond the “medium-sized molecule” limit. Another interesting aspect pointed out by the results of Table 1 is the good performance of the B2PLYP73/aug-cc-pVTZ level. Indeed, in the case of conformational energies, B2PLYP in conjunction with a tripleζ-quality basis set seems to perform better than expected based on the benchmark study accompanying its implementation.73 The same remark applies to thermodynamic and kinetic parameters, as shown, for instance, by a detailed study on the enthalpy of formation of the cis and trans isomers of protonated SO2, i.e., HOSO+ (panel E, Figure 1), considering different reactions.60 In ref 60, it was noted that, for all reactions and for both isomers, negligible differences were observed for two different molecular structures, namely, the

E(best‐“cheap”) = E(CCSD(T)/VTZ) + ΔE(MP2/CBS) + ΔE(MP2/CV)


The second term on the right-hand side accounts for the extrapolation to the CBS limit and is obtained, as in the case of the composite scheme implemented in CFOUR and discussed above, by extrapolating to the CBS limit the HF-SCF electronic energy according to the three-point extrapolation formula by Feller77 and the MP2 correlation energy by using the n−3 formula;78 the third term represents the core−valence correlation correction. Besides the remarkable reduction of the computational cost, this approach is able to provide accurate results (see, e.g., refs 52 and 74), also for flexible systems like molecular complexes.79 The so-called “cheap” composite scheme can also be applied to structural determinations,53,74,85,86 with the additivity approximation being directly applied to geometrical parame8135

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


errorsin order to be acceptableshould be extremely small, with high accuracy being required not only for the rotational constants, which are the leading terms, but also for otherin a sense minorterms, like the centrifugal-distortion constants. These issues will be further addressed along this Review. To achieve the goal of accurately calculating the leading terms of the property under investigation, whenever possible (based on the available implementations), composite schemes based on CC techniques can be once again employed. Focusing on force fields, for most of the available QC methods, quadratic force constants (which describe the harmonic force field) can be computed as analytical second derivatives of the electronic energy with respect to nuclear coordinates. However, harmonic force field evaluations are computationally more expensive than geometry optimizations, and this limits the application of composite schemes entirely based on CC calculations to systems of rather small dimensions. Once again, the “cheap” approach is affordable even for medium-sized molecules, like uracil85,92 and thiouracil,86 thus opening the way, for these systems, to the accurate computation of harmonic frequencies as well as the related spectroscopic properties, like the quartic centrifugaldistortion constants that are of relevance to rotational spectroscopy (see, e.g., refs 79, 85, 86, and 92) and the derivatives of the electric dipole moment that are fundamental for infrared (IR) spectroscopy (see, e.g., refs 92 and 93). Other properties in molecular spectroscopy require a quantitative prediction. A large part of them describes the “response” of the molecular system to an external perturbation and can be determined as the corresponding derivatives of the energy with respect to the components of the perturbation itself.94 This implies that composite schemes can be effectively applied also to their derivatives. Examples are (a) the electric dipole moment, which is described as the first derivative of the electronic energy with respect to an external electric field and for which the n−3 extrapolation formula (to the CBS limit) has been theoretically demonstrated;95 (b) the first derivative of the energy with respect to the magnetic moment at the nucleus of interest, which leads to the evaluation of the hyperfine coupling constants; and (c) the electric dipole derivatives, which are obtained as the first derivatives of the electric dipole moment with respect to nuclear coordinates and allow for the computation of the IR intensities within the harmonic approximation. As demonstrated, for example, in ref 96 for bromofluoromethane (panel I, Figure 1), the CCSD(T)/CBS +CV composite scheme, also accounting for relativistic effects on bromine by means of small-core relativistic pseudopotentials (PP), is able to quantitatively predict the electric dipole moment components once vibrational corrections are included (μcalc = 1.734 D vs μexp = 1.739(26) D). In ref 96, these predictions allowed the assignment of the Stark rotational spectrum, severely complicated by the hyperfine structure due to the Br atom. This example also allows us to point out that specific basis-set requirements need to be fulfilled. In the specific case of electric dipole moments, the basis sets to be used should incorporate diffuse functions in order to accurately describe the electronic density in the regions far from the intramolecular bonds, this being important to quantitatively predict the specific property under consideration.

ters based on the assumption that they show the same behavior as the energy (see, for example, refs 85, 87−89). As discussed in ref 89, to take into account basis-set truncation effects, the CBS limit can be evaluated by applying the consolidated n−3 extrapolation form78 to the whole parameters, that is, without separating the HF-SCF and correlation contributions. This strategy allows one to avoid geometry optimizations at the HFSCF level in conjunction with a basis set as large as a quintuple-ζ, thus only requiring equilibrium structures at the MP2/cc-pVTZ and MP2/cc-pVQZ levels. The resulting approach is denoted as “cheap” geometry scheme.51 Based on the comparison with best-estimated equilibrium structures obtained using the “gradient” CCSD(T)/CBS+CV scheme as well as with experiment,51−53,85,90 the so-called “cheap” geometry scheme is robust, reliable, and accurate, even when dealing with rather flexible systems like glycine and glycine dipeptide analogues53,74 as well as molecular complexes.79 Indeed, it is noted that for bond distances the maximum absolute deviation with respect to CCSD(T)/CBS+CV is smaller than 0.001 Å. In the field of computational spectroscopy, in addition to structure and energetics, there are other molecular properties that require high accuracy in their evaluation in order to obtain a quantitative description of experimental features. While we do not discuss in the following the implementation of the properties in electronic structure calculations (we refer the reader to, for instance, ref 91), we instead focus on the practical issues that are of interest to the present Review. Furthermore, it should be pointed out that for some properties, in particular for those related to chiral spectroscopies, implementation for higher-level methods (e.g., CCSD(T)) is still missing, thus preventing in most cases the application of composite schemes. All the spectroscopies addressed in this Review need the characterization of the portion of the potential energy surface (PES) around one or more stationary points, for which an effective description can be obtained in terms of force constants, i.e., derivatives of the electronic energies with respect to nuclear coordinates. On general grounds, while accurate quantitative results require the evaluation of anharmonic force fields, only the harmonic part really needs to be computed with high accuracy. This is a general concept at the basis of the so-called “hybrid” approaches (which will be presented in detail later in this Review), where the leading term is evaluated at the best possible level, while smaller contributions can be computed at a lower level of theory. This strategy does not affect significantly the overall accuracy because a larger uncertainty for a small term leads to an overall small error affecting the property under investigation. Another important issue that should be addressed when dealing with computational spectroscopy is the “target” accuracy, which strongly depends on the type of spectroscopic technique and/or experiment to be simulated. Let us consider once again vibrational and rotational spectroscopies. It should be noted that, in most cases, for vibrational spectroscopy we deal with medium-resolution spectra; therefore, the accuracy requirement is the prediction of transition frequencies with uncertainties within 10 cm−1. However, when dealing with high-resolution vibrational spectroscopy, a higher accuracy (with the spectroscopic accuracy being 1 cm−1) is necessary. Different is the situation in the case of rotational spectroscopy, which is by definition a very high-resolution spectroscopic technique. Thus, in the field of rotational spectroscopy the 8136

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


where Iτ,η’s are the elements of the inertia tensor (whose definition is provided later in the text), with τ, η, and ς representing the Cartesian coordinates. After insertion of eqs 7 and 8 in eq 5, the terms of the vibro-rotational Hamiltonian can be grouped by power of the vibrational (q and p) and rotational (J) operators:

2.2. The Nuclear Problem

Let us first of all focus on the total energy of an isolated molecule (Emol s ) in a given state (|Ψs⟩), which is provided by the time-independent non-relativistic Schrödinger equation: / mol(r, R)|Ψs(r, R)⟩ = Esmol|Ψs(r, R)⟩


vr vr vr / vr = / 20 + / 30 + / 40 + ... vibrational terms

where / mol(r, R) is the molecular Hamiltonian, with R and r being the position arrays of the nuclei and electrons, respectively. The direct resolution of eq 3 is rarely feasible, except for very special cases, and the Born−Oppenheimer (BO) approximation97 is commonly used to separate the electronic and nuclear degrees of freedom. A further approximation is to assume that the Eckart− Sayvetz conditions98,99 are met, so that the nuclear wave function can be further simplified by factoring out the translational motion and minimizing the couplings between vibrations and rotations, that is, ψ = ψtψvr ≈ ψtψvψr. In practice, this is verified when a molecular system, placed in the reference frame, has its center of mass coinciding with the origin of the latter, enforcing that its principal moments of inertia are collinear with the Cartesian axes. In the following, it is assumed that these constraints hold, and we focus on the vibrorotational problem: / vr|ψsvr⟩ = Esvr|ψsvr⟩

vr vr vr + / 21 + / 31 + / 41 + ... Coriolis terms vr vr vr + / 02 + / 12 + / 22 + ... rotational terms


where, for each / vrfg , f denotes the total power of operators q and p, and g that of J. In the following, the energy derivatives of eq 8, usually referred to as cubic and quartic force constants, will be written in a more compact form: ij y j ∂ 3V zzz zz fijk = jjjj j ∂qi ∂qj∂qk zz k {eq

ℏ2 2hc

The solution of full vibro-rotational Hamiltonian quickly becomes a daunting task and has to be limited to small systems.101−110 The focus of this Review is instead on methodologies able to treat medium- to large-sized molecular systems, for which state-of-the-art variational, many-body, or discrete variable representation (DVR) methods, possibly based on specific Hamiltonians and optimized coordinates systems,111−116 are hardly applicable. We start from semirigid systems, for which normal modes based on Cartesian coordinates, low-order perturbative approaches based on Watson’s Hamiltonian, and polynomial expansions of the PES perform at their best. Although the theoretical basis of these approaches dates back to the 1950s,117−120 general and robust implementations are more recent,121−124 this being especially true when considering the automatic detection of resonances and computation of intensities for medium- to large-sized systems.125−129 In most cases, the rotational and vibrational terms can be treated separately (and these are the cases considered in this Review). However, it should be emphasized that there are cases where treating rotations and vibrations as separable motions breaks down completely.130,131 2.2.1. Resolution of the Vibrational Problem. Let us first consider the purely vibrational part (g = 0). It should be noted that, by doing so, an interaction term with the rotational motion still remains, through the zeroth-order contribution from the effective inverse molecular inertia tensor μ:


ℏ2 8hc

∑ μτη (Jτ

− πτ )(Jη − πη)

τ ,η

∑ μττ



1 2


∑ ωipi 2

+= (5)


where J is the rotational angular momentum operator and π the vibrational angular momentum operator, defined as N



∑ ζij





with ζ being the matrix of Coriolis couplings and ωi the wavenumber associated with the ith mode. The exact form of the effective inverse molecular inertia tensor μ and the potential energy V being unknown, they are expanded as Taylor series with respect to the normal coordinates: μτη (q) = μτη (q eq) − 3 + 4

i=1 ∂I eq eq τς eq μττ μ ∂qi ςς i,j=1


∑∑ ς


∑ μττeq

eq ∂Iτη

∂qi eq ∂Iςη


μηηeq qi μηηeq qiqj + ◦(q 2)

i y 1 N 1 N jjj ∂ 3V zzz zz q q q V (q) = ∑ ωiqi 2 + ∑ jjj 2 i=1 6 i , j , k = 1 j ∂qi ∂qj∂qk zz i j k k {eq +

1 24

ij yz ∂ 4V jj zz 4 jj z jj ∂q ∂q ∂q ∂q zzz qiqjqk ql + ◦(q ) i ,j ,k ,l=1 k i j k l {eq

ij yz ∂ 4V j zz zz fijkl = jjjj j ∂qi ∂qj∂qk ∂ql zz k {eq


In the following we resort to Watson’s definition of the vibro-rotational Hamiltonian in terms of the dimensionless normal coordinates (q) and their conjugate momenta (p):100 / vr =


ℏ2 2hc

μτηeq πτπη =

τ ,η=x ,y,z N


∑ τ=x ,y ,z


∑ i ,j,k ,l=1

ζij , τζkl , τ

ωjωl ωiωk

qipj qk pl


with Beq being the equilibrium rotational constant tensor. The harmonic vibrational Hamiltonian is obtained by simply ignoring this term and expanding the potential energy up to the second order. However, a more physically sound approach must be based on the effective inclusion of higher orders of the expansion in eq 8.


(8) 8137

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


For diatomic molecules, a Taylor expansion (see eq 8) is not needed because V can be simply computed along a scan of the vibrational coordinate, and this approach can be extended to multi-dimensional cases, provided that the vibration of interest, represented by the mass-weighted coordinates Q1, is isolated from the rest of the system. The Cartesian displacement from equilibrium in each point of the scan can be defined as X = a(z) − X eq

This means that, for any operator with a known representation in the VBR basis, its expression in the DVR basis can be easily derived. Furthermore, by choosing a suitable form of Y so that it is orthogonal,141 the DVR basis functions can be localized on each point of the grid (χDVR = δij). In such a way, the potential ij energy operator in the DVR basis becomes diagonal, i.e., (VDVR)ij = V1(li)δij. Using the Gaussian quadrature149 and defining the elements of Y as


where z is a general parameter associated with the anharmonic mode and a its curvature in the coordinate space: a (z ) = X



z ·L1T M−1/2Q 1





V1,DVR ij =

ℏ2 2hc


∑ g 1/4 i,j=1

1 jij ∂si zyz j z MAA jjk ∂XAτ zz{

ij ∂sj yz jj z jj ∂X zzz τ A k {eq eq


and Wilson’s G matrix elements having the following form: Nat

Gij =

∑ ∑ A=1 τ=x ,y ,z


with g = det(G). For the one-dimensional problem involving the coordinate l, eq 15 can be simplified to138 ;1 = −

ℏ2 d2 , 2hc dl 2

so that

/ 1v(l) = −

ℏ2 d2 + =1(l) 2hc dl 2 (17)

The eigenvalues and eigenfunctions of this Hamiltonian can be computed, for instance, by means of DVR, the basic ideas of which go back to the 1960s.139,140 However, the method became widely popular soon after the pioneering work of Light and others in the 1980s,141−146 and it has been recently reexplored in connection with diffusive problems.147,148 This method is related to the variational basis representation (VBR), in which an operator is represented as a matrix in a truncated basis. The elements of the matrix representation of / 1v(l) in the χVBR basis set are H1, ij = − +

fiijk =


|ψsv⟩ =

DVR employs a pointwise representation of the VBR basis in the coordinate representation: χ DVR = YTχ VBR

mk DVR χ (lk)V1(lk)χjDVR (lk) m(lk) i


f jk ( +δqi) + f jk ( −δqi) − 2f jk (q eq) δqi 2


A recent review on the subject can be found in ref 162. The calculations of vibrational energies can be performed using variational (refs 111, 163−165) or perturbative (refs 112, 117, 118, 120−124, 126, 127, 161, 166−169) approaches. In both cases, the basis set is commonly composed of the eigenfunctions of the harmonic oscillator, so any wave function ψv can be written as

d2 ℏ2 χi VBR 2 χjVBR 2hc dl

ℏ2 χ VBR =1(l) χjVBR 2hc i

where mk is the weight at each of the Nscan points of the scan grid, which depends on the chosen quadrature, and m(lk) is a weight function over the interval of interest. Depending on the type of motion (rotation, stretching, inversion, ...), different DVR basis sets can be employed.144,145,150−153 Further details on DVR can be found in refs 141 and 154. While this technique is well-suited for one-dimensional problems, its extension to multi-dimensional ones is not at all straightforward due to a number of technical aspects.155−158 Promising approaches have been developed in this connection and are already effective for molecules containing up to about six atoms. The same applies to the most sophisticated variational methods (see, e.g., ref 115). For larger molecular systems, such procedures are still unpractical, and a Taylor expansion of the potential energy is more tractable. A first challenge is the calculation of the energy derivatives in eq 8. While, as already mentioned, analytic second derivatives have become fairly common and available for a wide range of QC methods, robust implementations are not generally available for higher derivatives;159−161 therefore, one has to resort to fitting procedures or, more effectively, to finite differences: ÅÄÅ fik ( +δqj) − fik ( −δqj) 1 ÅÅÅÅ f jk ( +δqi) − f jk ( −δqi) fijk = ÅÅ + 3 ÅÅÅ 2δqi 2δqj ÅÇ ÉÑ fij ( +δqk ) − fij ( −δqk ) ÑÑÑ ÑÑ + ÑÑ ÑÑ 2δqk ÑÑÖ (22)


∂ −1/2 ∂ 1/4 g Gij g ∂si ∂sj

Nscan k=1

The one-dimensional vibrational Hamiltonian is then / 1v(l) = ;1(l) + =1(l), with the kinetic energy operator expressed in terms of a generic curvilinear coordinate (s) given by136,137 ;=−


the matrix representation of the potential energy is given by the expression

with L1 being a column of the eigenvector matrix of the massweighted harmonic force constant matrix and M the diagonal matrix of atomic masses. This parametrization132−135 is valid only when the atoms not involved in the vibration can be kept fixed along the scan. A more general representation of the motion is the length of the curve l(z): l(z) =

mi DVR (l i ) χ m (l i ) j

Yij =

∑ Cv|v⟩ v


where |v⟩ represents a vector of occupation numbers for each normal mode of the system. Let us start by addressing the

(19) 8138

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


theory (VPT2),117 the popularity of which derives from the significant improvement it offers over the harmonic approximation at a reasonable overall computational cost. Higher orders have been investigated,126,181−184 but the increased cost is not justified by the marginal accuracy gain. As it is wellknown, the PT2 expressions areat least in one dimension exact for a Morse oscillator, and therefore certainly not correct for the incomplete quartic polynomial development, which, as mentioned above, is the most practical representation of the PES for medium- to large-sized molecules. However, as noticed by Handy and co-workers,185,186 the “Morsification” of the quartic potential often leads to an effective inclusion of nearly exact higher order terms. Thus, the PT2 predictions can be closer to experiment than their higher-order perturbative or even variational counterparts. By analogy with eqs 5 and 8, / v(0) to / v(2) are defined as follows:

variational approach. A variational minimization of the energy starting from the linear expansion given above leads to the secular equation HC = EC. A common way to solve it is the vibrational self-consistent field (VSCF) approach,113,164,170−173 the basic idea of which is analogous to the electronic HF-SCF model, with the significant difference that the anti-symmetrization is not needed here because vibrational modes are distinguishable. The vibrational Hamiltonian is first separated between single-mode and multi-mode terms, / v = ∑i / iv + = coupl , with / iv = ;i + =i = ;i +

1 1 1 ωiqi 2 + fiii qi 3 + f q 4 + ··· 2 6 24 iiii i (25)

and = coupl = = −

∑ =i



The modal functions are then solutions of the equation N

[/ iv

/ v(0) =


+ ⟨ ∑ |=


j=1 j≠i

| ∑ ⟩]|φi⟩ = εi|φi⟩ j=1 j≠i



By defining an effective potential = ieff = =i + ⟨∑ Nj = 1 |= coupl|∑ Nj = 1 ⟩, the previous equation can be simplified j≠i

1 N ∑ ωi(qi 2 + pi 2 ) 2 i=1

/ v(1) =

1 6


fijk qiqjqk


i ,j,k=1


as [;i + = ieff ]|φi⟩ = εi|φi⟩

/ v(2) =


A limitation of the method is that the interactions between the degrees of freedom are only partially included because treated as mean fields, but the method provides a good starting point for more refined approaches like vibrational Møller−Plesset174 or coupled cluster.175 In the vibrational configuration interaction (VCI) method, the vibrational correlation effects are accounted for through the diagonalization of the vibrational Hamiltonian in a finitedimension subspace of the Hilbert space associated with the system.176−178 However, the computational cost and memory requirements grow exponentially with the system size, thus limiting the applicability of VCI to small systems. Several different techniques, also including the Smolyak grid,179,180 have been proposed together with approximate/truncated VCI models (see, e.g., ref 116) to solve the computational issues and will be discussed later in the text. Perturbative methods are generally cheaper and can offer notable improvements over the harmonic approximation, even with relatively compact force fields. The basic idea is to expand from the harmonic solution by treating the anharmonic correction as a perturbation, these approaches thus being particularly well suited for small-amplitude motions (SAMs). Within perturbative approaches, the vibrational Hamiltonian is written as / v = / v(0) + λ / v(1) + λ 2 / v(2) + ···

1 24


fijkl qiqjqk ql +

i ,j ,k ,l=1

∑ τ=x ,y ,z



ζij , τζkl , τ

i ,j,k ,l=1

ωjωl ωiωk

qipj qk pl


For the sake of readability, the “v” superscript is dropped in the following; thus, Hamiltonians and wave functions refer to the vibrational problem. Harmonic quantities will be indicated with the “(0)” superscript (or the occupation number vector v for the wave function where relevant). To solve the Schrödinger equation, /|ψr ⟩ = εr|ψr⟩, two approaches can be followed. In the canonical van Vleck perturbation theory (CVPT),187 the Hamiltonian is transformed through a sequence of unitary operations (U = Uk...U2U1) to obtain an effective, diagonal or block-diagonal Hamiltonian /̃ with the desired properties, which are ⟨ψr(0)|/̃ |ψr(0)⟩ = ⟨ψr|/|ψr ⟩ = εr


That is to say, its eigenfunctions are the harmonic wave functions, and its eigenvalues are the same as the perturbed Hamiltonian. Each unitary transformation Uk can be written as the exponential of a Hermitian matrix Sk:

(29) k

Uk = eiλ sk = 1 + iλ k Sk −

with λ being the perturbation parameter, / the harmonic Hamiltonian, / v(1) the first-order perturbation term, / v(2) the second-order term, and so on up to the desired order. Here, we focus on second-order vibrational perturbation v(0)

λ 2k 2 Sk + ◦(Sk 2) 2


so that the effective Hamiltonian in a matrix representation can be written as 8139

DOI: 10.1021/acs.chemrev.9b00007 Chem. Rev. 2019, 119, 8131−8191

Chemical Reviews


H̃ = UHU −1 =



λ 2k 2yzz (0) Sk zz(H + λ H(1) + λ 2 H(2)) 2 z{

∏ jjjjj1 + iλk Sk − k




|ψr(2)⟩ =


λ 2l 2yzz Sl zz 2 z{

∏ jjjjj1 + iλlSl −



and εr(0) =


+ i[S2 , H(0)]

⟨vr| ∑

Sk is then defined to have H̃ diagonal, except for degenerate or nearly degenerate states. CVPT has been commonly used to derive the vibrational energies, in particular because its eigenfunctions are the same as those of the harmonic oscillator. An alternative formalism is the Rayleigh−Schrödinger perturbation theory (RSPT), where all terms in the vibrational Schrödinger equation are expanded to the second order: εr = εr(0) + λεr(1) + λ 2εr(2)


ψr = ψr(0) + λψr(1) + λ 2ψr(2)





ε0 =






∑ s≠r

⟨vs| ∑ijk εr(0)

fijk 6

qiqjqk |vr ⟩

− εs(0)

∑ Bτeq ∑ ζij ,τζkl ,τ

qiqjqkql +


τ f ijk

⟨vr| ∑ijk


qiqjqk|vt ⟩⟨vt | ∑ijk εr(0)

ωi + 2

4χij = fiijj −

f ijk 6

ωjωl ωiωk

qipj qkpl |vr ⟩










5fiii 2 3ωi

∑ j≠i

2ωi fiij 2

(8ωi 2 − 3ωj 2)fiij 2 ωj(4ωi 2 − ωj 2) 2ωj fijj 2

(52) fiii fijj

f jjj fiij


ÅÄÅ ÅÅ 2ωk (ωi 2 + ωj 2 − ωk 2)fijk 2 ∑ ÅÅÅÅÅ ÅÅÇ (ωi +ωj +ωk )(ωi −ωj −ωk )(ωj −ωi −ωk )(ωk −ωi −ωj) k≠i,j Å ÑÉ fiik f jjk ÑÑÑ 4(ωi 2 + ωj 2) ÑÑÑ + ∑ Bτeq {ζij , τ}2 − Ñ ωk ÑÑÑ ωiωj τ = x ,y ,z (53) ÑÖ

The coefficients (c(k) s ) and the perturbative contributions to the vibrational energy (ε r(k)) are obtained by multiplying (0) respectively by ⟨ψ(0) s | and ⟨ψr |, thus leading to |ψr(1)⟩


16χii = fiiii −









ÄÅ ÉÑ ÅÅ f f ÑÑ fijk 2 ÅÅ iik jjk ÑÑ ÑÑ − + ∑ ÅÅÅÅ 48(ωi + ωj + ωk) ÑÑÑÑ i ,j,k=1 Å ÅÅÇ 32ωk ÑÖ ÄÅ É 2Ñ N Ñ eq Å Å Ñ (ω − ω ) Ñ B Å ∑ τ ÅÅÅÅÅ1 − ∑ {ζij ,τ}2 i j ÑÑÑÑÑ ωiωj ÑÑ 4 ÅÅ τ i