Probing Basis Set Requirements for Calculating Core Ionization and

Nov 29, 2018 - We investigate the basis set requirements for calculating properties corresponding to removing core electrons by the SCF approach using...
0 downloads 0 Views 2MB Size
Subscriber access provided by La Trobe University Library

Quantum Electronic Structure

Probing Basis Set Requirements for Calculating Core Ionization and Core Excitation Spectroscopy by the #SCF Approach Maximilien A. Ambroise, and Frank Jensen J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b01071 • Publication Date (Web): 29 Nov 2018 Downloaded from http://pubs.acs.org on December 2, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Probing Basis Set Requirements for Calculating Core Ionization and Core Excitation Spectroscopy by the SCF Approach. Maximilien A. Ambroise and Frank Jensen Department of Chemistry, Aarhus University DK-8000 Aarhus, Denmark

Abstract We investigate the basis set requirements for calculating properties corresponding to removing core electrons by the SCF approach using Hartree-Fock and density functional theory. Standard contracted basis sets are shown to produce large errors and the improved performance of core-augmented basis sets is traced to the fact that the core-augmenting functions effectively creates an auxiliary set of uncontracted function in the core region. We propose two specific basis sets of double and triple zeta quality based on exponent interpolation of the polarization consistent basis sets, denoted pcX-1 and pcX-2, that display significantly lower basis set errors compared to other alternatives. These are suitable for both nonrelativistic and relativistic calculations of the Douglas-Kroll-Hess type, with typical basis set errors of 0.1 and 0.01 eV, respectively, and they can be used in a mixed basis set approach with only a minor degradation in performance. The versions augmented with diffuse functions (aug-pcX-1 and aug-pcX-2) are shown to perform better than other alternatives for X-ray absorption spectroscopy. When used in connection with range-separated hybrid density functional methods and relativistic corrections, the pcXn basis sets can in favorable cases reproduce experimental results to within a few tenths of an eV.

1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 44

Introduction Gaussian basis sets have been used extensively for performing molecular electronic structure calculations, and many different basis sets are available.1-3 Basis sets should ideally be available in welldefined quality levels such that the basis set incompleteness can be quantified and controlled, and systematically improved towards the complete basis set (CBS) limit. The basis set quality is commonly classified by a Double/Triple/Quadruple etc. Zeta notation, which in the original use indicated the number of contracted s- and p-functions, but is currently taken to reflect the highest angular momentum function included.4 Modern Gaussian basis sets are often optimized for specific models (e.g. correlated wave function or Density Functional Theory (DFT)) in order to increase the computational efficiency and provide a stable convergence towards the CBS limit. Most of the commonly employed basis sets aim at providing accurate energies, and these basis sets are also suitable for energy derivatives, such as gradients for determining equilibrium structures and second derivatives for calculating harmonic vibrational frequencies. Molecular properties, however, may have additional basis set requirements, and a better basis set convergence may be obtained by tailoring basis sets to specific properties. Molecular properties can in many cases be defined as derivatives of an energy or quasi-energy with respect to perturbations such as electric and magnetic fields. Electric fields correspond to a modification of the potential energy in the Hamiltonian, and primarily affect the electron density in the weakly bound region far from the nuclei. Augmenting standard basis sets with diffuse functions thus significantly improves the performance for properties such as electric dipole and higher order multipole moments, and polarizabilities and hyperpolarizabilities.5 Magnetic fields correspond to a modification of the kinetic energy in the Hamiltonian, and may lead to different basis set requirements. Nuclear magnetic shielding constants, for example, are sensitive to a good representation of the orbitals in the inner-valence region

2 ACS Paragon Plus Environment

Page 3 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and basis sets with an improved representation for the p-functions and a reduced contraction have been shown to provide a faster basis set convergence.6-7 Nuclear spin-spin coupling constants have four individual contributions, of which three are sensitive to a good representation of the orbitals near or at the nuclear position, and this requires addition of basis functions with large exponents and significant uncontraction of the basis set in order to achieve a satisfactory basis set convergence.8-12 Optical rotation constants, which are mixed electric-magnetic derivatives, in contrast only require augmentation with diffuse basis functions in order to provide a stable basis set convergence.13 In the present case we investigate how the polarization consistent basis sets (pc-n) can be tailored to provide a stable and systematic convergence towards the CBS limit for molecular properties related to core-excitations,14 and these will be denoted pcX-n (polarization consistent basis sets for X-ray spectroscopy). The simplest example of a core-excitation process is X-ray photoionization spectroscopy (XPS), where an electron is removed from a core orbital, and the quantity of interest is the energy difference between the neutral and ionized species. The XPS is sensitive to the bonding environment and can similarly to nuclear magnetic shielding constants be used as a spectroscopic tool. X-ray absorption spectroscopy (XAS) denotes processes where the core electron is excited to an empty valence or Rydberg orbital. XAS is commonly divided into two regimes, Near-Edge X-ray Absorption Fine Structure (NEXAFS) for excitations below the ionization threshold, and Extended X-ray Fine Structure (EXAFS) when the photon energy is far above the ionization potential of the core electron. While XPS and XAS explore the dynamics of core-hole formation, X-ray emission spectroscopy (XES) and Auger Electron Spectroscopy (AES) explore the paths by which the core-hole can decay. XES and AES are complementary techniques to XAS, and crucial for correct interpretation of molecules involving multiple near-degenerate core levels. All of these properties can be considered as energy differences, and do not

3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 44

directly involve perturbing electric or magnetic fields, but the fact that they involve ionization of core orbitals implies that standard basis sets will not be optimum. We note that multi-resolution methods are alternative approaches capable of very high accuracy and alleviates the user from selecting a proper basis set for the problem at hand, but these methods are in a development phase.15-16

Methodology Core-level spectroscopy was advocated by Siegbahn et al.17-18 in the 1960s and considerable effort has been put into reproducing experimental core ionization and excitation energies using quantum chemical methods at varying levels of theory. The core-electron binding energy (CEBE) can be defined as the difference between the energy of the ground state E0 and the hole state E+. Each of these energies can formally be written as a sum of a Hartree-Fock (HF), an electron correlation and a relativistic contribution. E0 = E0HF + E0corr + E0rel #(1) + + + E + = EHF + Ecorr + Erel #(2)

+ The bar notation indicates that it is the energy of the relaxed core-hole state, and EHF can be decomposed + into two terms, EHF + ER+ , where the first term denotes the energy of the unrelaxed (frozen) core-hole

state and the second term arise from orbital relaxation. The CEBE of the i'th core-electron can thus be written as in eq. (3). + + + ΔEi = (EHF ― E0HF) + ER+ + (Ecorr ― E0corr) + (Erel ― E0rel) = ― ϵi + ER+ + ΔEcorr + ΔErel #(3)

4 ACS Paragon Plus Environment

Page 5 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The difference in the HF energy using frozen orbitals is by virtue of Koopmans' theorem19 the orbital energy ϵi. The simplest approach ignores the last three terms in eq. (3) and takes the orbital energy as the predicted CEBE, but this is known to be a poor estimate.20 The Δ self-consistent-field method (ΔSCF) includes orbital relaxation effects by separate optimizations of the ground and the core-hole states, and calculates the CEBE as the difference between total electronic energies, and this method is widely used. The first ΔSCF calculations were performed using HF,21-22 which ignores electron correlation and relativistic effects, but the inclusion of the relaxation energy allowed reproduction of experimental results with an accuracy of a few eV.21 Inclusion of the electron correlation contribution, however, is necessary for achieving an accuracy that is sufficient for addressing the chemical XPS shift, i.e. the environmental effect of CEBE for the same chemical element. The ΔSCF method can be extended to post-HF methods, such as Møller-Plesset perturbation theory (ΔMPn),23 multi-configurational self-consistent-field, configuration interaction (ΔMCSCF, ΔCI)24 and coupled cluster (ΔCCSD, ΔCCSD(T))23 methods, but these methods significantly increase the required computational resources, and achieving convergence of the highly excited hole state using coupled cluster methods can be difficult. The electron correlation contribution can alternatively be estimated by DFT, which often offers the best compromise in terms of speed, accuracy and applicability to larger molecules, and this has been denoted as ΔDFT, DFT-ΔSCF or ΔKS (Kohn-Sham). First introduced by Triguero et al.,25 the ΔKS method has rapidly become the most popular member of the ΔSCF family.2627

Extensive studies have been performed to identify the most accurate functional23 and the most efficient

basis sets28-34 by comparing to a suitable database, and the ΔKS method can achieve accuracies on the order of a few tenths of an eV. Relativistic effects are relatively small for second row elements, but become significant for elements in the third row and beyond in the periodic table. They are to a good

5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 44

approximation independent on the chemical environment and can therefore be included by an elementspecific additive constant, or calculated by for example the Douglas-Kroll-Hess method terminated at second order (DKH2).35-37 While the ΔSCF method is comparably simple and general, there are technical challenges in localizing and optimizing the core-hole state. Convergence failure of the SCF procedure and variational collapse to the lowest valence-hole state are common problems, as well as core-hole hopping between multiple neardegenerate core levels for different atoms with the same nuclear charge. These problems can to some extent be alleviated by constructing the initial guess from localized MOs of the ground state and removing an electron from the desired (localized) core orbital. Besley et al. have proposed the maximum overlap method,27 where the orbital occupancies in each iteration are determined by the best overlap between the old and new orbitals, rather than using the aufbau principle. The maximum overlap method reduces the risk of variational collapse and is crucial for converging core excited states of larger molecules. The corehole hopping problem has been addressed by using mixed basis sets, where the core ionization site is described by an all-electron basis set while an effective core potential (ECP) is used for the remaining atoms. This approach has been denoted ΔSCF/ECP38 and by construction only allows core-hole states corresponding to the atom of interest. An alternative method is to use a fragment-oriented approach, where the core-hole is optimized separately from the rest of the molecule.26 The ΔSCF method has, in addition to the above-mentioned technical issues, a number of other disadvantages for especially XAS: (1) Each transition energy from the core orbital to an unoccupied orbitals needs to be calculated separately, i.e. each excited state requires a separate wave function optimization. Although this is trivially parallelizable, it can become a major bottleneck for extended molecules with multiple inequivalent excitation sites, especially if large basis sets are used. (2) The ΔSCF

6 ACS Paragon Plus Environment

Page 7 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

is a single-determinant method and assumes that a transition can be represented by an excitation involving only two orbitals. The separate optimization of each excited state implies that they in general are non-orthogonal, and there is often considerable overlap between determinants corresponding to high and low energy excited states.39-41 The ΔSCF method has therefore been argued to only be applicable for low-lying states.42 (3) When used in connection with DFT, it is unclear whether there exists a variational principle for excited states.43 Triguero et al.25 have argued that, due to the similarity between the corehole states and their 'equivalent core' species (e.g. the core-ionized states of CO mimic those of CF+ and NO+, which are ground states) one can assume applicability of the Kohn-Sham theorem for the coreionized species as well. (4) The transition intensities cannot be computed directly but need to be calculated using Fermi's Golden Rule.44 Furthermore, to allow a comparison with experimental XAS spectra, the calculated transition energies must be convoluted by for example Gaussian functions to account for the finite experimental resolution and lifetime of the core hole. (5) The core-excited state calculated in an unrestricted HF or KS formalism is not a spin-pure singlet, but a mixture of singlet and triplet states. This spin contamination can be alleviated by applying Ziegler's spin purification formula45 shown in eq. (4), where ES and ET are the energies of the pure singlet and triplet states, respectively. ES = 2 Emixed ― ET #(4) Using this procedure, however, requires an additional calculation of the triplet state for each excited state and thus roughly doubles the computational effort. Even with the above-mentioned drawbacks the ΔKS remains a convenient and accurate tool for computing and interpreting XAS spectra.38, 46

Results and Discussion

7 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 44

The main focus of the present paper is to investigate the requirements for a systematic and efficient control of the basis set errors for core-hole spectroscopy using the ΔSCF approach. Previous work have in most cases employed standard basis sets, typically from either the Pople47-48 (6-31G and 6-311G with choices of polarization), Karlsruhe49 (Def2-SVP, -TZP, -QZV) or Dunning-Peterson50-52 hierarchies (ccpVXZ, cc-pCVXZ). Reducing the basis set errors to below ~0.1 eV has in several cases been found to require large basis sets with core-augmentation, like the cc-pCV5Z basis set, which is surprising for a ΔSCF approach where only the electron density needs to be represented. The underlying physical phenomenon for anticipating that standard basis sets may be sub-optimum is the requirement of achieving an error balance between two quite different situations, a stable neutral species and a highly excited coreionized state. The latter will for the valence electrons appear almost as if the nuclear charge has been increased by one atomic unit, and the remaining core electron will also experience an increased effective nuclear charge. This is unproblematic in the CBS limit, but the goal is to obtain a useful error balance for basis sets of DZP or TZP quality. The error balance argument suggests that near-optimum basis function exponents for describing a core-ionization of an atom with atomic number Z may be functions corresponding to a nuclear charge of Z+½, which in practice can be obtained by interpolation of basis functions for atoms with nuclear charges of Z and Z+1. The necessity of describing the contraction of the valence orbitals upon ionization furthermore suggests that the basis set should be less contracted than regular basis sets. Figure 1 shows the ΔSCF calculated XPS for C and O in CH3OH at the HF level with the uncontracted pc-1 and pc-2 (upc-1 and upc-2) basis sets53-54 as a function of the exponent interpolation parameter α, where the CBS limit is taken as the result obtained with the upc-4 basis set. It is seen that using basis set exponents corresponding to a nuclear charge of Z tend to underestimate the CBS value while basis set

8 ACS Paragon Plus Environment

Page 9 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

exponents corresponding to a nuclear charge of Z+1 tend to overestimate the CBS value, and this was found to be a general trend. The value of the interpolation parameter which reproduces the CBS result depends on the specific system and slightly on whether the interpolation is done arithmetically or geometrically, but for all our test cases, the use of exponents assigned by a geometrical averaging of basis function exponents corresponding to Z and Z+1 leads to lower errors than using basis functions corresponding to a nuclear charge of Z or Z+1. At the quadruple zeta level (upc-3) the basis set is approaching completeness and there is no systematic improvements by interpolating the exponents. We will thus define uncontracted pcX-1 and pcX-2 basis sets from the corresponding upc-n by interpolation (or extrapolation for the last element in the row of the periodic table) of the basis function exponents. We note that we have tested whether improved results can be obtained by adding s-, p- or d-functions with larger exponents to the upc-1 or upc-2 basis sets, but this was in general found not to give a better performance. Basis set contraction is always a compromise between improving computational efficiency and increasing basis set errors. The underlying physical phenomenon for contraction is that basis functions primarily describing orbitals that are almost constants between the situations probed can be contracted by a fixed set of coefficients. The necessity of allowing both the core and valence orbitals to change upon core ionization suggests that standard basis sets are too strongly contracted. We will define the maximum allowed contraction error as the residual error in the uncontracted basis set, defined as the error compared to the CBS limit.55 Using a general contraction scheme, where all primitive basis functions are allowed to contribute to a given contracted function, this straightforwardly provides the optimum contraction for a given zeta-level.53 Figure 2 shows the contraction errors for the pcX-1 and pcX-2 basis sets as a function of degree of s-function contraction at the HF level for CH3OH, and it is clear that basis set contraction

9 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 44

rapidly leads to unacceptable errors and destroys the inherent accuracy of the uncontracted pcX-1 and pcX-2 basis sets. Only the three or four inner-most s-functions can be contracted without loss of accuracy, and this was found to be a general trend, even for the isolated atoms. Figure 3 shows corresponding pcX-1 and pcX-2 results for SiS at the HF level, and it is seen that the maximum s-contraction for thirdrow elements is four or five of the inner-most functions, which is only marginally larger than for secondrow elements. The contraction coefficients cx in Figures 2 and 3 for an atom with nuclear charge Z are taken as the interpolated HF orbital coefficients of the isolated atoms of nuclear charge Z and Z+1, such that cx = 0.5*(cz + cz+1), but the same conclusions regarding the maximum contraction is obtained at the DKH2 level (using DKH2 orbital coefficients) or different DFT methods (using DFT orbital coefficients). A major caveat, however, is that the contraction coefficients are sufficiently different for different methods that it is not possible to define a common contraction that produces acceptable (small) errors across a selection of methods. An acceptable contraction for a specific exchange-correlation functional, for example, gives unacceptable errors when used for another exchange-correlation functional, and this would imply that a plethora of method-specific contracted pcX-n basis sets would be required. Figures 2 and 3 show that the pcX-n basis sets in any case can only be weakly contracted, which implies that only a small improvement in computational efficiency can be achieved by contraction anyway, and the conclusion is thus that the pcX-n basis sets should be used in their uncontracted forms. The composition is therefore identical to uncontracted pc-n basis sets,56 i.e. pcX-1 is 8s5p1d and 12s9p1d for second and third row elements, respectively, and pcX-2 is 11s7p2d1f and 15s11p2d1f, respectively.

Benchmarking 1. XPS

10 ACS Paragon Plus Environment

Page 11 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

DFT is often the method of choice for many applications but the optimum exchange-correlation functional usually depends on the specific property. Table 1 shows the performance of different functionals at the CBS limit (obtained by extrapolating the upc-2, upc-3 and upc-4 results)53 for a set of 39 CEBEs (Table 2) where experimental values are available.57 The comparison between the DFT and DFT + DKH2 results show that the inclusion of relativistic effects is crucial for obtaining accurate CEBEs and the performance of all functionals improves when including the DKH2 correction. The best results for the current set of data is provided by the range-separated hybrid functionals CAM-B3LYP58 and ωB97X-D,59 while the range-separated non-hybrid LC-BLYP and the hybrid B3LYP/BHandHLYP functionals give less accurate results. This suggests that both inclusion of a fraction of HF exchange as well as making this fraction dependent on the electron-electron distance is important for corespectroscopy. Similar conclusions have been drawn for calculating core properties using TDDFT.60 The experimental CEBE values in Table 2 are taken from Ref. 57 which is a compilation from 1984 with typical uncertainties of ± 0.1 eV, estimated by comparing XPS results from different groups, but it still remains one of the most comprehensive set of reference data. More recent XPS results often are often derived from surface chemistry of metal alloys and there is a distinct lack of accurate values for smaller systems containing 2nd and 3rd row elements that can be used for calibration. Holme et al. have recently created a CEBE database of small molecules with typical accuracies of ± 20 meV but they are restricted to carbon ionization energies.23 2nd row elements Table 3 compares basis set errors for the 39 CEBE in Table 2 where the CBS results are obtained by extrapolating the upc-2, upc-3 and upc-4 results,53 and Figure 4 displays the AAD results for the Pople,

11 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 44

cc-pVXZ, cc-pCVXZ, pc-n, upc-n and pcX-n basis sets graphically. The basis set dependence is essentially independent on the exchange-correlation functional and only the CAM-B3LYP results are shown, while B3LYP, PBE, ωB97X-D and HF results are available in the supplementary material. Table 3 shows that standard basis sets of DZP or TZP quality (6-31G*, 6-311G*, cc-pVDZ, cc-pVTZ, pc-1, pc-2) have significant errors (> 0.4 eV), and a comparison with the upc-1 and upc-2 results indicates that a large part of the error is due to basis set contraction. This confirms the finding in the previous section that basis set contraction makes the basis set too inflexible to describe core-ionized states. The error for the pcX-1 and pcX-2 basis sets is substantially reduced compared to the non-interpolated upc-1,-2 basis sets and they also outperform other basis sets of DZP or TZP quality, while the non-interpolated upc-3 basis set already provides results essentially at the basis set limit. The basis set convergence with the ccpVXZ hierarchy of basis sets is slow, but it is significantly improved by the corresponding coreaugmented cc-pCVXZ. This was initially puzzling, but can be understood by noting that coreaugmentation adds uncontracted s- and p-functions with large exponents (1s1p, 2s2p1d, 3s3p2d1f, 4s4p3d2f1g for X = D,T,Q,5, respectively) and these functions together with the contracted functions of the cc-pVXZ basis set allow a much improved representation of the change in the orbitals upon coreionization. The largest exponent of the added functions increases in the X=D,T,Q,5 sequence, and at the T and Q levels become large enough that they have a large overlap with the important region of the 1sorbital. The cc-pCVTZ basis set thus has 6 s- and 5 p-functions that can combine freely, which is comparable to the pcX-1 with 8 s- and 5 p-functions, and these two basis sets have a similar performance. In terms of total functions, however, the cc-pCVTZ has 43 functions per atom compared to only 13 for pcX-1. The better performance of the cc-pCVXZ basis sets compared to cc-pVXZ is therefore an effect of the augmentation with uncontracted basis functions counteracting the errors introduced by the contraction of the cc-pVXZ basis sets. It should be noted that this error compensation is not complete, 12 ACS Paragon Plus Environment

Page 13 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and the cc-pCV5Z basis set tend to overshoot the CBS limit as can be seen by the sign of the mean deviation, especially if relativistic effects are included. The IGLO II and III basis sets perform surprisingly well, given that they were developed for calculating nuclear magnetic shielding constants. A closer inspection shows that the IGLO-II lies between DZ and TZ in terms of basis set quality in the s- and p-function space, while the IGLO-III lies between TZ and QZ in quality. They furthermore have a lower degree of contraction compared to other standard basis sets. This conforms our findings for pcX-1 and pcX-2 that the quality of the s- and p-type functions is more important than including high angular polarization functions, and that basis set contraction should be avoided or kept low. During the course of this work, Hanson-Heine et al. proposed to use dual-core basis sets where the (contracted) core function for the Z+1 element is added to the regular basis set for the element with nuclear charge Z, and this is denoted as (Z+1)cc-pVXZ when used in connection with the cc-pVXZ basis sets.34 We have in addition included results using dual-core basis sets based on the pcseg-n basis sets,55 which are denoted as (Z+1)pcseg-n. The underlying idea in the dual-core method is similar to our exponent interpolation, namely to ensure that the basis set is able to provide a balanced representation of both the ground and core-excited states. Adding only one core-function, however, neglects the requirement of allowing also the valence orbitals to change their shape, which our pcX-n approach incorporates. The dual-core method improves the results compared to the corresponding standard basis sets, but the results are inferior to the corresponding pcX-n counterparts, which shows that relaxation of the valence orbitals makes an important contribution. 3rd row elements Systems with 3rd row elements have an additional shell of core electrons and core ionization can consequently occur from either a K, L1, L2, or L3 shell, where K denotes generation of a hole in the 1s-

13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 44

orbital, L1 denotes generation of a hole in the 2s-orbital, while L2 and L3 denote removal of an electron from the 2p½- and 2p3/2-orbitals, respectively. In a non-relativistic or scalar-only relativistic calculation the latter two orbitals are degenerate, and benchmarking for 3rd row elements is in the present case restricted to K and L1 core-excitations. Table 4 shows the 16 molecular systems, with a total of 25 CEBEs, chosen for basis set calibration of 3rd row elements. Both 3rd row and 2nd row ionization centres are included in the test set, in order to additionally test the performance of the 3rd row element basis sets when acting as spectator atoms next to a 2nd row ionization centre. The basis set errors for the K-shell core-excitations are shown in Table 5, with Figure 5 displaying the AAD results for the Pople, cc-pVXZ, cc-pCVXZ, pc-n, upc-n and pcX-n basis sets graphically. It is noticeable that the upc-4 results now differ visibly from the CBS extrapolated results, and the latter may therefore contain residual basis set errors in the milli-eV range. The maximum errors are obtained for the metal hydrides NaH, MgH2 and AlH3, where the lack of sufficiently diffuse functions on hydrogen is the main limitation. Standard contracted basis sets in general result in unacceptable errors regardless of zeta quality. Of the standard basis sets, only the cc-pCVQZ and ccpCV5Z produce useful results at the non-relativistic level, but these results are equalled by the pcX-1 and pcX-2, respectively, and the latter contain significantly fewer basis functions. When relativistic corrections are included, the cc-pCVQZ and cc-pCV5Z basis sets systematically underestimate the CEBE, most likely due to the contraction coefficients being taken from non-relativistic calculations, while the pcX-1 and pcX-2 basis sets provide an accuracy similar to the non-relativistic case. The uncontracted upc-n basis set has a similar performance as for 2nd row elements and display the fastest convergence of all non-interpolated basis set. Exponent interpolation significantly decreases the basis set error, and the pcX-2 has the second lowest basis set error of all the tested basis sets, only topped

14 ACS Paragon Plus Environment

Page 15 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

by the uncontracted quintuple-zeta quality basis set upc-4. The dual-core basis sets are a large improvement over standard basis sets, but the pcX-1 and pcX-2 again perform significantly better. Table 6 shows the basis set errors for the L1-excitation in the 16 systems in Table 4, where 2nd row element centres are left out, as they only have one core-electron shell, leading to a total of 16 CEBEs, and Figure 6 displays the AAD results for the Pople, cc-pVXZ, cc-pCVXZ, pc-n, upc-n and pcX-n basis sets graphically. Achieving wave function convergence was significantly more difficult for these systems and variational collapse was common, especially when using contracted basis sets. The cc-pVXZ basis sets here tend to perform better than the cc-pCVXZ, but appears to systematically underestimate the CEBE and only the cc-pCV5Z basis set provides acceptable results at both the non-relativistic and relativistic levels. The pcX-1 and pcX-2 again outperform all other tested basis sets. Calibration of the basis set errors for the L2 and L3 ionization is not presented as this would require inclusion of spin-orbit effects, but there is good reason to believe that the pcX-n basis sets would be suitable for these ionizations as well.

2. XAS Calculating XAS spectra is less straight-forward than calculating XPS when using the ΔSCF method, due to peak broadening and the necessity to use additional methods to compute the peak intensity. The basis set requirements for creating the core-hole, however, is expected to be very similar to XPS discussed in the previous section, but the presence of the excited electron in a (previously) unoccupied, ground-state molecular orbital is expected to require addition of a set of diffuse functions to the basis set. We define aug-pcX-n basis sets by adding a set of diffuse functions to the pcX-n basis sets, where the diffuse functions are taken from the corresponding (interpolated) aug-pc-n.61 We will in the present case

15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 44

only consider the lowest excitations corresponding to moving an electron from the core to the lowest unoccupied orbital. Large basis sets, like aug-cc-pV5Z, aug-pcseg-n or aug-upc-4, are plagued by convergence issues due to the many diffuse functions leading to oscillations in the SCF procedure. Tuning certain SCF parameters, dropping linear combinations of basis set functions and using the maximum overlap method alleviated some but not all of the problems, and non-converged cases are given as zero in the supplementary material. The results shown in Tables 9 and 10 are thus based on averages over a slightly reduced set of compounds. The core-excited singlet state is, as mentioned in the introduction, spin-contaminated but an improved energy estimate can be obtained by using the spin-purification formula in eq. (4). Table 7 shows the performance of the ΔSCF method for the systems in Table 8 for different exchange-correlation functionals including the DKH2 correction and the aug-upc-4 basis set. Spin-purification improves the results for all the tested functionals, and the two range-separated hybrid functionals CAM-B3LYP and ωB97X-D again deliver the best accuracy with AAD values comparable to the XPS results in Table 1. Table 9 shows the XAS basis set errors for the systems given in Table 8, where the focus is on the effects of adding diffuse functions to the cc-pCVXZ and upc-n basis sets and the CBS limit is estimated by extrapolation of the aug-upc-2, -3, -4 results. The spin purification formula has not been used as only the basis set effect is probed. There is in general a significant improvement by adding diffuse functions, but the contraction error in most cases dominates for the standard basis sets. It is expected that including diffuse functions will be even more important for higher core excitation energies. The aug-pcX-1 and aug-pcX-2 basis sets again perform similar or better than aug-cc-pCVTZ and aug-cc-pCVQZ, respectively.

16 ACS Paragon Plus Environment

Page 17 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3rd row elements For the basis set calibration for 3rd row elements, a smaller subset of the systems in Table 4 was used (bold entries were excluded). Achieving convergence for the excited states was problematic, and the use of the maximum overlap method was essential, but variational collapse was inevitable when using the aug-cc-pCVQZ basis set, and these results are consequently omitted. The XAS basis set errors are shown in Table 10 and they display the same behaviour as for the 2nd row elements. Adding diffuse functions to the pc-n, pcX-n or cc-pCVXZ basis sets significantly increases the accuracy, and the aug-pcX-n basis sets again perform quite well.

Local Basis Set Approximation The necessity of leaving the pcX-n basis sets uncontracted means that these are computationally less efficient than the regular contracted pcseg-n basis sets.55 Core ionization, however, is a strongly localized quantity and it is therefore of interest to investigate whether the pcX-n basis sets can be used as locallydense basis set, i.e. using pcX-n on only the ionization centre and pcseg-n on the remaining atoms, in analogy with the strategy used for calculating nuclear magnetic shielding constants.62 One could argue that using pcX-n basis sets on all atoms is inconsistent, as the interpolated basis set exponents will be non-optimum for atoms not involved in the core-ionization, but the results in Tables 3, 5, 6, 9 and 10 show that this is not a severe problem. Table 11 shows a comparison between using pcX-n on all atoms and using the mixed pcX-n/pcseg-n approach for core ionization and excitation energies for 2nd row and 3rd row elements using the benchmarking systems in Tables 2, 4 and 8, and it is seen that the mixed basis set approach leads to basis set errors very similar to those obtained by using pcX-n on all atoms.

17 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 44

As an example of applying this method for a larger system we have calculated all 11 XPS values for guanine (Figure 7). Table 12 shows the AAD and MaxAD of different basis sets for computing the CEBEs for all ionization centres of guanine compared to the CBS limit using the ωB97X-D functional. The general behaviour is very similar to the results in Table 3. The use of the mixed pcX-n/pcseg-n basis sets leads to a small deterioration of the result, but the pcX-2/pcseg-2 combination will in most cases provide sufficiently low basis set errors. We note that the pcX-2/pcseg-1 combination (results not shown) leads to significantly worse results, and the use of mixed basis sets at different zeta-levels is therefore not recommended. The calculated values can be compared to experimental results (Table 13) by adding an element-specific relativistic constant with the results given in Table 12. The constants have been calculated by Chong et al.63 with the results of Perkeris’ study of two-electron atoms.64 The AAD at the basis set limit is 0.19 eV, while basis sets like cc-pV5Z show a slightly lower AAD of 0.17 eV due to method and basis set error cancellations. The cc-pCVXZ basis set hierarchy does not converge smoothly towards the CBS value, with the error increasing from cc-pCVQZ to cc-pCV5Z. ΔKS generally underestimates the experimental values at the CBS limit with the ωB97X-D functional. The SCF approach requires two separate calculations for each CEBE (three if spin-purification is employed for XAS) and these must necessarily be carried out with the same basis set. If several coreexcitations are desired for the same system (as in the guanine example above), the use of a mixed basis set approach increases the total number of calculations. Using a pcX-n basis set on all atoms will for a system with N core-ionizations require N+1 separate calculations (one for the ground state and one for each of the core-hole states), while a mixed pcX-n/pcseg-n basis set approach will require 2N separate calculations (one ground and one core-hole state for each basis set combination). For a system with many 18 ACS Paragon Plus Environment

Page 19 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

core-excitations the computational savings by using the mixed basis set approach may thus be offset by the requirement of performing N-1 additional calculations. We have probed the computationally most efficient approach by performing calculations for alkanes, poly-alkenes and fused aromatic systems up to octane/octatetracene and antracene, with the results provided as supplementary material. If only a single, or a few, CEBEs are desired, the mixed pcXn/pcseg-n basis set approach leads to computational savings by ~30% and ~50% for n = 1 and 2, respectively, compared to using pcX-n for all atoms. If all the CEBEs are desired, however, there is essentially no savings at the n = 1 level, and the saving is reduced to ~30% at the n = 2 level by the mixed basis set approach. If relativistic effects are included by the DKH2 approach, the recommendation is to use the pcX-n basis set on all atoms.

Computational Details All ΔSCF calculations have been performed with the GAMESS-US (18 AUG 2016, R1) program package.65 The systems have been optimized at the B3LYP/6-31G* level using the Gaussian-09 (Revision D.01) program package.66 Core ionization/excitation energies have been calculated by first determining the wave function for neutral species and localizing all the orbitals using either the FosterBoys67 or the Edmiston-Ruedenberg68 scheme. The Foster-Boys method often fails to localize the core orbitals, especially for 3rd row elements and beyond, while the Edmiston-Ruedenberg method is more stable. The (localized) orbitals are reordered to make the desired hole-orbital the last occupied, and this is then used as a starting guess for the core-ionized or core-excited species. Doublet core-hole states are calculated using the UHF formalism. For small molecules this procedure in most cases converges without problems but the maximum overlap method and second order convergence scheme may be required for

19 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 44

larger systems. All systems have been calculated without symmetry, as the introduction of a core-hole often reduces the point group of the system.

Summary The present work defines a family of basis sets, pcX-n, that in a systematic way can approach the basis set limit for core-excitation processes such as XPS and XAS calculated by the SCF approach. They are shown to provide lower basis set errors than other basis sets of similar size, and can be used as local basis sets in combination with the energy optimized pcseg-n basis sets for reducing computational costs when core-excitation processes are only required for a (small) subset of atoms. At the polarized triple zeta level (pcX-2) they have typical basis set errors of ~0.02 eV, at both non-relativistic and relativistic levels of theory when using density functional methods. We note that the basis set behavior when using response methods for calculating core-excitations may be different and the present findings may be valid only when using the SCF approach.

Corresponding Author [email protected]

ORCID: 0000-0002-4576-5838 The authors declare no competing financial interest

Acknowledgment. This work was supported by Grant No. 4181-00030B from the Danish Council for Independent Research.

20 ACS Paragon Plus Environment

Page 21 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Supporting Information Available. Tables containing raw results and pcX-n basis sets. This information is available free of charge via the Internet at http://pubs.acs.org/. Table 1. Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for the data set in Table 2 as a function of DFT method using the extrapolated CBS estimate, where relativistic results are at the DKH2 level (eV). Non-Relativistic Functional MD AAD MaxAD CAM-B3LYP 0.31 0.38 1.01 0.32 0.40 1.19 B97X-D BLYP 0.46 0.49 1.34 B3LYP 0.59 0.59 1.34 BHandHLYP 0.19 0.43 1.20 PBELYP 0.76 0.76 1.55 LC-BLYP 1.15 1.15 1.74 PBE 1.36 1.36 2.25

MD 0.05 0.02 0.19 0.28 0.40 0.45 0.88 1.16

Relativistic AAD MaxAD 0.17 0.69 0.21 0.58 0.25 0.72 0.29 0.85 0.43 1.30 0.45 1.01 0.88 1.08 1.16 1.72

Table 2. Experimental core binding energies used for benchmarking (eV).57 Ion. Centre

CEBE (eV) Ion. Centre

BF3

202.80

C2H4 CH4 C2H2 CH3NH2 CH3OH CH3CN CH3OCH3 HCN CH3CN CH3F HCONH2 H2CO HCOOH

290.82 290.91 291.14 291.60 292.42 292.68 292.91 293.40 292.98 293.70 294.45 294.47 295.90

CEBE (eV) Ion. Centre CEBE (eV)

CO CO2 CF4

296.21 297.69 301.90

CH3NH2 NH3 CH3CN HCONH2 HCN N2 NF3

405.15 405.56 405.64 406.41 406.78 409.98 414.20

HCONH2 CH3OCH3 HCOOH

537.74 538.33 538.93

CH3OH H2CO H2O HCOOH CO2 CO

539.11 539.48 539.90 540.65 541.28 542.55

CH3F HF NF3 BF3 CF4 F2

692.91 694.23 694.45 694.80 695.56 696.69

21 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

22 ACS Paragon Plus Environment

Page 22 of 44

Page 23 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 3. Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for the data set in Table 2 as a function of basis set using the CAM-B3LYP method and upc2,-3,-4 extrapolated results as reference, where relativistic results are at the DKH2 level (eV). The AAD results are illustrated in Figure 4. Non-Relativistic Basis set MD AAD MaxAD 631G* 2.897 2.897 3.627 6311G* 0.487 0.487 0.746 cc-pVDZ 2.898 2.898 3.573 cc-pVTZ 0.420 0.420 0.703 cc-pVQZ 0.077 0.077 0.138 cc-pV5Z 0.022 0.022 0.068 cc-pCVDZ 1.339 1.339 1.811 cc-pCVTZ 0.060 0.062 0.156 cc-pCVQZ 0.001 0.013 0.053 cc-pCV5Z -0.001 0.004 0.044 IGLO II 0.095 0.101 0.219 IGLO III 0.046 0.046 0.140 pc-1 2.391 2.391 2.899 pc-2 1.446 1.446 1.969 pc-3 0.080 0.080 0.130 pc-4 0.013 0.013 0.065 upc-1 0.281 0.281 0.568 upc-2 0.025 0.025 0.072 upc-3 0.002 0.002 0.051 upc-4 0.000 0.000 0.000 pcX-1 -0.031 0.076 0.276 pcX-2 0.007 0.014 0.061 (Z+1)cc-pVDZ 0.117 0.122 0.286 (Z+1)cc-pVTZ 0.022 0.024 0.065 (Z+1)pcseg-1 0.081 0.088 0.224 (Z+1)pcseg-2 0.035 0.036 0.069

Relativistic MD AAD MaxAD 2.975 2.975 3.707 0.413 0.413 0.612 2.967 2.967 3.735 0.382 0.382 0.646 -0.006 0.025 0.091 -0.043 0.043 0.124 1.274 1.274 1.691 0.024 0.037 0.074 -0.020 0.021 0.072 -0.019 0.019 0.051 0.035 0.067 0.177 0.011 0.034 0.133 2.480 2.480 3.086 1.497 1.497 2.119 0.014 0.031 0.052 -0.027 0.027 0.082 0.275 0.275 0.562 0.022 0.023 0.070 0.000 0.001 0.002 0.000 0.000 0.000 -0.036 0.077 0.279 0.004 0.015 0.060 0.120 0.121 0.361 0.171 0.172 0.463 0.130 0.131 0.468 0.127 0.127 0.371

23 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 4. Molecular systems containing 3rd row elements selected for basis set calibration. Systems marked in bold are not included in the benchmarking for 3rd row K-edge excitations (XAS). NaH MgH2 AlH3 SiH4

SiS HSiP PH3 H2S

H2CS CH3SH SO2 HCl

Cl2 H2C2Cl2 CH3Cl HCOCl

Table 5. Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for K-edge excitations for the systems in Table 4 as a function of basis set using the CAMB3LYP method and upc-2,-3,-4 extrapolated results as reference, where relativistic results are at the DKH2 level (eV). The AAD results are illustrated in Figure 5. Non-Relativistic Relativistic Basis set MD AAD MaxAD MD AAD MaxAD 631G* 14.068 14.068 22.133 14.402 14.402 22.615 6311G* 2.556 2.556 8.928 1.592 1.938 7.198 cc-pVDZ 15.074 15.074 22.311 16.195 16.195 22.769 cc-pVTZ 7.621 7.621 16.892 7.843 7.843 17.243 cc-pVQZ 7.855 7.855 16.526 7.622 7.624 16.900 cc-pV5Z 1.921 1.921 9.165 1.717 1.727 11.209 cc-pCVDZ 2.174 2.174 6.987 1.180 1.280 6.825 cc-pCVTZ 0.225 0.225 1.282 -0.784 0.876 2.097 cc-pCVQZ 0.040 0.041 0.378 -0.814 0.815 1.846 cc-pCV5Z 0.024 0.027 0.243 -0.483 0.494 0.982 pc-1 12.397 12.397 20.920 11.175 12.650 21.407 pc-2 8.388 8.388 13.792 7.852 9.387 16.888 pc-3 6.154 6.154 9.075 6.308 6.308 9.665 pc-4 5.349 5.349 8.961 4.618 6.384 19.374 upc-1 0.409 0.409 4.821 0.416 0.416 5.003 upc-2 -0.090 0.173 3.098 -0.107 0.186 3.069 upc-3 0.024 0.025 0.512 0.027 0.027 0.519 upc-4 0.009 0.009 0.164 0.008 0.008 0.168 pcX-1 0.044 0.078 0.240 0.035 0.062 0.243 pcX-2 0.013 0.029 0.207 0.018 0.027 0.211 pcX-3 0.007 0.011 0.165 0.008 0.011 0.169 (Z+1)cc-pVDZ 0.136 0.145 0.409 -0.467 0.594 1.512 (Z+1)cc-pVTZ 0.090 0.100 0.320 -0.809 0.871 1.990 (Z+1)pcseg-1 0.214 0.072 0.652 -0.594 0.710 1.555 (Z+1)pcseg-2 0.093 0.021 0.347 -0.604 0.628 1.535

24 ACS Paragon Plus Environment

Page 24 of 44

Page 25 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 6. Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for L1-edge excitations for the systems in Table 4 as a function of basis set using the CAMB3LYP method and upc-2,-3,-4 extrapolated results as reference, where relativistic results are at the DKH2 level (eV). The AAD results are illustrated in Figure 6. Non-Relativistic Relativistic Basis set MD AAD MaxAD MD AAD MaxAD 631G* 2.294 2.294 2.877 2.434 2.434 3.322 6311G* 0.475 0.475 1.159 0.455 0.455 0.925 cc-pVDZ 2.054 2.054 2.573 2.251 2.251 2.976 cc-pVTZ 0.663 0.663 1.090 0.693 0.693 1.278 cc-pVQZ 0.694 0.694 0.868 0.796 0.796 1.255 cc-pV5Z 0.095 0.095 0.188 0.032 0.046 0.280 cc-pCVDZ 0.563 0.563 0.961 0.310 0.328 0.709 cc-pCVTZ 0.042 0.042 0.123 -0.020 0.055 0.111 cc-pCVQZ -0.002 0.006 0.020 -0.082 0.082 0.128 cc-pCV5Z -0.003 0.004 0.023 -0.052 0.052 0.079 pc-1 1.578 1.578 2.177 1.674 1.674 2.417 pc-2 0.631 0.631 0.997 0.775 0.775 1.427 pc-3 0.306 0.306 0.500 0.401 0.401 0.873 pc-4 0.163 0.163 0.284 0.236 0.236 0.609 upc-1 0.207 0.207 0.733 0.198 0.198 0.702 upc-2 0.057 0.057 0.168 0.049 0.049 0.121 upc-3 0.009 0.009 0.051 0.003 0.003 0.018 upc-4 0.000 0.000 0.003 0.001 0.001 0.008 pcX-1 0.083 0.086 0.181 0.104 0.107 0.476 pcX-2 0.019 0.022 0.098 0.016 0.019 0.053

25 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 44

Table 7. Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for the XAS data set in Table 8 as a function of DFT method using the aug-upc-4 basis set (eV), where relativistic results are included at the DKH2 level, and with or without the use of the spincorrrection formula in eq. (4). Functional CAM-B3LYP B97X-D B3LYP PBE HF

No spin correction With spin correction MD AAD MaxAD MD AAD MaxAD 0.41 0.44 1.04 0.18 0.26 0.57 0.16 0.32 0.86 -0.03 0.22 0.80 0.44 0.46 1.08 0.23 0.26 0.62 1.27 1.27 2.16 1.08 1.08 1.05 0.14 0.34 1.00 -0.20 0.41 2.01

Table 8. Experimental XAS data for comparing the performance of the methods in Table 7 (eV). System Excitation Energy (eV) Ref. System Excitation Energy (eV) Ref. 69 69 C2H4 284.69 N2 401.09 70 71 C2H2 285.71 HCONH2 401.95 72 HCN 286.37 73 74 CH4 287.02 O2 529.53 75 76 CO 287.42 HCONH2 531.63 77 78 CH3NH2 287.70 CH3OH 533.78 78 79 CH3OH 287.81 H2O 533.85 76 75 HCONH2 288.09 CO 534.23 69 69 CO2 290.77 CO2 535.34 HCN NH3 CH3NH2

399.70 400.60 400.78

72 77

F2 HF

682.20 687.40

77

26 ACS Paragon Plus Environment

80 80

Page 27 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 9. Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for XAS for the systems in Table 8 as a function of basis set using the CAM-B3LYP method and aug-upc-2,-3,-4 extrapolated results as reference, where relativistic results are at the DKH2 level (eV). Non-Relativistic Relativistic Basis set MD AAD MaxAD MD AAD MaxAD 6-31G* 3.538 3.538 4.256 3.612 3.612 4.368 6-311G* 0.720 0.720 1.169 0.651 0.651 1.131 6-31+G* 3.256 3.256 4.260 6-311+G* 0.614 0.614 0.878 cc-pCVDZ 1.833 1.833 2.462 1.784 1.784 2.342 cc-pCVTZ 0.270 0.270 0.681 0.238 0.238 0.666 cc-pCVQZ 0.124 0.124 0.397 0.098 0.099 0.389 cc-pCV5Z 0.054 0.054 0.226 0.043 0.053 0.220 aug-cc-pCVDZ 1.500 1.500 2.139 1.441 1.441 2.022 aug-cc-pCVTZ 0.130 0.130 0.203 0.094 0.094 0.120 aug-cc-pCVQZ 0.095 0.095 0.145 0.005 0.007 0.021 aug-cc-pCV5Z 0.002 0.002 0.004 -0.013 0.014 0.038 upc-1 0.508 0.508 1.001 0.502 0.502 0.997 upc-2 0.100 0.100 0.305 0.099 0.099 0.304 upc-3 0.019 0.019 0.086 0.020 0.021 0.085 upc-4 0.004 0.004 0.020 0.007 0.007 0.030 aug-upc-1 0.338 0.338 0.509 0.332 0.332 0.504 aug-upc-2 0.025 0.025 0.036 0.025 0.025 0.040 aug-upc-3 0.001 0.001 0.004 0.002 0.002 0.008 aug-upc-4 0.000 0.000 0.000 0.000 0.000 0.000 pcX-1 0.289 0.291 0.921 0.279 0.282 0.918 pcX-2 0.109 0.111 0.361 0.102 0.104 0.360 aug-pcX-1 0.111 0.111 0.245 0.111 0.111 0.243 aug-pcX-2 0.013 0.013 0.022 0.013 0.013 0.024

27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 10 Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for XAS for the systems mentioned in Table 4 (excluding the bold entries) as a function of basis set using the CAM-B3LYP method and aug-upc-2,-3,-4 extrapolated results as reference (eV). 6-31G* 6-311G* cc-pCVDZ cc-pCVTZ cc-pCVQZ cc-pCV5Z aug-cc-pCVDZ aug-cc-pCVTZ aug-cc-pCV5Z pc-1 pc-2 pc-3 pc-4 aug-pc-1 aug-pc-2 aug-pc-3 aug-pc-4 pcX-1 pcX-2 aug-pcX-1 aug-pcX-2

13.336 2.431 1.902 0.187 0.019 1.167 1.867 0.156 -0.022 0.200 0.266 -0.046 0.002 0.143 0.019 -0.008 -0.036 0.084 0.029 0.021 -0.007

Non-relativistic 13.462 2.481 1.902 0.193 0.059 1.281 1.930 0.178 0.037 0.200 0.292 0.064 0.004 0.229 0.054 0.023 0.060 0.112 0.062 0.099 0.041

18.792 8.931 4.067 0.578 0.177 18.017 3.873 0.520 0.249 0.584 4.531 0.532 0.015 0.478 0.218 0.175 0.454 0.434 0.168 0.391 0.232

28 ACS Paragon Plus Environment

Page 28 of 44

Page 29 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 11. Mean deviation (MD), average absolute deviation (AAD) and maximum absolute deviation (MaxAD) for XPS/XAS, comparing the complete basis set approach from the benchmarkings in Tables 2, 4 and 8 and the local basis set approach, as a function of basis set using the CAM-B3LYP method and upc-2,-3,-4 (XPS) or aug-upc-2,-3,-4 (XAS) extrapolated results as reference (eV).

pcX-1 pcX-2 2nd row (XPS) pcX-1 (loc) pcX-2 (loc) pcX-1 pcX-2 3rd row (K) (XPS) pcX-1 (loc) pcX-2 (loc) pcX-1 pcX-2 3rd row (L1) (XPS) pcX-1 (loc) pcX-2 (loc) aug-pcX-1 aug-pcX-2 2nd row (XAS) aug-pcX-1 (loc) aug-pcX-2 (loc) aug-pcX-1 aug-pcX-2 3rd row (K) (XAS) aug-pcX-1 (loc) aug-pcX-2 (loc)

Non-relativistic MAD AAD MaxAD -0.031 0.076 0.276 0.007 0.014 0.061 0.021 0.076 0.233 0.009 0.022 0.107 0.044 0.078 0.240 0.013 0.029 0.207 0.074 0.099 0.348 0.017 0.029 0.207 0.083 0.086 0.181 0.019 0.022 0.098 0.084 0.087 0.173 0.011 0.014 0.042 0.111 0.111 0.245 0.013 0.013 0.022 0.127 0.127 0.253 0.013 0.013 0.021 0.021 0.099 0.391 -0.007 0.041 0.232 0.007 0.116 0.464 -0.013 0.038 0.230

29 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 44

Table 12. Average absolute deviation (AAD) and maximum absolute deviation for (MaxAD) for 11 XPS values in guanine (Figure 7) as a function of basis set using the B97X-D method and upc-2,-3,-4 extrapolated results, and experimental values (Table 13) as reference (eV). The data comparing with the experimental results include an element specific relativistic correction of 0.10, 0.21, 0.39 eV for C, N and O respectively. CBS limit AAD MaxAD 6-31G* 2.644 3.366 6-311G* 0.423 0.505 cc-pVDZ 2.556 3.094 cc-pVTZ 0.311 0.476 cc-pVQZ 0.057 0.091 cc-pV5Z 0.023 0.028 cc-pCVDZ 1.100 1.497 cc-pCVTZ 0.017 0.040 cc-pCVQZ 0.010 0.020 cc-pCV5Z 0.003 0.005 IGLO-II 0.147 0.199 IGLO-III 0.055 0.064 upc-1 0.368 0.486 upc-2 0.031 0.051 upc-3 0.002 0.003 upc-4 0.000 0.000 pc-1 2.225 2.605 pc-2 1.135 1.656 pc-3 0.075 0.083 pc-4 0.015 0.015 pcX-1 0.132 0.190 pcX-2 0.022 0.048 pcX-1 (loc) 0.238 0.321 pcX-2 (loc) 0.019 0.041 (Z+1)cc-pVDZ 0.348 2.604 (Z+1)cc-pVTZ 0.026 0.042 (Z+1)pcseg-1 0.219 0.315 (Z+1)pcseg-2 0.040 0.065 CBS limit 0.000 0.000

Exp. Values AAD MaxAD 2.645 2.999 0.479 0.856 2.557 2.850 0.314 0.701 0.200 0.493 0.170 0.375 1.100 1.396 0.199 0.443 0.195 0.436 0.226 0.445 0.230 0.598 0.216 0.511 0.369 0.800 0.202 0.478 0.195 0.451 0.194 0.449 2.226 2.592 1.135 1.442 0.191 0.516 0.196 0.463 0.225 0.568 0.202 0.471 0.276 0.670 0.202 0.470 0.487 2.612 0.202 0.430 0.273 0.645 0.205 0.486 0.194 0.449

30 ACS Paragon Plus Environment

Page 31 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 13 Experimental XPS values for guanine.81 Centre CEBE (eV) O11 536.7 N1 406.3 N7 406.3 N10 406.3 N9 404.5 N4 404.5 C8 293.8 C6 293.8 C2 292.1 C5 292.1 C3 290.9

Figure captions: Figure 1. The basis set error as a function of the interpolation parameter  for the C and O XPS in CH3OH with the upc-1 and upc-2 basis sets. Figure 2. The basis set contraction error for the C and O XPS in CH3OH with the upc-1 and upc-2 basis sets as a function of number of contracted s-functions. Figure 3. The basis set contraction error for the Si and S XPS in SiS with the upc-1 and upc-2 basis sets as a function of number of contracted s-functions. Figure 4. Average absolute deviations (Table 3) for K-edge XPS basis set errors for the systems in Table 2. Figure 5. Average absolute deviations (Table 5) for K-edge XPS basis set errors for the systems in Table 4. Figure 6. Average absolute deviations (Table 6) for L1-edge XPS basis set errors for the systems in Table 4. Figure 7. Numbering scheme for the guanine results in Table 12. 31 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 44

References 1. Hill, J. G., Gaussian basis sets for molecular applications. International Journal of Quantum Chemistry 2013, 113, 21-34. 2. Jensen, F., Atomic orbital basis sets. Wiley Interdisciplinary Reviews-Computational Molecular Science 2013, 3, 273-295. 3. Nagy, B.; Jensen, F., Reviews in Computational Chemistry 2017, 30, 93. 4. Weigend, F.; Furche, F.; Ahlrichs, R., Gaussian basis sets of quadruple zeta valence quality for atoms HKr. Journal of Chemical Physics 2003, 119, 12753-12762. 5. Rappoport, D.; Furche, F., Property-optimized Gaussian basis sets for molecular response calculations. Journal of Chemical Physics 2010, 133, 134105. 6. Jensen, F., Basis set convergence of nuclear magnetic shielding constants calculated by density functional methods. Journal of Chemical Theory and Computation 2008, 4, 719-727. 7. Jensen, F., Segmented Contracted Basis Sets Optimized for Nuclear Magnetic Shielding. Journal of Chemical Theory and Computation 2015, 11, 132-138. 8. Provasi, P. F.; Sauer, S. P. A., Optimized basis sets for the calculation of indirect nuclear spin-spin coupling constants involving the atoms B, Al, Si, P, and Cl. Journal of Chemical Physics 2010, 133, 054308. 9. Provasi, P. F.; Aucar, G. A.; Sauer, S. P. A., The effect of lone pairs and electronegativity on the indirect nuclear spin-spin coupling constants in CH2X (X=CH2, NH, O, S): Ab initio calculations using optimized contracted basis sets. Journal of Chemical Physics 2001, 115, 1324-1334. 10. Jensen, F., The basis set convergence of spin-spin coupling constants calculated by density functional methods. J. Chem. Theor. Comp. 2006, 2, 1360-1369. 11. Jensen, F., The optimum contraction of basis sets for calculating spin-spin coupling constants. Theoretical Chemistry Accounts 2010, 126, 371-382. 12. Benedikt, U.; Auer, A. A.; Jensen, F., Optimization of augmentation functions for correlated calculations of spin-spin coupling constants and related properties. Journal of Chemical Physics 2008, 129, 064111. 13. Hedegard, E. D.; Jensen, F.; Kongsted, J., Basis Set Recommendations for DFT Calculations of Gas-Phase Optical Rotation at Different Wavelengths. Journal of Chemical Theory and Computation 2012, 8, 4425-4433. 14. Norman, P.; Dreuw, A., Simulating X-ray Spectroscopies and Calculating Core-Excited States of Molecules. Chemical Reviews 2018, 118, 7208-7248. 15. Jensen, S. R.; Saha, S.; Flores-Livas, J. A.; Huhn, W.; Blum, V.; Goedecker, S.; Frediani, L., The Elephant in the Room of Density Functional Theory Calculations. Journal of Physical Chemistry Letters 2017, 8, 1449-1457. 16. Parkkinen, P.; Xu, W. H.; Solala, E.; Sundholm, D., Density Functional Theory under the Bubbles and Cube Numerical Framework. Journal of Chemical Theory and Computation 2018, 14, 4237-4245. 17. Siegbahn, K.; Nordling, C., Nov. Act. Uppsaliensis 1967. 18. Travnikova, O.; Borve, K. J.; Patanen, M.; Soderstrom, J.; Miron, C.; Saethre, L. J.; Martensson, N.; Svensson, S., The ESCA molecule-Historical remarks and new results. Journal of Electron Spectroscopy and Related Phenomena 2012, 185, 191-197. 19. Koopmans, T., Über die Zuordnung von Wellenfunktionen und Eigenwerten zu den Einzelnen Elektronen Eines Atoms. Physica 1934, 1, 104-113. 20. Schwartz, M. E., Correlation of 1s binding energy with the average quantum mechanical potential at A nucleus. Chemical Physics Letters 1970, 6, 631-636. 21. Bagus, P. S., Self-consistent-field wave functions for hole states of some Ne-like and Ar-like ions Physical Review 1965, 139, A619.

32 ACS Paragon Plus Environment

Page 33 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

22. Deutsch, P. W.; Curtiss, L. A., Ab-initio calculation of K-shell excitation and ionization energies of CH4, NH3, H2O, and HF. Chemical Physics Letters 1976, 39, 588-592. 23. Holme, A.; Borve, K. J.; Saethre, L. J.; Thomas, T. D., Accuracy of Calculated Chemical Shifts in Carbon 1s Ionization Energies from Single-Reference ab Initio Methods and Density Functional Theory. Journal of Chemical Theory and Computation 2011, 7, 4104-4114. 24. Schio, L.; Li, C.; Monti, S.; Salen, P.; Yatsyna, V.; Feifel, R.; Alagia, M.; Richter, R.; Falcinelli, S.; Stranges, S.; Zhaunerchyk, V.; Carravetta, V., NEXAFS and XPS studies of nitrosyl chloride. Physical Chemistry Chemical Physics 2015, 17, 9040-9048. 25. Triguero, L.; Plashkevych, O.; Pettersson, L. G. M.; Agren, H., Separate state vs. transition state KohnSham calculations of X-ray photoelectron binding energies and chemical shifts. Journal of Electron Spectroscopy and Related Phenomena 1999, 104, 195-207. 26. Takahata, Y.; Chong, D. P., DFT calculation of core-electron binding energies. Journal of Electron Spectroscopy and Related Phenomena 2003, 133, 69-76. 27. Besley, N. A.; Gilbert, A. T. B.; Gill, P. M. W., Self-consistent-field calculations of core excited states. Journal of Chemical Physics 2009, 130, 124308. 28. Cavigliasso, G.; Chong, D. P., Accurate density-functional calculation of core-electron binding energies by a total-energy difference approach. Journal of Chemical Physics 1999, 111, 9485-9492. 29. Carniato, S.; Millie, P., Accurate core electron binding energy calculations using small 6-31G and TZV core hole optimized basis sets. Journal of Chemical Physics 2002, 116, 3521-3532. 30. Chong, D. P.; Takahata, Y., Density functional theory calculation of electron spectra of formaldehyde. Chemical Physics Letters 2006, 418, 286-291. 31. Tolbatov, I.; Chipman, D. M., Performance of density functionals for computation of core electron binding energies in first-row hydrides and glycine. Theoretical Chemistry Accounts 2014, 133, 1473. 32. Tolbatov, I.; Chipman, D. M., Benchmarking density functionals and Gaussian basis sets for calculation of core-electron binding energies in amino acids. Theoretical Chemistry Accounts 2017, 136, 82. 33. Fouda, A. E. A.; Besley, N. A., Assessment of basis sets for density functional theory-based calculations of core-electron spectroscopies. Theoretical Chemistry Accounts 2017, 137, 6. 34. Hanson-Heine, M. W. D.; George, M. W.; Besley, N. A., Basis sets for the calculation of core-electron binding energies. Chemical Physics Letters 2018, 699, 279-285. 35. Douglas, M.; Kroll, N. M., Quantum electrodynamical corrections to the fine-structure of helium Annals of Physics 1974, 82, 89-155. 36. Hess, B. A., Relativistic electronic-structure calculations employing a 2-component no-pair formalism with external-field projection operators. Physical Review A 1986, 33, 3742-3748. 37. Jansen, G.; Hess, B. A., Revision of the Douglass-Kroll transformation. Physical Review A 1989, 39, 60166017. 38. Klues, M.; Hermann, K.; Witte, G., Analysis of the near-edge X-ray-absorption fine-structure of anthracene: A combined theoretical and experimental study. Journal of Chemical Physics 2014, 140, 014302. 39. Davidson, E. R., Single-configuration calculations on excited states of helium. Journal of Chemical Physics 1964, 41, 656. 40. Davidson, E. R., Single-configuration calculations on excited states of helium. 2. Journal of Chemical Physics 1965, 42, 4199. 41. Gilbert, A. T. B.; Besley, N. A.; Gill, P. M. W., Self-Consistent Field Calculations of Excited States Using the Maximum Overlap Method (MOM). Journal of Physical Chemistry A 2008, 112, 13164-13171. 42. Zhekova, H. R.; Seth, M.; Ziegler, T., A Perspective on the Relative Merits of Time-Dependent and TimeIndependent Density Functional Theory in Studies of the Electron Spectra Due to Transition Metal Complexes.

33 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 44

An Illustration Through Applications to Copper Tetrachloride and Plastocyanin. International Journal of Quantum Chemistry 2014, 114, 1019-1029. 43. Gaudoin, R.; Burke, K., Lack of Hohenberg-Kohn theorem for excited states. Physical Review Letters 2004, 93. 44. Groot, F. d.; Kotani, A., Core Level Spectroscopy of Solids. CRC Press: 2008. 45. Ziegler, T.; Rauk, A.; Baerends, E. J., Calculation of multiplet energies by Hartree-Fock-Slater method. Theoretica Chimica Acta 1977, 43, 261-271. 46. Klues, M.; Jerabek, P.; Breuer, T.; Oehzelt, M.; Hermann, K.; Berger, R.; Witte, G., Understanding the F 1s NEXAFS Dichroism in Fluorinated Organic Semiconductors. Journal of Physical Chemistry C 2016, 120, 1269312705. 47. Hehre, W. J.; Ditchfield, R.; Pople, J. A., Self-consisten molecular-orbital methods. 12. Further extensions of Gaussian-type basis sets for use in molecular-orbital studies of organic-molecules. Journal of Chemical Physics 1972, 56, 2257-2261. 48. Krishnan, R.; Binkley, J. S.; Seeger, R.; Pople, J. A., Self-consistent molecular-orbital methods. 20. Basis set for correlated wave-functions. Journal of Chemical Physics 1980, 72, 650-654. 49. Weigend, F.; Ahlrichs, R., Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Physical Chemistry Chemical Physics 2005, 7, 3297-3305. 50. Dunning, T. H., Gaussian-basis sets for use in correlated molecular calculations. 1. The atoms boron through neon and hydrogen. Journal of Chemical Physics 1989, 90, 1007-1023. 51. Dunning, T. H.; Peterson, K. A.; Wilson, A. K., Gaussian basis sets for use in correlated molecular calculations. X. The atoms aluminum through argon revisited. Journal of Chemical Physics 2001, 114, 9244-9253. 52. Woon, D. E.; Dunning, T. H., Gaussian-basis sets for use in correlated molecular calculations. 5. Corevalence basis-sets for boron through neon. Journal of Chemical Physics 1995, 103, 4572-4585. 53. Jensen, F., Polarization consistent basis sets: Principles. Journal of Chemical Physics 2001, 115, 91139125. 54. Jensen, F., Polarization consistent basis sets: Principles (vol 115, pg 9113, 2001). Journal of Chemical Physics 2002, 116, 3502-3502. 55. Jensen, F., Unifying General and Segmented Contracted Basis Sets. Segmented Polarization Consistent Basis Sets. Journal of Chemical Theory and Computation 2014, 10, 1074-1085. 56. Jensen, F., Polarization consistent basis sets. II. Estimating the Kohn-Sham basis set limit. Journal of Chemical Physics 2002, 116, 7372-7379. 57. Jolly, W. L.; Bomben, K. D.; Eyermann, C. J., Core-electron binding-energies for gaseous atoms and molecules. Atomic Data and Nuclear Data Tables 1984, 31, 433-493. 58. Yanai, T.; Tew, D. P.; Handy, N. C., A new hybrid exchange-correlation functional using the Coulombattenuating method (CAM-B3LYP). Chemical Physics Letters 2004, 393, 51-57. 59. Chai, J.-D.; Head-Gordon, M., Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections. Physical Chemistry Chemical Physics 2008, 10, 6615-6620. 60. Song, J. W.; Watson, M. A.; Nakata, A.; Hirao, K., Core-excitation energy calculations with a long-range corrected hybrid exchange-correlation functional including a short-range Gaussian attenuation (LCgau-BOP). Journal of Chemical Physics 2008, 129, 184113. 61. Jensen, F., Polarization consistent basis sets. III. The importance of diffuse functions. Journal of Chemical Physics 2002, 117, 9234-9240. 62. Reid, D. M.; Kobayashi, R.; Collins, M. A., Systematic Study of Locally Dense Basis Sets for NMR Shielding Constants. Journal of Chemical Theory and Computation 2014, 10, 146-152.

34 ACS Paragon Plus Environment

Page 35 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

63. Mukherjee, P. K.; Chong, D. P., Ab-initio calculation of core-electron binding-energies in small molecules. Chemical Physics Letters 1985, 120, 163-166. 64. Pekeris, C. L., Ground state of 2-electron atoms. Physical Review 1958, 112, 1649-1658. 65. Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A., General Atomic and Molecular Electronic-Structure System. Journal of Computational Chemistry 1993, 14, 1347-1363. 66. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; J. A. Montgomery, J.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J. M.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox;, D. J.; Gaussian Inc. Gaussian-09 rev.D-01, 2009. 67. Foster, J. M.; Boys, S. F., Canonical configurational interaction procedure. Reviews of Modern Physics 1960, 32, 300-302. 68. Edmiston, C.; Ruedenbe.K, Localized atomic and molecular orbitals. 2. Journal of Chemical Physics 1965, 43, S097. 69. McLaren, R.; Clark, S. A. C.; Ishii, I.; Hitchcock, A. P., Absolute oscillator-stengths from K-shell electronenergy-loss spectra of the fluoroethenes and 1,3-perfluorobutadieneE. Physical Review A 1987, 36, 1683-1701. 70. Hitchcock, A. P.; Brion, C. E., Carbon K-shell excitation of C2H2,C2H4,C2H6 and C6H6 by 2.5KeV electronimpact. Journal of Electron Spectroscopy and Related Phenomena 1977, 10, 317-330. 71. Robin, M. B.; Ishii, I.; McLaren, R.; Hitchcock, A. P., Fluorination effects on the inner-shell spectra of unsaturated molecules Journal of Electron Spectroscopy and Related Phenomena 1988, 47, 53-92. 72. Hitchcock, A. P.; Brion, C. E., Inner shell electron-energy loss studies of HCN and C2N2. Chemical Physics 1979, 37, 319-331. 73. Hitchcock, A. P.; Ishii, I., Carbon K-shell excitation-spectra of linear and branched alkanes. Journal of Electron Spectroscopy and Related Phenomena 1987, 42, 11-26. 74. Hitchcock, A. P., http://unicorn.chemistry.mcmaster.ca/corex/aph-version/cedb-frame.html. 75. Hitchcock, A. P.; Brion, C. E., K-shell excitation-spectra of CO, N2 and O2. Journal of Electron Spectroscopy and Related Phenomena 1980, 18, 1-21. 76. Ishii, I.; Hitchcock, A. P., A quantitative experimental-study of the core excited electronic states of formamide, formic-acid, and formyl fluoride. Journal of Chemical Physics 1987, 87, 830-839. 77. Wight, G. R.; Brion, C. E., K-shell excitations of CH4, NH3, H2O, CH3OH, CH3OCH3 and CH3NH2 by 2.5 KeV electron-impact. Journal of Electron Spectroscopy and Related Phenomena 1974, 4, 25-42. 78. Ishii, I.; Hitchcock, A. P., The oscillator-strengths for C1s and O1s excitation of some saturated and unsaturated organic alcohols, acids and esters. Journal of Electron Spectroscopy and Related Phenomena 1988, 46, 55-84. 79. Ishii, I.; McLaren, R.; Hitchcock, A. P.; Robin, M. B., Inner-shell excitations in weak-bond molecules. Journal of Chemical Physics 1987, 87, 4344-4360. 80. Hitchcock, A. P.; Brion, C. E., K-shell excitation-spectra of HF and F2 studied by electron energy-loss spectroscopy. Journal of Physics B-Atomic Molecular and Optical Physics 1981, 14, 4399-4413.

35 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 44

81. Plekan, O.; Feyer, V.; Richter, R.; Coreno, M.; Vail-Ilosera, G.; Prince, K. C.; Trofimov, A. B.; Zaytseva, I. L.; Moskovskaya, T. E.; Gromov, E. V.; Schirmer, J., An Experimental and Theoretical Core-Level Study of Tautomerism in Guanine. Journal of Physical Chemistry A 2009, 113, 9376-9385.

36 ACS Paragon Plus Environment

Page 37 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The basis set error as a function of the interpolation parameter α for the C and O XPS in CH3OH with the upc-1 and upc-2 basis sets 218x165mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. The basis set contraction error for the C and O XPS in CH3OH with the upc-1 and upc-2 basis sets as a function of number of contracted s-functions. 218x166mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 44

Page 39 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 3. The basis set contraction error for the Si and S XPS in SiS with the upc-1 and upc-2 basis sets as a function of number of contracted s-functions. 219x166mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Average absolute deviations (Table 3) for K-edge XPS basis set errors for the systems in Table 2. 190x190mm (144 x 144 DPI)

ACS Paragon Plus Environment

Page 40 of 44

Page 41 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Average absolute deviations (Table 5) for K-edge XPS basis set errors for the systems in Table 4. 190x190mm (144 x 144 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Average absolute deviations (Table 6) for L1-edge XPS basis set errors for the systems in Table 4. 190x190mm (144 x 144 DPI)

ACS Paragon Plus Environment

Page 42 of 44

Page 43 of 44 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 7. Numbering scheme for the guanine results in Table 12. 1174x736mm (72 x 72 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC 190x190mm (144 x 144 DPI)

ACS Paragon Plus Environment

Page 44 of 44