Assessment of pKa Determination for Monocarboxylic Acids

1 day ago - The solvation effect was included considering the continuous SMD solvation model; SMD and one explicit water molecule; and SMD, one water ...
0 downloads 0 Views 469KB Size
Subscriber access provided by Nottingham Trent University

A: New Tools and Methods in Experiment and Theory

Assessment of pKa Determination for Monocarboxylic Acids With An Accurate Theoretical Composite Method: G4cep Cleuton de Souza Silva, and Rogério Custodio J. Phys. Chem. A, Just Accepted Manuscript • DOI: 10.1021/acs.jpca.9b05380 • Publication Date (Web): 04 Sep 2019 Downloaded from pubs.acs.org on September 5, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

ASSESSMENT OF pKa DETERMINATION FOR MONOCARBOXYLIC ACIDS WITH AN ACCURATE THEORETICAL COMPOSITE METHOD: G4CEP

Cleuton de Souza Silva a,b and Rogério Custodio 1,a

a. Instituto de Química, Universidade Estadual de Campinas, Barão Geraldo, 13083-970 Campinas – São Paulo, Brazil, P. O. Box 6154 b. Instituto de Ciências Exatas e Tecnologia, Universidade Federal do Amazonas, Campus de Itacoatiara, 69100-021 Itacoatiara – Amazonas, Brazil

1

Corresponding author: e-mail: [email protected], Tel.: +55-19-35213104; fax: +55-1935213023

1 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

KEYWORDS 1. G4CEP 2. SMD 3. pKa 4. Monocarboxylic acids 5. Density functional theory

2 ACS Paragon Plus Environment

Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Abstract

For 22 monoprotic acids, pKa values were calculated using the G4CEP composite theory. The solvation effect was included considering the continuous SMD solvation model; SMD and one explicit water molecule; and SMD, one water molecule, and linear correction with respect to the experimental pKa values. The three tests provided mean absolute errors equal to 0.83, 0.51, and 0.30 pKa units, respectively, indicating excellent performance of the G4CEP method.

Comparison with density

functional theory at the B3LYP and BMK levels showed that these results are quickly obtained but with a significant error. The best performance of the functionals was obtained from the combination of SMD, one explicit water molecule, linear regression correction and basis set including diffuse functions. However, the dispersion of the results with DFT can lead to deviations of up to two pKa units, whereas for G4CEP the largest deviations seldom exceed one pKa unit.

3 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

1. Introduction Carboxylic acids are one of the classes of organic compounds most frequently mentioned in nature 1–6. The chemical reactivity of these substances is dominated by the positive character of the carboxylic carbon atom and the stabilization of the group by resonance when the proton is lost. These two factors contribute to both the acidity and the group's dominant chemical reactivity. 1–6 The pKa determination of these substances in aqueous solutions are fundamental for the understanding of a wide range of systems of chemical and biochemical importance

7–10.

Despite the applicability of this property, obtaining reliable

theoretical data is still a challenge. One of the simplest ways to simulate theoretically the solvent effect is through continuous solvation methods

11–14.

Such methods have been used to calculate pKa in

solution using electronic structure methods. Studies of this nature use thermodynamic cycles by combining the equilibrium reaction of deprotonation in the gas phase and solution 7–10 to estimate the Gibbs energies of the solvated components involved. Another technique for pKa calculations considers the use of explicit solvent molecules

7,8.

Therefore, solute-solvent interactions are calculated by considering

hundreds of explicit solvent molecules that interact with each of the species involved in a given system

7,8,

being used with good results to predict pK values in aqueous

solutions, especially for systems dependent on local solute-solvent interactions. The drawback of this method lies in the high computational cost due to the use of various solvent molecules. One of the first works that used composite methods to calculate pKa values was developed by Liptak and Shields 9 for eleven carboxylic acids and continuous solvation methods combined with the Gaussian n (Gn) and complete basis set (CBS) composite methods.9,10 In the following years, other authors used composite methods to calculate pKa values and some authors performed tests using density functional theory (DFT), such as Schmidt and Knapp,10 who performed calculations for four carboxylic acids and other organic molecules, obtaining an accuracy of 0.69 pKa units. Riojas and Wilson15 developed a method called Solv-ccCA, combining the ccCa composite method with a continuous representation of the solvent, referred to as the solvation model density (SMD) method, applying to nitrogen containing compounds. This combination of methods showed a mean absolute error of 1.0 pKa unit for 19 compounds. The Gn theory

16–21

comprises a set of composite methods most widely used in

the literature. Its most common applications are in calculations of enthalpies of formation, ionization potentials, electron affinities, and atomization energies. In recent years, the 4 ACS Paragon Plus Environment

Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

compact effective pseudopotential (CEP) has been combined with some of the alternatives of the Gn theory, providing methods as accurate as the all-electron original versions but more economical computationally. Examples of such adaptations are found in the literature, such as G3CEP,

22,23

G3(MP2)//B3LYP-CEP,24 G3(MP2)-CEP,25

G4CEP,26 and G3X-CEP.27 In addition to the calculation of thermochemical properties, some of these methods have been applied to studies of internal rotation barriers,28 phenol nitration,29 and evaluation of Diels-Alder reactions.30 Among these methods, the most rigorous results are obtained by the G4CEP method, which achieves a mean absolute error on the order of 1.1 kcal mol–1 for a set of 440 experimental data associated with the thermochemical properties mentioned above. For enthalpies of formation, the G4CEP yields an average absolute error of 1.0 kcal mol–1. The disadvantage of the G4CEP method lies in the greater computational demand compared to other versions of the Gn theory, even when using pseudopotential. This demand is directly associated with the use of extrapolation methods to reach the Hartree-Fock limit.26 Thus, despite the large computational cost, it is the best alternative for the calculation of thermochemical properties in the set of alternatives combining the Gn and pseudopotential theory. The Gn theory is being used in pKa calculations to describe gas phase compounds and the equivalent solution compounds, which are usually described by other methods

31–35.

The combination of composite calculations in solution, as the

authors know, has not been considered. Therefore, the objective of this work was to evaluate the newly developed G4CEP theory for pKa calculations considering its use for gas phase and for solvated molecules calculations. 2. Computational Methods The calculation of pKa was considered from the following thermodynamic definition: pKa = ― log(Ka) =

∆GR 2.303 RT

(1)

where Ka is the equilibrium constant of the following deprotonation reaction: 𝐴𝐻(𝑎𝑞)⇌ ― + + 𝐻(𝑎𝑞) 𝐴(𝑎𝑞) , ΔGR is the Gibbs energy variation of this reaction, R is the gas constant and

T is the temperature. In theoretical approaches,7–10 pKa values can be determined through the thermodynamic cycle shown in Figure 1. The change from the gas phase to the solution of the 𝐴𝐻, 𝐴 ― and 𝐻 + compounds is characterized by the Gibbs energies of solvation,

5 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

which are ∆𝐺𝑠𝑜𝑙𝑣(𝐴𝐻), ∆𝐺𝑠𝑜𝑙𝑣(𝐴 ― ) and ∆𝐺𝑠𝑜𝑙𝑣(𝐻 + ), respectively. According to the thermodynamic cycle, ∆𝐺𝑅 can be expressed as: ∆𝐺𝑅 = ∆𝐺𝑔 + ∆𝐺𝑠𝑜𝑙𝑣(𝐴 ― ) + ∆𝐺𝑠𝑜𝑙𝑣(𝐻 + ) ― ∆𝐺𝑠𝑜𝑙𝑣(𝐴𝐻)

(2)

All solvation Gibbs energy variations of this equation were calculated as the difference between the Gibbs energy of the solvated X compounds minus the Gibbs energy in the gas phase, ∆𝐺𝑠𝑜𝑙𝑣(𝑋) = 𝐺𝑠𝑜𝑙𝑣(𝑋) ― 𝐺𝑔(𝑋). In the present work all calculations were performed at 298.15 K and at three different levels of calculation: G4CEP, B3LYP/ccpVTZ,36,37 BMK/cc-pVTZ,38. B3LYP was chosen because of its wide range of applications and since it is often considered for tests of different properties. BMK was chosen because it was developed as a general purpose functional capable of accurate determination of thermochemical properties for equilibrium and transition states. The 𝐺𝑔(𝐻 + ) value usually used in pKa calculations is –6.28 kcal mol–1, obtained experimentally, and it is well established in the literature.7–10 The other compounds in the gas phase were calculated with the three theoretical methods mentioned above. In the case of the Gibbs energies considering the solvent effect, the SMD method was used. From the Gibbs energy differences indicated by Eq. 2, the absolute pKa values were calculated from Eq. 1. Regardless of which procedure is used to perform the theoretical determination of pKa, it is necessary to know the proton free energy of solvation [∆𝐺𝑠𝑜𝑙𝑣(𝐻 + )]. The Gibbs energy of the proton solvation found in the literature is ambiguous and the value adopted by different authors has been modified. The procedures to circumvent this problem correspond to a selective choice of available experimental values or to use of the proton solvation energy as a parameter to be adjusted within the limits of experimental values. The suggested values range from –252.6 kcal mol–1 to –271.7 kcal mol–1. In the present work, the value of –265.9 kcal mol–1, suggested by Thapa and Schlegel for various organic compounds, was used.7,8 The effect of the choice of this Gibbs energy of the proton solvation will be discussed later. 3. Results and Discussion Table 1 shows the experimental pKa values of 22 monocarboxylic acids and the deviations from the values calculated with the B3LYP/cc-pVTZ, BMK/cc-pVTZ, B3LYP/aug-cc-pVTZ, BMK/aug-cc-pVTZ, CAM-B3LYP/aug-cc-pVTZ, and G4CEP methods in relation to the experimental data. At the bottom of the table, the mean 6 ACS Paragon Plus Environment

Page 7 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

absolute errors, standard deviations, and the highest and lowest deviations are presented. Part of the calculation was performed including the solvent effect using only the SMD model and another part used the SMD model and one explicit water molecule. These strategies will be differentiated in the tables with indications of SMD and SMD + H2O, respectively. The difference between these two data sets is drastic and shows the importance of the local interactions through the hydrogen bonds to properly represent the solvent effect, as already observed in the literature.7,8 It is worth to mention that the explicit water molecule was placed in different positions around the carboxyl group and the supermolecule geometry was fully optimized also including the SMD model. Frequency calculations were carried out to confirm that the optimum structure was in a minimum of energy and not a stationary point. The G4CEP mean absolute error for the calculations considering only the solvent effect by the SMD method was 0.83 pKa units, with a standard deviation of 0.62. The inclusion of a single explicit water molecule in the SMD calculations reduced the mean absolute error and the standard deviation to 0.51 and 0.29, respectively. Therefore, the dispersion of the results with 95% confidence is approximately ±1.2 and ±0.6 pKa units for the calculations using the SMD and SMD + H2O solvation models, respectively. Table 1 shows that the mean absolute errors involving the BMK and B3LYP functionals are significantly larger using the cc-pVTZ basis set. Considering only the SMD results, the mean absolute errors are equal to 1.83 and 2.86 for BMK and B3LYP, respectively. Standard deviations indicate that there is a dispersion of the errors between 2 and 2.5 times larger than the results from G4CEP and solvent effect from the SMD model. The inclusion of a single explicit water molecule in the density functional calculations and SMD does not significantly reduce the mean absolute errors, producing values close to 1.56 and 2.73 pKa units for BMK and B3LYP, respectively. On the other hand, the respective standard deviations are reduced to approximately 0.84 and 1.09 pKa units, i.e., the dispersion of the results had a significant reduction from an uncertainty of ±2.4 and ±3.0 pKa units with 95% confidence to ±1.7 and ±2.2 using BMK and B3LYP, respectively. The last three columns of Table 1 show the effect of diffuse functions and the long-range correction. Calculations at the BMK/aug-cc-pVTZ and B3LYP/aug-ccpVTZ reduced the mean absolute errors from 1.56 and 2.73 pKa units to 0.79 and 0.76, respectively. The standard deviations are reduced by approximately 50% and 72% for BMK and B3LYP, respectively, which is significant, reducing the uncertainty of DFT calculations to approximately ±1.6 pKa. This standard deviation is almost 2.5 times larger than the G4CEP deviation in the same conditions. Table 1 also shows the effect of longrange corrections on B3LYP calculations in the last column. It is curious that the mean absolute error for CAM-B3LYP is larger than B3LYP. A comparison between B3LYP and 7 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

CAM-B3LYP errors show that the long-range corrections are more discrepant for acids containing halogens. The BMK and B3LYP calculations indicate that the gain in computational efficiency over G4CEP results does not yet present advantages in terms of accuracy in the estimation of pKa values. If only the mean absolute error is considered, the comparison between G4CEP and B3LYP or G4CEP and BMK suggest that DFT calculations should be selected because of the computational economy favoring these methods. However, the dispersion of the present results with DFT can achieve deviations larger than the G4CEP calculations. The values of the largest positive and negative deviations shown in Table 1 reinforce the earlier analysis of the G4CEP reliability and reinforce the need for inclusion of local solvent effects to achieve an acceptable level of accuracy with deviations below 1 unit of pKa. On the other hand, computationally advantageous results in terms of CPU time from the density functional calculations provide unacceptable deviations for the pKa predictions, even including explicit solvent molecules. An alternative to improve the quality of the results was presented by Miguel et al.,39 who suggested that the calculated pKa values can be improved by applying a linear regression defined from the calculated values at a certain level of theory with respect to the experimental pKa values, or mathematically presented as the following: 𝑝𝐾𝑎(𝑙𝑠𝑞𝑟) = 𝑎 + 𝑏.𝑝𝐾𝑎(𝑐𝑎𝑙𝑐.). In an ideal situation with excellent agreement between the calculated and experimental data, the linear regression has an angular coefficient equal to one and linear coefficient equal to zero. Two regression forms were used in this work. The first was obtained from the conventional linear correlation of the theoretical data with respect to the experimental data. The second considered that the linear regression must pass through the origin of the coordinate system. The coefficients for the regressions used in this work, without and with the presence of an explicit water molecule, are shown in Table 2. The angular and linear coefficients of the regressions indicate a behavior closer to the ideal for the calculations performed with the G4CEP method for both linear regression and linear regression through the origin. The BMK/cc-pVTZ and B3LYP/ccpVTZ results are improved, but the results are still far from providing significant deviations with respect to experimental data. The effect of diffuse functions on B3LYP and BMK calculations is substantial for the regression through zero, providing a correction similar to that necessary for G4CEP. Table 3 shows pKa deviations obtained with the linear corrections and the mean absolute error achieved by the four methods of calculation. While the G4CEP with SMD mean absolute error changed from 0.83 to 0.30, the B3LYP and BMK calculations decreased from 2.86 and 1.83 to 0.67 and 0.58 with cc-pVTZ and to 0.51 and 0.44 with aug-cc-pVTZ, respectively. The mean absolute error for least square calculations with 8 ACS Paragon Plus Environment

Page 9 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

G4CEP and SMD + H2O data was 0.38, while for B3LYP and BMK in the same conditions they were 0.44 and 0.50 pKa units with cc-pVTZ and to 0.45 and 0.37 with aug-cc-pVTZ, respectively. CAM-B3LYP presents statistical behavior almost identical with B3LYP/augcc-pVTZ except that the distribution of errors changes from molecule to molecule. The G4CEP calculations in any of the conditions evaluated showed the lowest mean absolute errors and less dispersion of the results. However, it is noted that the calculations using the B3LYP or BMK functionals become competitive when considering the least squares adjustment and with a computational cost much lower than G4CEP. The greatest advantage of the G4CEP calculations lies in the lower uncertainty expected. By using the results with SMD and SMD + H2O and cc-pVTZ, uncertainties are around ± 0.5 and ± 0.6 pKa units, respectively. For the B3LYP calculations, the uncertainties with both models are on the order of ± 1.0 and ± 0.9, respectively, whereas for BMK the application of these same methods yielded uncertainties on the order of ± 0.9 and ± 0.8 pKa units, respectively. The inclusion of diffuse functions reduces even more the uncertainties to ± 0.7 and ± 0.9 pKa units for BMK and B3LYP or CAM-B3LYP, respectively. These results indicate that the deviations for the B3LYP and BMK calculations may be as much as double the uncertainty expected from G4CEP. The values of the positive and negative maximum deviations observed for the sequence of molecules reinforce the need for a longer computational time through G4CEP to obtain results within a desired level of uncertainty. The positive and negative threshold deviations in Table 3 indicate that eventually the estimated pKa values with G4CEP can reach pKa variations of up to 1.3 units. On the other hand, the B3LYP and BMK calculations, although reducing the maximum and minimum errors, point to possible uncertainties of up to 2 pKa units with respect to the experimental data. Some results from the literature are calculated at an equivalent level and can be partially compared with those obtained using the G4CEP calculations. Sastre et al.

34

performed calculations of pKa using isodesmic reactions and calculations at the B3LYP, M05-2X, and CBS-4B3* levels of theory for several organic molecules, among which there were eight carboxylic acids evaluated in this article, which are shown in Table 4. The solvent effect was considered using the SMD and conductor-like polarizable continuum (CPCM) models and the pKa of acetic acid was used as a reference. The best results were obtained with the CBS-4B3* method and solvent effect from the SMD model. The absolute mean error found for the pKa values was 0.51 pKa units. The same series of eight molecules using the G4CEP method and the SMD solvation model provided a mean absolute error of 1.07 pKa units, or 0.41 pKa units with an explicit water molecule, or even 0.32 pKa units considering the explicit water molecule and the least squares fitting. The mean absolute error of 0.57 found by Sastre et al. 34 is similar to that found in 9 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

the present work using the functionals B3LYP (0.51) and BMK (0.45) and the SMD + H2O solvation model and least squares correction. Possibly, the inclusion of an explicit water molecule and calculations at the CBS-4B3* level will provide results as accurate as those from this work. It is worth mentioning that the use of G4CEP from isodesmic reactions may improve the calculated pKa. Work in this sense is in progress. 4. Proton Hydration Free Energy As mentioned previously, the adjustment of the Gibbs energy of proton solvation is not well-defined in the literature, and different values are used in several articles. Liptak and Shields9 used a ΔGsolv(H+) =-264.61 kcal mol-1 when pKa values for 11 carboxylic acids were calculated with the Gn and CBS methods in addition to continuous solvation. The value of ΔGsolv(H+) = –259.5 ± 2.5 kcal mol–1 was used by Florian and Warshel40,41 to determine the pKa of 17 organic acids from a new parametrization of the Langevin dipole model developed for ab initio calculation in aqueous solution. Schmidt and Knapp10 estimated the pKa of four carboxylic acids and other organic molecules using ΔGsolv(H+) = –261.94 kcal mol–1 and ΔGsolv(H+) = –265.74 kcal mol–1 with Becker(½) and B3LYP functionals. We performed tests with the extreme values of the proton solvation free energy from the literature using the G4CEP, BMK, and B3LYP methods to evaluate the sensitivity of the results with respect to this energy. The use of ΔGsolv (H+) = –264.61 kcal mol–1 showed a mean absolute error of 0.82 pKa units for the G4CEP calculations, considering only the solvent effect from SMD, a result similar to the mean absolute error presented in Table 1. Estimates of pKa values from the BMK and B3LYP functionals yielded mean absolute errors of 2.37 and 3.62 pKa units, respectively, which are greater by more than 0.5 units relative to the calculations in Table 1. The inclusion of one explicit water molecule, that is, SMD + H2O, showed a mean absolute error of 0.82 pKa units for G4CEP, which is 0.31 units higher than the respective error from Table 1. The pKa estimated from the BMK and B3LYP functionals in the same conditions produced mean absolute errors of 2.41 and 3.78 pKa units, respectively, higher by approximately one unit compared to the values from Table 1. For calculations involving SMD + H2O, the mean absolute errors for G4CEP, BMK, and B3LYP with ΔGsolv (H+) = –259.5 kcal mol–1 were 4.56, 6.07, and 7.52 pKa units, respectively. The comparison of these results with those presented in Table 1 makes the large influence of the choice of proton solvation free energy in the determination of pKa by the methods used in this work clear. 5. Conclusion 10 ACS Paragon Plus Environment

Page 11 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Values of pKa from monoprotic acids were calculated using the newly developed G4CEP theory combined with the continuous SMD solvation method. The following tests were also carried out using DFT: B3LYP/cc-pVTZ and BMK/cc-pVTZ. Twenty-two carboxylic acids were calculated at three levels of theory using three different strategies. The first alternative used only calculation at the G4CEP level, including the SMD reaction field. A second strategy included one explicit water molecule and was referred to as SMD + H2O. In the third alternative, besides the application of the solvent effect using SMD + H2O, the results were corrected from a linear regression defined from the calculated results with respect to the experimental pKa values. Calculations at the G4CEP level with SMD showed a mean absolute error of 0.83 pKa units. Inclusion of a single explicit water molecule reduced the error to 0.51 pKa units. Two other types corrections were tested. A linear regression with respect to experimental data provided mean absolute errors of 0.30 and 0.38 pKa units, respectively. B3LYP/cc-pVTZ and BMK/cc-pVTZ with SMD yielded mean absolute errors of 2.86 and 1.83 pKa units, respectively. The inclusion of one water molecule reduced the respective errors to 2.73 and 1.56 pKa units, respectively, while the diffuse functions improved even more the results to approximately 0.8 pKa units for both functionals and to 1.1 for CAM-B3LYP/aug-cc-pVTZ. The effect of the least squares correction on the calculations using B3LYP/cc-pVTZ and BMK/cc-pVTZ was very significant, reducing the mean absolute errors with SMD to 0.67 and 0.58 pKa units. For these two methods, the inclusion of one explicit water molecule produced a small reduction in the mean absolute errors to 0.44 and 0.50 pKa units for calculations with cc-pVTZ and to 0.51 and 0.44 pKa using the aug-cc-pVTZ basis set. These deviations are slightly larger than those obtained by the G4CEP and SMD + H2O method and least squares correction. However, the dispersion of the results with DFT can lead to deviations of up to two pKa units, whereas for the G4CEP calculations the largest deviations seldom exceed one pKa unit. Results obtained from G4CEP with the composite method of an equivalent level showed similar performance. Only seven of the 22 molecules could be compared between G4CEP and CBS-4B3* calculations by Sastre et al.34 The comparison between both sets of results cannot direct, because both used different pKas determination procedures. However, the errors present some tendencies that may be mentioned. The G4CEP results were obtained with the conventional thermodynamic cycle employing gas phase and solution data. The CBS-4B3* results were obtained using isodesmic reactions, and the comparison discussed in the present work was made considering acetic acid data as the reference.34 Use of the SMD solvation model along with G4CEP and CBS-4B3* achieved a mean absolute error twice as large as the first method when 11 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

compared to the second one (1.07 vs 0.51 pKa units, respectively), suggesting that the isodesmic process possibly allows for more efficient error cancellation. The inclusion of a single explicit water molecule in the G4CEP and SMD calculations produced a mean absolute error slightly smaller than the CBS-4B3* results (0.41 vs 0.51 pKa units, respectively). Possibly, the inclusion of explicit solvent molecules in the CBS-4B3* calculations may provide improved results. In general, the G4CEP and CBS-4B3* calculations can be considered to have approximate levels of accuracy for pKa calculations of monoprotic acids.

6. Acknowledgments The authors would like to acknowledge financial support from FAPEAM (Fundação de Amparo à Pesquisa do Estado do Amazonas), FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo - Center for Computational Engineering and Sciences, Grant 2013/08293-7 and Grant 2017/11485-6), and FAEPEX-UNICAMP (Fundo de Apoio ao Ensino, à Pesquisa e à Extensão da UNICAMP). The National Center of High-Performance Computing in São Paulo (CENAPAD-SP) and The National Center of High-Performance Computing in Ceará (CENAPAD-UFC) is acknowledged for access to their computational facilities. AUTHOR INFORMATION *E-mail: [email protected]; phone: +55-19-35213104; fax: +55-19-35213023.

12 ACS Paragon Plus Environment

Page 13 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

References (1)

Monaco, M. R.; Fazzi, D.; Tsuji, N.; Leutzsch, M.; Liao, S.; Thiel, W.; List, B. The Activation of Carboxylic Acids via Self-Assembly Asymmetric Organocatalysis: A Combined Experimental and Computational Investigation. J. Am. Chem. Soc. 2016, 138, 14740–14749.

(2)

Lang, S. B.; Cartwright, K. C.; Welter, R. S.; Locascio, T. M.; Tunge, J. A. Photocatalytic Aminodecarboxylation of Carboxylic Acids. European J. Org. Chem. 2016, 2016, 3331–3334.

(3)

Oliver-Tomas, B.; Renz, M.; Corma, A. Ketone Formation from Carboxylic Acids by Ketonic Decarboxylation: The Exceptional Case of the Tertiary Carboxylic Acids. Chem. - A Eur. J. 2017, 23, 12900–12908.

(4)

Olmedo, A.; Aranda, C.; del Río, J. C.; Kiebist, J.; Scheibner, K.; Martínez, A. T.; Gutiérrez, A. From Alkanes to Carboxylic Acids: Terminal Oxygenation by a Fungal Peroxygenase. Angew. Chemie Int. Ed. 2016, 55, 12248–12251.

(5)

Liu, J.; Qu, R.; Wang, Z.; Mendoza-Sanchez, I.; Sharma, V. K. Thermal- and Photo-Induced Degradation of Perfluorinated Carboxylic Acids: Kinetics and Mechanism. Water Res. 2017, 126, 12–18.

(6)

Tsivintzelis, I.; Kontogeorgis, G. M.; Panayiotou, C. Dimerization of Carboxylic Acids: An Equation of State Approach. J. Phys. Chem. B 2017, 121, 2153–2163.

(7)

Thapa, B.; Schlegel, H. B. Theoretical Calculation of p K a ’s of Selenols in Aqueous Solution Using an Implicit Solvation Model and Explicit Water Molecules. J. Phys. Chem. A 2016, 120, 8916–8922.

(8)

Thapa, B.; Schlegel, H. B. Improved p K a Prediction of Substituted Alcohols, Phenols, and Hydroperoxides in Aqueous Medium Using Density Functional Theory and a Cluster-Continuum Solvation Model. J. Phys. Chem. A 2017, 121, 4698–4706.

(9)

Liptak, M. D.; Shields, G. C. Accurate p K a Calculations for Carboxylic Acids Using Complete Basis Set and Gaussian-n Models Combined with CPCM Continuum Solvation Methods. J. Am. Chem. Soc. 2001, 123, 7314–7319.

(10)

Schmidt am Busch, M.; Knapp, E.-W. Accurate PKa Determination for a Heterogeneous Group of Organic Molecules. ChemPhysChem 2004, 5, 1513– 1522.

(11)

Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113, 6378–6396. 13 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(12)

Page 14 of 25

Marenich, A. V.; Olson, R. M.; Kelly, C. P.; Cramer, C. J.; Truhlar, D. G. SelfConsistent Reaction Field Model for Aqueous and Nonaqueous Solutions Based on Accurate Polarized Partial Charges. J. Chem. Theory Comput. 2007, 3, 2011– 2033.

(13)

Klamt, A. The COSMO and COSMO-RS Solvation Models. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2011, 1, 699–709.

(14)

Mennucci, B. Polarizable Continuum Model. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2, 386–404.

(15)

Riojas, A. G.; Wilson, A. K. Solv-CcCA: Implicit Solvation and the Correlation Consistent Composite Approach for the Determination of p K A. J. Chem. Theory Comput. 2014, 10, 1500–1510.

(16)

Pople, J. A.; Head-Gordon, M.; Fox, D. J.; Raghavachari, K.; Curtiss, L. A. Gaussian-1 Theory: A General Procedure for Prediction of Molecular Energies. J. Chem. Phys. 1989, 90, 5622–5629.

(17)

Curtiss, L. A.; Raghavachari, K.; Trucks, G. W.; Pople, J. A. Gaussian‐2 Theory for Molecular Energies of First‐ and Second‐row Compounds. J. Chem. Phys. 1991, 94, 7221–7230.

(18)

Baboul, A. G.; Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Gaussian-3 Theory Using Density Functional Geometries and Zero-Point Energies. J. Chem. Phys. 1999, 110, 7650–7657.

(19)

Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Gaussian-4 Theory. J. Chem. Phys. 2007, 126, 084108–084119.

(20)

Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Gn Theory. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2011, 1, 810–825.

(21)

Curtiss, L. A.; Redfern, P. C.; Raghavachari, K.; Pople, J. A. Gaussian-3X (G3X) Theory Using Coupled Cluster and Brueckner Energies. Chem. Phys. Lett. 2002, 359, 390–396.

(22)

Pereira, D. H.; Ramos, A. F.; Morgon, N. H.; Custodio, R. Implementation of Pseudopotential in the G3 Theory for Molecules Containing First-, Second-, and Non-Transition Third-Row Atoms. J. Chem. Phys. 2011, 135, 034106.

(23)

Pereira, D. H.; Ramos, A. F.; Morgon, N. H.; Custodio, R. Erratum: “Implementation of Pseudopotential in the G3 Theory for Molecules Containing First-, Second-, and Non-Transition Third-Row Atoms” [J. Chem. Phys. 135, 034106 (2011)]. J. Chem. Phys. 2011, 135, 219901.

(24)

Rocha, C. M. R.; Pereira, D. H.; Morgon, N. H.; Custodio, R. Assessment of G3(MP2)//B3 Theory Including a Pseudopotential for Molecules Containing First, Second-, and Third-Row Representative Elements. J. Chem. Phys. 2013, 139, 14 ACS Paragon Plus Environment

Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

184108. (25)

Pereira, D. H.; Rocha, C. M. R.; Morgon, N. H.; Custodio, R. G3(MP2)-CEP Theory and Applications for Compounds Containing Atoms from Representative First, Second and Third Row Elements of the Periodic Table. J. Mol. Model. 2015, 21, 204–210.

(26)

Silva, C. de S.; Pereira, D. H.; Custodio, R. G4CEP: A G4 Theory Modification by Including Pseudopotential for Molecules Containing First-, Second- and ThirdRow Representative Elements. J. Chem. Phys. 2016, 144, 204118–204126.

(27)

Silva, C. D. S.; Custodio, R. Empirical Corrections in the G3X and G3X ( CCSD ) Theories Combined with a Compact Effective Pseudopotential. Theor. Chem. Acc. 2018, 137, 1–9.

(28)

Pereira, D. H.; Ducati, L. C.; Rittner, R.; Custodio, R. A Study of the Rotational Barriers for Some Organic Compounds Using the G3 and G3CEP Theories. J. Mol. Model. 2014, 20, 2199–2212.

(29)

Rocha, C. M. R.; Rodrigues, J. A. R.; Moran, P. J. S.; Custodio, R. An Interpretation of the Phenol Nitration Mechanism in the Gas Phase Using G3(MP2)//B3-CEP Theory. J. Mol. Model. 2014, 20, 2524–2531.

(30)

Leal, R. C.; Pereira, D. H.; Custodio, R. An Energetic Analysis of the Diels-Alder Endo:Exo Selectivity Reaction by Using Composite Methods. Comput. Theor. Chem. 2018, 1123, 161–168.

(31)

Vipperla, B.; Griffiths, T. M.; Wang, X.; Yu, H. Theoretical p Ka Prediction of the α-Phosphate Moiety of Uridine 5′-Diphosphate-GlcNAc. Chem. Phys. Lett. 2017, 667, 220–225.

(32)

Khursan, S. L.; Ovchinnikov, M. Y. The p K a Theoretical Estimation of C―H, N―H, O―H and S―H Acids in Dimethylsulfoxide Solution. J. Phys. Org. Chem. 2014, 27, 926–934.

(33)

Çiftcioğlu, G. A.; Trindle, C. Computational Estimates of Thermochemistry and p K a Values of Cyclopropenyl Imine Superbases. Int. J. Quantum Chem. 2014, 114, 392–399.

(34)

Sastre, S.; Casasnovas, R.; Muñoz, F.; Frau, J. Isodesmic Reaction for PKa Calculations of Common Organic Molecules. Theor. Chem. Acc. 2013, 132, 1310.

(35)

Gupta, M.; da Silva, E. F.; Svendsen, H. F. Postcombustion CO 2 Capture Solvent Characterization Employing the Explicit Solvation Shell Model and Continuum Solvation Models. J. Phys. Chem. B 2016, 120, 9034–9050.

(36)

Becke, A. D. Density‐functional Thermochemistry. III. The Role of Exact Exchange. J. Chem. Phys. 1993, 98, 5648–5652.

(37)

Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti Correlation15 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 25

Energy Formula into a Functional of the Electron Density. Phys. Rev. B 1988, 37, 785–789. (38)

Boese, A. D.; Martin, J. M. L. Development of Density Functionals for Thermochemical Kinetics. J. Chem. Phys. 2004, 121, 3405–3416.

(39)

Miguel, E. L. M.; Silva, P. L.; Pliego, J. R. Theoretical Prediction of p K a in Methanol: Testing SM8 and SMD Models for Carboxylic Acids, Phenols, and Amines. J. Phys. Chem. B 2014, 118, 5730–5739.

(40)

Florián, J.; Warshel, A. Langevin Dipoles Model for Ab Initio Calculations of Chemical Processes in Solution: Parametrization and Application to Hydration Free Energies of Neutral and Ionic Solutes and Conformational Analysis in Aqueous Solution. J. Phys. Chem. B 1997, 101, 5583–5595.

(41)

Florián, J.; Warshel, A. Calculations of Hydration Entropies of Hydrophobic, Polar, and Ionic Solutes in the Framework of the Langevin Dipoles Solvation Model. J. Phys. Chem. B 1999, 103, 10282–10288.

(42)

CRC Handbook of Chemistry and Physics, 96 edition.; Haynes, W. M., Ed.; CRC Press, 2016.

(43)

Dobos, D. Electrochemical Data : A Handbook for Electrochemists in Industry and Universities; Elsevier Scientific Pub. Co., 1975.

16 ACS Paragon Plus Environment

Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry

Table 1. Experimental pKa values and differences between experimental and calculated values obtained at G4CEP, BMK and B3LYP levels. SMD corresponds to the use of the respective continuous solvation model and SMD + H2O is the SMD calculation including one H2O molecule explicitly. DFT calculations carried out using cc-pVTZ basis set, except the last three columns which used aug-cc-pVTZ. Substance Acetic acid Chloroacetic acid Trichloroacetic acid 2-Chlorobutanoic acid 3-Chlorobutanoic acid 4-Chlorobutanoic acid Bromoacetic acid 3-Butenoic acid 2-Methylpropanoic acid 2.2-Dimethylpropanoic acid 3-Methylbutanoic acid 2-Methylbutanoic acid 2-Butynoic acid 2-Chloropropanoic acid 3-Bromopropanoic acid 3-Chloropropanoic acid trans-Crotonic acid Pentanoic acid Hexanoic acid Formic acid Butanoic acid Propanoic acid MAE(d) Std.Dev.(e) Largest negative deviation Largest positive deviation

Exp.(a) 4.76 2.86 0.70 2.83 3.98 4.52 2.90 4.35 4.84 5.03 4.77 4.80 2.62 2.83 4.00 3.98 4.69 4.83 4.85 3.75 4.83 4.87

G4CEP(b) SMD -0.61 0.06 0.31 0.05 -1.16 -1.25 -0.52 0.80 -0.91 -1.01 -2.02 -1.60 0.44 0.09 0.28 -0.69 -0.95 -1.04 -2.35 -0.10 -0.45 -1.51 0.83 0.62 -2.35 0.80

G4CEP(b) SMD+H2O -0.50 -0.48 -0.75 -0.34 0.65 0.26 0.55 -0.37 -0.67 0.17 -0.50 0.52 1.19 0.16 -0.63 -0.26 0.67 -0.07 -0.38 -0.10 -0.96 -0.96 0.51 0.29 -0.96 1.19

BMK(b) SMD -2.16 -0.17 4.12 -0.22 -1.52 -1.31 -0.60 1.91 1.45 0.67 -0.64 -2.17 -4.34 1.24 -2.72 -2.03 -1.32 -3.73 -3.86 -1.29 -0.96 -1.76 1.83 1.21 -4.34 4.12

BMK(b) SMD+H2O -1.43 -2.19 1.88 -0.69 -2.33 0.13 -1.76 -1.98 -0.35 -2.71 -1.32 -0.36 -3.41 -1.79 -0.20 -1.68 -1.08 -1.52 -2.35 -1.20 -1.45 -2.50 1.56 0.84 -3.41 1.88

B3LYP(b) SMD -3.56 -1.62 2.47 -1.97 -3.01 -3.28 -1.90 0.50 0.09 -0.84 -1.58 -3.63 -6.39 -1.38 4.56 -3.66 -3.54 -4.30 -5.37 -2.48 -3.23 -3.53 2.86 1.53 -6.39 4.56

B3LYP(b) SMD+H2O -3.57 -3.22 -0.09 -1.90 -2.73 -1.64 -2.45 -1.85 -2.64 -3.71 -3.14 -2.75 -5.92 -2.79 -1.71 -2.41 -3.12 -2.99 -3.00 -2.54 -1.85 -4.06 2.73 1.09 -5.92 -0.09

BMK(c) SMD+H2O 0.27 -0.72 3.35 0.77 -0.87 1.59 -0.29 0.52 0.14 -1.25 0.14 1.10 -1.94 -0.32 1.25 -0.22 0.37 -0.06 -0.88 0.25 0.10 -1.04 0.79 0.76 3.35 -1.94

B3LYP(c) SMD+H2O -0.79 -0.44 2.69 0.87 0.04 1.14 0.32 0.93 0.13 -0.93 -0.35 0.02 -3.13 -0.01 1.06 0.36 -0.34 -0.20 -0.21 0.23 -1.27 -1.27 0.76 0.79 2.69 -3.13

CAM-B3LYP(c) SMD+H2O 0.52 -0.22 3.54 1.20 2.60 1.82 1.57 -0.60 -0.09 0.84 -0.40 -0.97 -2.01 0.15 0.72 1.28 -0.17 0.89 0.97 0.78 1.51 -1.50 1.11 0.83 -2.01 3.54

a. Data from references 42,43. b. Calculations using the cc-pVTZ basis set.

17 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 18 of 25

c. Calculations using the aug-cc-pVTZ basis set. d. MAE is the mean absolute error. e. Std. Dev. is the standard deviation.

18 ACS Paragon Plus Environment

Page 19 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Table 2. Linear regression coefficients of theoretical values obtained in different levels of calculations with respect to the experimental data [𝑝𝐾𝑎(𝑙𝑠𝑞𝑟) = 𝑎 + 𝑏.𝑝𝐾𝑎(𝑐𝑎𝑙𝑐.)]. SMD corresponds to the use of the respective continuous solvation model and SMD + H2O is the SMD calculation including one H2O molecule explicitly.

Method G4CEP G4CEP BMK BMK B3LYP B3LYP BMK B3LYP CAM-B3LYP

Basis Set cc-pVTZ cc-pVTZ cc-pVTZ cc-pVTZ aug-cc-pVTZ aug-cc-pVTZ aug-cc-pVTZ

Solvent effect SMD SMD+H2O SMD SMD+H2O SMD SMD+H2O SMD+H2O SMD+H2O SMD+H2O

𝑎

𝑏

𝑎

𝑏

1.27482 0.86576 2.48894 1.38189 2.58873 0.65463 2.12381 2.10985 2.45633

0.58531 0.76050 0.30122 0.48514 0.22655 0.49562 0.51986 0.50089 0.47752

0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

0.82790 0.95099 0.69483 0.71781 0.57015 0.58708 0.99972 0.97210 1.06719

19 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 20 of 25

Table 3. Experimental pKa values and the differences between experimental and calculated values obtained at G4CEP, BMK, and B3LYP levels modified by linear regression. SMD corresponds to the use of the respective continuous solvation model and SMD + H2O is the SMD calculation including one H2O molecule explicitly. DFT calculations were done using cc-pVTZ basis set, except the last three columns which used aug-ccpVTZ. Substance Acetic acid Chloroacetic acid Trichloroacetic acid 2-Chlorobutanoic acid 3-Chlorobutanoic acid 4-Chlorobutanoic acid Bromoacetic acid 3-Butenoic acid 2-Methylpropanoic acid 2.2-Dimethylpropanoic acid 3-Methylbutanoic acid 2-Methylbutanoic acid 2-Butynoic acid 2-Chloropropanoic acid 3-Bromopropanoic acid 3-Chloropropanoic acid trans-Crotonic acid Pentanoic acid Hexanoic acid Formic acid Butanoic acid Propanoic acid MAE (d) Std.Dev.(e) Largest negative deviation Largest positive deviation

Exp.(a) 4.76 2.86 0.7 2.83 3.98 4.52 2.9 4.35 4.84 5.03 4.77 4.8 2.62 2.83 4.0 3.98 4.69 4.83 4.85 3.75 4.83 4.87

G4CEP(b) SMD 0.34 -0.05 -0.80 -0.07 -0.30 -0.13 -0.38 1.00 0.20 0.22 -0.48 -0.22 0.07 -0.05 0.55 -0.03 0.11 0.12 -0.64 0.22 0.46 -0.14 0.30 0.26 -0.80 1.00

G4CEP(b) SMD+H2O -0.10 -0.54 -1.26 -0.44 0.59 0.42 0.26 -0.10 -0.21 0.48 -0.09 0.69 0.68 -0.06 -0.38 -0.10 0.78 0.25 0.02 -0.03 -0.43 -0.42 0.38 0.30 -1.26 0.78

BMK(b) SMD 0.19 -0.54 -0.76 -0.58 -0.17 0.27 -0.64 1.13 1.33 1.23 0.65 0.21 -1.97 -0.14 -0.51 -0.32 0.39 -0.24 -0.26 -0.26 0.60 0.38 0.58 0.45 -1.97 1.33

BMK(b) SMD+H2O 0.38 -0.97 -0.11 -0.26 -0.46 1.01 -0.74 -0.10 0.94 -0.11 0.43 0.91 -1.69 -0.79 0.58 -0.15 0.51 0.37 -0.02 -0.03 0.40 -0.09 0.50 0.41 -1.69 1.01

B3LYP(b) SMD 0.29 -0.74 -1.49 -0.85 -0.19 0.16 -0.78 0.89 1.18 1.11 0.74 0.30 -2.01 -0.71 1.54 -0.34 0.24 0.17 -0.05 -0.25 0.42 0.38 0.67 0.51 -2.01 1.54

B3LYP(b) SMD+H2O -0.02 -0.81 -0.35 -0.17 0.00 0.81 -0.41 0.62 0.48 0.04 0.20 0.40 -2.27 -0.61 0.52 0.16 0.16 0.30 0.30 -0.02 0.86 -0.21 0.44 0.47 -2.27 0.86

BMK(c) SMD+H2O

B3LYP(c) SMD+H2O

CAM-B3LYP(c) SMD+H2O

0.44 0.37 -1.87 0.87

0.51 0.45 -2.37 0.87

0.51 0.44 -2.05 0.86

0.30 -0.38 -0.05 -0.36 0.24 0.87 -0.58 0.24 0.27 -0.36 0.24 0.75 -1.87 -0.60 0.45 -0.10 0.32 0.23 0.66 -0.19 0.25 -0.33

0.66 -0.46 -0.41 -0.26 -0.10 0.72 -0.50 0.53 0.37 0.87 0.45 0.30 -2.37 -0.69 0.42 0.06 0.40 0.40 0.42 -0.12 -0.34 -0.32

0.28 -0.86 -0.40 -0.40 0.86 0.77 -0.19 0.10 0.12 0.57 0.23 0.51 -2.05 -0.91 -0.02 0.23 0.08 0.49 0.54 -0.12 0.79 -0.63

20 ACS Paragon Plus Environment

Page 21 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry

a. b. c. d. e.

Data from references 42,43. Calculations using the cc-pVTZ basis set. Calculations using the aug-cc-pVTZ basis set. MAE is the mean absolute error. Std. Dev. is the standard deviation.

21 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 25

Table 4. Experimental pKa values and the differences between experimental and calculated values obtained at the G4CEP level in different conditions and results from the literature obtained at the CBS-4B3* level of theory. SMD corresponds to the use of the respective continuous solvation model, SMD + H2O is the SMD calculation including one H2O molecule explicitly, and SMD + H2O + LSQR is the previous calculation including linear regression correction.

Substance

Chloroacetic acid 3-Chlorobutanoic acid 4-Chlorobutanoic acid Pentanoic acid Hexanoic acid Formic acid Propanoic acid MAEc Std.Dev.d Largest negative deviation Largest positive deviation

Exp. (a)

G4CEP

G4CEP

SMD

SMD+H2O

2.86 3.98 4.52 4.83 4.85 3.75 4.87

G4CEP

0.06 -1.16 -1.25 -1.04 -2.35 -0.10 -1.51 1.07 0.80

-0.48 0.65 0.26 -0.07 -0.38 -0.10 -0.96 0.41 0.32

SMD+H2O +LSQR -0.54 0.59 0.42 0.25 0.02 -0.03 -0.42 0.32 0.23

-2.35

-0.96

0.06

0.65

CBS-4B3*

CBS-4B3*

(b)

(b)

SMD

CPCM

-0.92 0.08 -0.05 0.53 -0.26 1.13 -0.61 0.51 0.41

0.95 -0.22 -0.58 -1.14 -1.08 0.56 -1.02 0.79 0.34

-0.54

-0.92

-1.14

0.59

1.13

0.95

a. Data from references 42,43. b. Data from reference 34. c. MAE is the mean absolute error. d. Std. Dev. is the standard deviation

22 ACS Paragon Plus Environment

Page 23 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure Captions

Figure 1. Thermodynamic cycle connecting the gas (g) phase and the aqueous (aq) phase to calculate the absolute pKa values.

23 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 25

Figure 1.

24 ACS Paragon Plus Environment

Page 25 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Table of Contents Graphic THE pKa DETERMINATION FOR MONOCARBOXYLIC ACIDS USING G4CEP

25 ACS Paragon Plus Environment