Application of the QSPR Approach to the Boiling ... - ACS Publications

Mar 30, 2011 - Tarmo Tamm,. § and Mati Karelson. ‡. †. Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesv...
0 downloads 0 Views 1MB Size
ARTICLE pubs.acs.org/JPCA

Application of the QSPR Approach to the Boiling Points of Azeotropes Alan R. Katritzky,*,† Iva B. Stoyanova-Slavova,† Kaido T€amm,‡ Tarmo Tamm,§ and Mati Karelson‡ †

Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States Institute of Chemistry, University of Tartu, Ravila 14A, 50411 Tartu, Estonia § Institute of Technology, University of Tartu, Nooruse 1, 50411 Tartu, Estonia ‡

bS Supporting Information ABSTRACT: CODESSA Pro derivative descriptors were calculated for a data set of 426 azeotropic mixtures by the centroid approximation and the weighted-contribution-factor approximation. The two approximations produced almost identical four-descriptor QSPR models relating the structural characteristic of the individual components of azeotropes to the azeotropic boiling points. These models were supported by internal and external validations. The descriptors contributing to the QSPR models are directly related to the three components of the enthalpy (heat) of vaporization.

’ INTRODUCTION Historically, chemistry has been oriented toward the synthesis and testing of unique individual compounds as pure entities. With the advent of combinatorial chemistry, it is now standard practice to synthesize simultaneously libraries consisting of mixtures of many products of a hundred or more distinct chemical species in the same container. Some such libraries made by combinatorial synthesis can be subjected to testing as such. An acknowledged technique for high-throughput screening (HTS) is to deliberately mix compounds available as individual species with the object of reducing the number of samples to be tested. Although mostly used for individual compounds, the quantitative structureproperty relationship (QSPR) methodology could, in principle, be applied to estimate some desired properties of mixtures. However, the increased complexity intrinsic in mixtures requires a modified approach for characterization. Considering the simplest case of a binary mixture, a fundamental question is: How should the descriptors characterizing the mixture be calculated? Two feasible approaches are (1) to simply calculate them as averages of the corresponding molecular descriptors for each component (centroid approximation)1 and (2) to scale each descriptor using weighting factors proportional to the molar fraction of each component in the mixture.2,3 Our current work is focused on the development of QSPRs for the prediction of the normal boiling points, Tb (i.e., the boiling points at 1 atm), of azeotropic binary mixtures. The normal boiling point (together with the ratio between the components) is an important physicochemical parameter used to characterize azeotropic mixtures. As in the case of pure liquids, the boiling point of a mixture is an indicator for its physical state—whether a liquid or a gas. In addition to being an indicator for the physical state, the boiling point also characterizes the volatility of azeotropic mixtures, which is a very important technological property related to the safety, transportation, and storage of materials, whether individual compounds or mixtures. r 2011 American Chemical Society

The classical estimation of phase equilibria and critical properties of mixtures is usually based on equations of state, be it the historic (cubic) van der Waals equation or some of its more accurate but also more complex modifications (RedlichKwong, PengRobinson, Elliott, Suresh, Donohue, Ritter, etc.).48 The more accurate equations of state, however, commonly make use of a number of compound-specific parameters that have to be either measured (critical parameters, etc.) or estimated using fitting constants. Moreover, some of the fitting parameters used for equations of state might be relevant only for homologous classes of compounds (for instance, hydrocarbons).9 Another option is to use group contribution schemes.10 The shortcoming of these is similar: the available parameters usually apply only to sets of compounds of closely related chemical structures. A more fundamental theoretical approach is based on the direct quantum mechanical calculation of the intermolecular interactions in dense fluids11 that are further applied within continuum theory1214 or by Monte Carlo or molecular dynamic simulations.15,16 However, the precision of such “first-principles” calculations is rather limited and often applicable only for chemically similar compounds. The advantage of using the QSPR approach based on theoretical descriptors is that all of the necessary parameters for prediction can be calculated purely from the three-dimensional representation of the molecular structure of each of the compounds of the mixtures, including mixtures of chemically diverse compounds. Thus, the present study considers a much more extensive set with much larger chemical diversity than previous investigations; the resulting models are, therefore, also applicable to a much wider array of possible mixtures. Received: May 11, 2010 Revised: March 5, 2011 Published: March 30, 2011 3475

dx.doi.org/10.1021/jp104287p | J. Phys. Chem. A 2011, 115, 3475–3479

The Journal of Physical Chemistry A

ARTICLE

Table 1. Statistical Parameters of the QSPR Models Using the Weighted-Contribution-Factor Approximation number of descriptors

R2

Rcv2

R2  Rcv2

F

2

0.556

0.546

0.010

198.88

3 4

0.690 0.750

0.680 0.740

0.010 0.010

234.75 235.74

5

0.750

0.736

0.014

188.23

6

0.750

0.735

0.015

156.54

Table 2. Statistical Parameters of the QSPR Models Using the Centroid Approximation Scheme Figure 1. Statistical distribution of the normal boiling point values in the data set and the corresponding normal (Gaussian) distribution curve.

Our literature search located only a single publication17 utilizing a QSPR approach to study thermodynamic properties of azeotropic mixtures in which a QSPR model expresses the entropy of vaporization, ΔSv, in terms of Tb and molecular weight M. Our current work demonstrates that both the centroid and the weighted-contribution-factor approximations can provide accurate predictions of the normal boiling points of azeotropes. Moreover, in terms of the descriptors involved, our results are completely compatible with models reported earlier for single organic species.18,19

’ DATA SET The normal boiling points (in Kelvin) and molar fractions of 426 binary azeotropic mixtures were extracted from a publicly available database.20 A list of the mixtures and the respective data is provided in the Supporting Information. The compounds forming the mixtures of the data set are chemically diverse, including alcohols, esters, alkanes, amines, ketones, ethers, and halogeno and nitro compounds. Several compounds, such as water, toluene, and 1-butanol, are present in two or more of these mixtures. Details of the mixtures investigated (exact composition of each mixture, boiling points of the components and of the mixture itself) are provided in the Supporting Information. To assess properly the quality of a QSPR model (using, for instance, the F statistic) and to ensure its predictive power, the property values should follow a statistical normal (Gaussian) distribution. As shown in Figure 1, the normal boiling point values of the azeotropes utilized do follow a Gaussian distribution, and therefore, no transformation of the original property was necessary. ’ COMPUTATIONAL PROCEDURE A three-dimensional conversion and preoptimization procedure was performed on the structures of each of the compounds in the mixtures using molecular mechanics (MMþ) as implemented in HyperChem 8.0.21 Final geometry optimization of the individual molecules composing the mixtures was carried out by applying the semiempirical quantum-mechanical AM1 parametrization.22 The optimized geometries were loaded into CODESSA Pro software,23 and more than 600 theoretical descriptors were calculated for each structure.24

number of descriptors

R2

Rcv2

R2  Rcv2

F

2

0.585

0.573

0.012

223.43

3

0.714

0.705

0.009

263.48

4

0.755

0.744

0.011

242.18

5

0.793

0.783

0.010

240.86

6

0.803

0.792

0.011

213.18

In the next step, the values of the descriptors (according to both the centroid and weighting-factor schemes) were calculated. The resulting modified descriptors were then imported into Statistica 6.0,25 where two different QSPR modeling approaches were employed:26 (1) The initial set was separated into training (75% of the mixtures, 320 azeotropes) and test (25% of the mixtures, 106 azeotropes) subsets, and the latter was used to validate the model. (2) All mixtures were used to build a general model that was further tested by the application of a scrambling validation procedure. To obtain a truly representative validation set (the “test set” of case 1), all mixtures were ordered in ascending order of their Tb values, and each fourth point was selected to form the respective validation sets. At the next stage, the stepwise forward multiple linear regression method implemented in Statistica 6.0 was used to process the descriptor data.

’ RESULTS AND DISCUSSION For the training set consisting of 320 mixtures, models with up to six descriptors were generated by the CODESSA Pro procedure. The corresponding statistical parameters of these models are given in Tables 1 and 2 and illustrated in Figures 2 and 3. The optimal QSPRs were selected based on the following criteria: (1) retaining the smallest number of parameters to avoid overfitting, (2) finding the smallest difference R2  Rcv2 (squared correlation coefficient  squared cross-validated correlation coefficient), (3) finding the highest value of the Fisher criterion, (4) selecting a model in which the number of descriptors n represents a “break point” in the plot of n versus R2, and (5) achieving overall selfconsistency. As can be seen from the statistical parameters listed in Table 1 (calculated according to the weighted-contribution-factor scheme) and the plot of R2 versus the number of descriptors in the model (Figure 2), the model involving four parameters satisfied the first four criteria. As can be seen from Figure 2 and the first column of Table 1, after the addition of the fourth descriptor, there was practically no improvement in the statistical 3476

dx.doi.org/10.1021/jp104287p |J. Phys. Chem. A 2011, 115, 3475–3479

The Journal of Physical Chemistry A

ARTICLE

Figure 4. Experimental boiling points versus values predicted by the weighted-contribution-factor approximation. Figure 2. Plot of R2 versus the number of descriptors in the weightedcontribution-factor approximation.

Figure 3. Plot of R2 versus the number of descriptors in the centroid approximation.

parameters of the models (addition of a fifth descriptor affected R2 only in the fourth position after the decimal point). Application of the algorithm just discussed to models obtained using the centroid approximation selected as most optimal that involving three descriptors (see Table 2 and Figure 3). However, our desire to work with consistent models (the fifth criterion) that could be compared at a later stage led us to the final selection of a four-descriptor model. The plot of the predicted versus experimental Tb values for the training and test subsets in the case of the weighted-contributionfactor approximation is shown in Figure 4. As can be seen, the two regression lines almost coincide (the same applies to the centroid approximation model shown in Figure 5). The minor difference in their intercepts indicates negligible overestimation of Tb for the azeotrope part of the test set. The application of the methodology described in the first four criteria to the complete data set of 426 azeotropes led us to the models of Tables 5 and 6. The boiling points were predicted utilizing the general models of Tables 5 and 6, as reported in Table SI1 of the Supporting Information. For comparative purposes, the regression coefficients of the four models described in Tables 36 are compared in Table 7. As can be seen from Table 7, the regression coefficients of the descriptors vary only slightly. This variation is practically

Figure 5. Experimental boiling points versus the values predicted using the centroid approximation.

negligible for the most important descriptor [gravitation index (all bonds)], with a level of significance at least twice that of the second most important descriptor [HA-dependent HDCA-1/ TMSA (Zefirov PC) (all)]. The same is valid for the intercept of the models, with a variation between its minimum and maximum values of less than 3%. In terms of intermolecular interactions, the normal boiling point represents the point at which the molecules forming the azeotropic mixture have enough kinetic energy to overcome the various intermolecular attractions binding the molecules together and, therefore, undergo a phase change into a gaseous phase. Hence, the boiling point of the azeotropes is also an indicator of the strength of the attractive forces between the molecules in the mixture. Similarly to the normal boiling points of simple organics,18 the boiling points of azeotropic mixtures were thus found to depend mainly on the molecular dispersion forces and the effective mass distribution taken into account by the gravitation index. However, use of the cube root of the gravitation index of the 3477

dx.doi.org/10.1021/jp104287p |J. Phys. Chem. A 2011, 115, 3475–3479

The Journal of Physical Chemistry A

ARTICLE

Table 3. Best Four-Parameter Weighted-Contribution-Factor QSPR Modela number 0 1 2

a

X 166.8 0.3580 4059

ΔX

t

name of descriptor

8.099

20.60

intercept

0.01511

23.68

gravitation index (all bonds)

11.06

HA-dependent HDCA-1/TMSA (Zefirov PC) (all)

367.2

3

1.131

0.1309

4

17.01

2.113

8.639 8.050

DPSA2 difference in CPSAs (PPSA2-PNSA2) (Zefirov PC) PPSA3 atomic-charge-weighted PPSA (Zefirov PC)

Training set: N = 320, Rtraining2 = 0.750, Rcv2 = 0.740, Rtest2 = 0.683, F = 235.7, s = 23.40.

Table 4. Best Four-Parameter Centroid QSPR Modela number 0 1 2

a

X 166.8 0.3801 4581

ΔX

t

name of descriptor

8.633

19.32

intercept

0.01601

23.74

gravitation index (all bonds)

10.87

HA-dependent HDCA-1/TMSA (Zefirov PC) (all)

421.4

3

0.9487

0.1302

4

11.52

2.348

7.287 4.904

DPSA2 difference in CPSAs (PPSA2-PNSA2) (Zefirov PC) PPSA3 atomic-charge-weighted PPSA (Zefirov PC)

Training set: N = 320, Rtraining2 = 0.755, Rcv2 = 0.744, Rtest2 = 0.693, F = 242.2, s = 23.16.

Table 5. Best Four-Parameter Weighted-Contribution-Factor QSPR Model for the Complete Data Seta number 0 1 2 3 4 a

X 170.8 0.3541 3972 1.153 16.73

ΔX

t

6.962

24.53

0.01290

27.46

gravitation index (all bonds)

12.00

HA-dependent HDCA-1/TMSA (Zefirov PC) (all)

331.0 0.1039 1.769

name of descriptor intercept

11.10 9.458

DPSA2 difference in CPSAs (PPSA2-PNSA2) (Zefirov PC) PPSA3 atomic-charge-weighted PPSA (Zefirov PC)

N = 426, R2 = 0.732, Rcv2 = 0.723, F = 287.0, s = 23.94.

Table 6. Best Four-Parameter Centroid QSPR Model for the Complete Data Seta number 0 1 2

a

X 168.9 0.3703 4220

ΔX

t

name of descriptor

7.518

22.47

intercept

0.01396

26.52

gravitation index (all bonds)

11.08

HA-dependent HDCA-1/TMSA (Zefirov PC) (all)

380.9

3

0.9546

0.1088

4

12.43

2.036

8.773 6.103

DPSA2 difference in CPSAs (PPSA2-PNSA2) (Zefirov PC) PPSA3 atomic-charge-weighted PPSA (Zefirov PC)

N = 426, R2 = 0.738, Rcv2 = 0.730, F = 297.1, s = 23.64.

azeotropes (using both the weighted-contribution-factor scheme and the centroid approximation) as utilized in ref 18 did not improve the statistical parameters of the models (Rweighting factors2 = 0.572 and Rcentroid approximation2 = 0.586). According to the ClausiusClapeyron equation expressed in the form !1 R lnðP0 Þ 1 þ Tb ¼ ΔHvap T0 the normal boiling point (Tb) depends on the enthalpy of vaporization. The enthalpy (heat) of vaporization has three main components taking into account the following intermolecular interactions: (1) nonspecific or van der Waals attractions between all molecules, (2) unevenly distributed electron densities

causing induced dipoledipole interactions, and (3) hydrogenbond formation. In terms of ΔHvap, the gravitation index can be related to interactions of type 1. The second most important descriptor in our models, namely, “HA-dependent HDCA-1/TMSA (Zefirov PC) (all)” (hydrogenbond-acceptor-dependent hydrogen-bond-donor charged surface area/total molecular surface area), can be related to the hydrogen-bond formation ability of the components of the azeotropic mixtures, described by interactions of type 3. Both the gravitation index and the HA-dependent HDCA-1/TMSA descriptors are characterized by positive regression coefficients, and their increase leads to higher boiling points. This is opposite to what was observed in our QSPR model for the vapor pressure,18 where the regression signs of those descriptors [in the case of “HA-dependent HDCA-1/TMSA (Zefirov PC) 3478

dx.doi.org/10.1021/jp104287p |J. Phys. Chem. A 2011, 115, 3475–3479

The Journal of Physical Chemistry A

ARTICLE

Table 7. Regression Coefficients of the Models in Tables 36a X(WF)

X(CA)

166.8

166.8

0.3580 4059

0.3801 4581

X(WFCS)

X(CACS)

170.8

168.9

0.3541 3972

name of descriptor intercept

0.3703 4220

gravitation index (all bonds) HA-dependent HDCA-1/TMSA (Zefirov PC) (all)

1.131

0.9487

1.153

0.9546

DPSA2 difference in CPSAs (PPSA2-PNSA2) (Zefirov PC)

17.01

11.52

16.73

12.43

PPSA3 atomic-charge-weighted PPSA (Zefirov PC)

a

WF, weighted contribution factors; CA, centroid approximation; WFCS, weighted contribution factors for the complete set; CACS, centroid approximation for the complete set.

(all)”, its derivative “HA-dependent HDCA-1 (Zefirov PC)”] are negative. This kind of behavior is to be expected because of the inverse proportionality between the vapor pressure and the normal boiling points (the higher the vapor pressure of a liquid at a given temperature, the lower the normal boiling point). The remaining two descriptors, namely, “DPSA2 difference in CPSAs (PPSA2-PNSA2) (Zefirov PC)” (where PPSA = partial positively charged surface area and PNSA = partial negatively charged surface area) and “PPSA3 atomic-charge-weighted PPSA (Zefirov PC)”, can be related to the second component of the ΔHvap function, concerning the charge redistribution within a molecule and thus responsible for the induced dipole dipole interactions. Our model suggests that the order of overall significance of these three types of intermolecular interactions is: nonspecific or van der Waals attractions > hydrogen bonding > dipoledipole interactions. A scrambling procedure was applied to examine the sensitivity of the proposed QSPR model to chance correlations; that is, the model was fitted to randomly reordered boiling points and then compared with that obtained for their actual values.27 Twenty randomizations were performed and produced an average R2 = 0.056 (ranging from 0.037 to 0.092). The substantial difference between the actual R2 value and the average R2 value from the scrambling procedure supported the stability of the model and the lack of chance correlations.

’ CONCLUSIONS Rigorous QSPR approaches utilizing both the centroid approximation and the weighted-contribution-factor scheme were employed to relate the normal boiling points of 426 azeotropic mixtures to the structural characteristics of the individual components. Both methodologies resulted in equations involving identical descriptors with only a slight variation of their corresponding regression coefficients. These two approaches produced virtually equivalent QSPR models in a manner fully compatible with previously published work concerning the boiling points5 and the vapor pressures6 of simple organics. All four descriptors could be directly related to each of the three components of the enthalpy (heat) of vaporization. ’ ASSOCIATED CONTENT

bS

Supporting Information. Normal boiling points of the individual components and azeotropes, molar fractions of the individual components, and predictions of boiling points according to the general models of Tables 5 and 6 (Table SI1). This material is available free of charge via the Internet at http://pubs. acs.org.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected]fl.edu.

’ REFERENCES (1) Sheridan, R. P. J. Chem. Inf. Comput. Sci. 2000, 40, 1456. (2) Ajmani, S.; Rogers, S. C.; Barley, M. H.; Livingstone, D. J. J. Chem. Inf. Model. 2006, 46, 2043. (3) Ajmani, S.; Rogers, S. C.; Barley, M. H.; Burgess, A. N.; Livingstone, D. J. QSAR Comb. Sci. 2008, 27, 1346. (4) Abu-Eishah, S. I.; Darwish, N. A.; Aljundi, I. H. Int. J. Thermophys. 1998, 19, 239. (5) Aslam, N.; Sunol, A. K. Phys. Chem. Chem. Phys. 2004, 6, 2320. (6) Fotouh, K.; Shukla, K. Chem. Eng. Commun. 1998, 166, 35. (7) Yuan, W.; Hansen, A. C.; Zhang, Q. Fuel 2005, 84, 943. (8) Ritter, J. A.; Pan, H. H.; Balbuena, P. B. Langmuir 2010, 26, 13968. (9) Orbey, H.; Sandler, S. I. AIChE J. 1996, 42, 2327. (10) Larsen, B. L.; Rasmussen, P.; Fredenslund, A. Ind. Eng. Chem. Res. 1987, 26, 2274. (11) Prausnitz, J. M.; Tavares, F. W. AIChe J. 2004, 50, 739. (12) Tomasi, J. Theor. Chem. Acc. 2004, 112, 184. (13) Cramer, C. J.; Truhlar, D. G. Acc. Chem. Res. 2008, 41, 760. (14) Klamt, A.; Eckert, F.; Arlt, W. Annu. Rev. Chem. Biomol. Eng. 2010, 1, 101. (15) Punnathanam, S; Monson, P. A. J. Chem. Phys. 2006, 125, 024508. (16) Zhu, S; Elcock, A. H. J. Chem. Theory Comput. 2010, 6, 1293. (17) Demirel, Y. Thermochim. Acta 1999, 339, 79. (18) Katritzky, A. R.; Lobanov, V. S.; Karelson, M. J. Chem. Inf. Comput. Sci. 1998, 38, 28. (19) Katritzky, A. R.; Slavov, S.; Dobchev, D.; Karelson, M. Comput. Chem. Eng. 2007, 31, 1123. (20) Azeotrope Databank, http://ecosse.org/chem_eng/azeotrope_bank.html. Accessed February 13, 2010. (21) HyperChem, version 8.0, 2007; Hypercube Inc.: Gainesville, FL; www.hyper.com. Accessed December 6, 2009. (22) Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107, 3902. (23) CODESSA Pro Software; University of Florida: Gainesville, F:, 2002; www.codessa-pro.com. Accessed January 11, 2010. (24) Karelson, M. Molecular Descriptors in QSAR/QSPR; WileyInterscience: New York, 2000. (25) Statistica, version 6.0, 2004; StatSoft Inc.: Tulsa, OK; www. statsoft.com. Accessed November 5, 2005. (26) Katritzky, A. R.; Kuanar, M.; Slavov, S.; Hall, C. D.; Karelson, M.; Kahn, I.; Dobchev, D. A. Chem. Rev. 2010, 110, 5714. (27) Eriksson, L.; Jaworska, J.; Worth, A. P.; Cronin, M. T. D.; McDowell, R. M.; Gramatica, P. Environ Health Perspect. 2003, 111, 1361.

3479

dx.doi.org/10.1021/jp104287p |J. Phys. Chem. A 2011, 115, 3475–3479