ARTICLE pubs.acs.org/IECR
Quantitative StructureProperty Relations (QSPRs) for Predicting the Standard Absolute Entropy (S°298 K) of Gaseous Organic Compounds Lailong Mu* and Hongmei He School of Chemistry & Chemical Engineering, Xuzhou Normal University, Xuzhou, Jiangsu 221116, People’s Republic of China Xuzhou College of Industrial Technology, Xuzhou, Jiangsu 221006, People’s Republic of China
bS Supporting Information ABSTRACT: To predict the standard absolute entropies of gaseous organic compounds, the variable molecular connectivity index (mχ0 ) and Ring parameter (H), based on adjacency matrix of molecular graphs, variable atomic valence connectivity index (δi0 ), and the numbers of atomic chains (cycles) of molecule niR were proposed. The optimal values of parameters b, c, mi, and y included in the definition of δi0 , and mχ0 can be found by using an optimization method. When b = 1.3, c = 0.91, and y = 0.22, a good four-parameter model can be constructed from H and mχ0 by using the best subsets regression analysis method for the standard absolute entropies of gaseous organic compounds. The correlation coefficient (r), standard error (s), and average absolute deviation (AAD) of the multivariate linear regression (MLR) model are 0.9988, 8.16 J K1 mol1, and 6.13 J K1 mol1, respectively, for the 726 gaseous organic compounds (training set). The AAD of predicted values of the standard absolute entropy of another 364 gaseous organic compounds (test set) is 6.14 J K1 mol1 for the MLR model. The results show that the MLR method can provide an accurate model for the prediction of the standard absolute entropies of gaseous organic compounds.
1. INTRODUCTION The quantitative structureproperty/activity relationship (QSPR/QSAR) studies of organic compounds have been a focus of great attention by scientists for a long time.13 Large numbers of QSPR/QSAR models have been developed by using various model parameters to describe and predict the physical properties and biological activities of organic compounds from their molecular structures. Topological indices (TIs) play an important role in these research fields related to the prediction of physical, chemical, and biological propertied of organic molecules. The uses of these molecular descriptors have permitted the prediction of many properties of chemical, pharmaceutical, toxicological, and environmental relevance.47 In these studies, the TIs used can be considered as “classic” molecular descriptors, some of these “classic” TIs were created in contexts that are very different from the QSPR/QSAR research fields. Thus, there is no guarantee that the use of these “classical” TIs can availably predict a property/activity for a discretionary set of compounds, even for the same series of compounds originally used to define the TI. Generally, two approaches are used to try to modify the lacked of correlation of one particular TI with an experimental property. One of these approaches is to try with another TI, with the hope that this index will give a better correction. Another is to try by combining several TIs into a MLR model. Instead of these nooptimal approaches, we have proposed the use of an optimization approach for the KierHall index.7 Here, we will explain in detail this approach, and we will use it to optimize models to predict the standard absolute entropy of gaseous organic compounds. ° K) represents thermoThe absolute standard entropy (S298 dynamic data of special significance, forging the link between enthalpy and Gibbs energy, which is the true arbiter of chemical equilibrium and stability in processes whose outcome is determined r 2011 American Chemical Society
by thermodynamic (as opposed to kinetic) considerations. While enthalpy data are widely published814 or can be estimated1520 for many compounds, entropy values are often unavailable when simple estimation procedures become particularly useful. Latimer21 reported an additive method for the estimation of the standard entropies of solids, monatomic aqueous ions, and nonpolar molecules, based on summation of elemental contributions obtained from the equation 3 R ln M 3:93 ð1Þ S°298 J K 1 mol1 ¼ 2 where R is the gas constant (R = 8.314 J K1 mol1) and M is the atomic mass of the element in question. Contributions from anions are dependent on the charge residing on the cation. A more-complex summation procedure was used for inorganic compounds by Fyfe et al.,22 who employed the entropy and volume of component oxides to derive an estimate for multiple oxide phases, effectively applying a “volume correction” to the individual oxide entropy contributions. Rihani and Doraiswarmy23 using the group contribution methods (GCMs) introduced an effective way for estimation of absolute entropy of ideal gas. Duchowicz et al.24 reported that the standard entropies of acyclic and aromatic compounds can be estimation, on the basis of the fundamental concepts on molecular structure such as the count of atoms and types of chemical bonds. Jenkins et al.25,26 reported that the formula unit volume (Vm) can be employed for general ° K for inorganic compounds, estimation of the standard entropy, S298 Received: February 17, 2011 Accepted: June 1, 2011 Revised: May 23, 2011 Published: June 01, 2011 8764
dx.doi.org/10.1021/ie2003335 | Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research
ARTICLE
organic liquids and solids, through three simple linear correlations between entropy and molar volume. Vm can be obtained from several possible sources, or, alternatively, the density (F) may be used as the source of data. The approach can also be extended to estimate entropies for hypothesized compounds. Mu and co-workers,2730 Feng and co-workers,31,32 etc., by using various model parameters, have developed some QSPR/QSAR models to estimate the entropies of inorganic compounds, alkanes, and chain hydrocarbons. Recently, we have developed some accurate QSPR/QSAR models to estimate the diamagnetic susceptibilities of organic compounds by constructing variable atomic valence connectivity index δ0i and variable molecular connectivity index mχ0 .3337 In this study, to construct the accurate model for predicting the standard absolute entropies of gaseous organic compounds, the atomic valence connectivity index (δ0i ) was renewedly modified using some parameters based on our formal works. A new variable molecular connectivity index (mχ0 ) were proposed based on the KierHall index7 and δ0i . The optimal values of these parameters can be found using an optimization method. In addition, the Ring parameter (H) was proposed based on the numbers of chains (cycles) atomic of molecule (niR). The optimal novel molecular connectivity index, together with Ring parameter H, has a better correlation for the standard absolute entropies of gaseous organic compounds.
2. VARIABLE MOLECULAR CONNECTIVITY INDICES AND RING PARAMETER Molecular valence connectivity indices have been widely used as structure descriptors.7 The general expression for the mthorder molecular valence connectivity index is as follows: m v χk
¼
nm
∑ j¼1
mY þ1 i¼1
" δi ¼ c
∑
j¼1
δvi
ð2Þ j
Zvi hi Zi Zvi 1
ðZvi hi Þmi
ð3Þ
#
xi ðZi Zvi þ 0:9Þb 2:551
i¼1
ð4Þ
j
where parameter m is the order of the molecular connectivity index, y is a variable, its optimal value can be found by optimization method. The 0χ0 , 1χ0 , 2χ0 , 3χp0 , 3χc0 , 4χp0 , 4χc0 , 5χp0 , 4χpc0 , 5 χpc0 , and χch0 of 1090 gaseous organic compounds were calculated using the MATALAB program for various values of parameters b, c, and y. The Ring parameter (H) is defined as follows: H ¼ ðln nRi Þ2:5
where parameter Zi is the number of electrons of atom i, Zvi the number of valence electrons, and hi the number of hydrogen atoms connected to atom i. In our study, it was found that the δvi value does not distinguish the precise chemical environment of an atom. For instance, the δvi values of the carbon atom all are 3 in atom groups “sssCH”, “dsCH”, “dsCH(c)”, and “tCH”. (Symbols: “s” = single bond, “d” = double bond, “t” = triple bond, “c” = conjugated). Therefore, we defined a novel δ0i value for all atoms by a unity formula, according to the structural characteristics of different atoms. The δ0i value is defined as 0
The values of parameters mi, b, and c (in the conjugated πelectron system) are variable and the optimal values can be found using an optimization method. The parameters xi, mi, and δ0i (for b = 1.3, c = 0.91 (in the conjugated π-electron system)) values for some atoms in different atom groups are calculated from eqs 4 and 4a and are listed in Table 1. When δ0i is used instead of the δvi , the novel connectivity index m 0 χ can be defined as follows: ! y mY þ1 nm 0 m 0 χk ¼ δi ð5Þ
! 1=2
where parameter m is the order of the molecular valence connectivity index, and k denotes a contiguous-path type of fragment, which is divided into paths (P), clusters (C), path/ clusters (PC), and chains (cycles) (CH). Parameter nm is the number of the relevant paths, and δvi is the atomic valence connectivity index and is defined as δvi ¼
where parameter Zi is the number of electrons of atom i, Zvi the number of valence electrons, hi the number of hydrogen connected with atom i, and xi the orbital electronegativity.38 The constants 0.9 and 2.551 represent the experimential value and the sp3 hybrid orbital electronegativity of carbon atom. Parameter c, the atomconjugated emendation parameter, in the nonconjugated π-electron system, is defined as c = 1. The δ0i value of atom i in the nonconjugated π-electron system can be defined as follows: " # 0 ðZvi hi Þmi xi ð4aÞ δi ¼ ðZi Zvi þ 0:9Þb 2:551
ð6Þ
where the parameter niR is the number of atomic chains (cycles) of a molecule (for a noncycle molecule, niR = 1) and the constant 2.5 is an experimential parameter. For example, the niR values are 3 and 1 for methyl-cyclopropane and butane, respectively.
3. DATASET The quantitative structureproperty relationships (QSPRs) research started with the collection of the dataset. The experimental absolute standard entropy (S°298 K) data were gathered from refs 8, 9, 13, and 14. A total of 1090 gaseous organic compounds were selected as the dataset (see TS1 and TS2, which are given as Supporting Information). The quality and robustness of the predictive power of a QSPR model are heavily dependent on the diversity of the dataset. To select significant descriptors for the QSPR model, which captures all the underlying interaction mechanisms, it is advisible to represent as many structural features as possible in the dataset. The working dataset included hydrocarbons, nonhydrocarbons, and their substituted compounds. 4. SEARCH FOR OPTIMAL VALUES OF PARAMETERS b, c, y, AND mi To find the optimal values of the parameters b, c, and y for the four-parameter regression model of the absolute standard entropies of 1090 gaseous organic compounds, we initially varied b, c, and y in the intervals [1.0, 1.6], [0.88, 0.94], and [0.16, 0.26], respectively. The interval is selected according to the results of our pretest. The comparison of the four-parameter regression 8765
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research
ARTICLE
Table 1. Atomic Attributes and δ0i Values for Organic Compounds (b = 1.3 and c = 0.91) No.
atom groupa
Zi
Zvi
hi
mi
xi
δ0i
No.
atom group
Zi
Zvi
hi
mi
xi
δ0i
1
sCH3
6
4
3
1
2.551
0.2505
24
dNH
7
5
1
0.4
3.32
0.5217
2
ssCH2
6
4
2
1
2.551
0.5011
25
dNH(c)b
7
5
1
0.4
3.32
0.5934
3
sssCH
6
4
1
1
2.551
0.7516
26
dsN
7
5
0
0.6
3.32
0.9782
4
ssssC
6
4
0
1
2.551
1.0022
27
dsN(c)b
7
5
0
0.6
3.32
0.8902
5
dCH2
6
4
2
1
2.64
0.5186
28
tN
7
5
0
1.5
3.515
2.5892
6
dCH2(c)b
6
4
2
1
2.64
0.4719
29
ddN
7
5
0
1
3.515
1.7261
7
dsCH
6
4
1
1.1
2.64
0.8556
30
ddN(c)b
7
5
0
1
3.515
1.5708
8 9
dsCH(c)b dssC
6 6
4 4
1 0
1.1 0.9
2.64 2.64
0.7786 0.9334
31 32
tsN ddsN
7 7
5 5
0 0
0.8 1
3.515 3.32
1.3809 1.6303
10
dssC(c)b
6
4
0
0.9
2.64
0.8494
33
ddsN(c)b
7
5
0
1
3.32
1.4836
11
ddC
6
4
0
1.6
2.818
1.7713
34
dssN
7
5
0
1
3.32
1.6303
12
tCH
6
4
1
1.6
2.818
1.3285
35
dssN(c)b
7
5
0
1
3.32
1.4836
13
stC
6
4
0
1.4
2.818
1.5499
36
sSH
16
6
1
0.28
2.101
0.0517
14
sOH
8
6
1
0.2
2.747
0.2698
37
ssS
16
6
0
1.2
2.101
0.2657
15
ssO
8
6
0
0.7
2.747
1.1331
38
dS
16
6
0
1.1
2.761
0.3201
16 17
dO dO(c)b
8 8
6 6
0 0
0.7 0.7
3.751 3.751
1.5473 1.4080
39 40
dS(c)b dssS
16 16
6 6
0 0
1.1 4
2.761 2.761
0.2913 1.1639
18
sNH2
7
5
2
0.35
2.929
0.3021
41
dssS(c)b
16
6
0
4
2.761
1.0591
19
sNH2(c)b
7
5
2
1
3.32
0.8902
42
ddssS
16
6
0
14
2.596
3.8302
20
ssNH
7
5
1
1
2.929
1.1507
43
sF
9
7
0
0.93
3.515
2.2474
21
ssNH(c)b
7
5
1
1
3.32
1.1869
44
sCl
17
7
0
1.1
2.626
0.3552
22
sssN
7
5
0
1
2.929
1.4383
45
sBr
35
7
0
1
2.404
0.0832
23
sssN(c)b
7
5
0
1
3.32
1.4836
46
sI
53
7
0
0.9
2.121
0.0352
Symbols: “s” = single bond, “d” = double bond, and “t” = triple bond. b The term “(c)” denotes that the atomic group is in the conjugated π-electron system. a
models based on the Ring parameter H and the novel connectivity index mχ0 from different pairs (b, c, and y) is made based on their standard error of estimate. The best subsets regression analysis method is applied to select optimal values of the parameters b, c, and y for linear QSPR models. The standard error of the four-parameter regression models, which are constructed by the best subsets regression of the absolute standard entropies, S°298 K, versus H, 0χ0 , 1χ0 , 2χ0 , 3χp0 , 3 0 4 0 4 0 5 0 4 χc , χp , χc , χp , χpc0 , 5χpc0 , and χch0 of 1090 gaseous organic compounds for the various values of the parameters c, b, and y are listed in TS3 (given in the Supporting Information). The boldfaced values in TS3 (given in the Supporting Information) are the local minimum of the standard error of estimate for different c and y. Figure 1 graphically illustrate the optimization results of parameter b and y for c = 0.91. The optimal values of parameters mican be found by the same algorithm. To avoid repetition, the details will not be described. The optimal values of parameters mi are listed in Table 1.
5. MULTIVARIATE LINEAR REGRESSION MODEL In order to obtain an effective QSPR model, the dataset was divided into two datasets: a training set and a test set. The training and test sets represented 66.67% (726 data points) and 33.33% (364 data points), respectively, of the dataset. The k-Means Cluster Analysis (k-MCA) may be used in designing the training and test sets.3943 From the results of best subset regression analysis, it has been found that only four indices (H, 1χ0 , 2χ0 , and 3χp0 ) can be included in the regression models.
Figure 1. The results of optimization of parameter b and y for c = 0.91.
So, only the four indexes will be considered in the k-MCA. A k-MCA splits the 1090 gaseous organic compounds in eight clusters with 72, 172, 128, 40, 192, 131, 199, and 156 members, respectively. The main results of the k-MCA for the 1090 gaseous organic compounds are depicted in Table 2. The selection of the training sets and test sets was carried out by taking, in a random way, compounds that belong to each cluster. MLR analysis was carried out by MATLAB and SPSS. The regression of the standard absolute entropies, S°298 K versus H, 1χ0 , 8766
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research
ARTICLE
Table 2. Main Results of the k-Means Cluster Analysis for the 1090 Gaseous Organic Compounds Variance Analysis molecular connectivity index H
between SSa
within SSb
Fisher ratio, F
p-levelc
283.17
4.07
69.63
0.00
χ 2 0 χ
5119.10 8676.27
3.10 3.83
1653.37 2268.41
0.00 0.00
χp0
16831.20
5.50
3061.26
0.00
1 0
3
a
Variability between groups. b Variability within groups. c Level of significance.
Figure 3. Distribution of the predicting error, using the MLR model described by eq 7.
agreement between the experimental and calculated standard absolute entropy values for the 1090 gaseous organic compounds with diverse structures. The histogram of predicting errors is shown in Figure 3, where a near-Gaussian error distribution curve centered at zero is seen.
Figure 2. Plot of calculated versus experimental values of standard absolute entropies, using the MLR model described by eq 7. 2 0
χ , and 3χp0 the 726 gaseous organic compounds (training set), for the parameters b = 1.3, c = 0.91, and y = 0.22, resulted in a best model. The model is shown as So298 K J K 1 mol1 ¼ 176:1190 20:7266H þ 35:00401 χ0 3:36122 χ0 1:59023 χp
0
ð7Þ
with the following parameters: n is the number of organic compounds included in the model (n = 726), r the correlation coefficient (r = 0.9988), r2 the square of the correlation coefficient (r2 = 0.9976), s the standard deviation of the regression (s = 8.16 J K1 mol1), F the Fisher ratio (F = 75632.26), AIC the Akaike’s information criterion (AIC = 67.21), and FIT the Kubinyi function (FIT = 407.72).44,45 The calculated results from the model described by eq 7 for 726 gaseous organic compounds are shown under the “Cal.1” column in TS1 (given in the Supporting Information). Finally, to test the prediction ability of the model described by eq 7, the standard absolute entropies of another 364 gaseous organic compounds (the test set) were calculated from the model described by eq 7 and are shown under the “Cal.1” column in TS2 (given in the Supporting Information). The correlation coefficient and the standard deviation are r = 0.9986 and s = 8.47 J K1 mol1. The plot of calculated the standard absolute entropies versus experimental data for all the organic compounds in this study is shown in Figure 2. This figure shows good
6. RESULTS AND DISCUSSION From Figure 1, it can be found that the local optimal values of parameter y are 0.24, 0.24, 0.23, 0.22, 0.21, 0.20, and 0.19 for b = 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, and 1.6, respectively; the pair (b = 1.3, y = 0.22) has the smallest standard deviation (s = 8.2626 J K1 mol1) for c = 0.91. In the same way, it can be found that the pairs (b = 1.3, y = 0.22) all have the least standard deviations (s = 8.2454, 8.2349, 8.2306, 8.2325, 8.2402, and 8.2535 J K1 mol1) for c = 0.88, 0.89, 0.90, 0.92, 0.93, and 0.94, respectively. Thus, it can be seen that a best model, with the smallest standard deviation (s = 8.2306 J K1 mol1) can be constructed from the pair (b = 1.3, c = 0.91, and y = 0.22). In other words, the optimal values of parameters b, c, and y are 1.3, 0.91, and 0.22 in this study. From the definition of the δ0i , it is known that the δ0i values are not the same as the original atomic valence connectivity index δvi . The δ0i parameter defined in this paper varies for different atoms, because of various electron numbers, valence electron numbers, the number of hydrogen atoms connected to atom i, the values of parameter mi, and the orbital electronegativities. For the different atom groups of an atom in the same valences, the δ0i values are unequa, because of different parameters hi, mi, or xi. For example, all parameters hi are 1, so the δvi value is 3 for “sssCH”, “dsCH”, and “tCH”. However, because the values of parameter xi are 2.551, 2.64, and 2.818, and the values of parameter mi are 1.0, 1.1, and 1.6, the δ0i values are 0.7516, 0.8556, and 1.3285, respectively, for b = 1.3. Besides, the δ0i of the atom in the conjugated π-electron system is multiplied by a factor of c, because the π-electron not only belongs to an atom but also is shared by more atoms in these conjugated system. For example, the δ0i values of “dsCH” and “dsCH(c)” are different; they are 0.8556 and 0.7786, respectively, for b = 1.3 and c = 0.91. These facts show that the definition of the new δ0i parameter of a skeletal atom expresses both electronic and topological information and can reflect the different chemical environment of the given atom. At the same time, from the definitions of the variable connectivity index mχ0 , it can be found that the basic features 8767
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research
ARTICLE
Table 3. Statistical Parameters of the Model Described by eq 7 for Several Random Splits Training Set
Table 4. Statistics of Different Subsets for the MLR Model Described by eq 7 number of
Test Set subsets
compounds
r
r2
s
No.
r
r2
s
r
r2
s
1
0.9988
0.9975
8.12
0.9987
0.9974
8.48
aliphatic hydrocarbons aromatic hydrocarbons
331 90
0.9986 0.9973 0.9985 0.9970
7.38 6.99
2
0.9987
0.9975
8.11
0.9987
0.9975
8.53
alchols, ethers, phenols
76
0.9990 0.9980
7.87
3 4
0.9987 0.9988
0.9974 0.9975
8.23 8.15
0.9988 0.9987
0.9976 0.9974
8.28 8.44
aldehydes, ketones, acids, anhydrides
44
0.9946 0.9893
8.34
fluoride
57
0.9951 0.9902
6.86
5
0.9988
0.9975
8.14
0.9987
0.9974
8.47
chloride
69
0.9847 0.9697
8.13
6
0.9988
0.9975
8.25
0.9987
0.9974
8.23
bromide
50
0.9881 0.9763
8.51
iodide
23
0.9916 0.9832
5.92
amine nitrile, cyanide, diazo-compound
24 22
0.9854 0.9710 0.9888 0.9778
7.93 7.43
sulfide
126
0.9997 0.9994
6.02
others
178
0.9980 0.9960
6.78
of original molecular connectivity indexes are maintained in our variable valence molecular connectivity indices. When y = 0.5, the variable molecular connectivity index mχ0 is reduced to the molecular valence connectivity index mχvk. Therefore, the variable valence molecular connectivity indices preserve the features of the original molecular connectivity indices; it may be useful for QSAR and QSPR studies. It is useful to be able to estimate standard entropy data. First, there is a paucity of standard entropy data for compounds in standard thermochemical tables.814 Second, experimental de° K), by calorimetry, is termination of the absolute entropy (S298 both a lengthy and nontrivial procedure. Such measurements are no longer a fashionable science and, for this reason, increasing reliance must be placed on estimation techniques for thermochemical data. The procedure can be applied for new (or even hypothetical) gaseous organic compounds, as well as for alreadysynthesized, existing gaseous organic compounds. The approach’s ultimate importance is in its use to estimate changes in the Gibbs energy for reactions. In the estimation of Gibbs energy data via the TΔS contribution to the ΔG term (in units of kJ mol1) at 298 K, the value of ΔS (given in units of J K1 mol1), derived as the absolute standard entropy differences of products and reactants, is multiplied by the factor T/(1000 K) = 0.298. Effectively, a consequence of this factor is that a larger error can be tolerated in the standard entropies. This significant point renders the correlation reported here of enhanced value. Furthermore, the TΔS term in ΔG is generally quite small, relative to ΔH at or near room temperature, where much of chemistry is studied, so that rule-of-thumb procedures are likely to be proven suitable, even when the entropy may be somewhat in error. The theoretical prediction of the standard entropy for gaseous organic compound is a complicated task that requires information both from the electron, atom, and molecule level. Obviously, the limiting factor here is not in the use of particular QSPR method but rather the molecular descriptors that fail to account for all the details of the underlying system. The result of MLR analysis has shown that the linear model 7 is a good fit, and the correlation coefficient (r) is 0.9988. The model that is described by eq 7 explains more than 99.76% of the variance in the experimental values of the absolute standard entropy for these organic compounds. From TS1 and TS2 (given in the Supporting Information), one can observe that there are 65 (5.96%) compounds showing deviations greater than two standard errors (16.32 J K1 mol1). The greatest error in this dataset is observed with cyclopropane; the present approach gives an error of 38.91 J K1 mol1. As shown in column Cal.1 in TS1 (given in the Supporting Information), the calculated values agree well with the available
experimental ones. The AAD of 726 gaseous organic compounds (the training set) is 6.13 J K1 mol1. The model that is described by eq 7 has been verified by cross validation, using the leave-one-out method; the correlation coefficient rcv and the normal r are 0.9988 and 0.9988, respectively, and the standard deviation scv and s are 8.23 and 8.16 J K1 mol1, respectively. These data reveal that the result of the cross validations for the model described by eq 7 is very close to the normal result of the model described by eq 7, which means that the model constructed in this work is stable. To test the prediction ability of the model described by eq 7, the standard absolute entropies of another 364 gaseous organic compounds were calculated from the model that is described by eq 7. The calculated values of the standard absolute entropies of the 364 gaseous organic compounds are listed in column Cal.1 in TS2 (given in the Supporting Information). The predicted values agree well with the experimental values, and the AAD is 6.14 J K1 mol1. Finally, to prove the stability of the model described by eq 7, the model has been estimated for several random splits into a training set (726 data points) and a test set (364 data points); the results are shown in Table 3. From Table 3, one can determine that the correlation coefficients, the square of the correlation coefficients, and the standard deviations of splits are very close to each other for the training set and the test set. The results show that the model that is described by eq 7 is very stable. In this paper, the working dataset contained eight nonhydrogen elements (C, N, O, S, F, Cl, Br, and I) and included hydrocarbons, nonhydrocarbons, and their substituted compounds. Therefore, our method should have a broad application domain. On the other hand, the range of the experimental values of the standard absolute entropies is large, from 200 J K1 mol1 to 1000 J K1 mol1. In this case, it is necessary to analyze the statistics of different subsets and compare the values of the standard deviation (s), the correlation coefficient (r), and the square of the correlation coefficient (r2). The analytical results of the model that is described by eq 7 for the different subset gaseous organic compounds are shown in Table 4. From Table 4, it can be found that the standard deviations are closest to the result of the model described by eq 7, although the r and the r2 values are slightly less than the result for the model described by eq 7 for some subsets. The results show that the the model described by eq 7 is sufficiently good. In other words, the model 8768
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research
ARTICLE
Figure 5. Distribution of predicting error, using the MLR model that is described by eq 8. Figure 4. Plot of calculated versus experimental values of standard absolute entropies, using the MLR model that is described by eq 8.
that is described by eq 7, constructed from the variable molecular connectivity index (mχ0 ) and the Ring parameter (H), can be used to predict the standard absolute entropies of gaseous organic compounds with extensive structural diversities. To explain the effect of the Ring parameter H, the best fourparameter regression model has been constructed by the best ° K) versus subsets regression of the absolute standard entropy (S298 0 0 1 0 2 0 3 0 3 0 4 0 4 0 5 0 4 χ , χ , χ , χp , χc , χp , χc , χp , χpc0 , 5χpc0 , and χch0 of 726 gaseous organic compounds. The result is shown as follows: So298 K J K 1 mol1 ¼ 160:8373 þ 33:60731 χ0 2:61855 χp 0 1:68124 χpc 0 38:5026χch 1
ð8Þ
with n = 726, r = 0.9948, r = 0.9923, s = 17.11 J K mol1, F = 17047.87, AIC = 295.74, and FIT = 91.90. To test the prediction ability of the model described by eq 8, the absolute standard entropies of another 364 gaseous organic compounds were calculated from the model that is described by eq 8, and these are listed in the Cal.2 column of TS2 (given in the Supporting Information). The correlation coefficient, the standard deviation, and the AAD are r = 0.9923, s = 19.68 J K1 mol1, and 13.34 J K1 mol1, respectively. The plot of the calculated standard absolute entropies from the model that is described by eq 8 versus the experimental data for all the gaseous organic compounds in this study is shown in Figure 4. The histogram of predicting errors from model described by eq 8 for the 1090 gaseous organic compounds is shown in Figure 5. Comparison with the model that is described by eq 8 shows that the statistical parameters in the model that is described by eq 7 as the values of the r, r2, F, and FIT increased and as the value of s and AIC decreased. Figures 3 and 5 indicate that, via comparison with the model that is described by eq 8, the standard deviation of predicted values from the model described by eq 7 is decreased by 54.11% for 1090 gaseous organic compounds. The results show that the more accurate model can be constructed for the prediction of the standard absolute entropies of gaseous organic compounds, by combining the novel valence molecular connectivity indexesmχ0 and Ring parameter H values into a MLR model. The most common software packages used in chemical engineering design incorporate efficient algorithms for the prediction 2
of thermodynamic and physical properties of interest, by means of the GCM.4648 These techniques are easy to apply, relying solely on the sum of contributions of each molecular structure fragment to a given thermodynamic property, but it suffers from some drawbacks: one is that, in its basic form (without corrections), it cannot model isomeric structure; this is not a problem for small organic compounds, although the situation gets worse for bigger size compounds with increasing number of conformers. The second important associated problem is that there are not always measured data available to extend these methods to less-common compounds, such as molecules containing fused aromatic rings or to organometallic compounds. To compare our method with the GCM, the 54 compounds from ref 23 are selected as the investigation dataset, including hydrocarbon and nonhydrocarbon structures. The regression result of the calculated values of the GCM (from ref 23) versus the experimental values (from this paper) shows that the correlation coefficient, the standard deviation, and the AAD are r = 0.9655, s = 23.86 J K1 mol1, and AAD = 11.28 J K1 mol1. According to our method, the standard absolute entropies can be calculated from the model described by eq 7 for the 54 compounds. The correlation coefficient, the standard deviation, and the AAD are r = 0.9980, s = 5.92 J K1 mol1, and AAD = 4.74 J K1 mol1. The results show that the current method is more effective than the GCM for the prediction of the standard absolute entropies of organic compounds. The GCM mentioned above include the statistical mechanics term, R ln σ, where σ is the rotational symmetry number consisting of the both internal and external symmetry numbers. This term arises due to overcounting of the rotational contribution to the entropy for molecules with some rotational symmetry. When the term R ln σ is added to the model described by eq 7, a five-parameter regression model can be obtained. The model is shown as follows: So298 K J K 1 mol1 ¼ 179:1015 20:6132H 0
þ 34:86551 χ0 3:35022 χ0 1:60563 χp 0:5494R ln σ ð9Þ with n = 726, r = 0.9990, r = 0.9979, s = 7.63 J K mol1, F = 69192.19, AIC = 58.95, and FIT = 460.67. To test the prediction ability of the model that is described by eq 9, the absolute standard entropies of another 364 gaseous 2
8769
1
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research
ARTICLE
gaseous organic compounds (training set). The model is shown as follows: S°298 K J K 1 mol1 ¼ 181:3361 þ 5:6793SP þ 13:3823RBN þ 16:2609S1K þ 24:9314CIC2
Figure 6. Plot of calculated versus experimental values of standard absolute entropies, using the MLR model that is described by eq 10.
Figure 7. Distribution of predicting error, using the MLR model that is described by eq 10.
organic compounds were calculated from the model that is described by eq 9. The correlation coefficient, the standard deviation, and the AAD are r = 0.9987, s = 8.00 J K1 mol1, and 5.88 J K1 mol1. Comparison with the model that is described by eq 7 shows that the statistical parameters in the model described by eq 9 improved as the value of r, r2, and FIT increased, as the value of F, s, and AIC decreased. The results show that the prediction ability of the model that is described by eq 9 could be slightly improved by including the term R ln σ. To compare the novel valence molecular connectivity indexes and the Ring parameter H with the other different types of descriptors from the molecular structure, the 1664 descriptors for the 1090 gaseous organic compounds are calculated using Dragon software that was provided by Talete srl. By excluding constant, near-constant, and highly correlation variables, 760 descriptors are obtained and used to perform the MLR analysis. The best four-parameter regression model can be constructed by SP (sum of atomic polarizabilities (scaled on the carbon atom)), RBN (number of rotatable bonds), S1K (1-path Kier alphamodified shape index), and CIC2 (complementary information content (neighborhood symmetry of 2-orders)) for the 726
ð10Þ
with n = 726, r = 0.9978, r2 = 0.9957, s = 11.00 J K1 mol1, F = 41555.41, AIC = 122.08, and FIT = 224.02. To test the prediction ability of the model that is described by eq 10, the absolute standard entropies of another 364 gaseous organic compounds were calculated from the model described by eq 10, and these are listed in the Cal.3 column of TS2 (given in the Supporting Information). The correlation coefficient, the standard deviation, and the AAD are r = 0.9976, s = 10.93 J K1 mol1, and AAD = 8.21 J K1 mol1. The plot of the calculated standard absolute entropies from the model that is described by eq 10 versus the experimental data for all the organic compounds in this study is shown in Figure 6. The histogram of predicting errors from model that is described by eq 10 for the 1090 gaseous organic compounds is shown in Figure 7. Comparison with the model described by eq 10 shows that the statistical parameters in the model that is described by eq 7 improved as the value of the r, r2, F, and FIT increases, and as the value of s and AIC decreases. Figures 3 and 7 show that, via comparison with the model that is described by eq 10, the standard deviation of predicted values from the model that is described by eq 7 is decreased by 24.86% for 1090 gaseous organic compounds. The results show that the novel valence molecular connectivity indexes, together with the Ring parameter H, are more accurate than the other different types of classic molecular descriptors for the prediction of the standard absolute entropies of organic compounds.
7. CONCLUSION To predict the standard absolute entropy of an organic compound, a variable molecular connectivity index (mχ0 ) and the Ring parameter (H) have been proposed, based on the adjacency matrix of molecular graphs, the variable atomic valence connectivity index δ0i , and the numbers of atomic chains (cycles) of molecule niR. The optimal values of parameters b, c, mi, and y included in definition of δ0i , and mχ0 can be found using an optimization method. When b = 1.3, c = 0.91, and y = 0.22, a good four-parameter QSPR model for the standard absolute entropies can be constructed from H and mχ0 , using the best subsets regression analysis method. The correlation coefficient r, standard error s, and average absolute deviation (AAD) of the MLR model are 0.9988, 8.16 J K1 mol1, and 6.13 J K1 mol1, respectively, for the 726 gaseous organic compounds (the training set). The cross validation, using the leave-one-out method, demonstrates that the MLR model is highly reliable from the point of view of statistics. The AAD of predicted values of the standard absolute entropies of another 364 gaseous organic compounds (the test set) is 6.14 J K1 mol1 for the MLR model. The results show that the MLR method can provide an accurate model for the prediction of the standard absolute entropies of gaseous organic compounds. ’ ASSOCIATED CONTENT
bS
Supporting Information. Experimental and calculated standard absolute entropies of 726 gaseous organic compounds (the training set) and another 364 gaseous organic compounds (the test set), values of the standard error of estimate for various
8770
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research values of the parameters b, c, and y. (PDF) This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
*Tel: þ86-516-83403061. Fax: þ86-516-83403067. E-mail address:
[email protected].
’ ACKNOWLEDGMENT This work is supported by the university natural science foundation of Jiangsu Province in China (through Contract Grant No. 04KJD150195). The authors express our gratitude to the referees for their value comments. ’ REFERENCES (1) Kier, L. B.; Hall, H. Molecular Connectivity in Chemistry and Drug Research; Academic Press: New York, 1976; p 181. (2) Balaban, A. T. Applications of Graph in Chemistry. J. Chem. Inf. Comput. Sci. 1985, 25, 334–343. (3) Balaban, A. T. Chemical Graphs: Looking Back and Glimpsing Ahead. J. Chem. Inf. Comput. Sci. 1995, 35, 339–350. (4) Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17–20. (5) Randic, M. Characterization of Molecular Branching. J. Am. Chem. Soc. 1975, 97, 6609–6615. (6) Kier, L. B.; Hall, L. H. Molecular Connectivity in Structure Activity Analysis; Research Studies: Letchworth, England, 1986. (7) Devillers, J.; Balaban, A. T. Topological Indices and Related Descriptors in QSPR and QSAR; Gordon and Breach: Amsterdam, 1999. (8) David, R. L.; Grace, B.; Lev, I. B.; Robert, N. G.; Henry, V. K.; Kozo, K.; Gerd, R.; Dana, L. R.; Daniel, Z. CRC Handbook of Chemistry and Physics, 85th Edition; CRC Press: Boca Raton, FL, 2002; pp 5-55-60 (9) Dean, J. A. Langes Handbook of Chemistry, 15th Edition; McGrawHill: New York, 1999; pp 6.816.123. (10) Wagman, D. D.; Evans, W. H.; Parker, V. B.; Schumm, R. H.; Nutall, R. L. Selected Values of Chemical Thermodynamic Properties; U. S. Department Commerce, National Bureau of Standards: Washington, DC, 1982. (11) Robie, R. A.; Hemingway, B. S.; Fisher, J. R. Thermodynamic Properties of Minerals and Related Substances at 298.15 K and 1 bar (105 Pascals) Pressure and at Higher Temperatures; Geological Survey Bulletin 1452; U.S. Government Printing Office: Washington, DC, 1978. (12) Saxena, S. K.; Chatterjee, N.; Fei, Y.; Shen, G. Thermodynamic Data on Oxides and Silicates; Springer: Berlin, 1993. (13) Scientific database, http://www.enginchem.csdb.cn/sdb_ 2004/all_thermochemistry.html. (14) Thermophysical properties database, http://www.fiz-chemie. de/infotherm/servlet/infothermSearch. (15) Jenkins, H. D. B.; Roobottom., H. K.; Passmore, J.; Glasser, L. Relationships among Ionic Lattice Energies, Molecular (Formula Unit) Volumes, and Thermochemical Radii. Inorg. Chem. 1999, 38, 3609– 3620. (16) Glasser, L.; Jenkins, H. D. B. Lattice Energies and Unit Cell Volumes of Complex Ionic Solids. J. Am. Chem. Soc. 2000, 122, 632–638. (17) Jenkins, H. D. B.; Tudela, D.; Glasser, L. Lattice Potential Energy Estimation for Complex Ionic Salts from Density Measurements. Inorg. Chem. 2002, 41, 2364–2367. (18) Jenkins, H. D. B.; Glasser, L. Ionic Hydrates, MpXq 3 nH2O: Lattice Energy and Standard Enthalpy of Formation Estimation. Inorg. Chem. 2002, 41, 4378–4388. (19) Jenkins, H. D. B.; Roobottom, H. K.; Passmore, J. Estimation of Enthalpy Data for Reactions Involving Gas Phase Ions Utilizing Lattice Potential Energies: Fluoride Ion Affinities (FIA) and pF Values of mSbF5(l) and mSbF5(g) (m = 1, 2, 3), AsF5(g), AsF5 3 SO2(c). Standard
ARTICLE
Enthalpies of Formation: ΔfH°(SbmF5mþ1,g) (m = 1, 2, 3), ΔfH°(AsF6,g), and ΔfH°(NF4þ,g). Inorg. Chem. 2003, 42, 2886–2893. (20) Christe, K. O.; Jenkins, H. D. B Quantitative Measure for the “Nakedness” of Fluoride Ion Sources. J. Am. Chem. Soc. 2003, 125, 9457–9461. (21) Latimer, W. M. Methods of Estimating the Entropies of Solid Compounds. J. Am. Chem. Soc. 1951, 73, 1480–1482. (22) Fyfe, W. S.; Turner, F. J.; Verhoogen, J. Metamorphic Reactions and Metamorphic Facies; Geological Society of America: Boulder, CO, 1958; Memoir 73. (23) Rihani, D. N.; Doraiswarmy, L. K. Estimation of the ideal gas entropy of organic compounds. Ind. Eng. Chem. Fundam. 1968, 7, 375–380. (24) Duchowicz, P. R.; Castro, E. A.; Fernandez, F. M.; Pankratov, A. N. QSPR evaluation of thermodynamic properties of acyclic and aromatic compounds. J. Argent. Chem. Soc. 2006, 94, 31–45. (25) Jenkins, H. D. B.; Glasser, L. Standard Absolute Entropy, S298°, Values from Volume or Density, 1. Inorganic Compounds. Inorg. Chem. 2003, 42, 8702–8708. (26) Glasser, L.; Jenkins, H. D. B. Standard absolute entropies, S298°, from volume or density, Part II. Organic liquids and solids. Thermochim. Acta 2004, 414, 125–130. (27) Mu, L. L.; Feng, C. J. Topological Research of Standard Entroplies of Alkanes. Chin. J. Chem. Phys. 2003, 16, 197–202. (28) Mu, L. L.; Feng, C. J. Topological Research on Standard Entropies of Alkali and Alkaline Earth Metal Compounds. Chin. J. Chem. Phys. 2003, 16, 19–24. (29) Mu, L. L.; Feng, C. J. Quantitative Structure Property Relations (QSPRs) for Predicting Standard Absolute Entropy, S298°, of Inorganic Compounds. MATCH 2007, 57, 111–134. (30) Mu, L. L.; He, H. M.; Feng, C. J. Topological Research on Standard Absolute Entropies, S298°, for Binary Inorganic Compounds. Chin. J. Chem. 2008, 26, 1201–1209. (31) Feng, C. J. Topological Researches on the F Center Energy Bands, Lattice Energy and Standard Entropy of Alkaline Halides. Chin. J. Struct. Chem. 2004, 23, 556–559. (32) Feng, C. J.; Mu, L. L. Edge valence topological study on standard entropies of chain hydrocarbons. Chin. J. Beijing Univ. Chem. Technol. 2005, 32 (6), 57–60. (33) Mu, L. L.; He, H. M.; Feng, C. J. Modeling Diamagnetic Susceptibilities of Organic Compounds with the Novel Connectivity Index. Ind. Eng. Chem. Res. 2008, 47, 2428–2433. (34) Mu, L. L.; He, H. M.; Feng, C. J Topological research on diamagnetic susceptibilities of organic compounds. J. Mol. Model. 2008, 14, 109–134. (35) Mu, L. L.; He, H. M.; Yang, W. H.; Feng, C. Improved QSPR Study of Diamagnetic Susceptibilities for Organic Compounds Using Two Novel Molecular Connectivity Indexes. Chin. J. Chem. 2009, 27, 1045–1054. (36) Mu, L. L.; He, H. M.; Yang, W. H.; Feng, C. Variable Molecular Connectivity Indices for Predicting the Diamagnetic Susceptibilities of Organic Compounds. Ind. Eng. Chem. Res. 2009, 48 (8), 4165–4175. (37) Mu, L. L.; He, H. M. The application of variable molecular connectivity indices for topological research on diamagnetic susceptibilities of organic compounds. Chin. J. Chem. Ind. Eng. 2009, 60, 1859–1872. (38) Li, G. Y. A new graduation of orbital electronegativity. Chin. Univ. Chem. 2006, 16 (5), 57–61. (39) Gonzalez, M. P.; Helguera, A. M.; Cabrera, M. A. Quantitative structureactivity relationship to predict toxicological properties of benzene derivative compounds. Bioorg. Med. Chem. 2005, 13, 1775– 1781. (40) Gonzalez, M. P.; Dias, L. C.; Helguera, A. M.; Rodriguez, Y. M.; de Oliveira, L. G.; Gomez, L. T.; Diaz, H. G. TOPS-MODE based QSARs derived from heterogeneous series of compounds. Applications to the design of new anti-inflammatory compounds. Bioorg. Med. Chem. 2004, 12, 4467–4475. (41) Molina, E.; Diaz, H. G.; Gonzalez, M. P.; Rodriguez, E.; Uriarte, E. Designing antibacterial compounds through a topological substructural approach. J. Chem. Inf. Comput. Sci. 2004, 44, 515–521. 8771
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772
Industrial & Engineering Chemistry Research
ARTICLE
(42) Gonzalez, M. P.; Gonzalez, Diaz, H.; Molina Ruiz, R.; Cabrera, M. A.; Ramos de Armas, R. TOPS-MODE based QSARS derived from heterogeneous series of compounds. Applications to the design of new herbicides. J. Chem. Inf. Comput. Sci. 2003, 43, 1192–1199. (43) Gonzalez, M. P.; Teran, C.; Fall, Y.; Diaz, L. C.; Morales, A. H. A topological sub-structural approach to the mutagenic activity in dental monomers. 3. Heterogeneous set of compounds. Polymer 2005, 46, 2783–2790. (44) Saiz-Urra, L.; Gonzalez, M. P.; Teijeira, M. QSAR studies about cytotoxicity of benzophenazines with dual inhibition toward both topoisomerases I and II: 3D-MoRSE descriptors and statistical considerations about variable selection. Bioorg. Med. Chem. 2006, 14, 7347–7358. (45) Saiz-Urra, L.; Gonzalez, M. P.; Teijeira, M. 2D-autocorrelation descriptors for predicting cytotoxicity of naphthoquinone ester derivatives against oral human epidermoid carcinoma. Bioorg. Med. Chem. 2007, 15, 3565–3571. (46) Predict program, http://www.mwsoftware.com/dragon/desc. html. (47) ChemEng Software Design, http://www.cesd.com/chempage. htm. (48) Artist program, http://www.ddbst.de/new/Win_DDBSP/ frame_Artist.htm.
8772
dx.doi.org/10.1021/ie2003335 |Ind. Eng. Chem. Res. 2011, 50, 8764–8772