Determination of Diffusion Coefficient of Organic Compounds in Water

Jan 4, 2012 - Hamza K. Khattak , Igor M. Svishchev ... Alireza Abbasi , Reza Eslamloueyan ... Farhad Gharagheizi , Amir H. Mohammadi , Dominique Richo...
2 downloads 0 Views 905KB Size
Article pubs.acs.org/IECR

Determination of Diffusion Coefficient of Organic Compounds in Water Using a Simple Molecular-Based Method Farhad Gharagheizi* Department of Chemical Engineering, Buinzahra Branch, Islamic Azad University, Buinzahra, Iran S Supporting Information *

ABSTRACT: In this study, a new simple three-parameter equation is presented for calculation/prediction of the diffusion coefficient of nonelectrolyte organic compounds in water at infinite dilution. The model variables include three molecular-based descriptors. The model is developed using the genetic function approximation (GFA) method. The GFA is applied to select the parameters of the model from more than 3000 molecular-based parameters. To propose a comprehensive and predictive model, 4728 pure chemical compounds are investigated. Furthermore, several statistical methods are implemented to evaluate the predictive power of the model. The root-mean-square of error and the average relative deviation of the model are approximately equal to 3.13 × 10−6 cm2·s−1 and 3.6%.

1. INTRODUCTION The term “molecular diffusion” is defined as the net transport of material within a single phase as a result of its nonuniform distribution.1 The molecular diffusion in binary mixtures is usually described by the well-known Fick’s laws. The first Fick’s law is proposed for steady (time-independent) problems and is defined as

JAx = − DAB

∂cA ∂x

◦ DAB =

where JAx is mass flux of substance A in the x direction, and DAB is the diffusion coefficient of A in B. Since, the majority of processes in the real word are unsteady (time-dependent), the second Fick’s law should be implemented as follows:2−4

(2)

◦ DAB

To employ Fick’s laws, it is required to know the value of DAB. There are some major points that highlight the need for some predictive computational methods for determination of DAB. The experimental data for DAB are not always available for many chemicals that are currently in use, particularly those compounds that are new. On the other hand, experimentally measuring DAB is expensive and time-consuming and needs especial facilities. The estimation of diffusion coefficient of organic compounds in liquid solutions has been the subject of many studies. These studies have been based on calculation of diffusion coefficient for infinitely dilute solution of A in B which is called diffusion ° ). In majority of applications coefficient at infinite dilution (DAB DAB ° and DAB have almost identical values for concentration of A of 5−10 mol %.1 The first model for DAB ° is the one proposed by Wilke and Chang5 based on empirical modification of the Stokes−Einstein equation. © 2012 American Chemical Society

ηBVA 0.6

(3)

where φ, MB, ηB, VA, and T are, respectively, association factor of solvent B (dimentionless), molecular weight of solvent B (g/ mol), viscosity of solvent B (cP), a molar volume of solute A at its normal boiling point, and temperature (K). DAB ° is in the cm2 s−1 unit. In comparison with the experimental values of DAB ° of different compounds at different temperatures, they obtained the average relative error (ARD%) of 10%. Since then, several modifications have been proposed based on the Wile and Chang correlation.6−10 A comparison between these modifications reveals that none of these modifications have been accepted among researchers.1 In another survey, Tyn and Claus,11 proposed a model for ° . This model is as follows: estimation of DAB

(1)

∂ 2c ∂c = DAB 2A ∂t ∂x

7.4 × 10−8 (φMB) T

⎛ VA ⎞1/6 ⎛ PB ⎞0.6 T = 8.93 × 10 ⎜ ⎟ ⎜ ⎟ ⎝ VB ⎠ ⎝ PA ⎠ ηB −8

(4)

where VA, VB, PA, PB, T, and ηB are, respectively, molar volume of solute A and solvent B, prachors of solute A and solvent B, ° is in cm2 temperature (K), and viscosity of solvent B (cP). DAB −1 s unit. This model has some drawbacks; for example the method cannot predict DAB ° of components in viscous solvents. A comparison between the model outputs and experimental values demonstrated the average relative deviation of 9%.1 Another limitation of the model is the need for parachor values. The experimental values of parachor are very rare and may not be available for majority of compounds. Hayduk and Minhas12 suggested another correlation for DAB ° of components in water . Their correlation is as follows: Received: Revised: Accepted: Published: 2797

August 29, 2011 December 17, 2011 January 4, 2012 January 4, 2012 dx.doi.org/10.1021/ie201944h | Ind. Eng.Chem. Res. 2012, 51, 2797−2803

Industrial & Engineering Chemistry Research

Article

diversity in the investigated chemical families and the number of pure compounds available in the data set. In this work, the database provided by Yaws49 was implemented, which is one of the most comprehensive sources of property data for organic ° at 298 K. The DAW ° for 4728 compounds, for example, DAW organic compounds found in the database is used as main data set in this study. 2.2. Providing the Molecular Descriptors. To obtain a QSPR model, the required input parameters are molecular descriptors of the present compounds. The calculation process of the molecular descriptors is described as follows: chemical species were drawn separately into Hyperchem 50 and preoptimized using an MM+ molecular mechanics force field. A more precise optimization was done with a semiempirical RM1 method. All calculations were carried out at the restricted Hartree−Fock level with no configuration interaction. The molecular structures were optimized using the Polak-Ribiere algorithm until the root-mean-square (rms) gradient was 0.001. Next, the final optimized structures were exported to the Dragon software for the sake of molecular descriptors calculation. Note that there is an online version of the Dragon software that enables the users to compute descriptors. It is freely available at no charge (http://www.vcclab.org/lab/ edragon/). Additionally, The Dragon-based molecular descriptors have been calculated for nearly 250 000 synthesized or fictional compounds by Dragon software and are freely available at no charge (http://michem.disat.unimib.it/mol_ db/). About 3000 descriptors from 22 diverse classes of descriptors are calculated by Dragon software.51 After the completion of descriptors calculation, those that could not be calculated for some compounds are excluded completely from the list. Next, the pair correlations for each binary group of descriptors are calculated. For binary group with the pair correlation greater than 0.9, one of descriptors is omitted randomly. 2.3. Genetic Function Approximation (GFA). The stateof-art genetic function approximation (GFA) that was developed by the pioneering work of Rogers and Hopfinger,52 is a combination of two seeming distinctive algorithms: Friedman’s multivariate adaptive regression splines (MARS)53 and Holland’s Genetic Algorithm (GA).54 In the most cases involved in QSPR, models are represented as the sum of linear or nonlinear terms:

◦ DAB = 1.25 × 10−8(VA −0.19 − 0.292)T1.52

ηw(9.58/ VA − 1.12)

(5)

where ηw is the viscosity of water (cP). The rest of parameters are the same as eq 3. They reported the same average relative deviation of 10% using the same data set used by Tyn and Claus.11 Nakanishi13 developed a model for estimation of DAB ° based on molar volumes of solute and solvent, temperatre, and viscosity of solvent. This multiparameter correlation showed an average relative deviation of 13%. Recently, two models were proposed by the author and his co-workers for representation/prediction of the diffusion coefficient of nonelectrolyte organic compounds in water at infinite dilution.2,3 The first model is a quantitative structure property relationship model based on a small data set of 320 organic compounds.2 The model showed a squared correlation coefficient of 0.98. In another survey employing a data set including 4852 organic compounds, a complicated Artificial Neural Network-Group Contribution (ANN-GC) was proposed. The model showed an average relative deviation of 1.5%.3 All the aforementioned methods proposed for estimation of DAB ° can be categorized into two main classes. In the first group, DAB ° is correlated using other physical or chemical properties. This point may be regarded as a drawback because it is probable that those properties are not always available. As a result, lack of those properties may directly result in lack of estimation. All of the methods reviewed except those proposed by the author and co-workers are the examples of this category. Although the models in this category are able to give a temperature- and/or pressure- dependent DAB ° , they cannot be employed for a wide range of compounds. This latter resulted from employing noncomprehensive data sets when developing these models. The second group of models is based on the studies of the author and his co-workers. Molecular structure of chemical compounds contains much invaluable information that can be implemented to study their behavior, particularly their physical and chemical properties. Quantitative structure property relationship (QSPR) technique employs some parameters called “molecular descriptor” to describe chemical structure.2−4,14−48 Developing predictive models using these parameters is the main idea that is pursued in QSPR. The simplest form of QSPR is perhaps the Group Contribution method in which the occurrences of various chemical groups are counted and then used to develop a model. These methods (QSPR and particularly GC) were used to develop models for DAB ° in water. The major advantage of this category of correlations is that the models have more chance to be predictive due to applying some statistical techniques for the sake of predictive power evaluation. In this study, the major aim is to develop a simple, accurate, and comprehensive QSPR model for prediction of DAB ° of nonelectrolyte organic compounds in water (DAW ° ).

M

F (X ) = a 0 +

∑ ak ϕk(X ) k=1

(6)

The terms are called basis functions and shown by{ϕk}. They are functions of one or more features which are denoted as Xis. The model parameters {ak} are regressed by fitting procedures. The linear strings of basis functions play the role of DNA for the application of GA. The early QSPR models are generated by the random selection of several descriptors from the pool. Next, the basis functions are built with these descriptors. Generally, these functions are linear in most of the QSPR models. Then, the genetic model from the random sequence of the basis function is built. The fitness function used in the GFA during the evolution, is the Freidman’s lack of fit (LOF) scoring function which is a penalized least-squares error measure:

2. MATERIALS AND METHODS 2.1. Materials. The quality a model directly depends on the reliability and accuracy of the data set employed for its development. In addition, comprehensiveness of the data set determined the applicability range of the model. The aforementioned characteristics of such a model contain both 2798

dx.doi.org/10.1021/ie201944h | Ind. Eng.Chem. Res. 2012, 51, 2797−2803

Industrial & Engineering Chemistry Research

LOF (model) =

1 N

Article

LSE (model)

(1 −

(c + 1 + (d × p)) N

n training = 3783;

2

)

Q LOO2 = 0.9788;

(7)

Q EXT 2 = 0.9774;

RMSE = 3.13 × 10−7cm 2· s−1; SDE = 3.13 × 10−7cm 2·s−1; ΔK = 0.095;

ΔQ = 0.0001;

RN = 0.0001;

F = 58454.40

a = − 0.014; RP = 0.013;

where DAW ° is in 106 × cm2.s−1 unit. The term “Ms” is the mean electrotopological state. It is defined as the mean value of the electrotopological index defined as follows: A

Si = Ii + ΔIi = Ii +

∑ j=1

(Ii − Ij) (dij + 1)k

(9)

where Ii is the intrinsic state of the ith atom and ΔIi is the field effect on the ith atom calculated as perturbation of intrinsic state of ith atom by all other atoms in the molecule, dij is the topological distance between the ith and jth atoms, and A is the number of atoms. Ii is calculated using

Ii =

(2/Li)2 δiv + 1 δi

(10)

where Li is the principal quantum number (2 for C, N, O, F atoms, 3 for Si, S, Cl, ...), δiv is the number of valence electrons, and δi is the number of sigma electrons of ith atom in the hydrogen depleted molecular graph.62 “Ms” is a measure of molecular composition according to the distances between each pair of atoms. In the other words, having dissimilar atoms that are located at the lowest possible distance increases this descriptor and therefore increases molecular diffusion. Having dissimilar atoms located at the lowest possible distance makes the molecule more polar and spherical. As a result, the more polar and small a molecule is, the higher diffusion coefficient it has. “VRA2” is the average Randic-type eigenvector-based index from the adjacency matrix which is defined as follows:

3. RESULT AND DISCUSSION The accurate linear correlation was obtained implementing the GFA algorithm to choose the final subset of molecular descriptors. The procedure began by implementing the GFA to obtain models with one descriptor. The output models were investigated and the model with the best R2 was selected. The process continued by incremental addition of descriptors and investigating related models to obtain the best model for a very number of descriptors. The process was halted, when incremental addition of descriptors did not significantly change the accuracy of the final model. The most accurate linear equation was found with three descriptors as follows:

VRA2 =

∑ b

1 A li A · l j B

(11)

where liA is the eigenvector associated with the largest negative eigenvalue of the edge adjacency matrix. The summation runs over all of the edges in the molecular graph; liA and ljA are the local vertex invariants of the two vertices incident to the considered edge.62 Finally, “VED2” is similarly defined as follows:62

VED2 =

1 A

A

∑ lj A i=1

(12)

“VRA2” and “VED2” were proposed to take the local vertex invariants in a molecule in account. They show the discrimination among graph vertices. Lower values correspond to vertices of lower degree, farther from the center or from a vertex of high degree.62

= − 9.19249( ± 0.10979) + 0.83383 ( ± 0.00892)Ms + 6.09171( ± 0.09163) VRA2 + 28.40759( ± 0.10797)VED2

R2 = 0.9789;

R adj2 = 0.9789;

Q BOOT 2 = 0.9788;

In this LOF function, c is the number of nonconstant basis functions, N is the number of samples in the data set, d is a smoothing factor to be set by the user, p is the total number of parameters in the model, and the LSE is the least-squares error of the model. Employment of LOF leads to the models with the better prediction without overfitting. The next step is the repeated performing of genetic recombination or crossover operation: two best models according to their fitness are selected as ″parents″. Then, they are randomly cut into two sections. A new model is obtained by merging of the sections from each parent. The model with the worst fitness was replaced with the new model. This process will be halted when no significant fitness improvement of the model is observed in the population. For a population of 300 models, 3000 to 10000 genetic operations are usually sufficient to achieve convergence. The GFA algorithm approach has a number of important superiorities over other techniques: • Multiple output models by the algorithm rather than a single model. • Automatic selection of the present features in the basis functions and evaluation of the optimum number of the basis function by testing the full size model rather than incremental addition of features. • Better control of fit smoothness and avoidance of overfitting by applying LOF score function. • Compatible with broader variety of basis functions such as: splines, Gaussians, or higher-order polynomials • Useful additional insight of the preferred model length and useful partitions of the data set which could not be obtained by the standard regression analysis.52 The successful application of GFA is reported extensively in diverse engineering, medical and pure science applications.55−61 In this study, the GFA algorithm was implemented for subset variable selection.

◦ DAW

n test = 945;

(8) 2799

dx.doi.org/10.1021/ie201944h | Ind. Eng.Chem. Res. 2012, 51, 2797−2803

Industrial & Engineering Chemistry Research

Article

where λ, n, p values are, respectively, the Eigen-values obtained from the correlation matrix of the data set X(n, P), the number of experimental data, and model parameters. KXY and KX are calculated using a set of the selected variables and the selected variables in addition to the DAW ° values, respectively. The statistical parameters QASYM2 and RP are defined as follows:

The related statistical parameters for the obtained linear model are presented below eq 8. The terms ntrainiing and ntest are the numbers of compounds available in training set and test set, respectively, and R2 is the squared correlation coefficients of the model. RMSE and SDE are the root mean squared error of the model results in comparison with the experimental values and standard deviation error, respectively. F is the F-ratio which is defined as the ratio between the model summation of squares (MSS) and the residual summation of squares (RSS):63

⎛ n ⎞ Q ASYM 2 = 1 − (1 − R2)⎜ ⎟ ⎝ n − p′ ⎠

MSS/df M F= RSS/df E

and p+

where dfM and dfE denote the degree of freedom of the obtained model and the overall error, respectively. It is a comparison between the model explained variance and the residual variance. It should be noted that high values of the Fratio test indicate the reliability of models. To evaluate the reliability of the model several internal validation techniques were applied. Initially, the leave-one-out cross validation technique was applied for the internal validation of the model. The associated parameter obtained 2 from this technique is called QLoo and is calculated as follows:14−21

∑in= 1 (yi − yiĉ )2 Q LOO = 1 − ∑in= 1 (yi − y ̅ )2

P

R =

Mj = (13)

0 ≤ RP ≤ 1

(17)

∑ Mj (18)

R jy R



1 p



1 1 ≤ Mj ≤ 1 − p p

(19)

p+ and p− are the number of cases which Mj has the positive and negative values, respectively. The calculated values of RQK test are presented as follows: ΔK = 0.095, ΔQ = 0.001, RP = 0.035, and RN = 0.001.These values which are greater or equal to zero indicate not only the validity of the model, but also approval for nonchance correlation. Bootstrap is another validation technique applied in this study to assess the prediction power of the obtained model. “Qboot2” is one the key parameter of this technique and defines the average of the prediction error of sum of squares (PRESS) calculated in this technique. PRESS is defined as follows:

(14)

n

PRESS =

∑ (yi − yi /̂ i )2 i=1

(20)

where ŷi/i denotes the response of the ith predicted DAW ° using the obtained model ignoring the use of ith experimental DAW ° . The bootstrapping has been repeated 5000 times. Consequently, the value Qboot2 parameter of the obtained model has been evaluated to be 0.9788. In addition, the y-scrambling validation technique was applied to test the model for chance correlations. Chance correlations are able to successfully predict the responses but their parameters are actually of no real significance. To eliminate this possibility, the y-scrambling technique was advocated. In this method the connection of the target values and descriptors is deliberately destroyed by the permutation of y data, while leaving all x values untouched. The new correlative model is then obtained and its accuracy is compared with the original model in terms of R2 or Q2. If the

2(p − 1)/p 0≤K≤1

Mj > 0

where Mj is defined as below:

∑j |(λ j/ ∑j λ j − (1/p))| and

j=1 ⎝

j=1

Where n and p′ are the numbers of experimental values and the model parameters, respectively. The less difference there is between this value and the R2 parameter, the more validity of the model would be expected. The evaluated adjusted-R2 parameter of the obtained linear model is 0.9789. In lieu of avoiding chance correlations in the model and improving its prediction, Todeschini et al.64 proposed four RQK constraints which must be completely satisfied:14−21,32,65 1 ΔK = KXY − KX > 0 (quick rule) 2 ΔQ = QLOO2 − QASYM2 > 0 (asymptotic Q2 rule) 3 RP > 0 (redundancy RP rule) 4 RN > 0 (overfitting RN rule) where KXY and KX are calculated following the equation

j = 1, ..., p



p−

RN =

Where yi is the DAW ° for ith compound, y ̅ is mean value of DAW ° for all of the investigated compounds, and ŷic is response of ith object predicted by the obtained model ignoring the value of the related object (ith experimental DAW ° ). If the absolute difference of this value and calculated R2 is small, the reliability of the model would be validated. The evaluated leave-one-out cross validation parameter of the obtained linear model is 0.9788. Adjusted-R2 parameter is another statistical parameter to assess the internal reliability of the linear model in QSPR and defines as follows:

⎛n−1⎞ R adj2 = 1 − (1 − R2)⎜ ⎟ ⎝ n − p′ ⎠

p ⎞⎞ ⎟⎟⎟ ⎝ p − 1 ⎠⎠



∏ ⎜⎜1 − Mj⎜

and

2

K=

(16)

(15) 2800

dx.doi.org/10.1021/ie201944h | Ind. Eng.Chem. Res. 2012, 51, 2797−2803

Industrial & Engineering Chemistry Research

Article

original model has no chance correlation, there is a significant difference in the quality of the original model and that associated with models obtained with random responses. The yscrambling parameter is the intercept of the following equation:

Q k 2 = a + brk(y , yk̃ )

(21)

Qk2

where is the explained variance of the model obtained using the same predictors but the kth y-scrambled vector; rk is the correlation between the true response vector and the kth yscrambled vector. The values of intercept a interprets the possibility of the chance correlation. If the “a” value closes to zero, the lack of chance correlations is verified. On the other hand, by achieving large values of a, the model would be dubbed as a chance correlation model. The y-scrambling should be repeated hundreds of times (in this work 300 times). High values of the intercept a indicate that the model is unstable. The value of intercept a has been calculated as −0.014 for the developed linear model. Finally, the external validation technique was applied to our model. It is conducted by testing an additional compound for the validation set in order to assess the prediction capability of the model. The Qext2 demonstrated as follows:66

∑in=test1 (yi /̂ i − yi )2 Q ext = 1 − n ∑i =test1 (yi − ytraining )2 ̅

Figure 2. Relative error (%) between experimental DAW ° values and obtained ones by eq 8.

All of the estimated DAW ° by eq 8, and the corresponding deviations from experimental values, accompanied with the detailed allocation of the molecular descriptors in each ionic liquid have been extensively presented as Supporting Information. As shown in both figures, there is some kind of curvature in the lower diffusion coefficient region. This curvature may be due to some changes in diffusion coefficient behavior that cannot completely be described by the model. This issue is probably related to the nature of all the explicit QSPR methods use linear regression for subset variable selection. Fortunately, there are some complicated implicit nonlinear sunset variable selection methods such as artificial neural networks and support vector machines that are currently in use, however their model is not a simple correlation. As a result some much more powerful and simpler subset variable selection is required in QSPR. The highest associated relative deviation was reported for 1,2-diiodopropane with the value 15%. In addition the average relative error of the model over all of the 4728 compound used is 3.6% which demonstrates that the model can accurately predict the DAW ° of nonelectrolyte organic compounds. The results obtained show that the model is more accurate and more comprehensive than previously proposed models except the recently group contribution-based model suggested by the author and co-workers. An important point to consider is that the model proposed here is considerably simpler than all of the other previously proposed methods and does not need any information other than chemical structure of a molecule.

2

(22)

where yt̅ raining is the average value of the DAW ° of the compounds present in training set, ŷi/i is response of ith object predicted by the obtained model ignoring the value of the related object (ith experimental DAW ° ). The less difference there is between this value and the R2 parameter, the more validity of the model would be expected. The evaluated Qext2 parameter of the obtained linear model is 0.9774. Ultimately, all the validation techniques demonstrate the final model as valid, stable, nonchance correlation with high predictive power. Figure 1 depicts the predicted DAW ° by eq 8

4. CONCLUSION A quantitative structure−property relationship was performed to develop a predictive model for diffusion coefficient of nonelectrolyte organic compounds in water at infinite dilution ° ). To do so, a comprehensive experimental data set was (DAW implemented and the powerful genetic function approximation (GFA) method was employed to select the most important and effective molecular descriptors on DAW ° . Finally, a threeparameter model was obtained that could accurately predict ° of nonelectrolyte organic compounds. Additionally, the DAW several validation techniques demonstrated the predictability of

Figure 1. Comparison between experimental DAW ° values and predicted ones by eq 8.

versus experimental data for the results obtained by GFA. Besides, the relative error of these results in comparison with the experimental values has been better interpreted in Figure 2. 2801

dx.doi.org/10.1021/ie201944h | Ind. Eng.Chem. Res. 2012, 51, 2797−2803

Industrial & Engineering Chemistry Research

Article

(16) Gharagheizi, F. QSPR studies for solubility parameter by means of genetic algorithm-based multivariate linear regression and generalized regression neural network. QSAR Comb. Sci. 2008, 27 (2), 165−170. (17) Gharagheizi, F. New neural network group contribution model for estimation of lower flammability limit temperature of pure compounds. Ind. Eng. Chem. Res. 2009, 48 (15), 7406−7416. (18) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. QSPR approach for determination of parachor of non-electrolyte organic compounds. Chem. Eng. Sci. 2011, 66 (13), 2959−2967. (19) Gharagheizi, F.; Gohar, M. R. S.; Vayeghan, M. G. A quantitative structure-property relationship for determination of enthalpy of fusion of pure compounds. J. Therm. Anal. Calorim. 2011, 1−6. (20) Gharagheizi, F.; Mehrpooya, M. Prediction of standard chemical exergy by a three descriptors QSPR model. Energy Convers. Manage. 2007, 48 (9), 2453−2460. (21) Gharagheizi, F.; Mehrpooya, M. Prediction of some important physical properties of sulfur compounds using quantitative structureproperties relationships. Mol. Diversity 2008, 12 (3−4), 143−155. (22) Gharagheizi, F. A new accurate neural network quantitative structure-property relationship for prediction of θ (lower critical solution temperature) of polymer solutions. e-Polym. 2007. (23) Vatani, A.; Mehrpooya, M.; Gharagheizi, F. Prediction of standard enthalpy of formation by a QSPR model. Int. J. Mol. Sci. 2007, 8 (5), 407−432. (24) Gharagheizi, F.; Alamdari, R. F. A molecular-based model for prediction of solubility of C60 fullerene in various solvents. Fullerenes, Nanotubes, Carbon Nanostruct. 2008, 16 (1), 40−57. (25) Gharagheizi, F.; Alamdari, R. F.; Angaji, M. T. A new neural network-group contribution method for estimation of flash point temperature of pure components. Energy Fuels 2008, 22 (3), 1628− 1635. (26) Gharagheizi, F.; Alamdari, R. F. Prediction of flash point temperature of pure components using a quantitative structureproperty relationship model. QSAR Comb. Sci. 2008, 27 (6), 679−683. (27) Gharagheizi, F.; Fazzeli, A. Prediction of the Watson characterization factor of hydrocarbon components from molecular properties. QSAR Comb. Sci. 2008, 27 (6), 758−767. (28) Gharagheizi, F. A new group contribution-based model for estimation of lower flammability limit of pure compounds. J. Hazard. Mater. 2009, 170 (2−3), 595−604. (29) Gharagheizi, F. Prediction of upper flammability limit percent of pure compounds from their molecular structures. J. Hazard. Mater. 2009, 167 (1−3), 507−510. (30) Gharagheizi, F. Prediction of the standard enthalpy of formation of pure compounds using molecular structure. Aust. J. Chem. 2009, 62 (4), 376−381. (31) Gharagheizi, F.; Sattari, M. Prediction of the θ(UCST) of polymer solutions: A quantitative structure−property relationship study. Ind. Eng. Chem. Res. 2009, 48 (19), 9054−9060. (32) Gharagheizi, F.; Tirandazi, B.; Barzin, R. Estimation of aniline point temperature of pure hydrocarbons: A quantitative structureproperty relationship approach. Ind. Eng. Chem. Res. 2009, 48 (3), 1678−1682. (33) Gharagheizi, F. Chemical structure-based model for estimation of the upper flammability limit of pure compounds. Energy Fuels 2010, 24 (7), 3867−3871. (34) Gharagheizi, F.; Abbasi, R. A new neural network group contribution method for estimation of upper flash point of pure chemicals. Ind. Eng. Chem. Res. 2010, 49 (24), 12685−12695. (35) Gharagheizi, F.; Abbasi, R.; Tirandazi, B. Prediction of Henry’s law constant of organic compounds in water from a new groupcontribution-based model. Ind. Eng. Chem. Res. 2010, 49 (20), 10149− 10152. (36) Gharagheizi, F.; Sattari, M. Prediction of triple-point temperature of pure components using their chemical structures. Ind. Eng. Chem. Res. 2010, 49 (2), 929−932.

the model. Although in this work a comprehensive model was developed to predict the DAW ° of a large number of nonelectrolyte organic compounds, there are still some limitations. The model has a wide range of applicability, but the prediction capability of the model is restricted to the compounds, which are similar to those ones applied to develop the model. Application of the model for the totally different compounds than the investigated ones is not recommended although it may be used for a rough estimation of the DAW ° of these kinds of compounds.



ASSOCIATED CONTENT

S Supporting Information *

Table containing the collection of the names of 4728 pure compounds, their molecular descriptors, and the presented model in comparison with the experimental data. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Fax: +98 21 77926580. E-mail: [email protected]; fghara@gmail. com.



REFERENCES

(1) Poling, B. E.; Prausnitz, J. M.; O’Connel, J. P. The Properties of Gases and Liquids, 5th ed.; McGraw-Hill: New York, 2001. (2) Gharagheizi, F.; Sattari, M. Estimation of molecular diffusivity of pure chemicals in water: A quantitative structure-property relationship study. SAR QSAR Environ. Res. 2009, 20 (3−4), 267−285. (3) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Representation and Prediction of Molecular Diffusivity of Nonelectrolyte Organic Compounds in Water at Infinite Dilution Using the Artificial Neural Network-Group Contribution Method. J. Chem. Eng. Data 2011, 56 (5), 1741−1750. (4) Sattari, M.; Gharagheizi, F. Prediction of molecular diffusivity of pure components into air: A QSPR approach. Chemosphere 2008, 72 (9), 1298−1302. (5) Wilke, C. R.; Chang, P. Correlation of diffusion coefficients in dilute solutions. AIChE J. 1955, 1 (2), 264−270. (6) Amourdam, M. J.; Laddha, G. S. Diffusivities of some binary liquid systems using a diaphragm cell. J. Chem. Eng. Data 1967, 12 (3), 389−391. (7) Lusis, M. A. Predicting liquid diffusion coefficients. Chem. Process. Eng. 1971, 5, 27−35. (8) Olander, D. R. The diffusivity of water in organic solvents. AIChE J. 1961, 7, 175−176. (9) Wise, D. L.; Houghton, G. The diffusion coefficients of ten slightly soluble gases in water at 10−60 °C. Chem. Eng. Sci. 1966, 21 (11), 999−1010. (10) Witherspoon, P. A.; Bonoli, L. Correlation of diffusion coefficients for paraffin, aromatic, and cycloparaffin hydrocarbons in water. Ind. Eng. Chem. Fundam. 1969, 8 (3), 589−591. (11) Tyn, M. T.; Calus, W. F. Diffusion coefficients in dilute binary liquid mixtures. J. Chem. Eng. Data 1975, 20 (1), 106−109. (12) Hayduk, W.; Minhas, B. S. Correlations for prediction of molecular diffusivities in liquids. Can. J. Chem. Eng. 1982, 60 (2), 295− 299. (13) Nakanishi, K. Prediction of diffusion coefficient of nonelectrolytes in dilute solution based on generalized Hammond-Stokes plot. Ind. Eng. Chem. Fundam. 1978, 17 (4), 253−256. (14) Gharagheizi, F. QSPR analysis for intrinsic viscosity of polymer solutions by means of GA-MLR and RBFNN. Comput. Mater. Sci. 2007, 40 (1), 159−167. (15) Gharagheizi, F. Quantitative structure-property relationship for prediction of the lower flammability limit of pure compounds. Energy Fuels 2008, 22 (5), 3037−3039. 2802

dx.doi.org/10.1021/ie201944h | Ind. Eng.Chem. Res. 2012, 51, 2797−2803

Industrial & Engineering Chemistry Research

Article

(37) Mehrpooya, M.; Gharagheizi, F. A molecular approach for the prediction of sulfur compound solubility parameters. Phosphorus, Sulfur, Silicon, Relat. Elem. 2010, 185 (1), 204−210. (38) Eslamimanesh, A.; Gharagheizi, F.; Mohammadi, A. H.; Richon, D. Artificial neural network modeling of solubility of supercritical carbon dioxide in 24 commonly used ionic liquids. Chem. Eng. Sci. 2011, 66 (13), 3039−3044. (39) Gharagheizi, F.; Keshavarz, M. H.; Sattari, M. A simple accurate model for prediction of flash point temperature of pure compounds. J. Therm. Anal. Calorim. 2011, 1−8. (40) Gharagheizi, F.; Babaie, O.; Mazdeyasna, S. Prediction of vaporization enthalpy of pure compounds using a group contributionbased method. Ind. Eng. Chem. Res. 2011, 50 (10), 6503−6507. (41) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Determination of critical properties and acentric factors of pure compounds using the artificial neural network group contribution algorithm. J. Chem. Eng. Data 2011, 56 (5), 2460−2476. (42) Gharagheizi, F.; Mirkhani, S. A.; Tofangchi Mahyari, A. R. Prediction of standard enthalpy of combustion of pure compounds using a very accurate group-contribution-based method. Energy Fuels 2011, 25 (6), 2651−2654. (43) Gharagheizi, F.; Salehi, G. R. Prediction of enthalpy of fusion of pure compounds using an artificial neural network-group contribution method. Thermochim. Acta 2011, 521 (1−2), 37−40. (44) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Determination of parachor of various compounds using an artificial neural network-group contribution method. Ind. Eng. Chem. Res. 2011, 50 (9), 5815−5823. (45) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Representation/prediction of solubilities of pure compounds in water using artificial neural network-group contribution method. J. Chem. Eng. Data 2011, 56 (4), 720−726. (46) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Artificial neural network modeling of solubilities of 21 commonly used industrial solid compounds in supercritical carbon dioxide. Ind. Eng. Chem. Res. 2011, 50 (1), 221−226. (47) Gharagheizi, F.; Sattari, M.; Tirandazi, B. Prediction of crystal lattice energy using enthalpy of sublimation: A group contributionbased model. Ind. Eng. Chem. Res. 2011, 50 (4), 2482−2486. (48) Keshavarz, M. H.; Gharagheizi, F.; Pouretedal, H. R. Improved reliable approach to predict melting points of energetic compounds. Fluid Phase Equilib. 2011, 308 (1−2), 114−128. (49) Yaws, C. L. Yaws’ Handbook of Thermodynamic and Physical Properties of Chemical Compounds; Knovel: New York, 2003. (50) HyperChem Release 7.5 for Windows, Molecular Modeling System; Hypercube, Inc.: Gainesville, FL, 2002. (51) Dragon for Windows Software for Molecular Descriptor Calculation, version 5.4; Talete, SRL: Milano, Italy, 2007. (52) Rogers, D.; Hopfinger, A. J. Application of genetic function approximation to quantitative structure−activity relationships and quantitative structure−property relationships. J. Chem. Inf. Comput. Sci. 1994, 34 (4), 854−866. (53) Friedman, J. H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19 (1), 1−67. (54) Holland, J. H. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence, 1st ed.; MIT Press: Cambridge, MA, 1992; pp xiv; 211. (55) Ahmed, S. S. S. J.; Ahameethunisa, A.; Santosh, W. QSAR and pharmacophore modeling of 4-arylthieno [3,2-d] pyrimidine derivatives against adenosine receptor of Parkinson’s disease. J. Theor. Comput. Chem. 2010, 9 (6), 975−991. (56) Chhabria, M. T.; Jani, M.; Parmar, K.; Singh, M. QSAR study of a series of 2,3-dihydroimidazo[1,2-c]pyrimidines as antibacterial agents. Med. Chem. Res. 2010, 1−8. (57) Chhabria, M. T.; Suhagia, B. N.; Mandhare, A. B.; Brahmkshatriya, P. S. QSAR study of a series of cholesteryl ester transfer protein inhibitors. Collect. Czech. Chem. Commun. 2011, 76 (7), 803−813.

(58) Khaled, K. F. Modeling corrosion inhibition of iron in acid medium by genetic function approximation method: A QSAR model. Corros. Sci. 2011, 53, 3457−3465. (59) Ojha, P. K.; Roy, K. Chemometric modelling of antimalarial activity of aryltriazolylhydroxamates. Mol. Simul. 2010, 36 (12), 939− 952. (60) Reyes, O. J.; Patel, S. J.; Mannan, M. S. Quantitative structure property relationship studies for predicting dust explosibility characteristics (Kst, Pmax) of organic chemical dusts. Ind. Eng. Chem. Res. 2011, 50 (4), 2373−2379. (61) Sivakumar, P. M.; Iyer, G.; Doble, M. QSAR studies on substituted 3- or 4-phenyl-1,8-naphthyridine derivatives as antimicrobial agents. Med. Chem. Res. 2011, 1−8. (62) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics, 2nd rev., enl., ed.; Wiley-VCH: Chichester, U.K., 2009; p 2 v. (63) Krzanowski, W. J. Principles of Multivariate Analysis: A User’s Perspective, rev. ed.; Oxford University Press: Oxford, 2000; pp xxi, 586. (64) Todeschini, R.; Consonni, V.; Mauri, A.; Pavan, M. Detecting “bad” regression models: Multicriteria fitness functions in regression analysis. Anal. Chim. Acta 2004, 515 (1), 199−208. (65) Gharagheizi, F.; Sattari, M. Estimation of molecular diffusivity of pure chemicals in water: A quantitative structure−property relationship study. SAR QSAR Environ. Res. 2009, 20 (3−4), 267−285. (66) Chiou, J. Hybrid method of evolutionary algorithms for static and dynamic optimization problems with application to a fed-batch fermentation process. Comput. Chem. Eng. 1999, 23 (9), 1277−1291.

2803

dx.doi.org/10.1021/ie201944h | Ind. Eng.Chem. Res. 2012, 51, 2797−2803