ARTICLE pubs.acs.org/IECR
Prediction of Vaporization Enthalpy of Pure Compounds using a Group Contribution-Based Method Farhad Gharagheizi,*,† Omid Babaie,† and Sahar Mazdeyasna‡ † ‡
Saman Energy Giti Co., Postal Code 3331619636, Tehran, Iran Department of Chemical Engineering, Iran University of Science and Technology, Tehran, Iran
bS Supporting Information ABSTRACT: In this work, the artificial neural networkgroup contribution (ANN-GC) method is applied to estimate the vaporization enthalpy of pure chemical compounds at their normal boiling point. A group of 4907 pure compounds from various chemical families are investigated to propose a comprehensive and predictive model. The obtained results show the squared correlation coefficient (R2) of 0.993, root mean square error of 1.1 kJ/mol, and average absolute deviation lower than 1.5% for the estimated properties from existing experimental values.
1. INTRODUCTION The vaporization enthalpy at the normal boiling point (ΔvapHb), also known as the latent heat of evaporation, is the difference between the enthalpy of the saturated vapor and that of the saturated liquid at the boiling point temperature. Physical properties of compounds are required when designing chemical and petrochemical processes in which materials are processed. Among different properties, the vaporization enthalpy at the normal boiling point (ΔvapHb) is of great importance particularly in those processes in which vaporliquid equilibrium occurs. Additionally, ΔvapHb is occasionally one of the properties implemented for estimation of other physical properties.1 Due to the experimental measurment of ΔvapHb being time-consuming and laborious, having an accurate method of estimation especially based on molecular structure is essential. Many studies have been done to develop an accurate model for estimation of ΔvapHb, so far. These methods can be categorized into two main groups based on the type of parameters they use to give an estimation for ΔvapHb. The first class is those correlations which just apply the chemical structure-based parameters. Of this type are the models presented by Joback and Reid,2 Zhou et al.,3 Wenying et al.,4 Dalmazzone et al.,5 Sanghvi and Yalkowsky,6 and Jia et al.7 The second class includes those correlations in which at least one other physical property is required. The most important models in this class are the ones presented by Nu et al.,8 Carruth and Kobayashi,9 Basarova and Svoboda,10 Liley,11 Meyra et al.,12 Reidel,13 Chen,14 Vetere,1517 Liu,18 Sivaraman et al.,19 Morgan and Kubayashi,20 Morgan,21 and Mohammadi and Richon.22 There are some important points that should be considered when comparing both the classes to each other. The second class has a main disadvantage. The accuracy of the models in this class is directly dependent on the accuracy of the physical properties they apply to give an estimation for ΔvapHb. Furthermore, if one of the required physical properties is not available, the models do not result in any estimation. Another important point to consider is which one of these two classes gives better estimations? In this r 2011 American Chemical Society
route, Cachadina and Mulero23 reviewed several models using a data set of experimental data for 290 compounds. They concluded that none of the models give a good estimation for all the compounds. Another point that should be considered is that during developing these models by the original researchers, and also then during comparing the different models, only a few hundred compounds were used. So, the comprehensiveness of the previously presented models is a general weakness of theirs. In this study, a new accurate and comprehensive model is presented for estimation of the ΔvapHb of pure compounds. The model is an artificial neural network (ann) which uses group contributions (GCs) to give an estimation for ΔvapHb.
2. MATERIALS AND METHODS 2.1. Materials. The accuracy and reliability of models for estimation of physical properties, especially those dealing with large number of experimental data, directly depends on the quality and comprehensiveness of the applied data set for its development. The aforementioned characteristics of such a model include both diversity in the investigated chemical families and the number of pure compounds available in the data set. In this work, the database prepared by Yaws24 was implemented, which is one of the most comprehensive sources of physical property data for chemical species e.g. ΔvapHb. The ΔvapHb for 4907 compounds found in the database was used as the main data set in this study. 2.2. Providing the Collection of Chemical Groups. In this step, the chemical structures of all pure compounds of the data set were investigated, and finally, 147 chemical groups were found useful to predict the ΔvapHb. The chemical groups found and used in this study are extensively presented as Supporting Information. Received: January 25, 2011 Accepted: April 7, 2011 Revised: April 7, 2011 Published: April 07, 2011 6503
dx.doi.org/10.1021/ie2001764 | Ind. Eng. Chem. Res. 2011, 50, 6503–6507
Industrial & Engineering Chemistry Research
ARTICLE
Figure 1. Schematic structure of the three layer feed forward neural network used in this study: W1 first layer weight, W2 second layer weight, b1 first layer bias, and b2 second layer bias.
The computed number of occurrences of the chemical groups on the compounds of the data set is presented as Supporting Information, too. 2.3. Developing Model. The next calculation step and perhaps the most significant one is to search for a relationship between the chemical functional groups and the ΔvapHb of chemical compounds. The simplest method for this purpose is the assumption of the existence of a multilinear relationship between these groups and the desired property (here is the ΔvapHb). This technique is a similar method used in the most of classical group contribution methods. Several calculations show that application of the mentioned methodology for the current problem does not bring about good results in comparison with the previously presented methods. Consequently, the nonlinear mathematical method of artificial neural network (ANN) is investigated. Artificial neural networks are extensively used in various scientific and engineering problems e.g. estimations of physical and chemical properties of different pure compounds.2547 These capable mathematical tools are generally applied to study the complicated systems. The theoretical explanations about artificial neural networks can be found elsewhere.48 Using the neural network toolbox of the MATLAB software (Mathworks Inc.), a three layer feed forward artificial neural network (FFANN) is developed for the problem. The typical structure of a three layer FFANN is schematically presented in Figure 1. The capabilities of this kind of ANNs have been demonstrated in previous works.2547 All the 147 functional groups and also the ΔvapHb values are normalized between 1 and þ1 to decrease computational errors during developing the model. This can be performed using maximum and minimum values of each functional group for input data and using maximum and minimum values of ΔvapHb for output parameters. Later, the main data set is divided into three new subdata sets including the “training” set, the “validation (optimization)” set, and the “test (prediction)” set. In this work, the training set is used to generate the ANN structure, the validation (optimization) set is applied for optimization of the model, and the test (prediction) set is used to investigate the prediction capability and validity of the obtained model. The process of division of the main data set into three subdata sets is performed randomly. For this purpose, about 80%, 10%, and 10% of the main data set are randomly selected for the training set (3927 compounds), the validation set (490 compounds), and the test set (490 compounds). The effect of the allocation percent of the three subdata sets from the data of the main data set on the accuracy of the ANN model has been studied elsewhere.26 As a matter of fact, generating an ANN model is determining the weight matrices and bias vectors.48 As shown in the Figure 1, there are two weight matrices and two bias vectors in a three layer FFANN: W1 and W2 and b1 and b2.2548 These parameters should be obtained by minimization of an objective function. The objective function used in this study is sum of squares of errors between the
Figure 2. Comparison between the predicted and experimental ΔvapHb.
outputs of the ANN (predicted ΔvapHb values) and the target values (experimental ΔvapHb values). This minimization is performed by LevenbergMarquardt (LM)48 optimization strategy. There are also more accurate optimization methods other than this algorithm; however, they need much more convergence time. In other words, the more accurate optimization, the more time is needed for the algorithm to converge to the global optimum. The LM48 is the most-widely used algorithm for training due to it being robust and accurate enough to deal with the considered system.2548 In most cases, the number of neurons in the hidden layer (n) is fixed. Therefore, the main goal is to produce an ANN model, which is able to predict the target values as accurately as expected. This step is repeated until the best ANN is obtained. Generally and especially in three layer FFANNs, it is more efficient that the number of neurons in the hidden layer is optimized according to the accuracy of the obtained FFANN.2548
3. RESULTS AND DISCUSSION Using the presented procedure, a three layer feed forward neural network was obtained for prediction of the ΔvapHb of pure compounds. To determine the number of neurons in hidden 6504
dx.doi.org/10.1021/ie2001764 |Ind. Eng. Chem. Res. 2011, 50, 6503–6507
Industrial & Engineering Chemistry Research
ARTICLE
Table 1. Statistical Parameters of the Obtained Model statistical parameter
value
Table 2. Comparison between the Presented Model and Previous Modelsa model
training set R2
0.99
average percent error
1.48%
standard deviation error
12.96
root mean square error
1.10
n
3927
0.99
average percent error standard deviation error
1.48% 13.52
root mean square error
1.03
n
490 test set
R2
0.99
average percent error
1.47%
standard deviation error
12.26
root mean square error
0.97
n
490 training þ validation þ test set
R2
0.99
average percent error
1.48%
standard deviation error
12.95
root mean square error
1.08
n
4907
layer of the neural network, numbers 1 through 20 were tested, so the number 10 showed the best results. Therefore, the best three layer feed forward neural network has a structure of 147-10-1. The mat file (MATLAB file format) of the obtained neural network containing all parameters of the obtained model is freely accessible from the author by email. The predicted values of ΔvapHb using the model in comparison with the experimental data are presented in Figure 2. The values of the predicted ΔvapHb as well as the status of each pure compound in the model (belonging to the training set, the validation set, or the test set) are presented as Supporting Information. The statistical parameters of the model are presented in Table 1. These results show the squared correlation coefficient, average percent error, and root mean square error of the model over the training set, the validation set, the test set, and the main data set are respectively 0.993, 0.994, 0.994, 0.993, 1.48%, 1.48%, 1.47%, 1.48%, 1.102, 1.032, 0.974, and 1.083. If we take a look at the results of predicted ΔvapHb values (in the Supporting Information), there are only three compounds for which the model gives more than 18% error (average percent error). The three compounds are carbon dioxide (148%), formaldehyde (36%), and ethane (32%). It seems these three are the only outliers of the model. There is no relation between these three compounds in order to extract a general concept regarding the weakness of the model. In order to compare the model with other previous models, the data set applied by Jia et al.7 was implemented. The data set containing experimental data for 309 organic compounds. It was extracted from the well-known DIPPR 801 database.49 The models
APE%
Riedel
1.65
4.01
Chen14 Vetere16
1.54 1.15
3.8 2.91
Liu18 Joback and Reid2
validation set R2
AAE (kJ/mol)
13
1.18
2.98
14.18
36.81
Jia et al.7
1
2.74
this work
1.06
2.83
N APE% = 1/N∑N 1 |EXP(i) PRED(i)|/|EXP(i)|, AAE% = 1/N∑1 | EXP(i) PRED(i)|. a
presented by Riedel,13 Chen,14 Vetere,16 Liu,18 Joback and Reid,2 and Jia et al.7 were included in order to make the comparison more comprehensive. The results are shown in Table 2. In comparison with the previous models, the model is better than all except the one presented by Jia et al.7 The difference between the accuracies of the presented model and that of Jia et al.7 is negligible. Furthermore, it should be noted that Jia et al.7 directly applied the data set to develop their models, whereas the data set is an external test set for other methods like the one presented here. As can be considered, the model is an accurate one. In comparison with the previous models, it is more accurate because it shows a very low average percent error over the experimental data. This makes the model more desirable by those who require most accurate values for ΔvapHb of pure compounds. On the other hand, the model was evaluated using a data set composed of 4907 compounds belonging to most chemical families of compounds. This simply shows the capability of the model for estimation of ΔvapHb of various chemical compounds. So, this demonstrates the comprehensiveness of the model in comparison with the previously presented models.
4. CONCLUSION In this work, a group contribution-based model was presented for estimation of the vaporization enthalpy of pure compounds at the normal boiling point (ΔvapHb) using a data set containing 4907 pure chemical compounds. The model is the result of a combination of feed forward neural networks and group contributions. The required parameters of the model are the numbers of occurrences of 147 functional groups in each investigated molecule. It should be noted that most of these functional groups are not available in a particular molecule at the same time. Therefore, computation of the required parameters from the chemical structure of any molecule is simple. For developing the model, the experimental ΔvapHb values from a large data set containing 4907 pure compounds from various chemical families were applied. Although in this work a comprehensive model was developed to predict the ΔvapHb of a large number of pure compounds, there are still some limitations. The model has a wide range of applicability, but the prediction capability of the model is restricted to the compounds, which are similar to those ones applied to develop the model. Application of the model for the totally different compounds than the investigated ones is not recommended although it may be used for a rough estimation of the ΔvapHb of these kinds of compounds. 6505
dx.doi.org/10.1021/ie2001764 |Ind. Eng. Chem. Res. 2011, 50, 6503–6507
Industrial & Engineering Chemistry Research Another element to consider is that, the presented model may be used as a technique to test the reliability of the experimental data reported in the literature. Finally, the average absolute deviation of the model results from experimental values demonstrates the accuracy of the presented model.
’ APPENDIX The model is very easy to apply. What is needed is to just drag and drop the mat file (freely available upon request) into the MATLAB environment (any versions) workspace. One can follow the example below to get a response from the model step by step: Assume that one is willing to predict the ΔvapHb of abietic acid using the developed model. First of all, the group-contribution parameters should be defined from the chemical structure of abietic acid (refer to the Supporting Information). Later, drag and drop the mat file, and the following commands should be entered in MATLAB workspace:
Therefore, one will observe the estimated ΔvapHb as follows: 72.3880. The experimental value24 for this compounds is equal to 72.48 (approximate ARD = 0.13%).
’ ASSOCIATED CONTENT
bS
Supporting Information. Supplementary tables contain the collection of 147 functional groups, the names of 490724 pure compounds, their properties, and the presented model in comparison with the previous models. This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
*Fax: þ98 21 77926580. E-mail address:
[email protected] and
[email protected].
’ REFERENCES (1) Poling, B. E.; Prausnitz, J. M.; O’Connell, J. P. The properties of gases and liquids, 5th ed.; McGraw-Hill: New York, 2001. (2) Joback, K. G.; Reid, R. C. Estimation of Pure-Component Properties from Group-Contributions. Chem. Eng. Commun. 1987, 57, 233–243. (3) Zhou, C.; Chu, X.; Nie, C. Predicting thermodynamic properties with a novel semiempirical topological descriptor and path numbers. J. Phys. Chem. B 2007, 111 (34), 10174–10179. (4) Wenying, W.; Jinyu, H.; Wen, X. Group vector space method for estimating enthalpy of vaporization of organic compounds at the normal boiling point. J. Chem. Inf. Comput. Sci. 2004, 44 (4), 1436–9. (5) Dalmazzone, D.; Salmon, A.; Guella, S. A second order group contribution method for the prediction of critical temperatures and enthalpies of vaporization of organic compounds. Fluid Phase Equilib. 2006, 242 (1), 29–42.
ARTICLE
(6) Sanghvi, R.; Yalkowsky, S. H. Estimation of the Normal Boiling Point of Organic Compounds. Ind. Eng. Chem. Res. 2006, 45 (8), 2856–2861. (7) Jia, Q.; Wang, Q.; Ma, P. Prediction of the enthalpy of vaporization of organic compounds at their normal boiling point with the positional distributive contribution method. J. Chem. Eng. Data 2010, 55 (12), 5614–5620. (8) Van Nhu, N.; Singh, M.; Leonhard, K. Quantum mechanically based estimation of perturbed-chain polar statistical associating fluid theory parameters for analyzing their physical significance and predicting properties. J. Phys. Chem. B 2008, 112 (18), 5693–5701. (9) Carruth, G. F.; Kobayashi, R. Extension to low reduced temperatures of three-parameter corresponding states: Vapor pressures, enthalpies and entropies of vaporization, and liquid fugacity coefficients. Ind. Eng. Chem. Fundam. 1972, 11 (4), 509–517. (10) Basarova, P.; Svoboda, V. Prediction of the enthalpy of vaporization by the group contribution method. Fluid Phase Equilib. 1995, 105 (1), 27–47. (11) Liley, P. E. Correlations for the Enthalpy of Vaporization of Pure Substances. Ind. Eng. Chem. Res. 2003, 42 (24), 6250–6251. (12) Meyra, A. G.; Kuz, V. A.; Zarragoicoechea, G. J. Universal behavior of the enthalpy of vaporization: An empirical equation. Fluid Phase Equilib. 2004, 218 (2), 205–207. (13) Riedel, L. Eine Neue Universelle Dampfdruckformel. Chem. Ing. Tech. 1954, 26, 83–89. (14) Chen, N. H. Generalized correlation for latent heat of vaporization. J. Chem. Eng. Data 1965, 10 (2), 207–210. (15) Vetere, A. New Generalized Correlation for Enthalpy of Vaporization of Pure Compounds; 1973. (16) Vetere, A. New correlations for predicting vaporization enthalpies of pure compounds. Chem. Eng. J. 1979, 17 (2), 157–162. (17) Vetere, A. Methods to predict the vaporization enthalpies at the normal boiling temperature of pure compounds revisited. Fluid Phase Equilib. 1995, 106 (12), 1–10. (18) Liu, Z. Y. Estimation of heat vaporization of pure liquid at its normal boiling temperature. Chem. Eng. Commun. 2001, 184, 221–228. (19) Sivaraman, A.; Magee, J. W.; Kobayashi, R. Generalized correlation of latent heats of vaporization of coal liquid model compounds between their freezing points and critical points. Ind. Eng. Chem. Fundam. 1984, 23 (1), 100–97. (20) Morgan, D. L.; Kobayashi, R. Extension of Pitzer CSP models for vapor pressures and heats of vaporization to long-chain hydrocarbons. Fluid Phase Equilib. 1994, 94 (C), 51–87. (21) Morgan, D. L. Use of transformed correlations to help screen and populate properties within databanks. Fluid Phase Equilib. 2007, 256 (12), 54–61. (22) Mohammadi, A. H.; Richon, D. New predictive methods for estimating the vaporization enthalpies of hydrocarbons and petroleum fractions. Ind. Eng. Chem. Res. 2007, 46 (8), 2665–2671. (23) Cachadina, I.; Mulero, A. Evaluation of correlations for prediction of the normal boiling enthalpy. Fluid Phase Equilib. 2006, 240 (2), 173–178. (24) Yaws, C. L. Yaws’ Handbook of Thermodynamic and Physical Properties of Chemical Compounds; 2003. (25) Gharagheizi, F. A new accurate neural network quantitative structure-property relationship for prediction of lower critical solution temperature of polymer solutions. e-Polym. 2007. (26) Gharagheizi, F. QSPR analysis for intrinsic viscosity of polymer solutions by means of GA-MLR and RBFNN. Comput. Mater. Sci. 2007, 40 (1), 159–167. (27) Gharagheizi, F. QSPR studies for solubility parameter by means of genetic algorithm-based multivariate linear regression and generalized regression neural network. QSAR Combin. Sci. 2008, 27 (2), 165–170. (28) Gharagheizi, F.; Alamdari, R. F. A molecular-based model for prediction of solubility of C60 fullerene in various solvents. Fullerenes Nanotubes Carbon Nanostruct. 2008, 16 (1), 40–57. (29) Gharagheizi, F.; Alamdari, R. F.; Angaji, M. T. A new neural network-group contribution method for estimation of flash point temperature of pure components. Energy Fuels 2008, 22 (3), 1628–1635. 6506
dx.doi.org/10.1021/ie2001764 |Ind. Eng. Chem. Res. 2011, 50, 6503–6507
Industrial & Engineering Chemistry Research
ARTICLE
(30) Gharagheizi, F.; Mehrpooya, M. Prediction of some important physical properties of sulfur compounds using quantitative structureproperties relationships. Molec. Divers. 2008, 12 (34), 143–155. (31) Sattari, M.; Gharagheizi, F. Prediction of molecular diffusivity of pure components into air: A QSPR approach. Chemosphere 2008, 72 (9), 1298–1302. (32) Gharagheizi, F. A new group contribution-based model for estimation of lower flammability limit of pure compounds. J. Hazard. Mater. 2009, 170 (23), 595–604. (33) Gharagheizi, F. New neural network group contribution model for estimation of lower flammability limit temperature of pure compounds. Ind. Eng. Chem. Res. 2009, 48 (15), 7406–7416. (34) Gharagheizi, F. Prediction of the standard enthalpy of formation of pure compounds using molecular structure. Aust. J. Chem. 2009, 62 (4), 376–381. (35) Gharagheizi, F.; Sattari, M. Estimation of molecular diffusivity of pure chemicals in water: A quantitative structure-property relationship study. SAR QSAR Environ. Res. 2009, 20 (34), 267–285. (36) Gharagheizi, F.; Tirandazi, B.; Barzin, R. Estimation of aniline point temperature of pure hydrocarbons: A quantitative structure-property relationship approach. Ind. Eng. Chem. Res. 2009, 48 (3), 1678–1682. (37) Gharagheizi, F.; Abbasi, R.; Tirandazi, B. Prediction of Henry’s law constant of organic compounds in water from a new groupcontribution-based model. Ind. Eng. Chem. Res. 2010, 49 (20), 10149– 10152. (38) Mehrpooya, M.; Gharagheizi, F. A molecular approach for the prediction of sulfur compound solubility parameters. Phosphorus, Sulfur Silicon Relat. Elements 2010, 185 (1), 204–210. (39) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Artificial neural network modeling of solubilities of 21 commonly used industrial solid compounds in supercritical carbon dioxide. Ind. Eng. Chem. Res. 2011, 50 (1), 221–226. (40) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Representation/Prediction of Solubilities of Pure Compounds in Water using Artificial Neural Network-Group Contribution Method. J. Chem. Eng. Data 2010, 55, 5059–5064. (41) Gharagheizi, F.; Sattari, B.; Tirandazi, B. Prediction of Lattice Crystal Energy Using Enthalpy of Sublimation: A Group ContributionBased Model. Ind. Eng. Chem. Res. 2011, 50 (4), 2482–2486. (42) Gharagheizi, F.; Abbasi, R. A new neural network group contribution method for estimation of upper flash point of pure chemicals. Ind. Eng. Chem. Res. 2010, 49 (24), 12685–12695. (43) Gharagheizi, F. An accurate model for prediction of autoignition temperature of pure compounds. J. Hazard. Mater. 2011, 189, 211–221. (44) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Critical Properties and Acentric Factors of Pure Compounds Using the Artificial Neural Network Group Contribution Algorithm. J. Chem. Eng. Data, in press doi: 10.1021/je200019g. (45) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Determination of Use of Artificial Neural Network-Group Contribution Method to Determine Surface Tension of Pure Compounds. J. Chem. Eng. Data, in press doi: 10.1021/je2001045. (46) Eslamimanesh, A.; Gharagheizi, F.; Mohammadi, A. H.; Richon, D. Artificial Neural Network Modeling of Solubilities of Supercritical Carbon Dioxide in Various Ionic Liquids. Chem. Eng. Sci. 2011, 56 (4), 720726. (47) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Determination of Parachors of Various Compounds: Artificial Neural Network-Group Contribution Approach. Ind. Eng. Chem. Res., in press doi: 10.1021/ie1002464t. (48) Hagan, M. T.; Demuth, H. B.; Beale, M. Neural Network Design; International Thomson: Andover, MA, 2002. (49) Project 801, Evaluated Process Design Data, Public Release Documentation, Design Institute for Physical Properties (DIPPR); American Institute of Chemical Engineers (AIChE), 1996.
6507
dx.doi.org/10.1021/ie2001764 |Ind. Eng. Chem. Res. 2011, 50, 6503–6507