Prediction of Standard Enthalpy of Combustion of Pure Compounds

Apr 22, 2011 - bS Supporting Information. ABSTRACT: The ... A total of 4590 pure compounds from various chemical families are investigated to propose ...
0 downloads 0 Views 1MB Size
ARTICLE pubs.acs.org/EF

Prediction of Standard Enthalpy of Combustion of Pure Compounds Using a Very Accurate Group-Contribution-Based Method Farhad Gharagheizi,*,† Seyyed Alireza Mirkhani,‡ and Ahmad-Reza Tofangchi Mahyari‡ † ‡

Saman Energy Giti Company, 333161-9636 Tehran, Iran Department of Chemical and Petroleum Engineering, Sharif University of Technology, 11365-6891 Tehran, Iran

bS Supporting Information ABSTRACT: The artificial neural networkgroup contribution (ANNGC) method is applied to estimate the standard enthalpy of combustion of pure chemical compounds. A total of 4590 pure compounds from various chemical families are investigated to propose a comprehensive and predictive model. The obtained results show the squared correlation coefficient (R2) of 0.999 99, root mean square error of 12.57 kJ/mol, and average absolute deviation lower than 0.16% for the estimated properties from existing experimental values.

1. INTRODUCTION The enthalpy of combustion of a chemical compound (ΔcH°) is defined as the increase in enthalpy when the compound in its standard state at a temperature of 298.15 K and pressure of 1 atm undergoes oxidation to defined combustion products. These combustion products are CO2(g), F2(g), Cl2(g), Br2(g), I2(g), SO2(g), N2(g), H3PO4(s), H2O(g), and SiO2(crystobalite). The knowledge of this value is required when considering the thermal efficiency of the equipment for producing either power or heat.1 ΔcH° provides knowledge of the amount of heat energy that is available from a given fuel for the performance of useful work. Therefore, reliable and accurate ΔcH° values are absolutely necessary when preparing plant designs,2 because of difficulties and time consumption while measuring ΔcH° of many compounds, such as toxic, volatile, explosive, or highly reactive compounds. It would be of great interest to have an accurate method for estimation of the property just using the compound chemical structure. There are several methods presented for estimation of ΔcH° in the literature. Cardozo3 proposed a method based on equivalent n-alkane chain length. In other words, the method is some kind of group contribution (GC)-based method. The method was developed using 1168 pure organic compounds. Unfortunately, the accuracy of the model was not discussed in detail in the work by Cardozo. In another survey, Seaton and Harrison2 suggested a method based on the original Benson’s methods that had been previously presented for estimation of enthalpy of formation of pure compounds. Unfortunately, Seaton and Harrison did not mention anything about the accuracy of their model. Furthermore, they did not give any information about the data set that they used to develop their model. Beside the methods, Hshieh et al.4,5 developed two empirical models for estimation of ΔcH° of organosilicon compounds and polymers, respectively. Their models are simple, and their applicability is limited to a particular chemical family of compounds (organosilicons and polymers). In another survey, Gharagheizi6 developed a simple three-parameter quantitative structureproperty relationship (QSPR) for prediction of r 2011 American Chemical Society

ΔcH°. The model was successfully evaluated over 1714 pure compounds and showed a squared correlation coefficient (R2) of 0.996. Recently, Pan et al.7,8 proposed two chemical-structure-based methods for ΔcH° of pure compounds. In the first one, an artificial neural network (ANN) uses atom-type electrotopological state indices to estimate ΔcH°.7 The model showed R2 of 0.991. The data set used in the study is a portion of the one used by Gharagheizi6 in his study (1496 of 1714 compounds). In comparison to the model proposed by Gharagheizi, their model has lower accuracy. In another try, Pan et al.8 applied a larger portion of the data set used by Gharagheizi6 to develop a QSPR method (1650 of 1714 compounds). Finally, they presented a four-parameter QSPR model. The model showed R2 of 0.995 over 1650 pure compounds. A computational comparison among the previously presented models (using the statistical parameters presented by the original authors, neglecting those works in which the authors had not presented any numeric information about their models) shows that the most comprehensive mode is the one presented by Gharagheizi.6 Of course, at first glimpse, it may appear that the QSPR model presented by Pan et al.8 is more accurate because of R2 reported for the model, but there is some important consideration about the data set used by Pan et al.8 They eliminated 64 compounds from the data set applied by Gharagheizi to develop his model. Pan et al.8 did not mention any reason for this issue. If those 64 compounds are taken into account, the model presented by Gharagheizi is simpler (one parameter lower than the one presented by Pan et al.8) and more accurate. In this study, a new very accurate model is presented for the prediction of ΔcH° of pure compounds. The model is an ANN, which uses the GCs to predict ΔcH° of pure compounds. Received: January 14, 2011 Revised: April 20, 2011 Published: April 22, 2011 2651

dx.doi.org/10.1021/ef200081a | Energy Fuels 2011, 25, 2651–2654

Energy & Fuels

ARTICLE

Figure 1. Schematic structure of the three-layer FFANN used in this study: W1, first layer weight; W2, second layer weight; b1, first layer bias; and b2, second layer bias.

2. MATERIALS AND METHODS 2.1. Materials. The accuracy and reliability of models for estimation of physical properties, especially those dealing with a large number of experimental data, directly depend upon the quality and comprehensiveness of the applied data set for its development. The aforementioned characteristics of such a model include both diversity in the investigated chemical families and the number of pure compounds available in the data set. In this work, the database prepared by Yaws was implemented,9 which is one of the most comprehensive sources of physical property data for chemical species, e.g., ΔcH° of pure compounds. It should be noted that values reported in ref 9 are the negative of the enthalpy of combustion. Therefore, ΔcH° for 4590 compounds was found in the database and used as the main data set in this study. 2.2. Providing the Collection of Chemical Groups. In this step, the chemical structures of all pure compounds of the data set were investigated with an algorithm comparing the chemical groups to define the most efficient contributions for the evaluation of the heat of combustion. Finally 142 chemical groups were found to be more efficient for the prediction of ΔcH°. The chemical groups found and used in this study are extensively presented as Supporting Information. The computed number of occurrences of the chemical groups on the compounds of the data set is presented as Supporting Information as well. 2.3. Developing Model. The next calculation step and perhaps the most significant one is to search for a relationship between the chemical functional groups and ΔcH° of chemical compounds. The simplest method for this purpose is assumption of the existence of a multilinear relationship between these groups and the desired property (ΔcH°).10 This technique is a similar method used in most of the classical GC methods. Several calculations show that application of the mentioned methodology for the current problem does not bring about good results in comparison to the previously presented methods. Consequently, the nonlinear mathematical method of ANN is investigated. ANNs are extensively used in various scientific and engineering problems, e.g., estimations of physical and chemical properties of different pure compounds.1037 These capable mathematical tools are generally applied to study complicated systems. The theoretical explanations about ANNs can be found elsewhere.38 Using the neural network toolbox of the MATLAB software (Mathworks, Inc.), a three-layer feed forward artificial neural network (FFANN) is developed for the problem. The typical structure of a three-layer FFANN is schematically presented in Figure 1. The capabilities of this kind of ANN have been demonstrated in the previous works.1038 All of the 142 functional groups and also ΔcH° values are normalized between 1 and þ1 to decrease computational errors during development of the model. This can be performed using maximum and minimum values of each functional group for input data and using maximum and minimum values of ΔcH° for output parameters. Later, the main data set is divided into three new subdata sets, including the “training” set, the “validation (optimization)” set, and

Figure 2. Comparison between the predicted and experimental ΔcH°. the “test (prediction)” set. In this work, the “training” set is used to generate the ANN structure, the “validation (optimization)” set is applied for optimization of the model, and the “test (prediction)” set is used to investigate the prediction capability and validity of the obtained model. The process of division of the main data set into three subdata sets is performed randomly. For this purpose, about 80, 10, and 10% of the main data set are randomly selected for the “training” set (3672 compounds), the “validation” set (459 compounds), and the “test” set (459 compounds), respectively. The effect of the allocation percent of the three subdata sets from the data of the main data set on the accuracy of the ANN model has been studied elsewhere.10 As a matter of fact, generating an ANN model is for the determination of the weight matrices and bias vectors.38 As shown in Figure 1, there are two weight matrices and two bias vectors in a three-layer FFANN: W1 and W2 and b1 and b2.1038 These parameters should be obtained by minimization of an objective function. The objective function used in 2652

dx.doi.org/10.1021/ef200081a |Energy Fuels 2011, 25, 2651–2654

Energy & Fuels

ARTICLE

Table 1. Statistical Parameters of the Obtained Model statistical parameter

value Training Set

R2

0.999

average percent error

0.16

standard deviation error

4838.333

root mean square error

11.87

n

3672 Validation Set

R2

0.999

average percent error standard deviation error

0.18 4677.2

root mean square error

12.47

n

459 Test Set

R2

0.999

average percent error

0.20

standard deviation error

4990.111

root mean square error

17.23

n

459 Training þ Validation þ Test Sets

R2

0.999

average percent error

0.16

standard deviation error

4837.487

root mean square error

12.57

n

4590

this study is sum of squares of errors between the outputs of the ANN (predicted ΔcH° values) and the target values (experimental ΔcH° values). This minimization is performed by the LevenbergMarquardt (LM)38 optimization strategy. There are also more accurate optimization methods other than this algorithm; however, they need much more convergence time. In other words, the more accurate the optimization, the more time is needed for the algorithm to converge to the global optimum. The LM38 is the most widely used algorithm for training because it is robust and accurate enough to deal with the considered system.1038 In most cases, the number of neurons in the hidden layer (n) is fixed. Therefore, the main goal is to produce an ANN model that is able to predict the target values as accurately as expected. This step is repeated until the best ANN is obtained. Generally and especially in three-layer FFANNs, it is more efficient that the number of neurons in the hidden layer is optimized according to the accuracy of the obtained FFANN.1038

3. RESULTS AND DISCUSSION Using the presented procedure, a three-layer FFANN was obtained for the prediction of ΔcH° of pure compounds. To determine the number of neurons in the hidden layer of the neural network, numbers 120 were tested; thus, the number 10 showed the best results. Therefore, the best three-layer FFANN has a structure of 142101. The mat file (MATLAB file format) of the obtained neural network containing all parameters of the obtained model is freely accessible from the author by e-mail. The predicted values of ΔcH° using the model in comparison to the experimental data are presented in Figure 2. The

values of the predicted ΔcH° as well as the status of each pure compound in the model (belonging to the training set, the validation set, or the test set) are presented as Supporting Information. The statistical parameters of the model are presented in Table 1. These results show that R2, average percent error, and root-mean-square error of the model over the training set, the validation set, the test set and the main data set are 0.999, 0.999, 0.999, and 0.999, 0.16, 0.18, 0.2, and 0.16%, and 11.87, 12.47, 17.23, and 12.57 kJ/mol, respectively. As can be found from the results, the model shows lower rootmean-square error over the test set in comparison to the training set and the validation set. This latter demonstrates the high predictive power of the model. It should be noted that the model presented here cannot distinguish the cis and trans isomers. The difference between heat of combustion values for cis and trans isomers is very low. As an example, ΔcH° reported for cis-1,3-pentadiene and trans-1,3pentadiene are 2990.8 and 2983.9 kJ/mol, respectively. There is a 6.9 kJ/mol difference between the two values. The difference is within the margin of error reported for the model. As can be seen, the model is a very accurate one. In comparison to the previous model presented by one of the authors, this model is more accurate because it has a R2 that is touching the ideal value, 1. Also, it has a very low average percent error, which makes this model more desirable by those who require most accurate values for ΔcH° of pure compounds. On the other hand, the model was evaluated using a data set composed of 4590 pure compounds from various chemical families. This simply shows the capability of the model for estimation of ΔcH° of various chemical compounds. Furthermore, this data set is about thrice the most comprehensive applied data set to develop a model for ΔcH° of pure compounds (4590 versus 1714 used in the author’s previous work). This latter reveals the comprehensiveness of the model.

4. CONCLUSION In this study, a GC-based model was presented for estimation of the standard enthalpy of combustion (ΔcH°) of pure compounds using a data set containing 4590 pure chemical compounds. The model is the result of a combination of FFANNs and GCs. The required parameters of the model are the numbers of occurrences of 142 functional groups in each investigated molecule. It should be noted that most of these functional groups are not simultaneously available in a particular molecule. Therefore, computation of the required parameters from the chemical structure of any molecule is simple. To develop the model, the experimental ΔcH° values from a large data set9 containing 4590 pure compounds from various chemical families were applied. As a consequence, a comprehensive model was developed to predict ΔcH° of many of the pure compounds. There are still some limitations. The model has a wide range of applicability, but the prediction capability of the model is restricted to the compounds, which are similar to those applied to develop the model. Application of the model to compounds totally different from the investigated ones is not recommended, although it may be used for a rough estimation of ΔcH° of these kinds of compounds. Another element to consider is that the presented model may be used as a technique to test the reliability of the experimental data reported in the literature. Finally, the average absolute 2653

dx.doi.org/10.1021/ef200081a |Energy Fuels 2011, 25, 2651–2654

Energy & Fuels deviation of the model results from experimental values,9 demonstrating the accuracy of the presented model.

’ ASSOCIATED CONTENT

bS

Supporting Information. Collection of 142 chemical groups, names of 4590 pure compounds and their properties, and chemical structure of all of the compounds used in this study. This material is available free of charge via the Internet at http:// pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*Fax: þ98-21-77926580. E-mail: [email protected] and/or fghara@ gmail.com.

ARTICLE

(30) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Ind. Eng. Chem. Res. 2011, 50, 5815. (31) Gharagheizi, F.; Babaie, O.; Mazdeyasna, S. Ind. Eng. Chem. Res. 2011, 50, 6503. (32) Gharagheizi, F.; Salehi, G. R. Thermochim. Acta 201110.1016/ j.tca.2011.04.001. (33) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. J. Chem. Eng. Data 2011, 56, 720. (34) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. J. Chem. Eng. Data 2011, 56, 2460. (35) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. J. Chem. Eng. Data 2011, 56, 2587. (36) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. J. Chem. Eng. Data 2011, 56, 1741. (37) Eslamimanesh, A.; Gharagheizi, F.; Mohammadi, A. H.; Richon, D. Chem. Eng. Sci. 2011, 66, 3039. (38) Hagan, M.; Demuth, H. B.; Beale, M. H. Neural Network Design; International Thomson Publishing: Boston, MA, 2002.

’ REFERENCES (1) American Society for Testing and Materials (ASTM). ASTM D4809-09a. Standard Test Method for Heat of Combustion of Liquid Hydrocarbon Fuels by Bomb Calorimeter; ASTM: West Conshohocken, PA, 2010. (2) Seaton, W. H.; Harrison, B. K. J. Loss Prev. Process Ind. 1990, 3, 311. (3) Cardozo, R. L. AIChE J. 1986, 32, 844. (4) Hshieh, F. Y. Fire Mater. 1999, 23, 79. (5) Hshieh, F. Y.; Hirsch, D. B.; Beeson, H. D. Fire Mater. 2003, 27, 9. (6) Gharagheizi, F. Chemom. Intell. Lab. Syst. 2008, 91, 177. (7) Cao, H. Y.; Jiang, J. C.; Pan, Y.; Wang, R.; Cui, Y. J. Loss Prev. Process Ind. 2009, 22, 222. (8) Pan, Y.; Jiang, J. C.; Wang, R.; Jiang, J. J. J. Loss Prev. Process Ind. 2011, 24, 85. (9) Yaws, C. L. Yaws’ Handbook of Thermodynamic and Physical Properties of Chemical Compounds; Knovel: Norwich, NY, 2003; http:// www.knovel.com/web/portal/browse/display? _EXT_KNOVEL_DISPLAY_bookid=667&VerticalID=0. (10) Gharagheizi, F. Comput. Mater. Sci. 2007, 40, 159. (11) Gharagheizi, F. e-Polymers 2007No. 114. (12) Vatani, A.; Mehrpooya, M.; Gharagheizi, F. Int. J. Mol. Sci. 2007, 8, 407. (13) Gharagheizi, F.; Mehrpooya, M. Mol. Diversity 2008, 12, 143. (14) Gharagheizi, F.; Alamdari, R. F.; Angaji, M. T. Energy Fuels 2008, 22, 1628. (15) Gharagheizi, F.; Fazeli, A. QSAR Comb. Sci. 2008, 27, 758. (16) Gharagheizi, F.; Alamdari, R. F. Fullerenes, Nanotubes, Carbon, Nanostruct. 2008, 16, 40. (17) Gharagheizi, F. QSAR Comb. Sci. 2008, 27, 165. (18) Sattari, M.; Gharagheizi, F. Chemosphere 2008, 72, 1298. (19) Gharagheizi, F.; Sattari, M. SAR QSAR Environ. Res. 2009, 20, 267. (20) Gharagheizi, F. Ind. Eng. Chem. Res. 2009, 48, 7406. (21) Gharagheizi, F. Aust. J. Chem. 2009, 62, 376. (22) Gharagheizi, F.; Tirandazi, B.; Barzin, R. Ind. Eng. Chem. Res. 2009, 48, 1678. (23) Gharagheizi, F. Energy Fuels 2010, 27, 3867. (24) Mehrpooya, M.; Gharagheizi, F. Phosphorus Sulfur Relat. Elem. 2010, 185, 204. (25) Gharagheizi, F.; Abbasi, R.; Tirandazi, B. Ind. Eng. Chem. Res. 2010, 49, 10149. (26) Gharagheizi, F.; Abbasi, R. Ind. Eng. Chem. Res. 2010, 49, 12685. (27) Gharagheizi, F.; Sattari, M. Ind. Eng. Chem. Res. 2011, 50, 2482. (28) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Ind. Eng. Chem. Res. 2011, 50, 221. (29) Gharagheizi, F. J. Hazard. Mater. 2011, 189, 211. 2654

dx.doi.org/10.1021/ef200081a |Energy Fuels 2011, 25, 2651–2654