Chemical Structure-Based Model for Estimation of ... - ACS Publications

Jun 8, 2010 - In the present work, a new molecular-based model is presented for estimation of the upper flammability limit (UFL) of pure compounds...
0 downloads 0 Views 1MB Size
Energy Fuels 2010, 24, 3867–3871 Published on Web 06/08/2010

: DOI:10.1021/ef100207x

Chemical Structure-Based Model for Estimation of the Upper Flammability Limit of Pure Compounds Farhad Gharagheizi*,† Department of Chemical Engineering, Faculty of Engineering, University of Tehran, Post Office Box 11365-4563, Tehran, Iran Received February 23, 2010. Revised Manuscript Received May 21, 2010

In the present work, a new molecular-based model is presented for estimation of the upper flammability limit (UFL) of pure compounds. The parameters of the model are the number of occurrences of a new collection of 113 functional groups. On the basis of these 113 functional groups, a feed-forward neural network is presented to estimate the UFL of pure compounds. The squared correlation coefficient, absolute percent error, standard deviation, and root-mean-square error of the model over the 867 pure compounds used for the development of the model are 0.9469, 7.07%, 0.883, 0.882, respectively. Therefore, the model is accurate and can be used to predict the UFL for a wide range of pure compounds.

The development of safe practices for handling and storage of combustible compounds in chemical industries needs the knowledge of the flammability characteristics of the compounds.1,2 These characteristics visualize the applicability range of the compounds to prevent fire and explosion. Every combustible gas burns in air only over a limited range of concentration. Lower than a special concentration of the compound in air, which is called the lower explosion limit or lower flammability limit (LFL), the mixture of the compound with air is too lean, while above another special concentration, which is called the upper explosion limit or upper flammability limit (UFL), the mixture is too rich. The concentrations between these two limits constitute the flammable range. Therefore, to prevent fire and explosion of a flammable gas, the knowledge of the LFL and UFL is essential. The UFL depends upon several factors, such as the nature of the compound, the geometry of the apparatus, the strength of the ignition source, the test temperature and pressure, the degree of mixing, the oxygen concentration, and the concentration of the diluents.3 Therefore, measuring the UFL requires the standard apparatus and several conditions stated in American Society for Testing and Materials (ASTM) E681.4 For most compounds, the UFL is measured by a 5 L spherical glass test vessel, but for those compounds that have large quenching distances and may be difficult to ignite, ASTM E681 recommends the use of a 12 L spherical flask. On the basis of ASTM E681, the gaseous mixtures (mixture of the compound with air) are subjected to an electrical spark-ignition source and the absence or presence of the ignition sources are visually determined.

Recognizing the subjectivity of the visual flame propagation assessment for near-limit mixtures, video-recording systems are used for subsequent analysis. Although the UFL has already been determined for many common compounds, the development of accurate computational methods is needed to estimate or even predict the UFL of those compounds in which their UFL has not been measured. There are several methods for the estimation of the UFL of pure compounds in the literature. We can divide these methods into three main classes. The first class is the empirical methods that use other physicochemical properties to correlate the UFL of pure compounds. Of these methods, we can refer to the work of Suzuki and Koide.5 This class has a main disadvantage. The accuracy of the models in this class is directly related to the accuracy of the physicochemicals used as parameters of the models. The second class of the methods used to estimate the UFL of pure compounds is the mathematical method based on theoretical laws. Of this class, we can refer to the works of Jones6 and Seaton.7 These methods are very useful because they give some theoretical information about combustion and fire. These methods have a main disadvantage. They cannot estimate the UFL of pure compounds with suitable accuracy; therefore, they cannot be used. The third class of methods used to estimate UFL is the quantitative structure-property relationship method. These methods are extensively used to estimate various physicochemical properties and have successfully been used by the author for estimation of some flammability characteristics.8-14 In this class, the chemical structures of molecules are used to develop parameters that can then be used to correlate the UFL of pure compounds. Of these methods, we can refer to the models

*To whom correspondence should be addressed. Fax: þ98-2166957784. E-mail: [email protected] or [email protected]. † Saman Energy Giti Co., Postal Code: 3331619636, Tehran, Iran. (1) Crowl, D. A.; Louvar, J. F. Chemical Process Safety Fundamentals with Applications, 2nd ed.; Prentice-Hall: Upper Saddle River, NJ, 2002. (2) Lees, F. P. Loss Prevention in the Process Industries, 2nd ed.; Butterworth-Heinemann: Oxford, U.K., 1996. (3) Sheldon, M. Fire Prev. 1984, 174, 23. (4) Britton, L. G.; Cashdollar, K. L.; Fenlon, W.; Furip, D.; Going, J.; Harrison, B. K.; Niemeier, J.; Ural, E. A. Process Saf. Prog. 2005, 24, 12.

(5) Suzuki, T.; Koide, K. Fire Mater. 1994, 18, 393. (6) Jones, G. W. Chem. Rev. 1938, 22, 1. (7) Seaton, W. H. J. Hazard. Mater. 1991, 27, 169. (8) Gharagheizi, F. Energy Fuels 2008, 22, 3037. (9) Gharagheizi, F.; Alamdari, R. F.; Angaji, M. T. Energy Fuels 2008, 22, 1628. (10) Gharagheizi, F. Chemom. Intell. Lab. Syst. 2008, 91, 177. (11) Gharagheizi, F. J. Hazard. Mater. 2009, 167, 507. (12) Gharagheizi, F. J. Hazard. Mater. 2009, 169, 217. (13) Gharagheizi, F. J. Hazard. Mater. 2009, 170, 595. (14) Gharagheizi, F. Ind. Eng. Chem. Res. 2009, 48, 7406.

1. Introduction

r 2010 American Chemical Society

3867

pubs.acs.org/EF

Energy Fuels 2010, 24, 3867–3871

: DOI:10.1021/ef100207x

Gharagheizi

Table 1. Functional Groups Used To Develop the Modeld

3868

Energy Fuels 2010, 24, 3867–3871

: DOI:10.1021/ef100207x

Gharagheizi Table 1. Continued

a The superscript represents the formal oxidation number. The formal oxidation number of a carbon atom equals the sum of the conventional bond orders with electronegative atoms. The C--N bond order in pyridine may be considered as 2 when we have one such bond and 1.5 when we have two such bonds. The C..X bond order in pyrrole or furan may be considered as 1. b An R-C may be defined as a C attached through a single bond with -CdX, -C#X, -C--X. c Pyrrole-type structure. d R represents any group linked through carbon. X represents any electronegative atom (O, N, S, P, Se, and halogens). Al and Ar represent aliphatic and aromatic groups, respectively. d represents a double bond. # represents a triple bond. -- represents an aromatic bond, such as in benzene, or delocalized bonds, such as the N-O bond in a nitro group. .. represents aromatic single bonds, such as the C-N bond in pyrrole.

presented by Shebeko et al.,15 High and Danner,16 Albahri,17 Gharagheizi,11 and Pan et al.18 Perhaps, the simplest version of these methods is the wellknown classic group contribution (GC) methods.16,17 Generally, in GC methods, some functional groups are identified as useful to relate the property to the chemical structure. Then, the number of occurrences of these functional groups is computed and used as a parameter to correlate the property. The aim of this study is to present a new method for estimation of the UFL of pure compounds. This method belongs to the third class and is a combination of the application of a new collection of functional groups to give maximum information about the chemical structure and the application of neural networks as a powerful mathematical tool to develop the model.

diverse references. The database is updated every year. During development of this database, some methods have been used to evaluate the collected properties; therefore, most accurate data are presented in this database. After considering the database, 867 pure compounds were found and their UFLs were extracted and used to develop the model. This data set is exactly the same one used in a previous study presented by the author.11 This data set is presented as Supporting Information. 2.2. Development of a New Collection of Functional Groups. In this step, the chemical structures of all 867 pure compounds were analyzed and, finally, 113 functional groups were found useful to relate the UFL. The functional groups found and used in this study and their chemical structures are extensively presented in Table 1. These 113 functional groups and their number of occurrences in pure compounds are presented as Supporting Information. These functional groups are used as input parameters for the model. 2.3. Development of the Model. Neural networks are widely used in various scientific and engineering areas. These powerful tools are usually used to study the complicated problems, such as the prediction of physicochemical properties of pure compounds and mixtures from chemical structures.20 The main advantage of the neural networks is that complex, nonlinear relationships can be modeled without any assumptions of the model.20 The theoretical basis of the neural networks can be found elsewhere.21 Three-layer feed-forward neural networks (FFNNs) with the sigmoidal transfer function are one of the most widely used types of neural networks that have found various applications in the estimations of physical properties. The FFNNs are available

2. Experimental Section 2.1. Data Set Preparation. The validity of a model directly depends upon the accuracy of the data set used to develop the model. On the other hand, the comprehensiveness of a model is related to the size of the data set and also the number of chemical families available in the data set. Both the validity and the comprehensiveness are preserved using the Design Institute for Physical Properties (DIPPR) 801 database.19 This database is the result of a vast literature survey of over more than 23 000 (15) Shebeko, Y. N.; Ivanov, A. V.; Alekhina, E. N.; Barmakova, A. A. Sov. Chem. Ind. 1983, 15, 599. (16) High, M. S.; Danner, R. P. Ind. Eng. Chem. Res. 1987, 26, 1395. (17) Albahri, T. Chem. Eng. Sci. 2003, 58, 3629. (18) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Cui, Y. Ind. Eng. Chem. Res. 2009, 48, 5064. (19) Design Institute for Physical Properties (DIPPR). Project 801, Evaluated Process Design Data, Public Release Documentation. American Institute of Chemical Engineers (AIChE), New York, 2006.

(20) Taskinen, J.; Yliruusi, J. Adv. Drug Delivery Rev. 2003, 55, 1163. (21) Hagan, M.; Demuth, H. B.; Beale, M. H. Neural Network Design; International Thomson Publishing: Tampa, FL, 2002.

3869

Energy Fuels 2010, 24, 3867–3871

: DOI:10.1021/ef100207x

Gharagheizi

Figure 1. Schematic structure of the three-layer FFNN used in this study.

in the Neural Network Toolbox in MATLAB software (Mathworks, Inc. software), and we selected this type of neural networks to develop the model. The first work was dividing our main data set into two data sets: the first for training the network and the second for testing it. Neural networks are good at fitting functions, and there is a proof that a simple neural network can fit any data set very well; therefore, the prediction power of the neural network should be checked by the test set. The test set is only used to check the validity of produced neural network and is not used to train it. The effect of the allocation percent of the test set from the main data set on the accuracy of the neural networks has been studied by the author.22 The results of this study show that the percent of the test set allocated from the main data set should be between 5 and 35%. In this work, 20% of the main data set (173 compounds) was randomly allocated to the test set, and all of the remaining 80% of the main data set (694 compounds) was allocated to the training set. The schematic structure of the used three-layer FFNN in this work is shown in Figure 1. The simplified form of the relationship between input parameters and output of a three-layer FFNN can be shown as eq 1. ycalc ðiÞ ¼ ðW2 ðtanhððW1 Ti Þ þ b1 ÞÞÞ þ b2

Figure 2. Comparison between estimated and experimental values of the UFLs. Table 2. Comparison between the Obtained Model and Previously Presented Models model Suzuki and Koide5 Jones6 Seaton7 Shebeko et al.15 High and Danner16 Albahri17 Gharagheizi11 Pan et al.18 this work

ð1Þ

R2

absolute percent error

n 95

0.23

27.8 20

142

0.92 0.92 0.92 0.758 0.9469

26.4 11.8 9.2 7.07

181 464 865 465 867

the three-layer FFNN used in this study can be found elsewhere.9,13,23-31

In this equation, T is the input matrix of dimension nparam  nds. nparam is the number of functional groups (it is equal to 113 in this study), and nds is the number of available compounds of the training set (it is equal to 694 in this study). Ti is the ith column of matrix T. W1 is the first weight matrix of the three-layer FFNN and is of dimension n  nds. n is the number of neurons in the hidden layer. b1 is the first bias matrix of dimension n  1. W2 is the second weight matrix of the output layer and is of dimension n  1. b2 is the second bias of the output layer, which is a scalar value. ycalc(i) is the ith output of this network, which should be compared to the ith member of the property. Usually, all inputs and outputs of FFNN are normalized between -1 and þ1, to decrease the computational errors. This work was performed by means of the minimum and maximum values of every input parameter value and also output value. The values of W1, W2, b1, and b2 are obtained by minimization of an objective function. In this study, the mean squared error between the outputs of the neural network and the target values was used. This minimization is usually performed using the Levenberg-Marquart algorithm. This algorithm is rapid and accurate in the process of training neural networks.20 This type of neural network has been used by the author in his previous work; therefore, the detailed explanations about

3. Results and Discussion From the presented procedure in the previous section, an optimized FFNN was obtained for the prediction of the UFL. For determination of the number of neurons of the hidden layer of the neural network, numbers 1-50 were checked, and then the number 4 showed the best results; therefore, the best three-layer FFNN has the structure 113-4-1. The .mat file (MATLAB file format) of the obtained neural network containing all parameters of the obtained model can be freely accessed by e-mail from the author. The predicted UFL using this model in comparison to the DIPPR 801 values is shown in Figure 2. Also, these values are reported as Supporting Information. The results obtained by the model are presented in Table 2. These results show that the squared correlation coefficient, absolute percent error, standard deviation, and root-meansquare error of the model over the training set, test set, and main data set are 0.9476, 0.9433, and 0.9469, 7.26%, 6.32%, and 7.07%, 0.881, 0.887, and 0.883, and 0.881, 0.882, and 0.882, respectively. The absolute percent error obtained by the model over all 867 pure compounds is shown in Figure 3. As can be found, the obtained model is an accurate model to estimate the UFL of pure components. Also, a comparison between the three classes of methods presented in the Introduction shows that the quantitative

(22) Gharagheizi, F. Comput. Mater. Sci. 2007, 40, 159. (23) Gharagheizi, F. e-Polymers 2007, Article 114. (24) Gharagheizi, F.; Fazeli, A. QSAR Comb. Sci. 2008, 27, 758. (25) Gharagheizi, F.; Alamdari, R. F. Fullerenes, Nanotubes, Carbon Nanostruct. 2008, 16, 40. (26) Gharagheizi, F.; Mehrpooya, M. Mol. Diversity 2008, 12, 143. (27) Gharagheizi, F.; Tirandazi, B.; Barzin, R. Ind. Eng. Chem. Res. 2009, 48, 1678. (28) Sattari, M.; Gharagheizi, F. Chemosphere 2008, 72, 1298. (29) Gharagheizi, F. Aust. J. Chem. 2009, 62, 374. (30) Gharagheizi, F.; Sattari, M. SAR QSAR Environ. Res. 2009, 20, 267.

(31) Mehrpooya, M.; Gharagheizi, F. Phosphorus, Sulfur Silicon Relat. Elem. 2010, 185, 204.

3870

Energy Fuels 2010, 24, 3867–3871

: DOI:10.1021/ef100207x

Gharagheizi

Table 3. Average Absolute Percent Errors of the Model over Each Chemical Family of Compounds Used in This Study family

average percent error

cycloaliphatic alcohols dicarboxylic acids aromatic esters polyfunctional acids cycloalkenes other aliphatic alcohols other hydrocarbon rings diphenyl/polyaromatics n-alcohols polyfunctional amides/amines dialkenes other aliphatic amines alkylcyclohexanes methylalkanes aliphatic ethers anhydrides aromatic alcohols aromatic chlorides other alkanes silanes/siloxanes polyfunctional esters 2,3,4-alkenes polyfunctional nitriles cycloalkanes polyols n-aliphatic acids unsaturated aliphatic esters C3 and higher aliphatic chlorides other ethers/diethers dimethylalkanes aldehydes 1-alkenes naphthalenes ketones other monoaromatics other alkylbenzenes other polyfunctional organics methylalkenes other saturated aliphatic esters formates other amines and imines other polyfunctional C, H, and O other aliphatic acids propionates and butyrates aromatic carboxylic acids polyfunctional C, H, O, and N n-aliphatic primary amines terpenes ethyl and higher alkenes aromatic amines C, H, and Br compounds mercaptans polyfunctional C, H, O, and halides nitriles n-alkanes sulfides/thiophenes acetates polyfunctional C, H, O, and S polyfunctional C, H, N, halides, and (O) nitroamines C, H, and NO2 compounds isocyanates/diisocyanates C1/C2 aliphatic chlorides C, H, and F compounds organic salts n-alkylbenzenes C, H, and I compounds peroxides multi-ring cycloalkanes alkylcyclopentanes C, H, and multi-halogen compounds epoxides other condensed rings

30.92 18.26 16.42 14.02 12.17 11.52 11.23 10.85 10.74 10.69 10.04 9.88 9.81 9.40 9.12 8.63 8.45 8.33 8.20 8.10 8.08 7.87 7.80 7.77 7.76 7.17 7.16 6.90 6.78 6.74 6.73 6.71 6.55 6.47 6.44 6.30 6.17 5.99 5.65 5.64 5.45 5.27 5.17 5.03 4.94 4.51 4.47 4.33 4.31 4.26 4.20 4.05 3.89 3.84 3.82 3.43 3.00 2.96 2.62 2.47 2.24 2.19 2.18 1.98 1.76 1.75 1.52 1.34 1.24 1.19 1.02 0.61 0.60

Figure 3. Absolute percent error obtained by the presented model and the number of pure compounds in each range.

structure-property relationships are more accurate than both empirical and theoretical methods in estimation of the UFL of pure compounds. In this class, the method presented by the author11 showed a better result in comparison to other methods. Also, the comparison between the model and the QSPR method previously presented by the author11 (with the same data set used in this study) shows that this model is more accurate and is simpler to use. Also, this model has a lower number of outliers than that model. The average absolute percent errors of the model over each chemical family of pure compounds used in this study are shown in Table 3. As can be found, the largest error is related to the cycloaliphatic alcohols. This case shows that the model cannot predict the UFL of this chemical family as well as other chemical families. The main reason of this weakness is probably related to the number of pure compounds of the chemical family in the data set. Of 867 pure compounds, only 1 pure compound is cycloaliphatic alcohol; therefore, one pure compound is so small to modify the behavior of the model for the prediction of the UFL of the chemical family. It seems that the modification of the model needs more experimental data for the chemical family. 4. Conclusion In the presented study, a molecular-based model was presented for prediction of the UFL of pure compounds. The model is the result of a combination of group contributions and FFNNs. The parameters of the model are the number of occurrences of 113 functional groups in every molecule. It should be noted that the majority of these 113 functional groups are not simultaneously available in a molecule; therefore, computation of these parameters from the chemical structure of every molecule is simple. To develop the model, 867 pure compounds were used; therefore, this model can be used to predict the UFL of every regular compound with some limitations. These 867 pure compounds cover many families of compounds; therefore, the model has a wide range of applicability, but application of the model is restricted to those compounds similar to the compounds used to develop this model. Application of the model for those compounds that are completely different from the compounds used to develop the model is not recommended. Supporting Information Available: Data set of 867 pure compounds found and their UFLs used to develop the model. This material is available free of charge via the Internet at http:// pubs.acs.org. 3871