Article pubs.acs.org/IECR
Prediction of Minimum Ignition Energy from Molecular Structure Using Quantitative Structure−Property Relationship (QSPR) Models Beibei Wang,†,‡ Lulu Zhou,‡ Kaili Xu,*,† and Qingsheng Wang*,‡,§ †
School of Resources and Civil Engineering, Northeastern University, Shenyang, Liaoning 110819, China Department of Fire Protection & Safety, Oklahoma State University, Stillwater, Oklahoma 74078, United States § Department of Chemical Engineering, Oklahoma State University, Stillwater, Oklahoma 74078, United States ‡
S Supporting Information *
ABSTRACT: Minimum ignition energy (MIE) is one of the most important and most widely used parameters when characterizing hazardous chemicals. However, it is extremely difficult to obtain experimental MIE data due to the high cost, time involved, and safety issues of experimental tests. In this work, two quantitative structure−property relationship (QSPR) models were built based on existing experimental data and molecular simulations through analysis of multiple linear regression (MLR) and support vector machine (SVM). Experimental MIE data of 61 chemicals were collected, and their molecular descriptors were derived solely from their molecular structures, which were optimized at B3LYP/631G(d) level using Gaussian 09. Both models were validated to have excellent performances in goodness-of-fit, internal robustness, and external predictive ability, and hence, they are qualified to predict MIE values for chemicals with no experimental MIE data available. These two validated models can also help gain a better understanding of effects of molecular structures on ignition properties of hydrocarbon fuels.
1. INTRODUCTION Minimum ignition energy (MIE) is an important parameter for studying and quantifying ignition hazards of fuels.1−3 Each flammable mixture has a specific threshold of ignition energy corresponding to MIE, and ignition sources with energy below this threshold will never ignite the mixture.4 Generally, MIE is determined from the energy stored in a capacitor at a known voltage which is then discharged through a specified fixed electrode gap (i.e., ASTM E582).4−7 However, for flammable and explosive chemicals, experimental measurements could be very dangerous and costly along with high experimental uncertainties. Hence, it can be of great help to explore a convenient alternative method to experimental measurements.8,9 A quantitative structure−property relationship (QSPR) model can reveal a mathematical relationship between structural attribute(s) and property of interest at the quantum chemical level. The use of such models for the prediction of target properties for a variety of chemicals prior to or in lieu of expensive and labor-intensive experimental measurements has naturally been very promising and enticing.10 Nevertheless, to the best of our knowledge, there is no qualified QSPR model for the prediction of MIE for hydrocarbon fuels. In 1952, Calcote et al. reported the values of MIE for a number of fuels by performing extensive experiments.11 They also investigated © XXXX American Chemical Society
the effect of molecular structures on MIE values qualitatively. In 2016, Baati tried to build some models to predict MIE values but unfortunately none of them are qualified.1 The global model in her PhD thesis comprised as many as 16 parameters and the average deviation of this model even exceeded 1000%. Among the three local models she built, the only satisfying one was the model built for the solid state materials. However, this model included as many as 27 parameters, which was very complex and unpractical. In this work, experimental MIE data of 61 chemicals were collected from Calcote’s work and two excellent QSPR models were developed.11 Both linear and nonlinear relationships between molecular structures and the denary logarithms of MIE values were studied by a combination of QSPR methodology and molecular descriptors obtained from molecular simulations at a high level, B3LYP/6-31G(d), using Gaussian 09. Moreover, these two QSPR models were validated and compared with each other. This is the first time that QSPR methodology and quantum chemistry are combined to study MIE properties for hydrocarbon fuels. This study could also be extended to predict Received: Revised: Accepted: Published: A
November 8, 2016 December 12, 2016 December 15, 2016 December 15, 2016 DOI: 10.1021/acs.iecr.6b04347 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Table 1. SP Value for Each Type of Chemicals type of chemicals
SP
type of chemicals
SP
type of chemicals
SP
alkane alkene alkyne alcohol sulfur alcohol chloride
2.0 −0.3 −3.0 2.1 1.0 3.0
primary amine secondary amine tertiary amine aldehyde ketone ester
5.0 3.0 2.0 2.0 2.6 5.0
ether sulfur ether peroxide cyclic compound cyclic compound (oxygen-containing ring) inorganic substance
2.1 1.5 −1.8 0.5 0 0
η = εLUMO − εHOMO
MIE values for all kinds of chemicals used in the process industries.
ω=
2. METHODOLOGY 2.1. Data Preparation. As MIE data is extremely sensitive to test conditions, in this work, all experimental MIE values were extracted from a single reference to guarantee the compatibility among them.11 In this reference, all experiments were performed with stoichiometric fuel−air mixtures at a pressure of 1 atm, and in total, 61 MIE values were collected. They were divided into a training set (49 samples), which was used to establish the QSPR models, and a test set (12 samples) for external validation. 2.2. Computational Details. Molecular structures of the 61 chemicals were downloaded from the PubChem database.12 It is well known that molecular descriptors are dependent on the molecular structures. In order to achieve the structure with the lowest energy, geometry optimization was performed for each chemical. Vibrational frequencies were also calculated to guarantee that the optimized structure is a minimum rather than a transition state. The optimizations and the frequency calculations were carried out using Gaussian 09 at 298 K and 1 atm. 13−15 Becke, 3-parameter, Lee, Yang, and Parr (B3LYP)16,17 and the 6-31G(d) basis set,18,19 including polarization function for angular flexibility, were adopted. Molecular descriptors are defined as numeric characteristics of a chemical compound directly calculated from its molecular structure with special algorithms.20 In this work, a series of effective descriptors were selected, and the details were explained in the following. Constitutional Descriptors. This group of descriptors consists of the molecular weight (MW), number of carbon atoms (nC), number of hydrogen atoms (nH), number of oxygen atoms (nO), number of nitrogen atoms (nN), number of sulfur atoms (nS), number of halogen atoms (nh), and structure parameter (SP). Some molecular structural fragments increase the MIE of an energetic chemical, while other molecular structural fragments decrease the MIE. SP is a numeric value which presents the positive and negative contributions of these molecular structural fragments on MIE. According to the study by Calcote et al., each type of chemicals was given a specific SP value as shown in Table 1.11 Quantum Chemical Descriptors. This group of descriptors includes the dipole moment (DM), energy of the highest occupied molecular orbital (εHOMO), energy of the lowest unoccupied molecular orbital (εLUMO), chemical potential (μ), hardness (η), and electrophilicity index (ω). Chemical potential, hardness, and electrophilicity index are calculated as follows: μ=
(εHOMO + εLUMO) 2
(2)
μ2 2η
(3)
2.3. Model Generation and Validation. In this work, multiple linear regression (MLR) and support vector machine (SVM) were applied to generate the prediction models. As a relatively simple approach, MLR attempts to model the linear relationship between explanatory variables and a response variable by fitting a linear equation. The general form is given as follows:20 Y = A 0 + A1x1 + A 2 x 2 + ... + A nxn
(4)
SVM is a novel statistical machine learning method based on an algorithm developed by Vapnik et al.21 Originally, this method was only used for classification problems.22−25 Recently, it has been extended to solve regression problems and has shown great performance in QSPR studies due to its ease in handling complex nonlinear problems.26−29 Model validation is essential for successful application and interpretation of QSPR models.10 Generally, the developed models are evaluated in three terms: goodness-of-fit, internal robustness, and external predictive ability. The goodness-of-fit is evaluated by the coefficient of determination (R2), average absolute error (AAE), and rootmean-square error (RMSE). AAE and RMSE are calculated as follows:30 n
AAE =
∑i = 1 |yi ,pred − yi ,exp | (5)
n n
RMSE =
∑i = 1 (yi ,pred − yi ,exp )2 n
(6)
where yi,pred is the predicted value, yi,exp is the experimental value, and n is the number of chemicals in the data set. The internal robustness is achieved by the leave-one-out (LOO) cross-validation test on the training set. The outcome from such a test is a cross-validated correlation coefficient, Q2LOO, which is calculated according to the following formula:30 2 Q LOO =1−
training
(yi − yi ̂ )2
training
(yi − y ̅ )2
∑i = 1 ∑i = 1
(7)
where yi, ŷi, and y ̅ are the measured, predicted, and mean experimental values of the training set, respectively. In order to evaluate the predictive ability of the generated models, it is essential to compare the predicted and measured values of a sufficiently large external test set which are not used during the model development. The predictive power of a QSPR model can be judged by an external Q2ext defined as follows:30,31
(1) B
DOI: 10.1021/acs.iecr.6b04347 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research test
2 Q ext
=1−
∑i = 1 (yi − yi ̂ )2 test
∑i = 1 (yi − ytr̅ )2
selected as c = 1024.0, g = 0.0078125, and p = 1.0 by LOO cross validation, and the number of support vectors is 47. The predicted lg(MIE) values are plotted against the experimental data in Figure 2. As shown from Figure 2, the
(8)
where yi, ŷ are the measured and predicted values of the test set, respectively, and yt̅ r is the mean experimental values of the training set.
3. RESULTS AND DISCUSSION 3.1. Multiple Linear Regression. The SPSS 19 program package was employed to perform the MLR analysis.32 The optimal linear equation obtained between the denary logarithms of MIE values and molecular descriptors through this analysis is shown as follows: lg(MIE) = 0.590 + 0.010MW − 0.345nO − 0.221nN − 0.362nS − 0.198nh + 0.158SP + 0.097DM + 1.810εHOMO − 10.772ω
(9)
This equation is composed of nine descriptors: molecular weight (MW), number of oxygen atoms (nO), number of nitrogen atoms (nN), number of sulfur atoms (nS), number of halogen atoms (nh), structure parameter (SP), dipole moment (DM), energy of the highest occupied molecular orbital (εHOMO), and electrophilicity index (ω). The predicted lg(MIE) values are plotted against the experimental data in Figure 1. As shown from Figure 1, the
Figure 2. Predicted versus experimental values (SVM).
predicted values are very close to the experimental data. The results of this model also present a great fitness with the experimental data (R = 0.928, R2 = 0.861, AAE = 0.136, and RMSE = 0.180 for the training set and R = 0.915, R2 = 0.837, AAE = 0.138, and RMSE = 0.165 for the test set), an adequate robustness (Q2LOO = 0.690) and a good prediction power (Q2ext = 0.833). The residuals of the predicted values are plotted against the experimental data in Figures 3 and 4 for both models. As most
Figure 1. Predicted versus experimental values (MLR).
predicted values fit the experimental data very well with R = 0.921, R2 = 0.848, AAE = 0.147, and RMSE = 0.187 for the training set and AAE = 0.161 and RMSE = 0.181 for the test set. This model also presents an excellent internal robustness (Q2LOO = 0.757) and a pretty good predictive ability (Q2ext = 0.799), while the basic criteria of a successful QSPR model is R2 > 0.6, Q2LOO> 0.5.33 3.2. Support Vector Machine. The SVM analysis was performed by LIBSVM 3.21 program package.34 In order to compare the performance of the two models, the training set, test set, and molecular descriptors of this model are exactly the same as the previous one. The denary logarithms of measured MIE values and the values of descriptors were scaled in the range of [0, 1] before modeling, and the RBF kernel function was adopted. The optimal parameters of this model were
Figure 3. Residuals versus experimental values (MLR).
of the residuals are distributed on both sides of the zero line, it can be verified that there is no systematic error during the development of the models. It should be noted that all parameters are very similar for the training set and the test set, which suggests that these two models also have excellent generalization performances. Both models are qualified and can be used to predict MIE values of other hydrocarbon fuels by theoretical descriptors calculated solely from their molecular structures. The MLR model is easier to develop, more convenient to apply, and C
DOI: 10.1021/acs.iecr.6b04347 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
support and computing time. The authors also appreciated the support from the Dale F. Janes Endowed Professorship at Oklahoma State University.
■
(1) Baati, N. Predictive Models for Thermal Stability and Explosive Properties of Chemicals from Molecular Structure. Ph.D. Thesis, École Polytechnique Fédérale De Lausanne, 2016. (2) Lewis, B.; von Elbe, G. Combustion. In Flames and Explosions of Gases, 3rd ed.; Academic Press, Inc.: Orlando, FL, 1987. (3) Maas, U.; Warnatz, J. Ignition Process in Hydrogen-oxygen Mixtures. Combust. Flame 1988, 74 (1), 53−69. (4) Bane, S. P. M.; Ziegler, J. L.; Boettcher, P. A.; Coronel, S. A.; Shepherd, J. E. Experimental Investigation of Spark Ignition Energy in Kerosene, Hexane, and Hydrogen. J. Loss Prev. Process Ind. 2013, 26 (2), 290−294. (5) Kurdyumov, V.; Blasco, J.; Sánchez, A. L.; Linan, A. On the Calculation of the Minimum Ignition Energy. Combust. Flame 2004, 136, 394−397. (6) Frendi, A.; Sibulkin, M. Dependence of Minimum Ignition Energy on Ignition Parameters. Combust. Sci. Technol. 1990, 73 (1−3), 395−413. (7) ASTM E582-07. Standard Test Method for Minimum Ignition Energy and Quenching Distance in Gaseous Mixtures; American Society for Testing and Materials International (ASTM): West Conshohocken, PA, 2007. (8) Wang, Q.; Wang, J.; Larranaga, M. D. Simple Relationship for Predicting Onset Temperatures of Nitro Compounds in Thermal Explosions. J. Therm. Anal. Calorim. 2013, 111 (2), 1033−1037. (9) Wang, Q.; Rogers, W. J.; Mannan, M. S. Thermal Risk Assessment and Rankings for Reaction Hazards in Process Safety. J. Therm. Anal. Calorim. 2009, 98, 225−233. (10) Tropsha, A.; Gramatica, P.; Gombar, V. K. The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models. QSAR Comb. Sci. 2003, 22 (1), 69−77. (11) Calcote, H. F.; Gregory, C. A.; Barnett, C. M.; Gilmer, R. B. Spark Ignition Effect of Molecular Structure. Ind. Eng. Chem. 1952, 44 (11), 2656−2662. (12) The PubChem Project. https://pubchem.ncbi.nlm.nih.gov/ #opennewwindow (accessed December 2016). (13) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Keith, T.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J. M.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, O.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J. Gaussian; Gaussian, Inc., Wallingford, CT, 2010. (14) Wang, Q.; Zhang, Y.; Rogers, W. J.; Mannan, M. S. Molecular Simulation Studies on Chemical Reactivity of Methylcyclopentadiene. J. Hazard. Mater. 2009, 165 (1−3), 141−147. (15) Wang, Q.; Ng, D.; Mannan, M. S. Study on the Reaction Mechanism and Kinetics of the Thermal Decomposition of Nitroethane. Ind. Eng. Chem. Res. 2009, 48 (18), 8745−8751. (16) Becke, A. D. Density-functional Thermochemistry. III. The Role of Exact Exchange. J. Chem. Phys. 1993, 98, 5648−5652. (17) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti Correlation-energy Formula into a Functional of the Electron Density. Phys. Rev. B: Condens. Matter Mater. Phys. 1988, 37, 785−789.
Figure 4. Residuals versus experimental values (SVM).
presents a better internal robustness, while the SVM model has a better goodness-of-fit and external predictive power. A suitable model should be selected according to the practical situation.
4. CONCLUSIONS In the present work, two excellent QSPR models were developed for the prediction of MIE through MLR and SVM methods based on the existing experimental data. Both models were validated to have an excellent goodness-of-fit, internal robustness, and external predictive ability. The MLR model is more convenient to apply with a much better internal robustness. As a black-box model, the SVM model has some drawbacks in interpreting ability, but its goodness-of-fit and external predictive ability are much better. Both models can be used to predict the MIE of new chemicals for which experimental data are unknown. Moreover, MIE data generated from the validated QSPR models could provide necessary guidance for developing safety standards and practical process designs.
■
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.6b04347. Experimental and predicted MIE values and list of molecular descriptors (Table S1). Example of the calculation of molecular descriptors. (PDF)
■
REFERENCES
AUTHOR INFORMATION
Corresponding Authors
*E-mail:
[email protected] (K. Xu). *E-mail:
[email protected] (Q. Wang). ORCID
Qingsheng Wang: 0000-0002-6411-984X Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This research was financially supported by China Scholarship Council (CSC). The authors thank the High Performance Computing Center at Oklahoma State University for software D
DOI: 10.1021/acs.iecr.6b04347 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research (18) Hehre, W. J.; Ditchfield, R.; Pople, J. A. Self-consistent Molecular Orbital Methods. XII. Further Extensions of Gaussian-type Basis Sets for Use in Molecular Orbital Studies of Organic Molecules. J. Chem. Phys. 1972, 56, 2257−2261. (19) Krishnan, R.; Binkley, J. S.; Seeger, R.; Pople, J. A. Selfconsistent Molecular Orbital Methods. XX. A Basis Set for Correlated Wave Functions. J. Chem. Phys. 1980, 72, 650−654. (20) Lu, Y.; Ng, D.; Mannan, M. S. Prediction of the Reactivity Hazards for Organic Peroxides Using the QSPR Approach. Ind. Eng. Chem. Res. 2011, 50 (3), 1515−1522. (21) Vapnik, V. The Nature of Statistical Learning Theory, 2nd ed.; Springer, 2013. (22) Belousov, A. I.; Verzakov, S. A.; von Frese, J. A Flexible Classification Approach with Optimal Generalisation Performance: Support Vector Machines. Chemom. Intell. Lab. Syst. 2002, 64 (1), 15− 25. (23) Burbidge, R.; Trotter, M.; Buxton, B.; Holden, S. Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis. Comput. Chem. 2001, 26 (1), 5−14. (24) Suykens, J. A. K.; Vandewalle, J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999, 9 (3), 293−300. (25) Niazi, A.; Ghasemi, J.; Zendehdel, M. Simultaneous Voltammetric Determination of Morphine and Noscapine by Adsorptive Differential Pulse Stripping Method and Least-squares Support Vector Machines. Talanta 2007, 74 (2), 247−254. (26) de Cerqueira Lima, P.; Golbraikh, A.; Oloff, S.; Xiao, Y.; Tropsha, A. Combinatorial QSAR Modeling of P-Glycoprotein Substrates. J. Chem. Inf. Model. 2006, 46 (3), 1245−1254. (27) Fatemi, M. H.; Gharaghani, S. A Novel QSAR Model for Prediction of Apoptosis-inducing Activity of 4-aryl-4-H-chromenes Based on Support Vector Machine. Bioorg. Med. Chem. 2007, 15 (24), 7746−7754. (28) Fatemi, M. H.; Gharaghani, S.; Mohammadkhani, S.; Rezaie, Z. Prediction of Selectivity Coefficients of Univalent Anions for Anionselective Electrode Using Support Vector Machine. Electrochim. Acta 2008, 53 (12), 4276−4282. (29) Niazi, A.; Jameh-Bozorghi, S.; Nori-Shargh, D. Prediction of Toxicity of Nitrobenzenes Using Ab Initio and Least Squares Support Vector Machines. J. Hazard. Mater. 2008, 151 (2−3), 603−609. (30) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Cui, Y. A Novel QSPR Model for Prediction of Lower Flammability Limits of Organic Compounds Based on Support Vector Machine. J. Hazard. Mater. 2009, 168 (2−3), 962−969. (31) Wang, B.; Yi, H.; Xu, K.; Wang, Q. Prediction of the Selfaccelerating Decomposition Temperature of Organic Peroxides using QSPR Models. J. Therm. Anal. Calorim. 2016, DOI: 10.1007/s10973016-5922-8, [Online early access]. (32) Nie, N. H.; Hull, C. H.; Jenkins, J. G.; Steinbrenner, K.; Bent, D. H. SPSS: Statistical Package for the Social Sciences, 2nd ed.; McGrawHill: New York, 1975. (33) Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graphics Modell. 2002, 20 (4), 269−276. (34) Chang, C. C.; Lin, C. J. LIBSVM: a Library for Support Vector Machines. ACM T. Int. Syst. Tech. 2011, 2 (3), 1−27 Software available at https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
E
DOI: 10.1021/acs.iecr.6b04347 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX