Prediction of Flash Points for Fuel Mixtures Using Machine Learning

Jun 5, 2013 - IFP Energies nouvelles, 1-4 avenue de Bois-Préau, 92852 ... rule based approach on new experimental data of surrogate jet and diesel fu...
0 downloads 0 Views 834KB Size
Article pubs.acs.org/EF

Prediction of Flash Points for Fuel Mixtures Using Machine Learning and a Novel Equation Diego Alonso Saldana,† Laurie Starck,† Pascal Mougin,† Bernard Rousseau,‡ and Benoit Creton*,† †

IFP Energies nouvelles, 1-4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France Laboratoire de Chimie-Physique, Université Paris-Sud, UMR 8000 CNRS, 91405 Orsay, France



S Supporting Information *

ABSTRACT: In this work, a set of computationally efficient, yet accurate, methods to predict flash points of fuel mixtures based solely on their chemical structures and mole fractions was developed. Two approaches were tested using data obtained from the existing literature: (1) machine learning directly applied to mixture flash point data (the mixture QSPR approach) using additive descriptors and (2) machine learning applied to pure compound properties (the QSPR approach) in combination with Le Chatelier rule based calculations. It was found that the second method performs better than the first with the available databank and for the target application. We proposed a novel equation, and we evaluated the performance of the resulting, fully predictive, Le Chatelier rule based approach on new experimental data of surrogate jet and diesel fuels, yielding excellent results. We predicted the variation in flash point of diesel−gasoline blends with increasing proportions of gasoline.

1. INTRODUCTION Alternative fuels represent a promising solution to social issues such as the increase of energy demand, the sustainability of current petroleum sources, and the reduction of greenhouse gas emissions. The deployment of biofuel is currently in progress, and the introduction of compounds originating from renewable sources, such as normal and iso-paraffins, naphthenic and aromatic compounds, normal and iso-olefins, alcohols, and/or esters, in fuels still requires a large amount of R&D work.1 For instance, the prediction of a substance’s physicochemical properties is a key task in several fields and applications, as they drive, among other things, the conditions for storage, transportation, and combustion quality. From process design to drug discovery applications, from pure compounds to mixtures, being able to anticipate the behavior of a substance can be a significant design advantage. In recent years, we have devoted large efforts to develop accurate, yet fast, machine learning (ML) based models, also known as Quantitative Structure−Property Relationships (QSPR), to predict a number of physical properties of alternative fuels.1−5 The properties of interest are those that take part in fuel specifications, such as those encountered in the ASTM6,7 and the European Norm:8,9 the Flash Point (FP),2 the Cetane Number (CN),2,3 the energetic content that can be assumed as the net heat of combustion (ΔcH),4 the melting point (Tm),4 and temperature dependent properties such as density (ρ(T)),5 and viscosity (η(T)).5 Recently, this promising approach to assist the formulation of alternative fuels has been taken up and completed by Dahmen et al., who have considered some ethers in the pool of renewable molecules.10 However, alternative fuels are blends containing a large number of compounds, and it is necessary to extend these QSPRs to mixtures in order to be usable during alternative fuel formulations. There is a wide spectrum of approaches that could be taken and have been previously applied in order to achieve this goal. We may classify these methods into two large © XXXX American Chemical Society

categories: empirical and theoretical methods. Empirical methods require no a priori knowledge of the property’s behavior with respect to its parameters, and methods based on ML fall in this category. One of the advantages of these methods is that they can be tailored in order to fit any available experimental data. However, care must be taken not to overfit such models to the data, which is why proper validation methods are necessary. Theoretical methods attempt to express the property as a function of the number of known parameters that is given by an equation that can be derived from previously known relations. Since these methods are based on physical grounds, they provide a degree of theoretical reliability. However, they often require knowledge of certain physical parameters that are characteristic of each compound or mixture, such as reaction enthalpies, activity coefficients, etc. In this paper, we mainly focus our efforts on the prediction of FP for complex mixtures such as alternative fuels, which represents a significant extension of our previous work on FP for neat compounds.2 The flash point is defined as the lowest temperature at which the vapor phase of a substance is at a sufficient concentration in the air such that it is able to ignite and produce a flash when exposed to an energy source, such as a spark. This is important for safety considerations, particularly during fuel storage, since there is a risk that the fuel might unexpectedly ignite if the flash point is low enough and an energy source is present. In the case of jet A1 fuels, the FP is required to be of at least 311 K,6 while for diesel fuels it is 328 K.7 Empirical methods have been developed for the prediction of mixture FPs, although these are not based on ML principles. Most of these methods involve some form of vapor−liquid equilibrium calculation, while avoiding the need for experimental data of physical properties as much as possible. A Received: March 27, 2013 Revised: June 5, 2013

A

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

McLaren,17 Liaw et al.,18−22 and Gmehling and Rasmussen.24 In total, the modeling databank comprises 25 mixtures, 21 of which are binary and four of which are ternary; 21 pure compounds, six of which are hydrocarbons and 15 of which are oxygenates; and a total of 287 data points. These data points were both used to train models for the prediction of the flash point of mixtures and to evaluate the performance of the FP models. As an additional form of validation, some FP measurements were carried out for surrogate jet and diesel fuels. The ASTM D3828 test method by a small scale closed cup tester was followed for the FP measurements.32 The molecules used for the different mixtures are neat at ca. 99%. Notice that esters come from a real blend of fatty acid esters from vegetable oils, mainly rapeseed. The detailed compositions of surrogate fuels widely inspired by some are provide in the literature33,34 and are given in the Supporting Information. Developed models will be evaluated on their ability to predict FP for these complex mixtures as well as on Carareto’s very recently published data.27 Data points used to train machine learning models for pure compound FP (FPi), enthalpy of vaporization (ΔHvap), critical temperature (TC), critical pressure (PC), and boiling point (Tb) were extracted exclusively from the DIPPR database.37 The database used to train machine learning models for FPi in our previous work2 has been extended by considering ketones, ethers, and carboxylic acids. 2.2. Descriptor Calculation. As described in our previous studies, Simplified Molecular Input Line Entry Specification (SMILES) formulas are assigned to each mixture component. In order to obtain a unique SMILES formula and thus to avoid duplicate molecules, SMILES formulas were then canonized using Pybel, the Open Babel module for the Python programming language.35 Three-dimensional structures were then created, and geometries were optimized using the COMPASS (Condensed-Phase Optimized Molecular Potentials for Atomistic Simulation Studies) force field and Gasteiger charges within the Accelrys Materials Studio 5.0s Forcite module.36 The first, second, and third principal axes of each structure were aligned to the x, y, and z axes of the Cartesian coordinate system, respectively. Then, Molecular Descriptor (MD) values were calculated using Materials Studio. Functional Group Count Descriptors (FGCDs) were calculated using the same SMARTS databank used in our previous papers2,4,5 as well as the SMARTS patterns used for the calculation of UNIFAC groups. Since previously described MD and FGCD are pure compound descriptors, it is necessary to extend their information to mixtures. There are three main approaches: (1) additive descriptors, which consist of calculating weighted sums of the pure components in the mixture; (2) nonadditive descriptors, which use equations that do not follow a purely additive scheme; and (3) fragment based descriptors, including schemes such as Simplex Representation of Molecular Structure (SiRMS) descriptors.14 While this latter approach was not used in this study, the first was followed for MD, and the two first approaches were tested for FGCD. Additive mixture descriptors were calculated using a linear combination of pure component descriptor values weighted by the mole fractions xi, also known as Kay’s mixing rule:38

notable example is that of Catoire et al., who have developed an equation based on pure compound predictions and involving the vaporization enthalpy (ΔHvap), the boiling point (Tb), and the number of carbon atoms11 and later extended this equation to mixtures.12,13 The application of ML methods to the prediction of mixture properties based on the structures of their components is a subject of growing interest.14 Such methods are frequently referred to as Mixture-QSPR (MQSPR). To our knowledge, no MQSPR model devoted to the prediction of the FP for mixtures has been reported in the literature. A number of studies on the prediction of FP for mixtures has been published using theoretical methods.17−24 To the best of our knowledge, all theoretical methods in the literature are based on some form of the Le Chatelier rule for flammability limits. The rule itself started as an empirical method used by Le Chatelier to study firedamp flammability.23 The relation was more recently proved to be theoretically valid by Mashuga and Crowl,25 provided that a number of conditions are met, namely, that the product heat capacities as well as the number of moles of gas are constant, that the combustion kinetics remain unchanged in the mixture compared to the pure components, and that the adiabatic temperature increases at the flammability limit is the same for all species. Of particular notability are the efforts of Liaw et al.,18−22 in which they explored the different aspects and possible improvements to the Le Chatelier rule. They have studied mixtures exhibiting minimum or maximum FPs,18,19 experimented with the prediction of FPs of miscible and partially miscible mixtures,20,21 and analyzed different methods to predict activity coefficients,22 among other things. In all of these studies, the Antoine equation is used to predict vapor pressures. More recently, the same authors have used the UNIFAC-Dortmund 93 as their method of choice for the prediction of activity coefficients.22 Nevertheless, Liaw et al. do not seem to take the influence of temperature on flammability limits into account, as it has been explored by Gmehling and Rasmussen.24 The possibility of using ML methods to predict model input parameters has not been discussed so far. Group contribution methods have been proposed as an alternative to experimental pure component flash points.26 Moreover, most methods rely on the Antoine equation to predict vapor pressures, which requires the knowledge of the equation’s parameters for each mixture’s component. Few studies on the applicability of mixture FP prediction methods to fuels have been published. Recently, Carareto et al. examined the possibility of using the Le Chatelier rule along with predicted pure component FPs and activity coefficients calculated using the NRTL equation in order to predict FPs of biodiesel−ethanol blends.27 The aim of this study is to compare two main approaches to the problem of predicting FP for complex miscible mixtures: (i) the development of MQSPRs and (ii) feeding Liaw’s equation using QSPR model predictions for pure component FPs, vapor pressures, and activity coefficients. In addition, we compare the influence of vapor pressure predictions performed using the Lee−Kesler28 equation and the Clausius−Clapeyron29 relation, as well as activity coefficient predictions performed using UNIFAC Lyngby30 and the Original UNIFAC 201231 on the Le Chatelier rule’s predictive performance.

M

dmix =

∑ xidi

(1)

i=1

where the sum runs over M, the number of components in the mixture; dmix and di denote the mixture descriptor value and the descriptor value for component i, respectively. Quadratic mixture descriptors were calculated using a mixing rule inspired by the Euclidian distance: M

dmix =

∑ xidi 2 (2)

i=1

2. MATERIALS AND METHODS

Logarithmic mixture descriptors were calculated using a mixing rule inspired by the Grunberg−Nissan equation:39

2.1. Experimental Databank. Experimental data on mixture FPs are particularly scarce. The main data sources used for databank construction include the papers of Catoire et al.,11−13 Affens and

M

dmix = exp(∑i = 1 xi ln(di + 1)) B

(3)

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

Since FGCD values can be, and often are, equal to zero, it was decided to increase them by one in every case so they have a defined logarithm value. In this study, eqs 1, 2, and 3 were used to calculate mixture FGCD descriptors, whereas only eq 1 was used to calculate mixture MD descriptors, since eqs 2 and 3 are not guaranteed to have a defined value in the latter case. In this way, the mixture descriptors described in this paper are the result of the application of certain mixing rules to pure compound descriptors. 2.3. Machine Learning Methods. In the present study, two machine learning approaches have been used to map descriptor values to FP values of mixtures: Genetic Function Approximation (GFA) and Support Vector Machines (SVM). The GFA approach is a method developed by Rogers and Hopfinger,40 which leads to the development of multilinear QSPR. This method consists of an evolutionary algorithm in which each individual is a set of selected descriptors di, used in a linear equation having the following form: N

ŷ =

∑ αidi + b i=1

(4)

where the sum runs over N, which is the number of descriptors selected, αi are coefficients associated with each descriptor, and b is the intercept. The αi and b coefficients are optimized to fit the data, and the Friedman Lack-of-Fit (LOF) function41 was chosen to evaluate the individual’s fitness. After several generations involving selection, crossover, and mutation operations, the optimal set of equations is retained. Throughout our previous studies,1−5 Support Vector Machines (SVM) have constantly appeared among the best ML methods that have been tested for the prediction of pure compound properties. For this reason, SVM were chosen as the ML method used to calculate all pure compound properties in this work. The predictive relation for a radial basis function kernel SVM regression is42

Figure 1. Data point percentage distribution histogram of flash point values in the total (databank), training, and test data sets.

As with pure compound models, both an external test set (30% of the total data points) and a form of internal validation were used. However, since obtaining LCO validation subsets is particularly difficult because of the complexity of the data separation process, a Bootstrap LCO (BLCO) sampling with replacement was chosen over a cross-validation type sampling. The internal validation protocol that we propose, which uses only the training set, can thus be summarized as follows: 1. Randomly exclude compounds from the training set such that one-third of the data points are excluded and assigned to an internal test set, and the remaining two-thirds of the training set are kept as an internal training set. Internal subset sizes should be chosen so as to ensure that there are enough data in both internal subsets in order to obtain a good model and an internal test set that is as representative as possible. 2. Train the SVM model normally using the internal training set only. 3. Evaluate the error of the resulting model using an appropriate error function, such as the Average Absolute Error (AAE), Average Absolute Relative Error (AARE), Root Mean Square Error (RMSE), coefficient of determination (R2), and the Concordance Correlation Coefficient (CCC). Recently, Chirico and Gramatica have shown that the use of this latter coefficient is advocated considering various scenarios such as location shifts, scale shifts, and location plus scale shifts.15,16 The error of the resulting model should be calculated on the internal test set and stored in a list. Repeat from step 1 ten times. 4. Calculate the average internal test error. This serves as an empirical estimate of the predictive error that would be obtained when predicting FPs of mixtures containing compounds outside of those found in the internal training set. This four-step protocol is used both to optimize the set of mixture descriptors, using the forward selection method, as well as to optimize SVM regression parameters during the development of SVM models for the FP of mixtures. 2.5. Flash Point Estimation Using Vapor Liquid Equilibrium. 2.5.1. The Le Chatelier Rule. The empirical rule expressed in eq 6 was initially developed by Le Chatelier during his work on firedamp flammability at the mines.23

N

ŷ =

∑ αiK (x, x i) + b i=1

(5)

where the sum runs over N (the number of samples in the training set), αi is the Lagrange multiplier, the kernel function K is the radial basis function applied to the input vector x and the training set vector xi, and b is the intercept. Samples belonging to the training set and having nonzero Lagrange multipliers assigned to their kernel function values after model optimization are called support vectors. More details on SVM regression and the sequential minimal optimization algorithm used to find the optimal set of Lagrange multipliers can be found elsewhere.43 The SVM were also used to select optimal sets of mixture descriptors; for this purpose, a forward selection protocol was applied as described in our previous works.2,4,5 2.4. External and Internal Validation Protocols. For the specific case of MQSPR, Muratov et al. have shown that the “compounds out” approach should be preferred over the “mixtures out” and “points out” approaches when the aim is to create a model capable of predicting mixtures having new compounds.14 For this reason, the Leave Compounds Out (LCO) validation protocol was used to separate the data set into two subsets: a training set and a test set. The test set is chosen by excluding a number of neat compounds, which means that these selected compounds do not appear in any of the mixtures in the training set. The model is then trained normally, by optimizing model parameters using the training set, and an estimate of the predictive error can be obtained by comparing the predictions and experimental values for compounds belonging to the test set. The test set error is, therefore, a predictive error calculated over compounds that are new to the model. Figure 1 shows the distribution of data percentages in each data set for different FP intervals. The relatively small size of the databank, combined with the need for a LeaveCompounds-Out (LCO) type validation render the data set sampling task more difficult, which is reflected in the lack of similarity between the distributions of the training and test sets.

N

∑ i=1

C

yi LFLi

=1 (6) dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

where the sum runs over the N components of the mixture, yi is the vapor-air mixture’s mole fraction of component i, and LFLi is the lower flammability limit of substance i. When this relation holds true, the substance has reached its lower flammability limit. According to the principles of vapor−liquid equilibrium, yi is given by the following expression:

yi =

xiγi(T ,

x)Piσ(T )φi σ

PφiV (T , P , yi )



xiγi(T ,

ω=

E+

(7)

where xi is the substance’s liquid phase mole fraction of component i, P is the atmospheric pressure, ϕiσ is the fugacity coefficient of component i in solution at saturation, ϕiV is the fugacity coefficient of the vapor phase of component i, and Piσ(T) and γi(T,x) are the vapor pressure and the activity coefficient of component i at the temperature T, respectively. The second part of the equality in eq 7 holds true at low pressures, since the vapor phase behaves ideally and fugacity coefficients at saturation are close to unity. By definition, LFLi, the lower flammability limit of compound i is given by

(8)

N i=1

xiγi(FP, x)Piσ(FP) Piσ(FP) i

=1 (9)

It is interesting to note that this equation is only valid when LFLi does not vary with T. This is a reasonable approximation, as pointed out by Gmehling and Rasmussen.24 However, the authors, while acknowledging that the temperature has negligible influence on LFLi for moderate temperature ranges, took the influence of T over LFLi into account by using a correlation developed by Zabetakis,44 since they were using LFLi values corresponding to a fixed temperature of 298 K. For a given mixture and its set of mole fractions xi, and knowing for each component i, Piσ(T), γi(T,x), and FPi, it is possible to estimate the FP of the mixture by using an iterative method running over T, such as the Newton−Raphson method.45 The three following subsections are dedicated to existing and used methods for the evaluation of Piσ(T), γi(T,x), and FPi, for neat compounds. 2.5.2. Vapor Pressures. Two types of empirical models were evaluated on their ability to predict the vapor pressure: (i) the Lee− Kesler equation,28 in which Pσ(T) is expressed as P σ(T ) = exp(A + BTC/T + C ln( + ω[E + F

⎛ T ⎞6 T ) + D⎜ ⎟ TC ⎝ TC ⎠

⎛ T ⎞6 TC T + G ln( ) + H ⎜ ⎟ ]) TC T ⎝ TC ⎠

Tb TC

6

(11)

(12)

where P is the atmospheric pressure and Tb is the normal boiling point. This relation was tested by using experimental and pseudoexperimental ΔHvap and Tb values found in the DIPPR database and by predicting ΔHvap and Tb values using specifically trained SVM models, as described hereafter. 2.5.3. Activity Coefficients. The calculation of activity coefficients γi(T,x) was performed both using the UNIFAC method with the Lyngby30 model and set of parameters and using the original model together with a recently reoptimized set of parameters: Original UNIFAC 2012.31 The Lyngby model is a modified version of the Original UNIFAC 197547 model that changes the influence of the temperature on the group interaction parameters by using a more complex equation involving three parameters. Details on the calculation of Original UNIFAC and UNIFAC Lyngby activity coefficients can be found elsewhere.30,31 2.5.4. Pure Compound Property Predictions. In this work, the pure compound properties (FPi, TC, PC, and Tb as well as ΔHvap) were modeled throughout SVM models, and models were trained using FGCD descriptors according to the procedure described in our previous papers.2,4,5 However, since the mixture FP databank contains a large number of ketones, it was decided to expand the previous pure compound FP databanks2 by including data for this latter family of compounds. Numbers of data points used for each property in the training, test, and total sets are shown in Table 2. Relevant sets of descriptors were selected using the forward selection procedure, and each SVM model was trained using these sets of descriptors. Table 2 also presents the predictive ability of each model, i.e. the test set coefficients of determination (R2) of their respective SVM model.

By substituting eqs 7 and 8 into 6, Liaw developed the following equation,18 which should hold true when the temperature, T, equals the flash point of the mixture:



Tb TC

Tb TC

⎛ ΔH vap ⎛ 1 1 ⎞⎞ P σ(T ) = P exp⎜⎜ ⎜ − ⎟⎟⎟ T ⎠⎠ ⎝ R ⎝ Tb

σ Piσ(FP) i φi

P σ(FP) i = i LFLi = V P Pφi (FP,i P , yi )

Tb TC

where PC is the critical pressure, the coefficients A to H are the same as in eq 10, and Tb is the normal boiling point. The Lee−Kesler method was tested with values of TC, Tb, and PC predicted using specifically trained SVM models, as described hereafter. However, the Lee−Kesler equation is generally considered to predict Pσ(T) values of hydrocarbons accurately and those of oxygenates less accurately, as coefficients A to H have been regressed only considering hydrocarbons.46 (ii) In the Clausius−Clapeyron relation,29 assuming that the enthalpy of vaporization ΔHvap does not vary with T, the liquid phase molar volume can be neglected in comparison to the vapor phase molar volume, and the vapor phase behaves ideally, the vapor pressure Pσ(T) is expressed as

x)Piσ(T ) P

b

T F TC b

6

( ) − D( ) + G ln( ) + H( ) T

− ln(PC) − A − B TC − C ln

3. RESULTS AND DISCUSSION The workflow was implemented fully in Python35 using the NumPy48 library for numerical calculations, the OpenBabel49 library for molecular identification (using the canonical SMILES50 functionality), and functional group matching

(10)

where the values of coefficients A to H can be found in Table 1, TC is the critical temperature, the acentric factor, ω, is expressed as

Table 1. Parameter Values of the Lee−Kesler Equation parameter

coefficient

A B C D E F G H

5.927140 −6.909648 −1.288620 0.169347 15.251800 −15.687500 −13.472100 0.435770

Table 2. Number of Compounds in Each Training and Test Sets and the Performance of Pure Compound Property SVM-FGCD Models Calculated on the Test Sets

D

property

training

test

total

R2

FP TC PC Tb ΔHvap

571 724 724 724 726

245 311 311 311 311

816 1035 1035 1035 1037

0.914 0.808 0.945 0.939 0.879

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

has been shown to lead up to the instability of the SVM classifier, and overfitting occurs as the number of feature dimensions is higher than the size of the training set.55 The two GFA-based models exhibit better performances as compared to that of SVM-based models. However, while the GFA-MD model shows promising performance regarding external validation, with a test CCC of 0.892, the test CCC value for the GFA-FGCD model falls to 0.435. This shows that the combination of a simple multilinear model with MD descriptors can outperform the other alternatives, including the nonlinear SVM model with MD descriptors, in a case where the training sample size is small. The GFA-MD multilinear equation is expressed as follows: FP = 10.6809514D1 + 5.51404545D2 − 146.547333D3 + 162.141056D4 + 251.333042 Figure 2. Schematic representation of the computational workflow used in this work.

(13)

where descriptors Di are presented in Table 4. Figure 3 shows the regression scatter plot which indicates a reasonable agreement between experimental and predicted values.

(using the SMARTS51 functionality). In addition, a relational database was implemented using PostgreSQL52 and interfaced with Python using the Psycopg2 module.53 Figure 2 shows a schematic representation of database integration, the development of predictive models, and the evaluation of models, summarizing the work performed in this study. Two approaches were considered to predict the flash point of mixtures: the first model is based on the use of QSPR methods to directly predict FP for mixtures, MQSPR, and the second labeled “hybrid predictive model” (HPM) consists of using QSPR predictions as inputs of eq 9, which is based on thermodynamics considerations. 3.1. MQSPR Model Evaluation. MQSPR models were built using the GFA and SVM regression algorithms previously described in this paper. Mixture descriptors and regression parameters were selected according to each algorithm’s optimization protocol. Table 3 shows the performances of the MQSPR models built using GFA and SVM algorithms in combination with MD and FGCD mixture descriptors. Table 3 presents some negative R2 values which are indicative of models having poor predictive performances.54 The SVM-MD model performs very poorly on the training set and fully fails in predicting FP of mixtures in the test set. SVM-FGCD model predictions for mixtures belonging to the training set are in agreement with experimental data (CCC = 0.899), but SVMFGCD seems to be subject to overfitting as predicted FP of mixtures in the test set deviate widely from experimental data. The failure of the SVM algorithm in the regression may be explained by the small number of samples in the database which

Table 4. List of MD Mixture Descriptors Used in eq 13 symbol

mixing rule

descriptor name

D1 D2 D3 D4

additive additive additive additive

Mean Polarizability (VAMP Electrostatics) E-state Keys Sums s_soh_ (Fast Descriptors) FPSA1 (Jurs Descriptors) RPSA (Jurs Descriptors)

Figure 3. Regression scatter plot for the final MQSPR (GFA-MD) model for compounds belonging to the training and test sets.

Table 3. Performance Comparison of MQSPR Models for Mixture FP on the Training and Test Sets training set

test set

MD 2

R CCC RMSE (K) AAE (K) AARE (%)a bias (K) a

FGCD

MD

FGCD

GFA

SVM

GFA

SVM

GFA

SVM

GFA

SVM

0.935 0.966 5.1 3.0 1.1 0

0.408 0.527 15.3 5.7 1.8 −3.6

0.984 0.992 2.5 1.6 0.6 0

0.806 0.899 8.8 3.5 1.2 1.1

0.795 0.892 13.3 10.1 3.2 −8.1

−2.147 −0.160 52.1 39.3 11.9 −36.6

−0.723 0.435 38.6 26.8 8.6 −23.9

−1.994 −0.164 50.9 37.0 11.1 −31.3

AARE are calculated on the basis of FP values expressed in K. E

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

experimental Tb and ΔHvap values, (2) a model using Piσ(T) values calculated using the Clausius−Clapeyron relation supplied with estimations of ΔHvap and Tb using SVM-FGCD models, which will be labeled predictive Clausius−Clapeyron, and (3) a model using Piσ(T) values calculated using the Lee− Kesler equation supplied with estimations of TC, PC, and Tb using SVM-FGCD models, which will be labeled predictive Lee−Kesler. The results presented in Table 6 show that all models have nearly equal performance over the mixture database, with both predictive models exhibiting predictive capabilities similar to that of the Clausius−Clapeyron relation supplied with experimental Tb and ΔHvap values. Moreover, the predictive Clausius−Clapeyron model performs slightly better than the predictive Lee−Kesler model. Figure 4 shows a comparison between Root Mean Square Logarithmic Error (RMSLE) values calculated by comparing

Table 5. Performance Comparison of FP Prediction Methods When Using UNIFAC Lyngby and Original 2012 as Activity Coefficient Predictive Models UNIFAC Original 2012

UNIFAC Lyngby

0.996 0.998 1.7 1.1 0.4 −0.4

0.980 0.990 3.8 2.4 0.8 −2.2

R2 CCC RMSE (K) AAE (K) AARE (%)a bias (K) a

AARE are calculated on the basis of FP values expressed in K.

3.2. Hybrid Predictive Model Evaluation. In the following subsections, we present a discretized evaluation of the so-called hybrid predictive model. This evaluation which can be seen as a sensitivity analysis of the HPM begins using the maximum experimental property values to supply eq 9 and ends with a fully predictive model for FP of mixtures. It is important to mention that no data set splitting (separation in training and test subsets) is needed to externally evaluate the hybrid model since it is not optimized using the databank’s mixture FP values. Finally, as experimental vapor pressure values are lacking for some temperatures and since the Lee− Kesler equation’s parameters were regressed using only hydrocarbons, 28 the Clausius−Clapeyron relation using DIPPR Tb and ΔHvap experimental values was in a first attempt chosen as the benchmark vapor pressure calculation method. Influence of Activity Coefficient Predictions. Table 5 shows a comparison between the performance of (1) a model using γi(T,x) given by the Original UNIFAC with 2012 parameters and (2) a model using γi(T,x) given by UNIFAC Lyngby. As mentioned above, the Clausius−Clapeyron relation was used to estimate Piσ(T) for both of these models, and experimental FP was used for neat compounds. The results show that, although both models exhibit a good performance, as reflected in their low Average Absolute Errors (AAE), the Original UNIFAC 2012 model has a better overall performance. A tendency toward underestimation on the part of both models is also apparent, as evidenced by the negative bias obtained in both cases but less marked in the case of the Original UNIFAC 2012. Influence of Vapor Pressure Predictions. From conclusions drawn in the previous subsection, the Original UNIFAC 2012 method was used to estimate γi(T,x) values for all neat compounds. Table 6 shows a comparison between the performance of three models: (1) a model using Piσ(T) values calculated using the Clausius−Clapeyron relation supplied with

Figure 4. Comparison of RMSLE values for Pσ(FPi) predictions for oxygenated compounds from databank Clausius−Clapeyron (A), predictive Clausius−Clapeyron (B), and predictive Lee−Kesler models (C).

Piσ(FPi) values of oxygenated compounds at their FPi obtained from the DIPPR database and predicted using (1) the Clausius−Clapeyron relation using databank ΔHvap and Tb values, (2) the predictive Clausius−Clapeyron model, and (3) the predictive Lee−Kesler model. Figure 4 shows that for oxygenated compounds both predictive models lead to less accurate results as compared to those obtained with the Clausius−Clapeyron relation using databank ΔHvap and Tb values. Furthermore, Figure 4 surprisingly shows that both predictive models have nearly equal RMSLE values for oxygenated compounds, with the predictive Clausius−Clapeyron model performing slightly better than the predictive Lee− Kesler model on oxygenated compounds. There seems to be little agreement between the models, which seems to indicate that the error introduced by vapor pressure deviations is small. However, there seems to be very good agreement regarding Piσ(FP)/Piσ(FPi) ratios (see Table S2 and Figure S1), which are the terms that truly influence the performance of the eq 9. By substituting eq 12 into 9, the HPM’s equation becomes

Table 6. Performance Comparison of FP Prediction Methods When Using Databank Clausius−Clapeyron, Predictive Clausius−Clapeyron and Predictive Lee−Kesler to Determine Vapor Pressures Clausius− Clapeyron R2 CCC RMSE (K) AAE (K) AARE (%)a bias (K) a

0.996 0.998 1.7 1.1 0.4 −0.4

predictive Clausius− Clapeyron 0.995 0.997 1.9 1.2 0.4 −0.4

predictive Lee− Kesler 0.994 0.997 2.1 1.3 0.5 0.1

AARE are calculated on the basis of FP values expressed in K. F

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

⎛ ΔH

N

∑ xiγi(FP, x) exp⎜⎜ ⎝

i=1

1 1 ⎞⎞ − ⎜ ⎟⎟⎟ = 1 FP ⎠⎠ ⎝ FPi

vap, i ⎛

R

model, and FPi values predicted using the SVM-FGCD model. The model’s performance is nearly identical to that of the third (14)

showing that, when predicting the ratio of two vapor pressure values under the assumptions of the Clausius−Clapeyron relation used in this paper, although the predicted Piσ(T) values are affected by ΔHvap, Tb, T, and P, as shown in eq 12, only ΔHvap and the two final temperature (in this case FP and FPi) values affect the predicted ratio. Therefore, we propose eq 14 as an alternative to eq 9 proposed by Liaw et al. for the prediction of mixture FPs. Table 7. Performance Comparison of FP Prediction Methods When Using Mixture Databank, DIPPR, and SVMFGCD Predicted FPi Values mixture databank

DIPPR

SVM-FGCD

0.996 0.998 1.7 1.1 0.4 −0.4

0.989 0.994 2.9 2.1 0.8 0.2

0.975 0.987 4.3 3.4 1.2 −1.2

R2 CCC RMSE (K) AAE (K) AARE (%)a bias (K) a

Figure 5. Regression scatter plot for the final fully predictive model (HPM) on the main modeling databank.

model in Table 7, which confirms that FPi prediction is the main factor influencing the quality of a fully predictive model using the Le Chatelier rule. Although the model still has a bias of −1.2 K, the model seems to predict mixture FP values very accurately, with an R2 value of 0.974 and an AAE of 3.4 K over the entire databank. FP experimental reproducibility can range from ca. 1 K for FPs of 293 K to ca. 12 K for FPs of 533 K; furthermore, FP measurement reproducibility for FAMEs is reported as ca. 15 K.57 As shown in Figure 5, the regression scatter plot shows very good agreement between experimental and predicted values. An analysis of the deviations confirms that the FPi prediction is the most important stage of this prediction method, since the highest deviation corresponds to pure 1phenylethan-1-one, which has an experimental FP of 356.5 K and a predicted FP of 340.4 K. Furthermore, the four highest deviations correspond to either 1-phenylethan-1-one or mixtures containing this compound. Minimum and Maximum FP Behaviors. Figures 6 and 7 show predictions carried out using the HPM for ethanol−

AARE are calculated on the basis of FP values expressed in K.

Influence of Pure Compound Flash Points. Table 7 shows a comparison between the performance of (1) a model using FPi values found in the mixture FP databank, (2) a model using FPi values found in the DIPPR database, and (3) a model using FPi values estimated using the SVM model. The HPM as presented in eq 14 was used and the Original UNIFAC 2012 method was used to estimate γi(T,x) values for all of these models. Results indicate that the prediction of FP for mixtures through eq 14 is mostly sensitive to FP values of pure components in the mixture. Thus, it shows that using different sources for FPi values results in a certain amount of deviation. These differences in experimental values may be due to differences in FP measurement methods and experimental error, as Liaw et al. pointed out previously.19 The fully predictive model (using SVM prediction to estimate the FP of pure components), as might be expected from a model using predicted FPi’s, has the highest deviation. However, these deviations are comparable to those of pure compound FP models56 and not very far from the deviations of the first model using as much input experimental data as possible. 3.3. Use of the HPM for Biofuel Blends. Table 8 shows the performance of the fully predictive model using γi(T,x) values computed using the Original UNIFAC 2012 model, Piσ(T) calculated using the predictive Clausius−Clapeyron Table 8. Performance Evaluation of the Final Fully Predictive (HPM) Model on the Modeling Databank HPM model 2

R CCC RMSE (K) AAE (K) AARE (%)a bias (K) a

0.974 0.986 4.4 3.4 1.2 −1.2

Figure 6. Prediction of flash point variation for octane−ethanol mixtures, where x1 is the ethanol mole fraction.

AARE are calculated on the basis of FP values expressed in K. G

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

Figure 7. Prediction of flash point variations for cyclohexanol−phenol mixtures, where x1 is the phenol mole fraction.

Figure 9. Prediction of flash point variations in a palm oil ethyl biodiesel (POEBD)−ethanol mixture, where x1 is the ethanol mole fraction.

octane and cyclohexanol−phenol binary mixtures. Although deviations due to using predicted FPi’s can be observed, the results show that the model successfully predicts minimum and maximum FP behaviors for these two binary mixtures. One can remark that the use of experimental FPi’s leads to an excellent agreement between experimental data and predictions. Additionally, deviations are not far from the deviations commonly observed in pure compound FP models. Surrogate Fuel Predictions. Table 9 shows the experimental values resulting from the measurements that were performed on surrogate jet and diesel fuels as well as predictions made using the HPM of our fully predictive FP model. The detailed composition of each surrogate fuel is given in the Supporting Information. The results as well as the regression plot are shown in Figure 8 and show excellent agreement between the fully predictive HPM model and experimental values. For this data subset, the calculated R2 is 0.985 and the AAE is 2.0 K.

Table 9. Predictions of the Final Fully Predictive Model on Surrogate Jet and Diesel Fuelsa flash point mixture

inspired from ref

exptl. (K)

pred. (K)

Jet-1 Jet-2 Jet-3 Jet-4 Jet-5 Jet-6 Jet-7 Jet-8 Jet-9 Diesel-1 Diesel-2 Diesel-3 Diesel-4 Diesel-5 Diesel-6

33 33 33 33 33 33 34 34 34 34 34 34 34 34 34

326.0 322.5 294.5 313.0 312.5 304.5 338.5 330.0 345.0 337.0 346.5 329.0 358.5 350.5 345.5

323.0 319.4 290.0 311.8 309.7 300.4 337.6 328.8 346.6 336.9 348.2 328.6 359.5 356.5 347.3

a

See Supporting Information for the detailed composition of each mixture.

100% of POEBD, with a reported experimental FP of 421.8 K and a predicted FP of 385.4 K. There is also, however, significant disagreement between several of the reported experimental pure component FP values and those found in other sources. For example, ethyl octadecanoate has a reported FP of 464 K, while Sigma-Aldrich reports58 386 K and LookChem59 reports 434 K. Similarly, ethyl tetradecanoate has a reported FP of 424 K, while Sigma-Aldrich reports 386 K and LookChem reports 408 K. Figure 9 shows that the mixture quickly approaches the flash point of ethanol, which is much lower than that of POEBD. It has previously been shown that within a complex blend of fuel compounds, it is observed that the FP of the mixture is in first approximation governed by that of the molecule exhibiting the smallest FP value.2,60 Prediction of Flash Points of a Hypothetical DieselGasoline Mixture. Figure 10 shows the variation in predicted FP of a diesel-gasoline mixture as the volume proportion of gasoline increases. Volume calculations were made by using a pure compound ρ(T) model developed previously5 and by assuming that pure compound molar volumes remain

Figure 8. Regression scatter plot for the final fully predictive model (HPM) for surrogate jet and diesel fuels. Values are given in Table 9.

Prediction of Flash Points for a Palm Oil Ethyl Biodiesel− Ethanol Mixture. Figure 9 shows the experimental and predicted FP for a palm oil ethyl biodiesel (POEBD)−ethanol mixture with increasing amounts of ethanol as reported by Carareto et al.27 There is satisfactory agreement between all points except in the case of the first, which corresponds to H

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

using experimental FPi data, with an R2 of 0.996 for the experimental data model and 0.975 for the fully predictive model. The model was further validated using new experimental data for surrogate jet and diesel fuels, and the predictions were in excellent agreement with the data. Additionally, we predicted the variation in FP of palm oil ethyl biodiesel−ethanol and diesel−gasoline blends with increasing proportions of ethanol and gasoline, respectively, and found that the FP of these mixtures quickly approaches that of the fuel having the lowest FP. This shows the potential of the presented methods as a tool for the formulation and blending of existing fuels and, potentially, for the development of entirely new fuels.



ASSOCIATED CONTENT

S Supporting Information *

Figure 10. Predicted flash point variations for diesel−gasoline mixtures, where v is the volumetric proportion of gasoline in the mixture.

Results of the MQSPR mixture FP models on the test set, the SVM-FGCD pure compound FP model, and a detailed description of the compositions of fuels that appear in Table 9 as well as in Figures 8 and 10. This material is available free of charge via the Internet at http://pubs.acs.org.

unchanged in the mixture. The surrogate diesel fuel chosen for this study is Diesel-1 (see the Supporting Information), and the surrogate gasoline fuel composition is the following (in mole fraction): n-pentane (0.36), 2,2,4-trimethylpentane (0.46), and n-undecane (0.18).34 According to the HPM model, a behavior similar to that in Figure 9 is observed, where the FP of the mixture quickly approaches that of the fuel having the lowest FP, in this case gasoline. There is no minimum or maximum FP behavior in this mixture, and the minimum FP required by European Norm Diesel specifications (328 K) is attained at a volumetric proportion of 0.01 gasoline.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Saldana, D. A.; Creton, B.; Mougin, P.; Jeuland, N.; Rousseau, B.; Starck, L. Rational Formulation of Alternative Fuels Using QSPR Methods: Application to Jet Fuels. Oil Gas Sci. Technol.Rev. IFP Energies Nouv. 2013, Accepted. DOI: 10.2516/ogst/2012034. (2) Saldana, D. A.; Starck, L.; Mougin, P.; Rousseau, B.; Pidol, L.; Jeuland, N.; Creton, B. Flash Point and Cetane Number Predictions for Fuel Compounds Using QSPR Methods. Energy Fuels 2011, 25, 3900−3908. (3) Creton, B.; Dartiguelongue, C.; de Bruin, T.; Toulhoat, H. Prediction of the Cetane Number of Diesel Compounds Using the Quantitative Structure Property Relationship. Energy Fuels 2010, 24, 5396−5403. (4) Saldana, D. A.; Starck, L.; Mougin, P.; Rousseau, B.; Creton, B. On the Rational Formulation of Alternative Fuels: Melting Point and Net Heat of Combustion Predictions for Fuel Compounds Using Machine Learning Methods. SAR QSAR Environ. Res. 2013, 24, 525− 543. (5) Saldana, D. A.; Starck, L.; Mougin, P.; Rousseau, B.; Creton, B. Prediction of Density and Viscosity of Biofuel Compounds Using Machine Learning Methods. Energy Fuels 2012, 26, 2416−2426. (6) Standard Specification for Aviation Turbine Fuels, ASTM D165-07; American Society for Testing and Materials: West Conshohocken, PA, 2007. (7) Standard Specification for Diesel Fuel Oils, ASTM D975; American Society for Testing and Materials: West Conshohocken, PA, 2011. (8) Standard Specification on the Quality of European Diesel Fuel, EN590; European Standards Organization, CEN: Brussels, Belgium, 2009. (9) Standard Specification on the Quality of European Gasoline Fuel, EN-228; European Standards Organization, CEN: Brussels, Belgium, 2004. (10) Dahmen, M.; Hechinger, M.; Victoria, J.; Marquardt, W. Towards Model-Based Identification of Biofuels for Compression Ignition Engines. SAE Int. J. Fuels Lubr. 2012, 5, 990−1003. (11) Catoire, L.; Naudet, V. A Unique Equation to Estimate Flash Points of Selected Pure Liquids Application to the Correction of Probably Erroneous Flash Point Values. J. Phys. Chem. Ref. Data 2004, 33, 1083−1111.

4. CONCLUSIONS In this work, we evaluated a number of methods to predict Flash Points (FP) of mixtures. The results indicated that the machine learning approach directly applied to mixture data, as implemented here, did not yield models with sufficient predictive performance. This may be due to the small size of the databank used to build the models but also to the need to develop models that not only interpolate FPs between mixtures of the same compounds but also predict FPs of mixtures containing unknown compounds. Additional work is still necessary, especially about the development of descriptors capable to correlate mixture FP. The results obtained by using Le Chatelier rule based models were far more accurate, and we tested a number of different methods to predict the physical parameters that are needed by this type of model, specifically: vapor pressure, Pσ(T), and activity coefficients, γi(T,x). We also tested the influence of these parameters, as well as that of predicted pure compound Flash Points (FPi) on the quality of the predictions. For Pσ(T), we found that a predictive Clausius−Clapeyron relation, using SVM-FGCD models to predict enthalpies of vaporization and boiling points performed slightly better than a predictive Lee−Kesler equation, using SVM-FGCD models to predict critical values and boiling points. As for γi(T,x), we found that using the Original UNIFAC with 2012 parameters yielded better results than UNIFAC Lyngby. By combining FPi predictions obtained using an SVM-FGCD model, as well as Pσ(T) predictions using a predictive Clausius−Clapeyron model and γi(T,x) predictions using the Original UNIFAC 2012, we obtained a fully predictive model with a predictive performance comparable to that of models I

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels

Article

behaviour. 19th International Congress of Chemical and Process Engineering CHISA; Prague, Czech Republic, 2010. (35) Van Rossum, G. Python Tutorial; Centrum voor Wiskunde en Informatica (CWI): Amsterdam, 1995. (36) Materials Studio, version 5.0; Accelrys Software Inc.: San Diego, CA, 2009. (37) Rowley, R. L.; Wilding, W. V.; Oscarson, J. L.; Yang, Y.; Zundel, N. A.; Daubert, T. E.; Danner, R. P. DIPPR Data Compilation of Pure Compound Properties. http://dippr.byu.edu/public/chemsearch.asp. (38) Kay, W. B. Density of Hydrocarbon Gases at High Temperature and Pressure. Ind. Eng. Chem. 1936, 28, 1014−1019. (39) Grunberg, L.; Nissan, A. H. Mixture Law for Viscosity. Nature 1949, 164, 799−800. (40) Rogers, D.; Hopfinger, A. J. Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Comput. Sci. 1994, 34, 854−866. (41) Friedman, J. H. Multivariate Adaptive Regression Splines, Technical Report No. 102; Laboratory for Computational Statistics, Department of Statistics, Stanford University: Stanford, CA, November 1988, rev. August 1990). (42) Smola, A.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput. 2004, 14, 199−222. (43) Platt, J. C. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in kernel methods Support Vector Learning; The MIT Press: Cambridge, MA, 1998 (44) Zabetakis, M. G. Flammability characteristics of combustible gases and vapors (Bulletin 267); U. S. Bureau of Mines: Washington, DC, 1965. (45) Raphson, J. Analysis Aequationum Universalis; University of Cambridge: Cambridge, England, 1690. (46) Lee, B. I.; Kesler, M. G. A Generalized Thermodynamic Correlation Based on Three-Parameter Corresponding States. AlChE J. 1976, 21, 510−527. (47) Fredenslund, A.; Jones, R. L.; Prausnitz, J. M. GroupContribution Estimation of Activity Coefficients in Nonideal Mixtures. AlChE J. 1975, 21, 1086−1099. (48) Jones, E.; Oliphant, T.; Pearu P. SciPy: Open Source Scientifics Tools for Python. http://www.scipy.org/. (49) The Open Babel Package, version 2.3.5. http://openbabel.org/ wiki/Main_Page (accessed December 2012). (50) Weininger, D. J. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. Chem. Inf. Comput. Sci. 1988, 28, 31−36. (51) SMARTS - A Language for Describing Molecular Patterns; Daylight Chemical Information Systems Inc.: Laguna Niguel, CA. (52) PostgreSQL Global Development Group, PostgreSQL. http:// www.postgresql.org. (53) Federico Di Gregorio, Psycopg2 Package Documentation. http//initd.org/psycopg/. (54) Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graphics Modell. 2002, 20, 269−276. (55) Dacheng, T.; Xiaoou, T.; Xuelong, L.; Xindong, W. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. Pattern Anal. Machine Intell., IEEE Trans. 2006, 28, 1088−1099. (56) Rowley, J. R.; Rowley, R. L.; Wilding, W. V. Estimation of the Flash Point of Pure Organic Chemicals from Structural Contributions. Process Saf. Prog. 2010, 29, 353−358. (57) Standard Test Method for Flash Point, ASTM D3828; American Society for Testing and Materials: West Conshohocken, PA, 2009. (58) Sigma-Aldrich. http://www.sigmaaldrich.com/france.html. (59) LookChem. http://www.lookchem.com/. (60) Pidol, L.; Lecointe, B.; Jeuland, N. Ethanol as a Diesel Base Fuel: Managing the Flash Point Issue - Consequences on Engine Behavior. SAE Int. 2009, 2009-01-1807.

(12) Catoire, L.; Paulmier, S. Estimation of Closed Cup Flash Points of Combustible Solvent Blends. J. Phys. Chem. Ref. Data 2006, 35, 9− 14. (13) Catoire, L.; Paulmier, S.; Naudet, V. Experimental Determination and Estimation of Closed Cup Flash Points of Combustible Solvent Blends. Process Saf. Prog. 2006, 25, 33−39. (14) Muratov, E. N.; Varlamova, E. V.; Artemenko, A. G.; Polischchuk, P. G.; Kuz’min, V. E. Existing and Developing Approaches for QSAR Analysis of Mixtures. Mol. Inf. 2012, 31, 202−221. (15) Chirico, N.; Gramatica, P. Real external predictivity of QSAR models: How to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J. Chem. Inf. Model. 2011, 51, 2320−2335. (16) Chirico, N.; Gramatica, P. Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J. Chem. Inf. Model. 2012, 52, 2044−2058. (17) Affens, W. A.; McLaren, G .W. Flammability Properties of Hydrocarbon Solutions in Air. J. Chem. Eng. Data 1972, 17, 482−488. (18) Liaw, H. J.; Lee, T. P.; Tsai, J. S.; Hsiao, W. H.; Chen, M. H.; Hsu, T. T. Binary Liquid Solutions Exhibiting Minimum Flash−Point Behavior. J. Loss Prev. Process. 2003, 16, 173−186. (19) Liaw, H. J.; Lin, S. C. Binary Mixtures Exhibiting Maximum Flash−Point Behavior. J. Hazard. Mater. 2007, 140, 155−163. (20) Liaw, H. J.; Chiu, Y. Y. A General Model for Predicting the Flash Point of Miscible Mixtures. J. Hazard. Mater. 2006, A137, 38−46. (21) Liaw, H. J.; Chen, C. T.; Gerbaud, V. Flash-Point Prediction for Binary Partially Miscible Aqueous−Organic Mixtures. Chem. Eng. Sci. 2008, 63, 4543−4554. (22) Liaw, H. J.; Gerbaud, V.; Li, Y. H. Prediction of Miscible Mixtures Flash-Point From UNIFAC Group Contribution Methods. Fluid Phase Equilib. 2011, 300, 70−82. (23) Le Chatelier, H. Estimation of Firedamp by Flammability Limits. Ann. Mines 1891, 19, 388−395. (24) Gmehling, J.; Rasmussen, P. Flash Points of Flammable Liquid Mixtures Using UNIFAC. Ind. Eng. Chem. Fundam. 1982, 21, 186− 188. (25) Mashuga, C. V.; Crowl, D. A. Derivation of Le Chatelier’s Mixing Rule for Flammable Limits. Process Saf. Prog. 2000, 19, 112− 117. (26) Vidal, M.; Rogers, W. J.; Mannan, M. S. Prediction of Minimum Flash Point Behavior for Binary Mixtures. Process Saf. Environ. 2006, 84, 1−9. (27) Carareto, N. D. D.; Kimura, C. Y. C. S.; Oliveira, E. C.; Costa, M. C.; Meirelles, A. J. A. Flash Points of Mixtures Containing Ethyl Esters or Ethylic Biodiesel and Ethanol. Fuel 2012, 96, 319−326. (28) Lee, B. I.; Kesler, M. G. A Generalized Thermodynamic Correlation Based on Three-Parameter Corresponding States. AIChE J. 1975, 21, 510−527. (29) Clausius, R. Ueber die bewegende Kraft der Wärme und die Gesetze, welche sich daraus für die Wärmelehre selbst ableiten lassen. Ann. Phys. 1850, 155, 500−524. (30) Larsen, B. L.; Rasmussen, P.; Frendenslund, A. A Modified UNIFAC Group-Contribution Model for Prediction of Phase Equilibria and Heats of Mixing. Ind. Eng. Chem. Res. 1987, 26, 2274−2286. (31) 10th Common Meeting of UNIFAC Consortium and DDBST GmbH; Oldenburg, Germany, September 18th, 2012; The UNIFAC Consortium: Oldenburg, Germany, 2012. (32) Standard Test Methods for Flash Point by Small Scale Closed Cup, ASTM D3828; American Society for Testing and Materials: West Conshohocken, PA, 2005. (33) Eddings, E. G.; Yan, S.; Ciro, W.; Sarofim, A. F. Formulation of a Surrogate for the Simulation of Jet Fuel Pool Fires. Combust. Sci. Technol. 2005, 177, 715−739. (34) Di Lella, A.; de Hemptinne, J.-C.; Bruneaux, G. Use of a predictive equation of state to investigate biofuel vaporisation J

dx.doi.org/10.1021/ef4005362 | Energy Fuels XXXX, XXX, XXX−XXX