Energy Fuels 2010, 24, 5396–5403 Published on Web 09/22/2010
: DOI:10.1021/ef1008456
Prediction of the Cetane Number of Diesel Compounds Using the Quantitative Structure Property Relationship Benoit Creton,*,† Cyril Dartiguelongue,‡ Theodorus de Bruin,† and Herve Toulhoat§ † IFP Energies nouvelles, Direction Chimie et Physico-Chimie Appliqu ees, 1 et 4 avenue de Bois Pr eau, 92852 Rueil-Malmaison, France, ‡IFP Energies nouvelles, Direction Physique et Analyse, Rond-point de l’ echangeur de Solaize, BP3, 69360 Solaize, France, eau, 92852 Rueil-Malmaison, France and §IFP Energies nouvelles, Direction Scientifique, 1 et 4 avenue de Bois Pr
Received July 2, 2010. Revised Manuscript Received September 7, 2010
In the present work, a quantitative structure property relationship (QSPR) methodology has been applied to predict the cetane number (CN) of hydrocarbons that are likely to be found in diesel fuels. A database containing 147 molecules has been set up with experimental CNs available in the literature. The prediction of the CN was improved by dividing the database into four chemical families: (i) linear (n-) and branched (iso-) paraffins, (ii) naphthenes, (iii) aromatics, and (iv) n- and iso-olefins. A genetic algorithm working on molecular descriptors was used to build specific CN models for each of these classes. The predictive models return CN values roughly in the range from 0 to 100, which is in line with the definition of CN, with average absolute deviations similar to the experimental reproducibility (3-5 points).
An empirical correlation has been developed to convert the ignition delay into a derived cetane number (DCN), which is equivalent to the CN measured by ASTM D613 on the CFR engine. The main advantage of this apparatus is the low volume of fuel needed to obtain a DCN (about 100 mL). Diesel engine technology was strongly improved during the last 10 years, with the widespread use of high-pressure direct injection and the development of new combustion processes, such as homogeneous charge compression ignition (HCCI) combustion. In this context, the understanding of fuel impact on ignition has become a major issue. It is in particular crucial to elucidate the link between the CN and the molecular structures of pure compounds that constitute diesel fuels. Indeed, engine tests performed over the last 60 years have demonstrated that the CN of molecules strongly depends upon their molecular structure, as illustrated in Figure 1 for linear and branched alkanes. Long n-paraffins exhibit very high CNs (e.g., n-nonadecane = 110), while monoaromatic hydrocarbons tend to have extremely low CNs (e.g., p-xylene = -13). While many models have been designed for the prediction of diesel fuel CN, only few models dedicated to pure hydrocarbons have been reported in the literature.3-6 Yang et al. have developed a neural network model based on a limited number of branched paraffins with experimentally known CNs,3 and results are compared to data computed using a group additivity model proposed by DeFries et al. in the 1980s.7 In comparison to the equation by Defries et al., the
1. Introduction The cetane number (CN) is one of the most stringent specifications for diesel fuels. It is used to quantify the combustion quality of middle distillates in diesel engines by measuring the self-ignition delay, defined as the elapsed time between the injection of the fuel and its combustion. CN measurements consist of running the fuel in a singlecylinder cooperative fuel research (CFR) engine, as specified in American Society for Testing and Materials (ASTM) D613.1 Two standard compounds are used to define the CN scale: cetane (n-hexadecane) and isocetane (2,2,4,4,6,8,8-heptamethylnonane, also called HMN), which are fixed to 100 and 15, respectively. The CN of a fuel is defined as the volumetric percentage of cetane in the blend with HMN, which exhibits the same ignition delay as the fuel, under the specified test conditions. For example, a 1:1 volumetric mixture of cetane and isocetane has a CN of 57.5. The determination of CN is widely used to check diesel fuel specifications but also in the petroleum industry to evaluate the quality of refinery streams. However, this technique is known to present important drawbacks, because the CN measurement according to ASTM D613 is time-consuming and requires a large volume of sample (about 1 L). Moreover, the reproducibility of the engine test is very weak (3-5 points of CN according to the cetane range). In recent years, alternative engine tests have been developed to overcome the main problems raised by the CFR engine. The ignition quality tester (IQT) operated according to ASTM D6890 measures the fuel ignition delay in a constant volume combustion chamber.2
(3) Yang, H.; Fairbridge, C.; Ring, Z. Pet. Sci. Technol. 2001, 19, 573. (4) Smolenskii, E. A.; Bavykin, V. M.; Ryshov, A. N.; Slovokhotova, O. L.; Chuvaeva, I. V.; Lapidus, A. L. Russ. Chem. Bull. 2008, 57, 461. (5) Lapidus, A. L.; Smolenskii, E. A.; Bavykin, V. M.; Myshenkova, T. N.; Kondrat’ev, L. T. Pet. Chem. 2008, 48, 277. (6) Taylor, J. D.; McCormick, R. L.; Clark, W. Report on the Relationship between Molecular Structure and Compression Ignition Fuels, Both Conventional and HCCI; National Renewable Energy Laboratory (NREL): Golden, CO, 2004; SR-540-36726. (7) DeFries, T. H.; Indritz, D.; Kastrup, R. V. Ind. Eng. Chem. Res. 1987, 26, 188.
*To whom correspondence should be addressed. E-mail: benoit.
[email protected]. (1) American Society for Testing and Materials (ASTM). ASTM D613, Standard Test Method for Cetane Number of Diesel Fuel Oil; ASTM: West Conshohocken, PA, 2010. (2) American Society for Testing and Materials (ASTM). ASTM D6890, Standard Test Method for Determination of Ignition Delay and Derived Cetane Number (DCN) of Diesel Fuel Oils by Combustion in a Constant Volume Chamber; ASTM: West Conshohocken, PA, 2010. r 2010 American Chemical Society
5396
pubs.acs.org/EF
Energy Fuels 2010, 24, 5396–5403
: DOI:10.1021/ef1008456
Creton et al. Table 1. Experimental and Calculated CNs for n- and iso-Paraffins Belonging to the Training and Test Sets
Figure 1. Evolution of the CN of linear and branched paraffins with the number of carbon atoms. Lines denote logarithmic and linear regressions for linear and branched paraffins, respectively.
neural network model of Yang et al. leads to a significant improvement of the CN prediction (in ref 3, authors report R2 coefficients of 0.97 for their neural network model and 0.55 when using the equation by Defries et al.). Nevertheless, this approach does not describe the link between CN and molecular structures, and its validity is limited to branched paraffinic compounds. Taylor et al. have reported the development of models based on correlations between the molecular structure and CN but did not explicitly present models and predicted values.6 More recently, Smolenskii et al. have built a predictive model based on topological indices for the prediction of CNs for paraffinic and naphthenic compounds.4,5 Their model exhibits an excellent agreement with respect to experimental data, and these authors present predicted CN for some hydrocarbons. However, some of the CN values returned by this model do not seem to be realistic, e.g., 3-ethyl-3methylpentane = -213. In this work, we present correlative models constructed using a quantitative structure property relationship (QSPR) approach to estimate CNs of pure hydrocarbons, which are likely to be present in diesel fuels. The paper is organized as follows: in section 2, we detail how the predictive models of CNs of hydrocarbons are built; in section 3, the predictive models are presented and the calculated values are discussed and compared to experimental results, when available. This paper ends with section 4, which gives the conclusions. 2. Materials and Methods 2.1. Experimental Data. The size and quality of the database are keystones for the accuracy of a predictive model. Here, the only property of interest is the CN. On the basis of several sources, Murphy et al. have realized a compendium of CN for a large number of hydrocarbons and oxygenated compounds.8 It is noticed that experimental values of CN may differ following the method of measurement. The database used to build the predictive model must represent the chemistry of diesel compounds aimed by the model. Consequently, only “true” hydrocarbons (non-oxygenated compounds) and compounds for which the CN is expected to be reliable are considered among (8) Murphy, M. J.; Taylor, J. D.; McCormick, R. L. Compendium of Experimental Cetane Number Data; National Renewable Energy Laboratory (NREL): Golden, CO, 2004; SR-540-36805. (9) Santana, R. C.; Do, P. T.; Santikunaporn, M.; Alvarez, W. E.; Taylor, J. D.; Sughrue, E. L.; Resasco, D. E. Fuels 2006, 85, 643. (10) Heyne, J. S.; Boehman, A. L.; Kirby, S. Energy Fuels 2009, 23, 5879.
training set
CNexp.
CNcalc.
n-pentane n-hexane n-heptane n-octane n-nonane n-decane n-undecane n-tridecane n-tetradecane n-pentadecane n-hexadecane n-heptadecane n-octadecane n-nonadecane 2,2-dimethylbutane 3-methylpentane 2,3(R)-dimethylpentane 2,3(S)-dimethylpentane 2,2,4-trimethylpentane 2,2,5-trimethylhexane 2,2-dimethyloctane 2,2,4,6,6-pentamethylheptane (4R,5R)-diethyloctane (4R,5S)-diethyloctane (4S,5S)-diethyloctane 3-ethyldecane 2,5(R)-dimethylundecane 2,5(S)-dimethylundecane 4-propyldecane 5-butylnonane 2,7-dimethyl-(4R,5R)-diethyloctane 2,7-dimethyl-(4R,5S)-diethyloctane 2,7-dimethyl-(4S,5S)-diethyloctane 2,2,4,4,6(R),8,8-heptamethylnonane 2,2,4,4,6(S),8,8-heptamethylnonane (7R,8R)-dimethyltetradecane (7R,8S)-dimethyltetradecane (7S,8S)-dimethyltetradecane 9-methylheptadecane (7R,8R)-diethyltetradecane (7R,8S)-diethyltetradecane (7S,8S)-diethyltetradecane (9R,10R)-dimethyloctadecane (9R,10S)-dimethyloctadecane (9S,10S)-dimethyloctadecane 2,9-dimethyl-5,6-diisoamyldecane (9R,10R)-dipropyloctadecane (9R,10S)-dipropyloctadecane (9S,10S)-dipropyloctadecane (10R,13R)-dimethyldocosane (10R,13S)-dimethyldocosane (10S,13S)-dimethyldocosane 9-heptylheptadecane
30 44 55 64 72 76 81 90 95 96 100 105 106 110 24 30 21 21 15 24 59 9 20 20 20 47 58 58 39 53 39 39 39 15 15 40 40 40 66 67 67 67 59 59 59 48 47 47 47 56 56 56 87
27 42 48 59 65 74 81 91 95 98 101 101 102 101 27 36 23 22 26 32 45 18 22 24 22 61 59 57 38 55 32 30 32 7 7 48 51 48 82 59 59 58 58 58 57 54 45 47 53 57 54 55 92
test set
CNexp.
CNcalc.
n-dodecane n-eicosane 2,4-dimethylpentane 2,3(R),4,5(R),6-pentamethylheptane 2,3(R),4,5(S),6-pentamethylheptane 2,3(S),4,5(S),6-pentamethylheptane 5-butyldodecane 7-butyltridecane 8-propylpentadecane 5,6-dibutyldecane 7-hexylpentadecane
85 110 29 9 9 9 45 70 48 30 83
86 100 32 14 15 15 64 69 68 34 84
the available data in the literature.8-10 Nevertheless, because most of the data were acquired several decades ago, an uncertainty is associated with the CN measurement for many hydrocarbons. This error is mainly due to (i) the impurities in the chemical products and (ii) the way CN was measured (neat 5397
Energy Fuels 2010, 24, 5396–5403
: DOI:10.1021/ef1008456
Creton et al.
Table 2. Predicted CNs for iso-Paraffins hydrocarbons 4(R)-butyldodecane 4(S)-butyldodecane 6(R)-butyldodecane 6(S)-butyldodecane 7-propyltridecane 3-ethyltetradecane 4(R)-ethyltetradecane 4(S)-ethyltetradecane 5(R)-ethyltetradecane 5(S)-ethyltetradecane 6(R)-ethyltetradecane 6(S)-ethyltetradecane 7(R)-ethyltetradecane 7(S)-ethyltetradecane 3(R)-methylpentadecane 3(S)-methylpentadecane 5(R)-methylpentadecane 5(S)-methylpentadecane 7(R)-methylpentadecane 7(S)-methylpentadecane (3R,4R)-dimethyltetradecane (3R,4S)-dimethyltetradecane (3S,4R)-dimethyltetradecane (3S,4S)-dimethyltetradecane (5R,6R)-dimethyltetradecane (5R,6S)-dimethyltetradecane (5S,6R)-dimethyltetradecane (5S,6S)-dimethyltetradecane 2-methylnonane 3(R)-methylnonane 3(S)-methylnonane 4(R)-methylnonane 4(S)-methylnonane 5-methylnonane 2-methylpentadecane 4(R)-methylpentadecane 4(S)-methylpentadecane 6(R)-methylpentadecane 6(S)-methylpentadecane 8-methylpentadecane 2-methylheptadecane 3(R)-methylheptadecane 3(S)-methylheptadecane 4(R)-methylheptadecane 4(S)-methylheptadecane 5(R)-methylheptadecane 5(S)-methylheptadecane 6(R)-methylheptadecane 6(S)-methylheptadecane 7(R)-methylheptadecane 7(S)-methylheptadecane 8(R)-methylheptadecane 8(S)-methylheptadecane (3S,5R)-dimethylnonane (3S,5S)-dimethylnonane (3R,5S)-dimethylnonane (3R,5R)-dimethylnonane 3(R)-methylundecane 3(S)-methylundecane (3R,6R)-dimethylundecane (3S,6S)-dimethylundecane (3R,6S)-dimethylundecane (3S,6R)-dimethylundecane 3(R)-methyltetradecane 3(S)-methyltetradecane 5(R)-methyldodecane 5(S)-methyldodecane 5(R)-propyldecane 5(S)-propyldecane 3(R)-methylnonadecane 3(S)-methylnonadecane 3(R)-methyltridecane 3(S)-methyltridecane (3R,7R)-dimethyltridecane
Table 2. Continued CN
hydrocarbons
CN
58 62 61 63 64 85 73 67 72 73 74 71 71 73 87 86 76 77 75 73 61 64 58 62 51 51 48 52 68 56 54 40 43 40 100 73 72 74 75 75 101 89 87 77 77 82 77 79 78 79 80 78 79 41 35 43 37 69 70 54 50 53 48 82 84 64 60 43 43 91 87 80 78 61
(3R,7S)-dimethyltridecane (3S,7R)-dimethyltridecane (3S,7S)-dimethyltridecane (3R,8R)-dimethylpentadecane (3R,8S)-dimethylpentadecane (3S,8R)-dimethylpentadecane (3S,8S)-dimethylpentadecane (3R,9R)-dimethylheptadecane (3R,9S)-dimethylheptadecane (3S,9R)-dimethylheptadecane (3S,9S)-dimethylheptadecane (3R,10R)-dimethylnonadecane (3R,10S)-dimethylnonadecane (3S,10R)-dimethylnonadecane (3S,10S)-dimethylnonadecane (3R,5S,6R)-trimethyloctane (3R,5R)-dimethyl-6(R)-ethylnonane (3R,5R)-dimethyl-6-ethyltridecane (3R,5S)-dimethyl-6-ethyltridecane (3S,5R)-dimethyl-6-ethyltridecane (3S,5S)-dimethyl-6-ethyltridecane (3R,5S)-dimethyl-6(R)-ethylundecane (3R,5R)-dimethyl-11-ethyltridecane (3R,5S)-dimethyl-11-ethyltridecane (3S,5R)-dimethyl-11-ethyltridecane (3S,5S)-dimethyl-11-ethyltridecane (3R,5R)-dimethyl-9-ethylundecane (3R,5S)-dimethyl-9-ethylundecane (3S,5R)-dimethyl-9-ethylundecane (3S,5S)-dimethyl-9-ethylundecane ethylpentane 2,2-dimethylpentane 3-ethyl-2-methylpentane 3-ethyl-3-methylpentane 2-methyl-3-ethylhexane 2-methyl-4-ethylhexane 3-methyl-3-ethylhexane 3-methyl-4-ethylhexane 2-methyl-3-ethylheptane 2-methyl-4-ethylheptane 3-methyl-3-ethylheptane 3-methyl-4-ethylheptane 4-methyl-4-ethylheptane 4-methyl-5-ethylheptane
63 64 59 70 67 68 67 69 70 70 71 71 71 70 70 32 29 49 45 46 48 37 63 59 61 63 57 55 58 58 41 16 27 30 23 34 17 33 36 32 33 29 7 31
molecule or in mixture with a fuel). The complete database containing 147 hydrocarbons can be extracted from Tables 1-5. 2.2. From Molecular Structures to Predictive Models. 2.2.1. Molecular Geometries. Materials Studio 5.0 software was used in this work.11 Structures of hydrocarbons were drawn with the constraint that the principal hydrocarbon chain is parallel to the x axis of the Cartesian coordinates system. Molecular geometries were optimized with the Forcite module using the condensed-phase optimized molecular potentials for atomistic simulation studies (COMPASS) force field, and atomic charges were attributed using the Gasteiger method.12-14 2.2.2. Molecular Descriptors. A wide number of molecular descriptors was computed on the basis of optimized geometries. The Materials Studio software enables the calculation of 1D descriptors, deriving from the empirical formula (counting descriptors, molecular mass, etc.), 2D descriptors, which can be obtained from the structural formula (atoms connectivity, number of functional groups, etc.), and 3D descriptors, which need a spatial representation of the molecule to be computed (topology index, molecular shadows, etc.). In total, ca. 150 descriptors have been calculated for the database that comprises 147 molecules. The (11) Accelrys Software, Inc. Materials Studio, Release 5.0; Accelrys Software, Inc.: San Diego, CA, 2009. (12) Sun, H. J. Phys. Chem. B 1998, 102, 7338. (13) Sun, H.; Ren, P.; Fried, J. R. Comput. Theor. Polym. Sci. 1998, 8, 229. (14) Gasteiger, J.; Marsili, M. Tetrahedron 1980, 36, 3219.
5398
Energy Fuels 2010, 24, 5396–5403
: DOI:10.1021/ef1008456
Creton et al.
Table 3. Experimental and Calculated CNs for Naphthene Compounds Belonging to the Training and Test Sets
Table 4. Experimental and Calculated CNs for Aromatic Compounds Belonging to the Training and Test Sets
training set
CNexp.
CNcalc.
training set
CNexp.
CNcalc.
cyclohexane cyclooctane ethylcyclohexane propylcyclohexane butylcyclohexane 3(R)-cyclohexylhexane 3(S)-cyclohexylhexane 1-methyl-3-dodecylcyclohexane 2(R)-cyclohexyltetradecane 2(S)-cyclohexyltetradecane 1,2,4-trimethyl-5-hexadecylcyclohexane 5(R)-cyclohexyleicosane 5(S)-cyclohexyleicosane bicyclohexyl 2-methyl-3(R)-cyclohexylnonane 2-methyl-3(S)-cyclohexylnonane trans-decalin cis-decalin propyldecalin butyldecalin tert-butyldecalin octyldecalin 4-methyl-4-decalylheptane 3(R)-methyl-3-decalylnonane 3(S)-methyl-3-decalylnonane 2-methyl-2-decalyldecane
15 22 45 52 46.5 36 36 70 57 57 42 66 66 51 63 63 32 41.6 35 31 24 31 21 18 18 37
13 26 45 44 45 39 39 67 57 64 39 63 68 45 64 62 35 40 35 32 29 37 24 15 19 30
test set
CNexp.
CNcalc.
methylcyclohexane 1,3,5-trimethylcyclohexane 2-methyl-2-cyclohexylpentadecane sec-(R)butyldecalin sec-(S)butyldecalin
21 30.5 45 34 34
35 24 54 32 31
ethylbenzene 1,3-dimethylbenzene 1,2-dimethylbenzene 1,4-dimethylbenzene isopropylbenzene sec-(R)butylbenzene sec-(S)butylbenzene 1-methyl-4-isopropylbenzene 1,2,3,4-tetramethylbenzene hexylbenzene diisopropylbenzene 2(R)-phenyloctane 2(S)-phenyloctane nonylbenzene octylxylene 2(R)-phenylundecane 2(S)-phenylundecane (Z)-2-phenyl-undec-2-ene (E)-2-phenyl-undec-2-ene 7-phenyltridecane tetradecylbenzene 2(R)-phenyltetradecane 2(S)-phenylundecane 2-methyl-2-phenylpentadecane 2-methyl-2-phenylheptadecane 1,2,4-trimethyl-5-hexahexylbenzene 5(R)-phenyleicosane 5(S)-phenyleicosane tetralin butyltetralin sec-(R)butyltetralin sec-(S)butyltetralin octyltetralin 1,4-dioctyltetralin 1-methylnaphtalene biphenyl 2,6-dimethylnaphtalene diphenylmethane 1,2-diphenylethane 2-tert-butylnaphtalene 2-methyl-2-(β-naphtyl)hexane 2-octylnaphtalene 4-methyl-4-(β-naphtyl)heptane (3R,6R)-dimethyl-3-(β-naphtyl)octane (3R,6S)-dimethyl-3-(β-naphtyl)octane (3S,6R)-dimethyl-3-(β-naphtyl)octane (3S,6S)-dimethyl-3-(β-naphtyl)octane 2-methyl-2-(β-naphtyl)decane 3-ethyl-3-(β-naphtyl)nonane
8 1 8 -13 15 6 6 2 17 26 -8 33 33 50 20 51 51 23 23 41 72 49 49 39 39 42 39 39 13 18 7 7 18 26 0 21 -13 11 1 3 10 18 9 18 18 18 18 18 13
4 -1 11 -10 6 8 11 -4 11 30 2 32 30 47 22 44 45 25 21 43 76 49 53 39 36 39 43 39 7 10 17 5 22 25 0 17 -5 10 12 7 15 7 13 15 20 16 17 20 12
test set
CNexp.
CNcalc.
tert-butylbenzene 1,3-diethylbenzene pentylbenzene heptylbenzene dodecylbenzene 4(R)-phenyldodecane 4(S)-phenyldodecane propyltetralin tert-butyltetralin 1-butylnaphtalene 5-methyl-5-(β-naphtyl)nonane
-1 9 9 35 68 42 42 8 17 6 12
11 4 21 32 66 43 39 9 14 1 15
hydrocarbons
CNcalc.
1-methyl-3-pentylcyclohexane 1-methyl-3-propylcyclohexane 1,3-dimethyl-5-(1-ethylbutyl)cyclohexane 1-methyl-3-heptylcyclohexane 1,3-dimethyl-5-(1-ethylhexyl)cyclohexane 1-methyl-3-nonylcyclohexane 1-methyl-3-undecylcyclohexane 2-methyldecalin 2-ethyl-7-butyldecalin 2-hexyldecalin 2,4-dimethyldecalin 2-decyldecalin 14-hydroanthracene 16-hydropyrene
45 38 24 52 26 55 64 34 15 35 37 39 26 20
return values have subsequently been standardized for each descriptor according to eq 1. Accordingly, the weight of each descriptor in the final predictive QSPR equation can be easily evaluated Xi ¼
Xi - ÆXæ σ
ð1Þ
where ÆXæ is the mean value of the descriptor X over the data set and σ is the standard deviation. 2.2.3. Correlation. Because the total number of molecular descriptors outweighs the number of molecules in the database, only the most relevant descriptors are retained for the building of the predictive QSPR equations. For example, descriptors that are highly correlated, i.e., hold essentially the same information, with respect to the target CN property, can be kept. Therefore, the correlation matrix was constructed. This m m matrix, with m equal to the number of descriptors plus 1 for the experimental CN, contains the correlation coefficients as defined by eq 2 CðX , YÞ rðX , YÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi ð2Þ VX VY
hydrocarbons 1-ethylnaphtalene 1,3,5-triethylbenzene 1-ethyl-3-methylnaphtalene 1,3-diethyl-5-butylbenzene 2-propylnaphtalene 1-methyl-4-pentylbenzene 2-methyl-6-propylnaphtalene 1-methyl-4-heptylbenzene
5399
CNcalc. 3 -13 -4 -2 2 8 -8 21
Energy Fuels 2010, 24, 5396–5403
: DOI:10.1021/ef1008456
Creton et al. Table 5. Experimental and Calculated CNs for Olefins Belonging to the Training and Test Sets
Table 4. Continued hydrocarbons
CNcalc.
1-ethylantracene 1methyl-4-(1-ethylpentyl)benzene 1-methyltetralin 1-ethyltetralin 5-ethyltetralin 5-methyltetralin 2-propylindane
-4 10 6 5 14 14 9
where X and Y denote descriptors or experimental values of the target property. V and C are the variance and covariance, respectively, and are detailed in eq 3, where n runs over the number of molecules of the data set. VX ¼
n n 1X 1X ðXi - ÆXæÞ and CðX , YÞ ¼ ðXi - ÆXæÞðYi - ÆYæÞ ð3Þ n i¼1 n i¼1
For two descriptors X and Y strongly correlated, r(X,Y) is close to 1. Using the so-obtained correlation matrix, only the descriptors that correlate well with CN and poorly correlate with another descriptor can be conserved. 2.2.4. Multivariate Analysis. In most cases, QSPR studies consist in determining an equation that gives a faithful property reproduction. In this work, we have chosen the target predictive model to be linear, as described by eq 4 X CNcalc: ¼ λ0 þ λi Xi ð4Þ i
where i runs over 1-8, which corresponds to the maximum number of descriptors admitted in the predictive equation and λi are weight factors. The method used to select the representative descriptors and optimize associated constants is the genetic function approximation (GFA) available in the Materials Studio software. This approach starts by establishing an initial population of equations randomly chosen. The terms of the equations (and more generally as well the powers and splines) are viewed as strings, and the population evolves through iterative operations: selection, crossover, and mutation of these strings. During the evolution process, the constructed equations are scored using a slightly modified Friedman’s lack-of-fit (LOF) method,11 which was preferred to the well-known coefficient of determination, R2 P ðCNcalc:, i - ÆCNæÞ2 i 2 R ¼ P ð5Þ ðCNexp:, i - ÆCNæÞ2
training set
CNexp.
CNcalc.
hex-1-ene hept-1-yne cis-hept-2-ene trans-hept-2-ene vinylcyclohexane oct-1-ene 2,4,4-trimethylpent-1-ene cis-oct-2-ene trans-oct-2-ene non-1-ene 2,6-dimethylhept-1-ene dec-1,9-diene dec-1-ene dodec-1-ene (Z)-2,6,7-trimethyltrideca-2,6-diene (E)-2,6,7-trimethyltrideca-2,6-diene hexadec-1-ene (Z)-4-butyldodec-4-ene (E)-4-butyldodec-4-ene 5-butyldodec-5-ene 2,2,6,6,8,8-hexamethyl-4-methylenenonane 7(R)-butyltridec-1-ene 7(S)-butyltridec-1-ene 3,12-diethyltetradec-3,11-diene (Z)-9-methylheptadec-9-ene (E)-9-methylheptadec-9-ene (Z)-(7R,10R)-methylhexadec-8-ene (Z)-(7R,10S)-methylhexadec-8-ene (Z)-(7S,10S)-methylhexadec-8-ene (E)-(7R,10R)-methylhexadec-8-ene (E)-(7R,10S)-methylhexadec-8-ene (E)-(7S,10S)-methylhexadec-8-ene octadec-1-ene cis-(10R,13R)-dimethyldoeicos-11-ene cis-(10R,13S)-dimethyldoeicos-11-ene cis-(10S,13S)-dimethyldoeicos-11-ene trans-(10R,13R)-dimethyldoeicos-11-ene trans-(10R,13S)-dimethyldoeicos-11-ene trans-(10S,13S)-dimethyldoeicos-11-ene
27 22 44 44 38 41 10 43 43 51 51 40 59 71 24 24 86 45 45 45 4.5 36 36 26 66 66 43 43 43 43 43 43 90 56 56 56 56 56 56
29 21 41 38 37 46 13 47 45 51 48 40 60 69 22 21 83 43 40 51 3 38 38 29 66 67 44 47 44 42 43 43 89 55 57 55 57 55 54
CNexp.
CNcalc.
test set hept-1-ene undec-1-ene tetradec-1-ene (Z)-8-propylpentadec-8-ene (E)-8-propylpentadec-8-ene 7-hexylpentadec-7-ene
i
32 65 79 45 45 47
37 63 76 44 43 46
of the chemistry and the range of CN in the database; (ii) the remaining molecules of the database, which make up the test set. These latter molecules are viewed as external compounds and used to test the predictive power of the equations and to select the best predictive model. The ratio 80% of molecules in the training set/20% of molecules in the test set is commonly used in QSPR studies.15,16
where i runs over compounds of the data set and CNcalc., CNexp., and ÆCNæ denote calculated, experimental, and mean of experimental CNs, respectively. Among all parameters of the revised LOF, one can note a term accounting for the number of terms in the model, which contributes to increase the LOF when the number of descriptors grows in the equation. The equations evolve until the convergence is observed; i.e., the scores are not further improved. Technically, a first set of equations was generated using all available descriptors. Then, a second set of equations was established using the 10 descriptors that are mostly correlated to the CN. The best predictive equation is the one having the lowest average absolute deviation (AAD), defined as n 1X AAD ¼ jCNexp: - CNcalc: j ð6Þ n i¼1
3. Results and Discussion As a first attempt, a single equation was set up with the aim of predicting the CN for any hydrocarbon likely to be present in a diesel fuel. Unfortunately, the predictive power of this equation was unsatisfactory (R2 equal to 0.886, crossvalidated R2 equal to 0.870, and the AAD for the training and test sets equal to 7.1 and 9.5, respectively). Therefore, the data set was divided into four different classes of compounds, each one corresponding to a specific chemical family: (i) n- and
where i runs over the n compounds of the data set and CNexp. and CNcalc. are the experimental and calculated CNs, respectively. Before carrying out calculations using the GFA approach, the total database has been split into two data subsets: (i) the training set, which represents 80% of the database, is intended to build the predictive models and is chosen to be representative
(15) Gharagheizi, F. Energy Fuels 2008, 22, 3037. (16) Patel, S. J.; Ng, D.; Mannan, M. S. Ind. Eng. Chem. Res. 2009, 48, 7378.
5400
Energy Fuels 2010, 24, 5396–5403
: DOI:10.1021/ef1008456
Creton et al.
Figure 3. Comparison between calculated (filled symbols) and experimental (empty symbols) evolutions of the CN with the number of carbon atoms for n-paraffins (circles) and n-olefins (triangles), with the double bond in position 1. Red and green denote molecules in the training and test sets, respectively.
Figure 2. Comparison between calculated and experimental values of CNs for n- and iso-paraffins. The dashed blue line denotes the ideal prediction, and the gap between blue lines is the experimental uncertainty on measurements (5 points of CNs).
iso-paraffins, (ii) naphthenes, (iii) aromatics, and (iv) n- and iso-olefins. We noticed that for S/R enantiomers (of chiral molecules) some 3D descriptors returned different values. When all enantiomers are taken into account in the training set and the CN is set equal for enantiomers, thereby canceling out the difference for these specific descriptors, the predictive models return near-identical CN for each set of enantiomers, as shown hereafter. 3.1. n- and iso-Paraffinic Compounds. As shown in Table 1, the training set consists of 36 linear and branched molecules and 9 compounds are used to test the predictive model. Several predictive models were obtained. Equation 7 has the lowest AAD for the training and test sets, 4.9 and 6.9, respectively
The same model was used to predict the CNs of compounds likely to be in diesel fuels and corresponding to the family of iso-paraffins. The predicted values are gathered in Table 2. From this table, it is seen that the predictive equation returns near-identical CNs for pairs of enantiomers, as physically expected. Furthermore, the largest CN is calculated for the 2-methylheptadecane molecule, with a value of 101. The CN falls to ca. 88 when the methyl group is on position 3 and seems to be roughly constant (ca. 78) for all other positions. This change in CN with the branching location in the hydrocarbon skeleton is also observed for methylnonane, methylpentadecane, and ethyltetradecane. This observation is in line with the poor anti-knock properties of symmetrical branched paraffins compared to their isomers, as mentioned by Puckett and co-workers.18 The model was also applied to estimate CNs of compounds for which Smolenskii et al. predicted huge negative values.4 CNs obtained with eq 7 are between 7 and 41, which is in line with expected values (see Table 2). 3.2. Naphthenic Compounds. The training set devoted to naphthenic hydrocarbons is composed of 21 compounds, and 4 molecules are set aside to test the predictive models. In the case of cis- and trans-decalin, we have used experimental values measured by Heyne and co-workers.10 Equation 8 is the model that exhibits the lowest AAD for both the training and test sets, 3.0 and 6.7, respectively
CNcalc: ¼ - 12:0X1 þ 37:1X6 - 29:1X9 þ 13:8X16 þ 16:5X19 þ 52:8 R ¼ 0:940 2
cross-validated R2 ¼ 0:926 F value ¼ 147:380
ð7Þ
where descriptors X1, X6, X9, X16, and X19 are “propyl”, “κ-3”, “principal moment of inertia Z”, “FPSA3”, and “total energy”, respectively.17 Table 1 presents a comparison between experimental and calculated CNs for n- and iso-paraffins. The calculated CN of cetane (n-hexadecane) is in excellent agreement with the experimental value. For isocetane (2,2,4,4,6,8,8heptamethylnonane) the model returns a low CN as expected (CNcalc. = 7), but the model underestimates the CN with respect to the experimental value of 15. Figure 2 shows that most predicted CNs are within the standard error of the engine test, which is commonly assumed to be 3-5 points. Figure 3 presents the evolution of CN when increasing the number of carbon atoms in n-paraffins. An excellent agreement with experimental data is found for molecules having between 5 and 18 carbon atoms. The predictive model seems however to systematically underestimate CN for molecules with more than 19 carbon atoms. Nevertheless, the impact of this limitation is theoretical, because diesel blends generally contain n-paraffins up to C20.
CNcalc: ¼ 7:1X2 - 45:9X5 þ 27:1X6 þ 4:3X10 þ 23:4X15 - 7:5X17 þ 41:4 R2 ¼ 0:947 cross-validated R2 ¼ 0:902 F value ¼ 59:090
ð8Þ
where descriptors X2, X5, X6, X10, X15, and X17 are “isopropyl”, “κ-1”, “κ-3”, “dipole moment”, “DPSA3”, and “RPCS”, respectively.17 Table 3 presents the experimental and computed CNs with eq 8 for naphthenic compounds. (18) Puckett, A. D.; Caudle, B. H. Ignition Qualities of Hydrocarbons in the Diesel Fuel Boiling Range; United States Bureau of Mines (USBM): Washington, D.C., 1948; Information Circular 7474.
(17) The reader is referred to the Supporting Information for the description and values of the molecular descriptors.
5401
Energy Fuels 2010, 24, 5396–5403
: DOI:10.1021/ef1008456
Creton et al.
Figure 4. Comparison between calculated and experimental values of CNs for naphthenic compounds. The legend is the same as in Figure 2.
Figure 5. Comparison between calculated and experimental values of CNs for aromatic compounds. The legend is the same as in Figure 2.
In agreement with the values of Heyne et al.,10 the model returns a CN for cis-decalin (40) greater than that of transdecalin (35). Very recent experimental measurements report CNs of 38.1 and 35.9 for the cis- and trans-decalin, respectively.19 The excellent agreement between these latter values and our estimations strengthens the capability of our model in predicting CN of naphtenic compounds. As shown in Figure 4, predicted CNs are in the standard error of the experimental engine test. Additionally, we can note that eq 8 correctly returns the near-identical CNs for pairs of enantiomers. 3.3. Aromatic Compounds. The predictive model was built using training and test sets containing 38 and 10 compounds, respectively. In the training set, two conformations can be considered for the 2-phenylundec-2-ene. Because no information about the conformation used during experimental measurements was available, both Z and E molecules were considered and the experimental value of 23 was attributed to both structures. Among all of the predictive equations obtained, eq 9 has the lowest AAD for the training and test sets, 3.9 and 4.4, respectively
Figure 6. Comparison between calculated and experimental values of CNs for olefins. The legend is the same as in Figure 2.
predictive models were tested for 5 molecules for which experimental CNs are known. In the database, some of the linear and branched olefinic molecules can be in the cis (Z) or trans (E) conformation. The CN of hept-2-ene was measured for the cis conformation of the molecule. For all other compounds, no information was available regarding molecular conformations considered during CN measurements. Despite the known difference that can occur between CNs of cis (Z) and trans (E) molecules (see CN of cis- and trans-decalin for instance), we have chosen to attribute the same CN value to cis (Z) and trans (E) conformations of each molecule. Among all predictive equations established, eq 10 has the lowest AAD for the training and test sets, 2.0 and 2.3, respectively
CNcalc: ¼ 78:7X3 þ 35:1X6 - 4:4X11 - 91:4X13 þ 14:1X16 - 3:1X20 þ 20:7 R ¼ 0:931 2
cross-validated R2 ¼ 0:906 F value ¼ 94:819
ð9Þ
where descriptors X3, X6, X11, X13, X16, and X20 are “atom”, “κ-3”, “dipole moment Y”, “DPSA2”, “FPSA3”, and “torsion energy”, respectively.17 In Table 4, data obtained for aromatic molecules using eq 9 are compared to experimental values when available. We notice that the model correctly reproduces the variation of the CN with respect to the positions of methyl groups in the dimethylbenzene molecules. Figure 5 shows that, for aromatic compounds, most predicted CNs are in the range of the experimental value, considering standard error. 3.4. n- and iso-Olefinic Compounds. For this family of compounds, the training set contains 23 compounds and the
CNcalc: ¼ 142:1X4 þ 25:4X6 - 16:1X7 - 5:1X8 þ 2:6X12 - 6:8X14 - 150:0X18 þ 45:4 R ¼ 0:978 2
cross-validated R2 ¼ 0:963 F value ¼ 198:485
ð10Þ
where descriptors X4, X6, X7, X8, X12, X14, and X18 are “subgraph counts”, “κ-3”, “Balaban index JX”, “E-state
(19) Experimental values measured at IFP Energies nouvelles following the ASTM D613 method.
5402
Energy Fuels 2010, 24, 5396–5403
: DOI:10.1021/ef1008456
Creton et al.
keys”, “dipole moment Z”, “FNSA1”, and “vertex adjacency/ magnitude”, respectively.17 CNs computed using eq 10 are presented in Table 5, while Figure 6 shows a comparison between calculated and experimental values. The latter reveals an excellent ability of the model to predict CN of both linear and branched alkenes. Moreover, Figure 3 presents the evolution of CN when increasing the number of carbon atoms in linear olefins, with the double bond in position one. An excellent agreement is observed with respect to experimental data. 3.5. Mathematical Selection versus Chemical Intuition. It is of course appealing to see whether a physical meaning can be ascribed to the predictive models resulting from the mathematical selection (here, the GFA approach). Equations 7-10 share the descriptor κ-3 associated with a positive coefficient. The role of this descriptor may be to account for the change of CN with the length of the hydrocarbon skeleton. In eq 7, the descriptor X19 (VAMP total energy), the only quantum descriptor selected, combined to a positive factor strengthens the correlation between the CN and the carbon number for paraffins. The use of descriptor X9 (principal moment of inertia Z) associated with a negative coefficient is mostly linked to how CN evolves as a function of the number of carbon atoms in n-paraffins (see Figure 3). The use of the descriptor X1 (propyl count) in eq 7 associated with a negative coefficient may mirror the effect of the branching on the CN value. Therefore, we indeed observe that the selected descriptors and their coefficients in the multi-linear predictive equations are consistent with expectations according to known trends relating molecular structure and CN. Notice, however, that it would be highly misleading to select descriptors on the sole basis of chemical/physical intuition. We rather recommend a procedure allowing for the selection of descriptors on the basis of mathematical criteria, among the largest initial set possible, making sure that this set includes descriptors corresponding to chemical intuition and, therefore, expected to survive the selection. This is probably why successful and statistically sound methods are proposed by mathematicians and successful correlations are obtained with these methods by users with intuitive knowledge of the field of application (e.g., chemists).
for which the CN is experimentally known, was designed mainly using the compendium of Murphy et al.8 To improve the prediction of CN, the database was divided into four classes based on the chemistry of compounds. Specific QSPR models were built for each of the following classes of molecules: (i) n- and iso-paraffins, (ii) naphthenes, (iii) aromatics, and (iv) n- and iso-olefins. The obtained computed results using the predictive models are in good agreement with respect to experimental data, and AADs are similar to the experimental reproducibility of the standard engine test. Equations 7-10 were used to estimate the CN of hydrocarbons constitutive of diesel fuels, for which, to our knowledge, no experimental CN can be found in the literature. Predicted CNs are consistent with respect to experimental CNs of comparable compounds: for example, the CN of methylnonane is smaller than that of n-decane. Moreover, all predicted CN values are roughly in the range of 0-100, which is in line with the definition of CN (mixture of cetane and HMN). This work shows that, when using a good-quality database and taking into account the chemistry of compounds, the QSPR approach is a powerful tool to estimate the CN of complex and/or difficult to isolate hydrocarbons. The CN of diesel fuels can now be estimated using the combination of (i) the detailed information provided by two- and three-dimensional gas chromatography analysis,20,21 (ii) predictive models presented in this work, and (iii) a linear22 or nonlinear23 volumetric mixing rule. This study is to be extended to other families of compounds and/or characteristic properties of fuels, e.g., oxygenate compounds, with the aim of predicting the CN of fatty acid methyl esters present in biodiesel fuels. Acknowledgment. The authors thank Melanie Loos, Nadege Charon, and Alain Quignard of IFP Energies nouvelles for the measurement of experimental CNs. Supporting Information Available: Brief description of the molecular descriptors in QSPR models and values of molecular descriptors computed for each molecule of the training and test sets. This material is available free of charge via the Internet at http://pubs.acs.org. (20) Vendeuvre, C.; Ruiz-Guerrero, M.; Bertoncini, F.; Duval, L.; Thiebaut, D.; Hennion, M. C. J. Chromatogr., A 2005, 1086, 21. (21) Adam, F.; Thiebaut, D.; Bertoncini, F.; Courtiade, M.; Hennion, M. C. J. Chromatogr., A 2010, 1217, 1386. (22) Dartiguelongue, C.; Celse, B.; Adam, F.; Courtiade, M.; Espinat, D. Proceedings of the American Institute of Chemical Engineers (AICHE) Spring National Meeting; Tampa, FL, April 26-30, 2009. (23) Ghosh, P.; Jaffe, S. B. Ind. Eng. Chem. Res. 2006, 45, 346.
4. Conclusions In this work a QSPR approach was applied to predict the CN of hydrocarbons. A database containing 147 molecules,
5403