Artificial Neural Network Based Group Contribution Method for

Sep 28, 2017 - Chemical pathways for converting biomass into fuels produce compounds for which key physical and chemical property data are unavailable...
2 downloads 12 Views 1MB Size
Subscriber access provided by UNIV OF ESSEX

Correlation

An Artificial-Neural-Network-Based Group Contribution Method for Estimating Cetane and Octane Numbers of Hydrocarbons and Oxygenated Organic Compounds William Louis Kubic, Rhodri W. Jenkins, Cameron M. Moore, TA Semelsberger, and Andrew D. Sutton Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.7b02753 • Publication Date (Web): 28 Sep 2017 Downloaded from http://pubs.acs.org on October 1, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

An Artificial-Neural-Network-Based Group Contribution Method for Estimating Cetane and Octane Numbers of Hydrocarbons and Oxygenated Organic Compounds William L. Kubic, Jr.,*† Rhodri W. Jenkins,‡ Cameron M. Moore,‡ Troy A. Semelsberger,§ and Andrew D. Sutton‡ †

Applied Energy and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.



Chemistry Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.

§

Material Physics Applications Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA

ABSTRACT. Chemical pathways for converting biomass into fuels produce compounds for which key physical and chemical property data are unavailable. We developed an artificial-neural-network-based group contribution method for estimating cetane and octane numbers that captures the complex dependence of fuel properties of pure compounds on chemical structure and is statistically superior to current methods.

1.

INTRODUCTION

ACS Paragon Plus Environment

1

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 32

The majority of advanced biofuels research is devoted to converting non-food biomass into fuels that are compatible with the existing transportation infrastructure, engine technologies, and government regulations. The scientific pursuit of low-cost next-generation biofuels has prompted interest in novel chemical pathways, which yield hydrocarbons and oxygenated organic compounds that are not found in current fuel supplies. A recent U.S. Department of Energy study1 stated that 1 billion tons of biomass are available in the U.S. for biofuel production; however, the available biomass only displaces 30% of current U.S. petroleum consumption. Therefore, research efforts need to focus on producing compounds or mixtures that improve or maintain the performance characteristics of petroleum-based fuels. Researchers typically synthesize potential advanced biofuels that contain compounds for which key physical and chemical properties have not been measured or published. The only reliable information available may be the molecular structures of the component species. Consequently, no screening metrics exist for ascertaining the viability of bio-derived fuels—aside from ASTM testing. Testing potential fuel compounds can be expensive and time-consuming; and if the chemicals are not available commercially, considerable time and effort may be needed to produce testable quantities.2 For example, ignition quality testing for determining derived cetane number typically requires 100 mL; and ASTM tests for cetane number (CN), motor octane number (MON), and research octane number (RON) each require 500 mL. Producing these quantities of biofuels may be impractical or unfeasible for the typical researcher, so a priori assumptions are often made as to what constitutes a good fuel blendstock molecule based on a qualitative assessment of chemical structure.3 The disadvantages of the qualitative approach are exacerbated when considering blends containing multiple bio-derived fuels. A reliable method for estimating the physical and chemical properties from molecular structure would help distinguish promising fuels from mediocre fuels. The time, effort, and cost to scale up production to obtain testable quantities of a mediocre fuel would be eliminated, and so these resources would be

ACS Paragon Plus Environment

2

Page 3 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

available to focus on research and development of the most promising fuels. Reliable estimation methods could be used to optimize process chemistry by tailoring the reaction selectivity, yield, and activity of a defunctionalization pathway that optimizes the targeted fuel properties. Process optimization could reduce the number of unit operations, reduce feedstock consumption, and energy consumption; all of which improve the overall process economics and reduce lifecycle greenhouse gas emissions. A quantitative structure-property relationship (QSPR) expresses a physical or chemical property as a function of descriptors derived from the molecular structure.4 QSPRs have been developed for thermodynamic and transport properties, but no widely accepted methods have been developed for estimating CN, MON, or RON of pure components. Our objective was to develop a reliable QSPR for estimating CN, RON, and MON of hydrocarbons and oxygenated organic compounds that are found in advanced biofuels. QSPR predictions for pure components could be used in conjunction with blend models5,6 to estimate CN, RON, and MON mixtures of bio-derived compounds and mixtures of bioderived compounds with current fuels.

2.

BACKGROUND

QSPRs differ in terms of descriptor type and the number of descriptors used. The simplest QSPR is the correlation of the property values for a homologous series of compounds with molecular weight.7 Other QSPRs use more complex parameters, such as topological indices, to relate properties to structure.8,9 Topological indices can vary from relatively simple measures of carbon atom connectivity10 to descriptors accounting for size, shape, and branching.9 Descriptors that characterize reactivity, shape, and binding properties have been defined with the aid of molecular modeling.9 Perhaps the most widely used QSPRs are group contribution methods. Organic molecules are divided into structural groups, which are recognizable groups of bound atoms, and properties are correlated to

ACS Paragon Plus Environment

3

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 32

the group type and number of groups. While topological indices and model-based descriptors are theoretically satisfying, structural groups have a greater intuitive appeal. The first group contribution method, published in 1944,10 was a correlation for predicting thermodynamic properties of ideal gases. In 1949, Thomas11 expanded this approach to critical properties. In the 1950s, more reliable group contribution methods for estimating critical properties were developed12 and researchers continued to develop new group-contribution correlations for thermodynamic properties including critical properties,13 freezing and boiling points,13 heat capacities,14,15,16 enthalpies of formation,15,16 entropies,15,16 and activity coefficients.17 Group contribution methods have also been developed for estimating liquid18 and gas19 viscosities. More recently, group contribution methods have been developed for estimating RON,20 MON,20 and CN;21,22 however, these methods did not include oxygen-containing groups. Dahmen and Marquardt’s group contribution method23 for estimating CN includes structural groups for most common oxygenated organic compounds. We tested these correlations against an independently compiled data set containing CN data for 449 compounds and octane number data for 218 compounds. The published CN correlations were based on data for 63 to 284 compounds, and published octane number correlations were based on data for 63 to 188 compounds. The comparisons with data showed that the published group contribution methods are unreliable particularly when applied to compounds not included in the developmental database. The unreliability of these methods can be explained by Smolenskii’s concept of the property complexity.24 Smolenskii consider a property to be complex if its value is sensitive to details of the chemical structure, and he measures this sensitivity by considering variability of the property within groups of isomers. Figure 1 shows the boiling point and RON of paraffinic hydrocarbons as functions of number of carbon atoms in the molecule. The variation of boiling point within each set of isomers is small, which indicates that boiling point is not very sensitive to molecular structure. RON, however, is

ACS Paragon Plus Environment

4

Page 5 of 32

much more sensitive to molecular structure as indicated by the large variation for a given set of isomers (e.g., C8 isomers). Smolenskii would consider RON to be a more complex property because it has a larger

500

125

400

100

300

75

200

50

100

25

Boiling Point Research

0

0 1

Figure 1.

Research Octane Number

variability within groups of isomers than boiling point.

Normal Boiling Point (K)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2

3 4 5 6 7 8 9 10 11 Number of Carbon Atoms

Normal boiling point and RON for paraffinic hydrocarbons plotted as functions of

number of carbon atoms in the molecule. Data is from Ref. 25.

Smolenskii proposed a quantitative correlation-coefficient-like complexity index. It measures the variability of the property within each isomer group relative to the total variability of the property. Based on the data plotted in Fig. 1, the complexity index for the normal boiling point of paraffins would be small because the variability in each isomer group is small relative to the total variability in normal boiling point. The complexity index for RON would be greater because its variability in some isomer groups is large relative to the total variability. According to Smolenskii’s complexity index, RON of paraffins is a moderately complex property, as is the CN of paraffins and naphthenes. Smolenskii et al.9 also observed that accurate QSPRs can be developed for simple properties, while developing models of complex properties may be futile. Group contribution methods that correlate property values with the sum of group parameters work well for

ACS Paragon Plus Environment

5

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 32

simple properties. However, these methods breakdown when applied to moderately complex properties because they fail to capture structural details and complex group-group interactions. Several researchers have investigated new approaches that capture structural details and groupgroup interactions within the context of a standard group contribution model. Albahri17 defines position dependent structural groups for his RON and MON models. He also investigated an artificial-neuralnetwork (ANN)-based group contribution method that could account for group-group interaction.20 Yang et al.22 developed an ANN-based group contribution method with position-dependent groups for predicting CN of paraffins. Their ANN yielded good results, but the scope of their study was limited. Albari’s ANN is limited to RON and MON of hydrocarbons.20 Yang et al. developed an ANN for CN using a training set consisting of only 13 isoparaffins.22

3.

APPROACH

Group contribution methods have the advantage of a relatively simple approach to describing molecular structure. These methods do not require computation abstract topological indices or complex descriptions of three-dimensional structures, so group contribution methods are easily accessible to a broad sector of the scientific and engineering community. Therefore, we based of QPSR for CN, RON, and MON on a group contribution approach. Recognizing that CN, RON, and MON are moderately complex properties, we used an ANN to capture the complex, nonlinear interactions among groups. Developing a ANN-based group contribution method consist of three parts – compiling the data set, developing the ANN, and validating the model. 3.1.

Data Set Selection.

Developing a reliable group contribution method for estimating CN,

RON, and MON requires a comprehensive data set that includes a broad range of compounds. We assembled a data set from published data compilations,25,26,27,28,29 data sets used to develop other structure-property relationships,9,23,30,31 and primary sources.32,33,34,35,36 The data set includes paraffins,

ACS Paragon Plus Environment

6

Page 7 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

olefins, alkynes, naphthenes, aromatics, alcohols, esters, furans, aldehydes, ketones, carboxylic acids, esters, and triglycerides. Table 1 summarizes the data set used in this study. A complete listing of the data is included in the electronic supplemental information (ESI). For CN, the data set include compounds containing from 1 to 57 carbon atoms. For RON and MON, the data set includes compounds containing from 1 to 12 carbon atoms. Figures 2 – 4 illustrate the distribution of CN, RON, and MON in the data set. The data for hydrocarbon cover a broad range of values for CN, RON, and MON. The CN data for oxygenated compounds also covers a broad range of values. However, the RON and MON data for oxygenated compounds are concentrated within the range of 80 – 120. Only 16% of the RON data and one of the MON for oxygenated compounds lies outside of this range. The literature contains duplicate measurements for many compounds, and the reported values are not always consistent. Huber and Hauber37 found significant differences in CN measurements obtained by different laboratories using the same fuel and the same model of test engine. Differences were as great as 4.5% for standard diesel fuels and 7.5% for premium quality diesel fuel. We found that reported values of CN may differ by more than 50 and that the spread in RON and MON measurements can be as a great as 40.

Table 1. A summary of compounds in the data set used for this study.

Number of Compounds

Number of Measurements

Compound Type CN

RON

MON

CN

RON

MON

Paraffins

77

45

45

115

70

66

Olefins and Alkynes

39

68

65

45

72

68

ACS Paragon Plus Environment

7

Industrial & Engineering Chemistry Research

Naphthenes

47

43

35

63

51

36

Aromatics

65

35

37

70

37

39

Alcohols

28

13

12

40

25

22

Esters and Triglycerides

130

3

3

168

3

3

Other Oxygenates

63

9

11

70

22

12

Total

449

216

208

571

280

246

Number of Measurements in Data Base

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 32

100 90

Hydrocarbons

80 70 60 50 40 30 20 10 0

Cetane Number Range

Figure 2.

Histogram illustrating the distribution of CN data.

ACS Paragon Plus Environment

8

50 45

Hydrocarbons

40 35 30 25 20 15 10 5 0

Reserach Octane Number Range

Figure 3.

Number of Measurements in Data Base

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Number of Measurements in Data Base

Page 9 of 32

Histogram illustrating the distribution of RON data.

50 45

Hydrocarbons

40 35 30 25 20 15 10 5 0

Motor Octane Number Range

Figure 4.

Histogram illustrating the distribution of MON data.

ACS Paragon Plus Environment

9

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 32

To ensure our database accurately reflects the observed variability in the measurements, it includes duplicate measurements. However, for a few compounds, large numbers of duplicate measurements have been reported. For example, the literature contains 11 different values of CN for n-decane. The inclusion of all measurements in the data set could bias the results by giving too much weight to a few compounds that have been studied extensively. Therefore, in order to mitigate data skewing, we limited the data set to only include three data points for a given compound—the “best estimate,” the minimum, and the maximum. In operations research, the probability distribution for activity durations is commonly estimated from three points —a “pessimistic,” an “optimistic,” and a “most likely” value.38 Limiting the number of measurements per compound to three is analogous to this practice. Our approach resembles data compression rather than data censoring. We employed minimal censoring when compiling the data set. We eliminated obvious outliers and values reported as inequalities. Some data compilations included compounds with incomplete or ambiguous structural information, so we did not include these compounds. We then used variability in duplicate measurements to estimate the uncertainty in the data. Uncertainty in measured CN has a standard deviation of 4.1 for hydrocarbons and a standard deviation of 7.5 for oxygenated organic compounds. For RON and MON, we observed no statistically significant difference in the uncertainty for hydrocarbons and oxygenated organic compounds. Based on variability in the data, the standard deviations for RON and MON are 3.4 and 3.3, respectively; and these values apply to all classes of compounds.

3.2.

Developing the Artificial Neural Network.

An ANN is a nonlinear, multivariable regression

model. Like all large, multivariable regression models, ANNs are subject to over fitting the data. Palmer and Mitchell encountered the problem of over fitting the data while using a random forest method to develop a QSPR for the solubility of drug-like molecules.39 They recognized the general nature of the

ACS Paragon Plus Environment

10

Page 11 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

over fitting problem by stating that “QSPR models adept at modeling noise.” Lack of guaranteed convergence to a global minimum is another problem that plagues ANN. A support vector regression (SVR) is a method of avoiding these problems. A SVR model is essentially a ANN with a single layer of hidden nodes formulated in way that avoid over fitting the data and guarantees convergence to a global minimum.40 In this study, we developed a new heuristic approach to developing an ANN. Unlike many applications, we did not fit the data to a predetermined network structure. In contrast, we used a statistically informed method to evolve a simple starting network representing a small subset of the database into a complex network encompassing the entire database. This approach helps to avoid over fitting the data to an arbitrary network structure with too many adjustable parameters and gives greater assurance of global convergence. An ANN is an acyclic directed graph consisting of multiple layers of interconnected nodes. The edges only connect the nodes of one layer to the nodes in the next higher layer. We used a three-layered ANN consisting of an input layer representing the number of each structural group, a single hidden layer, and an output layer representing CN, RON, and MON. The nodes in the hidden layer represent empirical functions relating the number of each structural group to a set of intermediate parameters. The hidden nodes represent data feature that are somewhat analogous to the eigenvectors in a principal component analysis; but, unlike eigenvectors, they are not necessarily orthogonal or statistically independent. The output layer consists of nodes representing empirical functions relating the output parameters from the hidden nodes to CN, RON, and MON. The output of an input layer node is the number of the corresponding structural group in the molecule. For example, the node corresponding to –CH2– groups would have a value of six for n-octane. The output of a hidden node is a nonlinear function of the following linear combination of the inputs to that node.

ACS Paragon Plus Environment

11

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 32

n

Field Cod

zk = wo,k + ∑ wi,k ⋅ xi ,

(1)

i=1

where xi is the output value from input node i, zk is the output from hidden node k, and wo,k and wi,k are empirical parameters or weighting factors determined from the regression analysis. We used the logistics function to relate zk to the output of the hidden node.

yk =

1 , 1 + e − zk

Field Cod

(2)

where yk is the output from hidden node k. The logistics function is, perhaps, the most commonly used function to relate the inputs to an ANN node to its output, but other functions can and have been used. We selected the logistics function as the first choice because of its common usage. Because we obtained satisfactory results, we did not investigate other functions. We used the same functional relationships to relate the outputs from the hidden nodes to the outputs from the output nodes. The range of the logistics function is defined on the open interval (0,1). Therefore, the outputs of the ANN are scaled values of CN, RON, and MON that are converted into the actual values using the following equation.

Rl = ( Rl,max − Rl,min ) ⋅ rl + Rl,min

(3)

where Rl is the actual value of CN, RON, or MON, rl is the scaled value of CN, RON, or MON for the output nodes of the ANN, Rl,max is the maximum value, and Rl,min is the minimum value. The index k in Eq. (3) is CN, RON, or MON. Rl,max and Rl,min are empirical parameters. We began developing the network by selecting an initial working data set consisting of n-alkanes and terminal alcohols of n-alkanes. The initial network consisted of three input nodes representing the –CH3, –CH2–, and –OH groups; three hidden nodes; and three output nodes. Initially, each hidden node is connected to only one output node as illustrated in Figure 5. Additional connections and nodes were

ACS Paragon Plus Environment

12

Field Cod

Page 13 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

added to the ANN to incorporate more complex structures into the model and to improve the fit of the data. We used the following statistically informed procedure for expanding and optimizing the ANN: 1. Optimize the parameters for the current network structure and working subset of the data. 2. Modify the ANN by (a) making finer distinctions in structural groups, (b) adding an additional connection between nodes, or (c) adding an additional hidden node. 3. Optimize parameters for modified ANN.

Output Nodes

CN

RON

MON

Hidden Nodes

CN-1

RON-1

MON-1

Input Nodes

–CH3

–CH2–

–OH

Figure 5. Initial ANN configure for CN, RON, and MON of n-alkanes and primary alcohols.

4. Use an F-test to determine if the ANN modification improves the model significantly. If the probability of an improvement is >0.9, the modification is accepted. If the probability of an improvement is C< (C# ≥ 3)

=CH–

–O– (2nd+)

=CH2

=CC= (C# ≥ 3)

In a conventional group contribution method, properties of homologous series of compounds should have similar shape when plotted as a function of a common variable such as the number of carbon atoms in the molecule. For example, the CN of primary alcohols of n-alkanes plotted as a function of number of carbon atoms should have shape similar to CN of n-alkanes plotted as a function of number

ACS Paragon Plus Environment

17

Industrial & Engineering Chemistry Research

of carbon atoms. However, Figure 7 shows that CNs of these two homologous series have different shapes. It is important to note our ANN model captures the distinct differences in shape for both data series. Figure 8 illustrates the ability of the ANN to capture the nonlinear behavior of the CN and RON for 1- alkenes including the maximum in RON at low molecular weights.

120 100 80 Cetane Number

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 32

60 40 20

n-alkane data ANN for n-alkanes primary alcohol data ANN for primary alcohols

0 -20 0

5

10 15 Carbon Number

20

Figure 7. ANN-based group contribution method results for CN of n-alkanes and primary alcohols of nalkanes.

ACS Paragon Plus Environment

18

Page 19 of 32

120 100 Cetane / Octane Number

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

80 60 40

CN Data ANN for CN RON Data

20

ANN for RON

0 0

5

10 Carbon Number

15

Figure 8. ANN-based group contribution method results for CN and RON of 1-alkenes.

Figures 9 and 10 are parity plots for the ANN prediction of CN for hydrocarbons and oxygenates respectively. Residuals for the ANN fit zero-mean normal distributions. The residuals for hydrocarbons had a constant standard deviation of 7.2, and the residuals for oxygenates had a CN dependent standard deviation given by the following equation.

σ = 6.3 + 0.083⋅CN .

(5)

CN in Equation (5) is the CN estimate from the ANN. Figures 11 and 12 are parity plots for RON and MON respectively. RON and MON residuals fit zero-mean normal distributions with constant standard deviations of 6.9 for RON and 6.1 for MON. Zero mean residuals iimplies that the model is unbiased. However, the standard deviations are much larger that the estimated experimental errors. Experimental errors only account for 25% to 32% of the observed variance in the residuals. The remainder is modeling error.

ACS Paragon Plus Environment

19

Field Cod

Industrial & Engineering Chemistry Research

150 125

Predicted

100 75 50 25 0 -25 -25

0

25

50 75 100 Measured

125

150

Figure 9. Parity plots for CN of hydrocarbons. The dotted lines represent 2 standard deviations from the mean.

150 125 100 Predicted

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 32

75 50 25 0 -25 -25

0

25

50 75 100 Measured

125

150

Figure 10. Parity plots for CN of oxygenated compounds. The dotted lines represent 2 standard deviations from the mean.

ACS Paragon Plus Environment

20

Page 21 of 32

150 125 100

Predicted

75 50 25 0 -25 -25

0

25

50 75 100 Measured

125

150

Figure 11. Parity plots for RON of oxygenated compounds. The dotted lines represent 2 standard deviations from the mean.

150 125 100 Predicted

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

75 50 25 0 -25 -25

0

25

50

75

100

125

150

Measured

Figure 12. Parity plots for MON of oxygenated compounds. The dotted lines represent 2 standard deviations from the mean.

ACS Paragon Plus Environment

21

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

Table 3 compares correlation coefficients (r2) for our ANN-based method to those for other group contribution methods. Correlation coefficients were determined from the following equation.

r2 = 1 −

2 σ res 2 σ data

(6)

where σres is the standard deviation of the residuals or errors in the ANN predictions and σdata is the standard deviation of the data. All correlation coefficients in Table 3 were calculated using the data set developed for this study.

Table 3.

Comparison of correlation coefficients for the ANN-based group contribution method of

this study with published group contribution methods.

CN Compound Class

This Study

RON

DeFries et Dahmen & al.18 Marquard20

MON

This Study

Albahri17

This Study

Albahri17

Paraffins

0.91

0.73

0.53

0.94

0.86

0.95

0.87

Olefins and Alkynes

0.90

*

-0.48

0.90

0.53

0.65

-1.55

Naphthenes

0.81

*

0.25

0.85

0.75

0.89

-0.40

Aromatics

0.87

0.44

0.58

0.76

-3.28





Oxygenates

0.85

*

0.41

0.62

*

0.56

*

Overall

0.90

0.64

0.53

0.93

0.55

0.91

-1.16

* Correlation is not applicable to the compound class. †

Insufficient data for meaningful evaluation of r2.

ACS Paragon Plus Environment

22

Page 23 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

For the ANN, are low for the CN of naphthenes and CN, RON, and MON of oxygenated compounds. Correlation coefficients can be low because the standard deviation in the residual is high or because the range of values in given subset of data is small. For CN of naphthenes, the standard deviation of residuals is small, but the distribution of CNs for naphthenes in the database is narrow resulting in a lower correlation coefficient than other classes of compounds. For CN of oxygenated compounds, the distribution of values is comparable to that of hydrocarbons (see Fig. 2), but the residuals are greater because the CN measurement errors for oxygenated hydrocarbons are greater than hydrocarbons (see Section 3.1). As illustrated in Figs. 3 and 4, the distributions of RON and MON for oxygenated compounds is small, so the have a lower correlation coefficient than hydrocarbons even though the residuals are comparable to thise of other compounds. Dahmen and Marquardt’s23 method for estimating CN is the only other group contribution method that includes oxygenated organic compounds. Our ANN group contribution method is the only published method that includes oxygenated organic compounds for estimating RON and MON. Comparing the correlation coefficients in Table 3 shows that our ANN-based method fits the data better than the other group contribution methods, but this comparison only tells part of the story. The group contribution methods of DeFries et al.21, Dahmen and Marquardt23, and Albahri17 perform well for compounds similar to those in the data sets used in their development. However, predicted values of CN, RON, and MON are inaccurate when these correlations are extrapolated to compounds in our data with molecular structures that deviate from those their original developmental data sets.

4.2.

Discussion of Model Validation.

Table 4 is a summary of the 10-fold validation results.

The average errors for the combined validation sets were zero for all properties. The standard deviations for the combined errors in the validation sets are slightly larger than the standard deviations for the regression analysis residuals. We used an F-test to evaluate the statistical significance of these

ACS Paragon Plus Environment

23

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 32

differences. The significance level of the difference is low. Also, the magnitude of the difference is low (