ARTICLE pubs.acs.org/EF
Automated Method for Determining Hydrocarbon Distributions in Mobility Fuels Nathan J. Begue,‡ Jeffery A. Cramer,† Chris Von Bargen,§ Kristina M. Myers,^ Kevin J. Johnson,† and Robert E. Morris*,† †
U.S. Naval Research Laboratory, Chemical Sensing and Fuel Technology Section, Code 6181, 4555 Overlook Avenue, SW, Washington, DC 20375, United States ‡ National Research Council (NRC), The National Academies, Washington, DC 20001, United States § Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-3803, United States ^ Nova Research, Inc., 1900 Elkin Street Suite 230, Alexandria, Virginia, USA 22308 ABSTRACT: The analysis of hydrocarbon profiles in complex fuel samples has been a daunting task for fuel scientists for decades. Although many studies of specific compound classes on a limited number of fuel samples have been published, a large-scale survey of many samples has been lacking. The complexity and extensive manpower requirements have inhibited comprehensive, wide-scoped studies. Presented here is a novel, automated chemical component classification scheme which is based on a set of selection rules that operate on either the molecular formula or chemical name from a previously generated “hit list.” The method is validated against synthetically generated data and a PIANO blend standard. Average jet and diesel fuel profiles are presented.
’ INTRODUCTION The characterization of hydrocarbon fuels (middle-distillates) has been a challenging task for decades due to the striking complexity of the samples. It has been proposed that diesel fuels may contain more than 100,000 distinct chemical components.1 Many different analytical techniques have been employed in such studies, but gas chromatography (GC)2-10 and high-performance liquid chromatography (HPLC)11-13 have been the most prominent. Although advances in instrumentation, particularly comprehensive gas chromatography (GCxGC),1,9,14-18 and high-resolution mass spectrometry (HRMS)5,19-21 have drastically increased the throughput and resolving power of such analyses, the resulting large, complex data sets have shifted the burden to data analysis. Determination of the distribution and types of hydrocarbons present in a given fuel is a necessary prerequisite for determining the Fit-For-Purpose (FFP) of a fuel for its intended use. This is of particular importance as the fuel user community is faced with the task of certifying fuels derived from alternative nonpetroleum sources. In most instances, while many of these alternative fuels are suitable for use as a replacement for their petroleum-derived counterparts, their constituency can be distinctly different from the traditional petroleum-derived fuels that they are intended to replace. Thus, an understanding of the chemical constituency of these fuels is critical for certification and quality surveillance. The primary method for a broader hydrocarbon analysis is ASTM D242522 in which compositional information is developed from fuels by performing a liquid-liquid extraction to separate the sample into aromatic and saturate fractions followed by GC-MS analysis of each fraction. By summing characteristic mass fragments of different hydrocarbons, estimates of compound class concentrations are obtained in mass percent. This method does not account for heteroatomic species, and sulfur r 2011 American Chemical Society
content greater than 0.25 mass percent can interfere. Also, the separation of aromatic and saturate fractions by ASTM D2549,23 a necessary precursor for ASTM D2425, is time and labor intensive. Additionally, ASTM D2425 was designed to function exclusively with petroleum fuels as it employs many isotopic correction factors and mole sensitivities empirically derived from such fuels, and therefore in its current form will not function properly with hydrorefined or synthetic fuels, due to their uniquely narrow and often discontinuous hydrocarbon distributions. Additionally, ASTM D131924 can be used to determine aromatic, olefin, and saturate content by fluorescent indicator absorption. This method utilizes a set of fluorescent dyes that coelute with each class on a silica gel column. The lengths of the colored bands observed under ultraviolet light are measured to determine the relative percentages of each compound class. A battery of other ASTM methods exist for the analysis of single compound classes25-29 or specific compounds30-34 in fuels. More recently Vendeuvre and co-workers18 extended their previous work and demonstrated an alternative to ASTM D2425 using GCxGC-FID. In this work, peaks falling within predetermined retention time windows were defined to be of a given class and carbon number. As the first column modulation split peaks across several slices, the integration was performed manually and “consists of selecting a blob...using mouse clicking”. The primary advantage of this technique is that it eliminates the need for complex sample preparation. Also, this methodology enabled the generation of carbon profiles and not just average carbon numbers for each compound class. Unfortunately this method requires significant user intervention by means of the manual Received: December 2, 2010 Revised: February 11, 2011 Published: March 11, 2011 1617
dx.doi.org/10.1021/ef101635a | Energy Fuels 2011, 25, 1617–1623
Energy & Fuels peak integration and maintenance of retention time windows, presumably, through intermittent runs of retention time standards. Additionally, while the prevalence of GCxGC instrumentation has grown dramatically in recent years, they are not nearly as ubiquitous as their single-column counterparts. To facilitate the analysis of complex mixtures, such as fuels, we present a novel classification algorithm based on a text parser that operates on the chemical name and formula of each chromatographic feature, herein referred to as the “hit list”. Although automated peak detection and deconvolution is a continuously evolving area of study,35-42 the hit lists here were derived in a manner consistent with the work of Stein.40 The focus of this particular work is GC-MS data, but the source of the hit list is irrelevant as long as it contains a minimum set of peak variables; specifically chemical name, formula, and integrated area, although CAS number can also be useful in classifying compounds with uncommon names.
’ EXPERIMENTAL SECTION Instrumental Methododolgy. The GC-MS data were acquired on an Agilent 7890A GC with a standard multimode inlet and 5975C mass selective detector (MS). An Agilent autoinjector with a 10-μL syringe was used to introduce 0.5 μL of neat fuel into the inlet which was split at a 200:1 split ratio. A DB-1MS (Agilent, 60 m 0.25 mm 0.25 μm film) column was used with an oven temperature program that began at 40 C, held for 1.5 min, ramped at 10 C/min to 290 C, and held for 10 min. The MS was scanned from 40 to 350 m/z, resulting in a scan rate of 5.19 Hz. Validation of the profiler classification logic employed a comprehensive fuel sample set which consisted of 408 jet fuels, comprising Jet A, Jet A-1, JP-5, and JP-8; 615 diesel, including samples of F76, MGO, and ULSD; and 76 alternative fuels, including samples of Fischer-Tropsch (FT) fuels, fatty acid methyl ester (FAME) biodiesels, and fuels produced from hydrorefined biomass. These samples were sourced from around the globe over the last five years in an attempt to maximize the variability of the source material. A sample of the DHA PIANO (detailed hydrocarbon analysis paraffins, iso-paraffins, aromatics, napthenes and olefins) blend was acquired from Restek and used as received. Additionally, 100 synthetic GC-MS chromatograms were generated consistent with the work of Dixon et al.37 using the 100 most commonly observed compounds identified in the validation set. All 100 compounds were present in each synthetic data set, unlike Dixon’s work where each component had a randomly assigned frequency of occurrence across the data set. While the retention times for each component were held fixed across all 100 synthetic chromatograms, the standard area (A) for each component was modulated by a random number generated from a normal distribution with a mean of A and a standard deviation of 0.1A. Computational Algorithms. The Agilent Chemstation data files were sequentially imported into MATLAB R2009b (Math Works, Inc., Natick, MA) and analyzed. Noise factor analysis, peak picking, and peak deconvolution were performed in accordance with the work of Stein,40 which is the foundation of NIST’s AMDIS software package. A characteristic noise factor is derived for each sample by scanning each individual ion chromatogram and determining signal free windows and estimating the instrumental noise characteristics assuming Poisson statistics. An averaged noise factor is derived from the individual noise factors across all mass channels. This noise factor is then used in the peak picking step to set a minimum signal-to-noise ratio (SNR) threshold for determination of peaks. The peak picking, or component perception, step identifies maxima of each mass channel that exceed the SNR threshold and accounts for skewing which is present in scanning mass
ARTICLE
spectrometers such as the quadrapole used in this work. Peaks across the mass channels are collected by matching retention times and peak shape. Next the peaks are deconvoluted using a least-squares method fitting to a model peak profile. It should be noted that while custom code was used in the present work to generate the hit list, the presented profiling scheme can be applied to any hit list that contains a minimum of the IUPAC name, chemical formula, and peak area for each identified compound. Whereas the method published by Stein included algorithms for mass spectral matching to a library, the NIST MS Search Program v 2.0 was used instead to take advantage of the NIST08 mass spectral library. Calls to NIST MS Search Program were made from within the MATLAB environment. Only peaks that accounted for more than 0.001% of the total integrated area were passed to the library search and only results with a match factor >70% were accepted. Two factors contribute to the ability to use such a low limit. First, exact identification of each component is not necessary for this work as the desired information is hydrocarbon class which is much easier to determine by mass spectral fingerprints. For example, differentiation of 3-methylheptane from 2-methylheptane is not necessary as they are both iso alkanes, but they have very different spectra from aromatics such as xylene. Second, low match factors tend to occur for peaks with low peak areas, and with a low SNR. Most of these peaks are rejected by the minimum area requirements and those that do remain would likely represent an insignificant amount of the total integrated area. Profiling of the identified compounds was performed by an in-house written text parser that operates on the compound names and chemical formulas in the hit list generated in the previous step, and assigns each spectral match to a compound type classification. These assignments were performed in accordance with the hierarchical rules presented in Table 1. For each compound class, there was a set of three rules that operated on either the chemical name or chemical formula, depending on compound class. The rules to sequentially filter out classes were applied in the order shown in Table 1 and consist of a set of character strings that must (1) all be included, (2) at least one included, and (3) all excluded from the chemical name or formula. For example, 2,6,10trimethyl-tetradecane would be classified as an iso alkane as it includes all of the ['ane', 'yl-'], at least one of ['ane'], and excludes all of ['ene,', 'ene-', 'ane-', 'cyclo', 'ox', 'rane', 'olane']. Rules operating on the chemical formula operate in a similar fashion and were used to classify heteroatomic species. In addition to these rules, two filters were used to correct several known misclassifications and override several known erroneous classifications, such as column bleed and partially saturated aromatics. This was accomplished by first comparing each CAS registry number to a list of known “bad actors.” If the CAS number was present in the list, a predefined class was returned and the logical rules were bypassed. The advantage of this method is that CAS numbers are a unique identifier and therefore make an ideal key for a lookup table approach. The disadvantage of this technique is that these bad actors must be identified in advance of the analysis. Additionally, several compounds commonly associated with column bleed, such as trichlorodocosylsileane, were added to the lookup table to force their classification into the “Other” class. Saturated and partially saturated aromatics were identified by searching for the strings ‘hydro’ and common aromatic frameworks. If the string ‘hydro’ is found in the test string then the second prefilter is invoked. The prefilter compares the test string to common hydration patterns for the aromatic in question. For example, by the rules presented in Table 1, decahydronaphthalene would be incorrectly classified as belonging to the naphthalenes class. For this example, the combination of the 'naphthalene' and 'decahydro' strings results in the classification as a dicyclo alkane. Alternatively, ‘1,2,3,4-tetrahydronapthalene’ would be classified as an indans and tetralins due to the 1618
dx.doi.org/10.1021/ef101635a |Energy Fuels 2011, 25, 1617–1623
Energy & Fuels
ARTICLE
Table 1. Text Parser Rules for Chemical Profiling and Degrees of Unsaturation (DU) for Each Chemical Classa compound DU
10
include at
class
include all
least one
exclude all
methyl esters
‘acid’, ’methyl ester’
--
--
sulfur-bound
‘S’
--
‘Sc’, ’Sr’, ’Sn’, ’Si’, ’Sb’, ’Se’
nitrogen-bound
‘N’
--
‘Na’, ’Nb’, ’Ni’, ’Ne’
oxygen-bound
‘O’
--
‘Os’
chlorine-bound
‘Cl’
--
--
other halogen-bound
--
‘F’, ’Br’, ’I’
‘Ir’, ’In’, ’Fr’, ’Fe’, ’Fm’
tricycloaromatics
--
‘acenaphthylene’, ’phenanthrene’,
--
’anthracene’, ’pyrene’ 9
acenaphthylenes
‘acenaphthylene’
--
--
8
acenaphthenes
--
‘acenaphthene’, ’naphthyleneethylene’
--
7
branched naphthalenes
‘naphthalene’
‘,’, ’-’
--
7
naphthalene
‘naphthalene’
--
‘,’, ’-’, ’ n’, ’ln’
6
indenes
‘indene’
--
--
5 4
indans and tetralins alkyl benzenes
---
‘tetralin’, ’Indane’, ’indan’ ‘ylbenzene’, ’benzene,’, ’xylene’, ’tolu’, ’phenyl’,
-‘acid’, ’ester’, ’(benzene’
4
benzene
‘benzene’
--
‘,’, ’-’, ’ b’, ’lb’
2
cyclo alkenes
‘ane’
‘heneicos’, ’ane’
‘ene,’, ’ene-’, ’ane-’, ’yl-’, ’cyclo’,
1
any alkenes
‘ene’
‘cyclo’, ’benz’, ’xyl’, ’tolu’, ’phenyl’, ’azulene’,
--
’benzo’, ’styrene’, ’fluorene’
’tert’, ’,’, ’ox’, ’rane’, ’olane’ ’pentalene’, ’annulene’ 3
alkyl tricyclo alkanes
‘ene’
--
‘heneicosane’
3
tricyclo alkanes
‘ane’, ’tricyclo’, ’yl’
‘ane’, ’adamantane’
‘ene’, ’ane-’, ’ox’, ’rane’, ’olane’
2
alkyl dicyclo alkanes
‘ane’, ’tricyclo’
‘ane’
‘ene’, ’ane-’, ’yl-’, ’ox’, ’rane’, ’olane’
2
dicyclo alkanes
‘ane’, ’yl’
‘decalin’, ’bicyclo’, ’dicyclo’, ’spiro’
‘ene’, ’ane-’, ’ox’, ’rane’, ’olane’
1
alkyl monocyclo alkanes
‘ane’
‘decalin’, ’bicyclo’, ’dicyclo’, ’spiro’
‘ene’, ’ane-’, ’yl-’, ’ox’, ’rane’, ’olane’
1
monocyclo alkanes
‘ane’, ’yl’, ’cyclo’
‘ane’
‘ene’, ’ane-’, ’ox’, ’rane’, ’olane’
0
iso alkanes
‘ane’
‘ane’
‘ene’, ’ane-’, ’yl’, ’ox’, ’rane’, ’olane’, ’cyclo’
0
normal alkanes
‘ane’, ’yl-’
‘ane’
‘ene,’, ’ene-’, ’ane-’, ’cyclo’, ’ox’, ’rane’, ’olane’
other a
All rules operate on the chemical name except for the heteroatomic classes which operate on the chemical formula.
presence of the string ‘1,2,3,4-tetrahydro’ in addition to the naphthalene string. Similar testing logic is applied to six other aromatic frameworks that were observed during validation. This filter was applied after testing for acid methyl esters and heteroatom classes, but before the aromatic classes. Further analysis of each compound class was performed by determining the average carbon number (Cave) and carbon profiles of each class and calculating the area percent for double bond equivalents (DBE) between zero and eleven. A weighted average was used to calculate Cave where the weighting factor was number of carbon atoms in each compound (as determined from the molecular formula). DBE can be calculated in accordance with eq 1 as DBE ¼
ð2a þ 2Þ - ðd - bÞ 2
ð1Þ
where a, b, and d are the number of tetra-, tri-, and monovalent atoms, divalent elements such as sulfur and oxygen do not contribute to the DBE. Finally, a text file is written which includes the percent integrated
area per compound class, the 20 largest peaks, area percent and average carbon number for degrees of unsaturation from 0 to 11, area percent for each carbon number for each class, and a detailed compound list for all classes. All compositional information is presented in area percent unless otherwise noted.
’ RESULTS AND DISCUSSION Classification Validation. To validate the logic of the selection rules, a compound list was generated by extracting the unique chemical compounds from the hit lists for the first 990 fuel samples. This resulted in a set of more than 814,000 matches with match factors >70% which consisted of 1656 unique chemical compounds which constitutes nearly 1% of the entries in the NIST08 electron ionization library. The CAS number was used as a unique identifier when available. For the 289 compounds that had no CAS number, the chemical name was used as a test for uniqueness. Each classification was manually verified 1619
dx.doi.org/10.1021/ef101635a |Energy Fuels 2011, 25, 1617–1623
Energy & Fuels
ARTICLE
Table 2. Hydrocarbon Profile Based on Area% by Class for PIANO Blend Standarda area% compound class
mass%
Profiler
Chemstation
actual
alkyl benzenes
23.8 ( 0.8%
32.59%
22.86%
normal alkanes
15.4 ( 1.2%
17.00%
19.90%
any alkenes iso alkanes
15.6 ( 1.3% 21.4 ( 1.3%
17.43% 14.84%
18.84% 18.70%
alkyl monocyclo alkanes
18.8 ( 1.9%
17.47%
17.95%
monocyclo alkanes
3.0 ( 0.4%
0.74%
2.72%
alkyl dicyclo alkanes
2.0 ( 0.7%
0.00%
0.00%
0.02 ( 0.01%
0.00%
0.00%
oxygen-bound
Profiler values were the average of the results from five replicate injections and the Chemstation results are derived from a single, manually integrated and profiled sample. The reference values, in mass%, supplied by the vendor are reported for comparison. a
Table 3. Hydrocarbon Profile Based on Area% by Degrees of Unsaturation (DU) for PIANO Blend Standarda area%
mass%
DU
Profiler
Chemstation
actual
0
36.8 ( 0.9%
40.68%
38.59%
1
38.8 ( 1.3%
25.39%
37.83%
2
0.6 ( 0.2%
0.09%
0.68%
4
23.8 ( 0.7%
33.88%
22.86%
a
The reference values, in mass%, supplied by the vendor are reported for comparison.
which identified 45 “bad actors”. Additionally 56 saturated or partially saturated aromatic compounds were trapped by the prefilter. The initial “bad actors” were added to the lookup table prefilter to prevent future misclassification of these compounds. This filter should remain applicable for hydrocarbon based fuels, synthetic or petroleum derived. If this algorithm were to be applied to a radically different sample matrix, soil extracts for example, the “bad actors” list should be revisited. To benchmark the automated profiler against manual classification, a DHA PIANO blend standard was analyzed. The hydrocarbon profiles obtained for the DHA PIANO blend are presented in Tables 2 and 3, where only compound classes showing more than 0.001% by area are shown. Results for the Profiler are for the fully automated technique presented here. Results from Chemstation were from the manual classification of each hit identified by the Chemstation Integratrator. The errors presented in Tables 2 and 3 are one standard deviation of the five replicate injections and are within the GC-MS method error. More than 190 chromagraphic features were classified by the automated profiler in each of the five replicate samples, while the ChemStation Integrator detected 105 chromagraphic features. Repeated analysis of a single data file using the same setup parameters resulted in identical profiles as the algorithm is completely deterministic. In general the misidentifications of the 45 bad actors cannot be corrected through amendments to the profiler logic but are an inherent consequence of the hierarchical rules used in the classification, and the limitations of the source MS library. The most common cause of misclassification is the existence of ringed
side chains (20 of 45 bad actors). For example, cyclohexylcyclooctane classifies as an alkyl monocyclo alkane instead of an alkyl dicyclo alkane. Further misclassifications of this type will have minimal impact on the profiles as they will still be profiled as cycloalkanes, which have similar chemical activity. This error could possibly be corrected if another layer of logic were added to remove the initially matched strings, followed by profiling the remaining name string. While this strategy would likely be successful, the current impact is low on the intended use of this algorithm and does not justify the increased complexity of additional layers of classification logic. No simple extension of the profiler logic would correctly classify the remaining bad actors due to the inherent limitations imposed by the methodical rules used. Three general cases were observed: (1) the use of non IUPAC names, (2) inconsistently formatted names in the NIST library, and (3) combinations of multiple complex functional groups which frustrate the profiler similar to the cyclo side chains discussed earlier. Fourteen misclassifications were caused by non IUPAC names such as menthane (as opposed to 1-methyl-4-isopropylcyclohexane). As the selection rules were designed to operate on standard IUPAC names, it is impossible to predict how these events will affect the profile. Eight misclassifications were due to formatting, and all were iso alkanes that were classified as normal alkanes. For example, 5-ethyldecane should be formatted as “decane, 5-ethyl-“ which would be correctly identified as an iso alkane due to the 'yl-' string but is incorrectly identified as a normal alkane. An observed complex combination of functional groups that thwarted the profiler is octahydro-4,7-methano-1h-indene. While octahydroindene would be classified as a dicyclo alkane (and is correctly classified due to the saturated aromatic prefilter), the addition of the extra ring hinders the classification. Although initial inspection would imply that the profiler logic failed to correctly classify 2.7% of the unique chemical compounds, this is not the fairest metric to use. The 990 chromatograms used in this validation study had a combined total integration area of 6.9 1012 counts. Of this, the misclassified compounds accounted for less than 1% of that area. Furthermore these misclassifications were less than 0.7% of the more than 814,000 chromagraphic components identified in the automated search. Additionally, nearly half of the misclassifications were due to cyclic alkane side chains resulting in dicyclo- and tricycloalkanes to be inadvertently classified as monocyclic which arguably have very little impact on the interpretation of the profile. Lastly, these specific misclassifications can be prevented in the future by the known bad actor prefilter. Also, a check is instituted in the profiler to flag previously unverified classifications to facilitate periodic updates of the known bad actors list. Such a test based on the remaining 109 samples (roughly 10% of the samples) in the data set generated 44 newly observed chemical compounds resulting in only 1 novel bad actor. The benchmark test based on the analysis of the PIANO blend highlights the speed advantage of the profiler. Automated profiling of the hit list required less than 5 s per fuel sample (