Evaluating the Predictive Powers of Spectroscopy and

Feb 18, 2006 - These predictions will take advantage of predictive models derived from chemometric analysis of the data stream. We report here the fin...
11 downloads 12 Views 69KB Size
Energy & Fuels 2006, 20, 727-733

727

Evaluating the Predictive Powers of Spectroscopy and Chromatography for Fuel Quality Assessment Kevin J. Johnson, Robert E. Morris,* and Susan L. Rose-Pehrsson NaVal Research Laboratory, Code 6181, Washington, D.C. 20375-5320 ReceiVed October 21, 2005. ReVised Manuscript ReceiVed January 13, 2006

In this work, a set of 45 fuel samples from around the world was acquired, with complete compositional and specification test results. These samples were analyzed by near-infrared (NIR) spectrophotometry, Raman spectroscopy, and gas chromatography with mass selective detection (GC-MS). The measured NIR, Raman, and GC-MS data were evaluated for their ability to predict various fuel properties via partial least squares modeling (PLS) as part of an effort directed toward developing robust, sensor-based fuel quality assessment methodologies. Currently, fuel quality in the field or onboard a ship is assessed with a series of traditional American Society for Testing and Materials (ASTM) fuel test procedures. A sensor-based device to perform these tests would not only provide significant savings in cost and manpower but also reduce the hazards associated with handling large volumes of fuel samples. It would also provide faster and, in many cases, more-consistent results. We are currently developing sensing technologies to perform fuel quality assessment and diagnostics. This technology is based upon the prediction of critical fuel properties from an array of optical and other specialized sensors. These predictions will take advantage of predictive models derived from chemometric analysis of the data stream. We report here the findings of our preliminary model development, taking advantage of novel chemometric methodologies developed at Naval Research Laboratory (NRL) for detecting minute compositional changes in fuels from chromatographic analyses.

Background Hydrocarbon-based fuels are manufactured to comply with performance specifications based on properties rather than composition. While there are limits placed on certain fuel constituents, the fuel quality and suitability for use is based on a series of physical and chemical measurements. These measurements are performed in accordance with accepted test methods contained in the applicable ASTM test methods.1 The NATOPS Aircraft Refueling Manual2 requires that aviation fuel (JP-5) received by ships for aircraft fueling be tested for API gravity, flash point, particulates, fuel system icing inhibitor (FSII), and free water. Then during flight operations, the fuel that is dispensed to aircraft must be tested each day for appearance, particulates, free water, and FSII. All of these tests are performed in the shipboard quality assurance (QA) fuel laboratory, which requires significant manpower and time. The necessity of sampling and transporting fuel samples from the source to the fuel lab also entails manpower costs and safety considerations. In addition, the American Society for Testing and Materials (ASTM) tests that are employed require that the analyst be trained and familiarized with the fuel test methods. Shipboard and land-based fuel-handling operations would both benefit from an instrumental method to monitor fuel quality and to perform the necessary quality assurance testing. While resulting in significant savings in cost, this would also reduce the time necessary to determine fuel quality. If this technology * Corresponding author. E-mail: [email protected]. (1) ASTM, Specification of Aviation Turbine Fuels. In Annual Book of ASTM Standards; ASTM: Philadelphia, PA, 1997; Vol. 05.01, pp D165596c. (2) Aircraft Refueling NATOPS Manual; NAVAIR Report No. 00-80T109; June 2002. Navel Air Technical Data and Engineering Service Command, Naval Air Station North Island, Code 3.3A, P.O. Box 357031, San Diego, California 92135-7031.

10.1021/ef050347t

could be developed to include in-line real-time quality monitoring, this would be invaluable in performing fuel quality monitoring throughout the fuel-handling system. Such a system could be automated to provide continuous real-time fuel quality monitoring for both shipboard and land-based fuel-handling and distribution operations. The use of compositional analysis as a tool to predict fuel performance has not been widely successful, partly because many of the critical fuel properties are determined or influenced by trace levels of certain constituents. Spectroscopy has been more successful and has been used in various forms for over 60 years to predict fuel properties. More recently, there has been renewed interest in spectroscopic analysis of fuels due to the development of advanced statistical methods, developed in the field of chemometrics. We have successfully employed chemometrics to screen fuels for their tendency to promote engine combustor failure, using capillary gas chromatography (GC) analysis.3 However, the predictive model derived from GC data was limited to similar fuel sources because of the fact that differences between fuels can be much greater than the subtle compositional differences of interest. To overcome these limitations, we extended our studies to multiway techniques for combined gas chromatography-mass spectrometry (GC-MS) data.4 Key to the success of this approach was the development of a novel analysis of variance (ANOVA) technique5 that vastly improved the predictive power of the chemometric models by restricting the analysis to only the relevant portions of the data. The resultant data was referred to as the feature selected data and proved to be very (3) Morris, R. E.; Hammond, M. H.; Shaffer, R. E.; Gardner, W. P. RosePehrsson, S. L. Energy Fuels 2004, 18, 485-89. (4) Johnson, K. J.; Rose-Pehrsson, S. L.; Morris, R. E. Energy Fuels 2004, 18, 844-50. (5) Johnson, K. J.; Synovec, R. E. Chemom. Intell. Lab. Syst. 2002, 60, 225-37.

This article not subject to U.S. Copyright. Published 2006 by the American Chemical Society Published on Web 02/18/2006

728 Energy & Fuels, Vol. 20, No. 2, 2006

useful for revealing minute compositional changes in a fuel, down to the determination of individual chemical compounds that were taking part in the fuel degradation process. This technique also was shown to be a very sensitive way to determine minute quantities of contaminants in a fuel sample, by comparing the GC-MS of the pure fuel with that of a contaminated sample. While chemometric modeling has been shown to be a viable method for correlating compositional information with fuel quality parameters, chromatography is not without its limitations as a real-time data-gathering technique. While chromatography can provide unique compositional information, it generally entails significant instrumental complexity and yields timedomain data that is susceptible to significant reproducibility issues. Moreover, the development of a rugged miniaturized chromatographic sensor represents a significant technological challenge. Spectroscopic methods are more desirable for analysis of fuel samples because of the relative simplicity of instrumentation, rapid analysis time, and high quality of the data from a chemometric perspective. Chemometric modeling provides the potential for rapid analysis, simultaneous prediction of multiple properties, and the ability to address large data sets automatically. A survey of current literature shows that a variety of fuel types, ranging from gasoline to jet and diesel, have been examined using both near-infrared6-9 and Fourier transform infrared10 instruments as well as FT-Raman11-13 instruments. A number of fuel properties have been predicted via chemometric regression of spectroscopic data, including octane/cetane number;14 flash point; freeze point; density; viscosity; sulfur content;15 oxygenates (such as methyl tert-butyl ether (MTBE) and ethanol); aromatic, olefin, and saturate content; distillation fractions; and vapor pressure. Of these, the correlation of octane number to near-infrared (NIR) spectra has been the most widespread, with numerous octane analyzers based on this method on the market today. The demonstrated feasibility of spectroscopic analysis to predict a number of desirable properties and continuing advances in chemometric analysis techniques for calibration of spectroscopic response data all suggest that this approach could be used to develop a sensing technology capable of predicting a range of useful fuel properties. However, it is important to note that such claims must be examined carefully, as it is generally quite easy to overstate the robustness and accuracy of a chemometric model, particularly when sample sets are limited. In this paper, we compare the predictive power of chemometric regression (6) Swarin, S. J.; Drumm, C. A. Prediction of Gasoline Properties with Near-Infrared Spectroscopy and Chemometrics; SAE Paper No. 912390; Society of Automotive Engineers: 400 Commonwealth Drive, Warrendale, Pennsylvania 15096-0001. (7) Fodor, G. E.; Kohl, K. B. Energy Fuels 1993, 7, 598-601. (8) Westbrook, S. R. Army Use of Near-Infrared Spectroscopy to Estimate Selected Properties of Compression Ignition Fuels; SAE Paper No. 930734; Society of Automotive Engineers: Warrendale, PA, 1993. (9) Macho, S.; Larrechi, M. S. Trends Anal. Chem. 2002, 21 (12), 799806. (10) Gomez-Carracedo, M. P.; Andrade, J. M.; Calvino, M.; Fernandez, E.; Prada, D.; Muniategui, S. Fuel 2003, 82 (10), 1211-18. (11) Flecher, P. E.; Welch, W. T.; Albin, S.; Cooper, J. B. Spectrochim. Acta, Part A 1997, 53A (2), 199-206. (12) Workman, J., Jr. J. Near Infrared Spectrosc. (review article) 1996, 4 (1-4), 69-74. (13) de Bakker, C. J.; Fredericks, P. M. Appl. Spectrosc. 1995, 49 (12), 1766-71. (14) Kelly, J. J.; Barlow, C. H.; Jinguji, T. M.; Callis, J. B. Anal. Chem. 1989, 61, 313-20. (15) Breitkreitz, M. C.; Raimundo, I. M., Jr.; Rohwedder, J. J. R.; Pasquini, C.; Dantas Filho, H. A.; Jose, G. E.; Araujo, M. C. U. Analyst (Cambridge, U. K.) 2003, 128 (9), 1204-07.

Johnson et al. Table 1. Summary of Fuel Property Data ASTM

property name

min

max

range

mean

D4052 D93 D3828 D5972 D5949 D2622 D1840 D1319 D6379 D1319 D1159 D1319 D3701 D4809 D445 D445 D445 D1218 D2624 D3242 D3241

density at 60 °F, g/mL flash point (P-M), °F flash point (mini), °F freeze point, °C pour point, °C total sulfur, ppm naphthalenes, vol % aromatics, vol % aromatics, vol % saturates, vol % olefins, vol % olefins, vol % hydrogen, weight % net heat content, btu/lb viscosity at 20 °C, mm2/s viscosity at -20 °C, mm2/s viscosity at -40 °C, mm2/s refractive index conductivity, pS/m acid number, (mg of KOH)/g thermal stability breakpoint, °F lubricity, mm WSD initial boiling point, °F 10% distillation, °F 20% distillation, °F 50% distillation, °F 90% distillation, °F final boiling point, °F

0.78 105 103 -72.0 -80 7 0.0 11.8 13.0 75.8 0.06 0.7 13.71 18331 1.3 2.7 4.8 1.44 0 0.000 265

0.82 154 144 -44.0 -60 2453 3.8 22.0 24.4 87.0 1.53 2.3 14.47 18589 3.0 6.2 14.6 1.46 395 0.020 370

0.04 49 41 28.0 20 2446 3.8 10.2 11.4 11.2 1.47 1.6 0.76 258 1.7 3.5 9.8 0.02 395 0.020 105

0.80 120 120 -52.3 -69 418 1.3 17.9 19.6 80.5 0.35 1.6 14.15 18506 1.8 4.2 8.6 1.45 93 0.006 287

0.54 294.1 329.1 335.2 346.4 372.4 386.8

0.71 362.8 388.6 400.0 439.0 488.3 521.7

0.17 68.7 59.5 64.8 92.6 115.9 134.9

0.62 318.2 347.7 358.8 391.3 454.7 481.0

D5001 D86 D86 D86 D86 D86 D86

models based on chromatographic data with those generated from spectroscopic data (NIR and Raman) to evaluate the potential of these analytical techniques in the development of an advanced fuel quality sensor system. Experimental Section Fuel Sample Set. A set of 45 jet fuels, that were sampled from around the world, were used in this initial survey. The set consisted of Jet A (11 samples), Jet A-1 (22 samples), JP-8 (9 samples), JP-5 (2 samples), and a petroleum (Stoddard) solvent (1 sample). The samples were supplied with measured values for 28 fuel-specification properties, and all fuel samples met the appropriate specifications. The range of property values reported for this fuel set are given in Table 1. NIR Spectroscopy. Near-infrared (NIR) spectra were obtained with a Cary model 5E spectrophotometer. Suprasil cells with path lengths of 10 and 1 mm were used. Initial spectra were obtained from 300 to 2300 nm, with a resolution of 1 nm. For chemometric analysis, the spectral region from 1000 to 2300 nm was used. Repeatability of the spectra was excellent, with the exception of some baseline variations observed when the 1 mm sample cell was used, presumably due to slight differences in positioning the cell in the beam. These baseline variations with the 1 mm cell data were corrected using the multiplicative scatter correction function in the PLS Toolbox for Matlab (Eigenvector Research, Inc.) prior to multivariate analysis. Preliminary comparisons between the 1 mm and 10 mm data indicated no significant advantage to the 1 mm cell, and so its use was discontinued. Data were collected with the Cary software provided with the instrument and exported in comma separated value (CSV) format. The resultant numerical representations of the spectra were imported into Matlab and combined in one array. Raman Spectroscopy. Raman spectra for a 30-fuel subset of the total sample set were provided by Real-Time Analyzers, Inc. (Middletown, CT), and were acquired with a portable scanning FTRaman spectrometer of their manufacture. Spectra were taken from 500 to 3500 cm-1 in increments of 1 cm-1. GC-MS Analysis. To avoid exceeding the linear range of the mass selective detector during the analysis, samples for GC-MS analysis were prepared by first diluting 2 µL of each sample with

Spectroscopy and Chromatography for Fuel Quality Assessment 2 mL of dichloromethane. An autosampler injected 1.0 µL aliquots of each of five replicate samples in random order to an Agilent model 5890 capillary gas chromatograph coupled to a HP 5971 mass selective detector. A split/splitless injector at 250 °C with a split flow ratio of 60:1 was used along with a 50 m × 0.2 mm Agilent HP-1 (dimethylpolysiloxane) capillary column. The oven temperature profile was 50 °C for 1 min, to 290 °C at 10 °C/min, holding for 7 min, giving a run time of 32 min. A solvent delay of 4 min was used, which reduced the data acquisition time to 28 min per run. Masses were scanned from m/z of 40 to 240. Prior to analysis, chromatograms were examined to ensure proper instrument function and to make sure that the sample did not saturate the mass selective detector. The GC-MS data that were acquired from these runs were converted from the native HP Chemstation data files to raw text format utilizing an MS Windows program written in-house. The chromatograms were then aligned to one another to minimize retention time variations from sample to sample and imported into MATLAB, Version 6.5 (Mathworks, Inc., Natick, MA), for chemometric analyses. Chromatographic alignment was accomplished via a stand-alone MS Windows program implementing the correlation-optimized warping algorithm described by Vest Nielsen et al.16 Chemometric Regression. Since this particular training set of fuels was obtained with complete specification properties, we did not limit our modeling to only those properties critical for fuel acceptance testing but instead applied the chemometric analysis to all the reported measurements. Some measured fuel properties are more highly correlated to the presence of specific chemical functionalities, while others are more related to overall physical properties of a broad range of chemical constituents. Moreover, the chemical species that may be closely related to a particular fuel property may also be different in different grades of fuels, such as the prediction of flash point in jet fuel versus diesel fuel. A model based on spectroscopy, which is more closely correlated to compound structures, may not perform as well as gas chromatographic measurements, which are more closely correlated with constituent polarity and volatility. Therefore, comparisons of the mathematical correlations of the spectroscopic and chromatographic measurement data were performed over a wide range of properties to provide valuable insights into how much overlap these measurements might have with different fuel types, and how chromatography and spectroscopy might be combined through a data-fusion approach to predict certain fuel properties with greater precision, and over a greater range of fuel types, than is possible with any single measurement. Partial least squares (PLS) and principal components regression (PCR) were performed utilizing the NIR spectra, Raman spectra, GC total ion current chromatograms, and unfolded GC-MS chromatograms against the 28 measured fuel properties. Both are inverse least squares regression models that use factor analysis to reduce the spectral or chromatographic data prior to regression.17 PCR projects the input data onto a lower dimensional subspace calculated to most efficiently represent the sample-to-sample variation contained within the calibration data. PLS projects the input data onto a lower dimensional subspace calculated to best represent the covariance between the calibration data and corresponding reference values. These two regression techniques were chosen for consideration as they are well-established, wellcharacterized, and widely implemented in various software packages. Additionally, multiway partial least squares regression (NPLS) was used to regress GC-MS datasets against the provided fuel properties for each of the 45 fuels in the sample set. NIR and Raman spectra, once imported into Matlab, were assembled into matrixes in which each row was a spectrum of a different fuel sample. The acquired dataset for GC-MS analysis consisted of a series of two-dimensional GC-MS chromatograms, (16) Vest Nielsen, N.-P.; Cartensen, J. M.; Smedsgaard, J. J. Chromatogr., A 1998, 805, 17-35. (17) Kramer, R. Chemometric Techniques for QuantitatiVe Analysis; Marcel Dekker: New York, 1998.

Energy & Fuels, Vol. 20, No. 2, 2006 729 one for each sample analyzed, stacked on each other to form a three-dimensional array, or cube of data. Total ion current (TIC) chromatograms were constructed by summing each GC-MS dataset along the m/z axis. “Unfolded” GC-MS chromatograms were created by reshaping the data matrix for each GC-MS chromatogram into a single row vector. Prior to unfolding, each GC-MS chromatogram was boxcar averaged with a window of five points. PLS and PCR algorithms were implemented utilizing the PLS Toolbox for Matlab 3.0 (Eigenvector Research, Inc.). Calibration models were evaluated utilizing “leave one out” cross-validation in which the property value of each sample is predicted utilizing a calibration model built from all of the other data. Regression models were built utilizing from 1 to 10 latent variables (or components), and the model with the lowest root-mean-square error of crossvalidation (RMSECV) was chosen for inclusion into the results. A limit of 10 latent variables was imposed to guard against overfitting the data with excessively complex models and provided us with a maximum ratio of roughly five fuel samples per latent variable regression model. For purposes of comparison between models of different properties, RMSECV values were normalized by the mean observed for the fuel property they were predicting. Six preprocessing strategies were examined for each dataset. For spectroscopic data, these strategies were as follows: (1) no preprocessing, (2) second derivative, (3) autoscaling, (4) meancentering, (5) second derivative followed by autoscaling, and (6) second derivative followed by mean-centering. Second derivative transformation was implemented through the Savitsky-Golay filter algorithm in the PLS Toolbox. For chromatographic data, the preprocessing strategies tested were as follows: (1) no preprocessing, (2) normalization, (3) autoscaling, (4) mean-centering, (5) normalization followed by autoscaling, and (6) normalization followed by mean-centering. Normalization was implemented by dividing each individual chromatogram by its Euclidean norm, and it was implemented to minimize any injection volume variation from run to run. Thus, for each preprocessing scheme and each regression technique, an optimized regression model using up to 10 latent variables was calculated and the RSECV of that model was recorded. Additionally, N-PLS was used to build regression models to predict fuel properties with entire GC-MS chromatograms. NPLS18,19,20 is a multiway generalization of the commonly used partial least squares regression algorithm designed to be used with secondor higher-order data sets. In accordance with the philosophy behind standard, first-order PLS regression, the algorithm seeks to decompose the data into a factor model that best describes the covariance between the independent (predictive) and dependent (predicted) variables. This model is then applied to subsequently measured data to make quantitative predictions. The chief difference between PLS and N-PLS is that, in N-PLS, this factor model is trilinear in form. N-PLS has previously been shown21,22 to be effective in the quantification of multicomponent compositional properties of both industrial naphtha and fuel samples by GC × GC. GC-MS data was boxcar averaged along the retention-time axis with a window of 10 points, and the mass spectral axis of the data was truncated to the first 100 masses acquired (m/z of 40-139) in order to speed calculations.

Results and Discussion The fuel sample densities, determined in accordance with ASTM D4052,23 were predicted from PLS regression of the (18) Bro, R. J. Chemometrics 1996, 10 (1) 47-61. (19) de Jong, J. Chemometrics 1998, 12, 77-81. (20) Bro, R.; Smilde, A. K.; de Jong, S. Chemom. Intell. Lab. Syst. 2001, 58 (1), 3-13. (21) Johnson, K. J.; Prazen, B. J.; Young, D. C.; Synovec, R. E. J. Sep. Sci. 2004, 27, 410-16. (22) Prazen, B. J.; Johnson, K. J.; Synovec, R. E. Anal. Chem. 2001, 73 (23), 5677-82. (23) ASTM. Standard Test Method for Density and RelatiVe Density of Liquids by Digital Density Meter; ASTM: Philadelphia, PA, 2005; Vol. 05.02, ASTM D4052-96.

730 Energy & Fuels, Vol. 20, No. 2, 2006

Figure 1. Density of the jet-fuel sample set predicted by PLS regression of Raman spectra vs density measured by ASTM D4052. The PLS regression model was constructed using 10 latent variables.

Figure 2. Jet-fuel sample aromatic content predicted by PLS regression of NIR spectra versus aromatic content measured by ASTM D6379 (HPLC method). The PLS regression model was constructed using seven latent variables.

Raman spectra and plotted against the measured values in Figure 1. Reasonably good agreement was obtained, as shown when the data were mean-centered and regressed with 10 latent variables. The predicted vs aromatic contents of the fuel samples from PLS regression of NIR spectra are shown in Figures 2 and 3, for measurements obtained by the high-performance liquid chromatography (HPLC) method (ASTM D6379)24 and the fluorescent indicator absorption (FIA) method (ASTM D1319),25 respectively. The PLS regressions were performed on the mean-centered data, using seven latent variables. As shown, good agreement was obtained between the predicted and (24) ASTM. Standard Test Method for Determination of Aromatic Hdrocarbon Types in Aviation Fuels and Petroleum DistillatessHigh Performance Liquid Chromatography Method with Refractive Index Detection. In Annual Book of ASTM Standards; ASTM: Philadelphia, PA, 2005; Vol. 05.03, ASTM D6379-04.

Johnson et al.

Figure 3. Jet-fuel sample aromatic content predicted by PLS regression of NIR spectra versus aromatic content measured by ASTM D1319 (FIA method). The PLS regression model was constructed using seven latent variables.

Figure 4. Aromatic content in the jet-fuel sample set, measured by ASTM D6379 vs ASTM D1319.

measured values for aromatic content by HPLC and by the FIA method. However, when the HPLC measurements are plotted against the corresponding FIA measurements in Figure 4, it is evident that the HPLC values were systematically higher. This illustrates the ability of PLS correlation modeling to derive reasonably accurate predictions from the same set of spectral data for two different measurements of a single property, even if the results of those two techniques are not in complete agreement with each other. Clearly, both the HPLC and the FIA methods are self-consistent, but it is important to specify which ASTM method is being used, since the models derived in this manner are only representative of the data used in the training set. (25) ASTM. Standard Test Method for Hydrocarbon Types in Liquid Petroleum Products by Fluorescent Indicator Adsorption. In Annual Book of ASTM Standards; ASTM: Philadelphia, PA, 2005; Vol. 05.01, ASTM D1319-03.

Spectroscopy and Chromatography for Fuel Quality Assessment Table 2. Comparison of Regression Model Root Mean Error of Cross-Validation (RMSECV) Calculated from Mean Observed Value for Each Property over All Samples, with the Published ASTM Method Reproducibility and Repeatabilitya ASTM method errors

PLS minimum RMSECV

ASTM

property

reprod.

repeat.

NIR

Raman

GC

D4052 D93 D3828 D5972 D5949 D2622 D1840 D1319 D6379 D1319 D1159 D1319 D3701 D4809 D445 D445 D445 D1218 D2624 D3242 D3241 D5001 D86 D86 D86 D86 D86

density flash pt. (PM) flash pt. (mini) freeze pt. pour pt. sulfur naphthalenes arom., FIA arom., HPLC saturates olefins olefins, FIA hydrogen heat content visc. at 20 °C visc. at -20 °C visc. at -40 °C ref. index conductivity TAN thermal stab. lubricity IBP 10% 20% 50% 90%

0.0005 8.5 6.2 1.3 6.8 30 0.069 2.70 1.897 4.40 0.40 2.10 0.11 77.4

0.0001 3.5 1.6 0.7 3.4 21 0.051 1.30 0.938 1.40 0.20 0.60 0.09 22.9

0.0026 8.8 7.8 3.6 2.7 283 0.47 0.66 0.64 0.75 0.21 0.34 0.11 27.0 0.25 0.47 1.16 0.0012 77 0.0034 46 0.037 11.0 9.2 8.5 8.9 14.0

0.0018 8.9 7.5 4.2 2.2 233 0.41 0.92 1.09 1.20 0.34 0.48 0.20 48.0 0.30 0.47 1.00 0.0019 92 0.0030 58 0.038 11.4 7.8 6.9 15.2 13.3

0.0031 7.1 7.4 3.0 2.1 419 0.55 1.32 1.48 1.37 0.22 0.31 0.12 40.4 0.24 0.39 0.86 0.0013 87 0.0034 47 0.033 9.6 5.7 5.1 10.5 10.9

a

0.0005 17 0.0030

0.0002 5 0.0010

0.070 15.3 10.8 13.9 24.2 11.6

0.046 6.3 5.1 5.3 9.0 5.4

Uncertainty values were not available for ASTM D445 and D3241.

The error of prediction of a chemometric regression model is a function of the uncertainty in the original ASTM reference values as well as of the error associated with the analytical technique used to acquire spectroscopic or chromatographic data utilized for model building. One would expect that the error in prediction of a good regression model would be comparable in magnitude to the uncertainty of the reference values used to construct it. Accordingly, the chemometric regression models were first evaluated to see how their rates of error compared to the reproducibility and repeatability values of the respective ASTM methods that were used to acquire the reference measurements. Reproducibility and repeatability values were calculated using the formula published in the method specification and the mean property value observed across the entire set of fuel samples. These values are charted in Table 2, along with the root mean errors of cross-validation (RMSECV) for the chemometric regression models constructed. An examination of Table 2 shows that the PLS model RMSECV values do, in fact, tend to be roughly similar in magnitude to the ASTM method values, with a few notable exceptions. The corresponding PCR model RMSECV values were similar. Predictions made for sulfur content and conductivity exhibited much greater error than would be expected from the ASTM method uncertainty alone. This is unsurprising, as none of the analytical techniques examined detected chemical species that directly affect these fuel properties. Next, an examination of which properties are most amenable to prediction by chemometric regression of compositional data was performed. Summarized in Tables 3 and 4 are the lowest observed RMSECV values for each property, across all analytical techniques and preprocessing schemes. Table 3 contains RMSECV values for PLS model predictions, and Table 4 contains RMSECV values for PCR model predictions, normal-

Energy & Fuels, Vol. 20, No. 2, 2006 731 Table 3. Summary of PLS Calibration Resultsa property

RMSECV

property

RMSECV

refractive index sp. heat cap. at 0 °C density at 60 °F hydrogen saturates dist 10% dist 50% dist FBP pour point dist IBP aromatics, D6379 aromatics, D1319 lubricity scar

0.00085 0.00146 0.00219 0.00765 0.00929 0.0164 0.0226 0.0291 0.0297 0.0302 0.0329 0.0366 0.0540

freezing point flash point, D93 flash point, D3828 viscosity at -20 °C viscosity at -40 °C viscosity at 20 °C thermal stability olefins, D1319 naphthalenes total sulfur acid number olefins, D1159/D27 conductivity

0.0570 0.0597 0.0615 0.0920 0.0995 0.136 0.165 0.194 0.319 0.537 0.545 0.605 0.840

a The lowest observed RMSECV for each property (normalized to mean property value) is displayed and these results are sorted from lowest to highest.

Table 4. Summary of PCR Calibration Resultsa property

RMSECV

property

RMSECV

refractive index sp. heat cap. at 0 °C density at 60 °F hydrogen saturates dist 10% dist 50% dist FBP dist IBP pour point aromatics, D6379 aromatics, D1319 freezing point

0.000941 0.00172 0.00327 0.00735 0.0103 0.0174 0.0244 0.0289 0.0304 0.0321 0.0357 0.0429 0.0559

flash point, D93 flash point, D3828 lubricity scar viscosity at -20 °C viscosity at -40 °C viscosity at 20 °C thermal stability olefins, D1319 naphthalenes acid number olefins, D1159/D27 total sulfur conductivity

0.0575 0.0601 0.0624 0.0916 0.0996 0.134 0.163 0.192 0.329 0.545 0.576 0.653 0.909

a The lowest observed RMSECV for each property (normalized to mean property value) is displayed and these results are sorted from lowest to highest.

ized to mean property values for the sake of comparison across different properties. The worst-performing models are those for olefins (ASTM D1319), naphthalenes, acid number, total sulfur, olefins (ASTM D1159/D27), and conductivity, all giving RMSECV values that were >20% of the mean property values being predicted, given in Table 1. A number of models, however, gave RMSECV values that were