Energy & Fuels 2008, 22, 2079–2083
2079
Multivariate Calibration by Variable Selection for Blends of Raw Soybean Oil/Biodiesel from Different Sources Using Fourier Transform Infrared Spectroscopy (FTIR) Spectra Data Itânia P. Soares,† Thais F. Rezende,† Renzo C. Silva,‡ Eustáquio Vinícius R. Castro,‡ and Isabel C. P. Fortes*,† Laboratório de Ensaio de CombustíVies, Departamento de Química, Instituto de Ciências Exatas, UniVersidade Federal de Minas Gerais, AV. Antônio Carlos, 6627, Campus Pampulha, CEP 31270-901, Belo Horizonte, Minas Gerais, Brazil and Departamento de Química, Instituto de Ciências Exatas, UniVersidade Federal do Espírito Santo, AV. Fernando Ferrari, CEP 29060-900, Vitória, Espírito Santo, Brazil ReceiVed September 5, 2007. ReVised Manuscript ReceiVed January 3, 2008
The partial least-squares (PLS) calibration method as a chemometric tool was used to develop a calibration model using Fourier transform infrared spectroscopy (FTIR) spectra data of biodiesel samples from different sources, such as cotton, castor, and palm, which were mixed with raw soybean oil to simulate an adulteration system. The PLS calibration method was applied with and without variable selection to quantify the amount of raw soybean oil present in these samples. Classic methods of variable selection, such as forward and stepwise, were applied to all origins together and each one separately. Variable selection improves not only the stability of the model to the colinearity in multivariate spectra but also the interpretability of the relationship between the model and the sample composition, which means that it becomes easier to determine and quantify the amount of raw soybean oil mixed in each biodiesel source.
1. Introduction Biodiesel is usually produced by the transesterification of a vegetable oil or animal fat with a short-chain alcohol in the presence of a catalyst.1,2 Indeed, biodiesel is an interesting alternative fuel because it contributes to a reduction in the emission of CO2, which is responsible for the greenhouse effect. SOx and particulate material are also reduced in the combustion process, when compared to conventional fossil fuel.3 Nowadays, Brazil has the opportunity and challenge to become a worldwide reference as a renewable fuel source producer in the replacement of fossil fuels because of its great area, geographic location, and sunlight. Biodiesel is mainly produced from rapeseed oil in Europe and other countries in the world. In Brazil, there are many oleaginous plants, which could be used in biodiesel production. The climatic diversity favors some oleaginous cultures more in one region than in others. For instance, palm is more common in northern Brazil, while castor is easier to find in the northeast area. Soybean culture develops better in the south and southeast areas. Thus, Brazil has a great potential as world producer and exporter of these raw materials for the respective biofuels. A common use of biodiesel is in blends with conventional mineral diesel fuel. In the European Union (EU), biodiesel is * To whom correspondence should be addressed. Telephone/Fax: 05531-3499-6650. E-mail:
[email protected]. † Universidade Federal de Minas Gerais. ‡ Universidade Federal do Espírito Santo. (1) Drown, D. C.; Harper, K.; Frame, E. J. Am. Oil Chem. Soc. 2001, 78, 574–584. (2) Chang, D. Y. Z.; Gerpen, J. H. V. J. Am. Oil Chem. Soc. 1996, 73, 1549–1555. (3) Leung, D. Y. C.; Koo, B. C. P.; Guo, Y. Bioresour. Technol. 2006, 97, 250–256.
used in a volume fraction of 5% in petrodiesel blends, while in the U.S., energy legislation has mandated the use of at least a volume fraction of 2%.4 In Brazil, the use of biodiesel has been authorized in 2005 in a volume fraction of 2%. From January 2008, diesel in Brazil must have a volume fraction of 2% biodiesel. One of the challenges of this program is to eliminate the fuel adulteration. Adulteration of fuel, a criminal practice, has been observed in Brazil since the end of monopoly in fuel distribution and introduction market reforms.5 In the last 5 years, the ANP (Brazilian National Agency for Petroleum, Natural Gas, and Biofuels) has developed efforts to avoid fuel adulterations. In the year 2000, data from ANP have shown that 12.5% of gasoline and 7.3% of diesel samples collected in different localities of the country were not in conformity to the ANP rules for fuel quality.6 Nowadays, international standard methods, such as EN14103 and ASTM 6584, among others are used to do the quality control of biodiesel. Both methods use gas chromatography as a technique. Results from these analyses can give information about whether or not the sample is adulterated with raw vegetable oils. However, these methodologies have some disadvantages, such as sample preparation, which is timeconsuming, use of more than one internal standard, a longer analysis time, and an expensive technique employed. The use of raw vegetable oils directly into engines can cause carbon deposition, injector blocking, and incomplete combustion because of their high viscosities, low volatilities, and polyun(4) Pimentel, F. P.; Teixeira, L. S. G.; Ribeiro, G. M. S.; Cruz, R. S.; Stragevitch, L.; Filho, J. G. A. P. Microchem. J. 2006, 82, 201–206. (5) Oliveira, F. C. C.; Brandão, C. R. R.; Ramalho, H. F.; Costa, L. A. F.; Suarez, P. A. Z.; Rubim, J. C. A. Anal. Chim. Acta 2007, 587, 194–199. (6) Knothe, G. J. Am. Oil Chem. Soc. 2001, 78, 1025–1028.
10.1021/ef700531n CCC: $40.75 2008 American Chemical Society Published on Web 02/29/2008
2080 Energy & Fuels, Vol. 22, No. 3, 2008
saturated character, as well as its gum formation characteristic because of oxidation and polymerization.3 Considering the possibility of biodiesel adulteration with vegetable oil, it is important to develop a methodology that quickly, easily, and economically is able to certify if the biodiesel is free of a range of raw vegetable oil. One of the most used analytical techniques, which have been used to monitor the quality of biodiesel and petrodiesel blends, is infrared (IR) spectroscopy. This technique has many advantages. It is nondestructive, very reliable, and allows for direct and fast determination of several properties, without sample pretreatment.4 IR spectroscopy comprises many different types of equipments, which operate in different regions and have different kind of detectors and accessories. Fourier transform infrared spectroscopy (FTIR) has become one of the major analytical techniques used because of its quality screening, quickness, and cost of analysis. It can be thought as a molecular “fingerprinting” method. Mid-infrared (MIR) spectroscopy, in particular, rapidly provides information on a very large number of analytes, and the absorptions bands are sensitive to the physical and chemical states of individual constituents. FTIR can be coupled with accessories, such as attenuated total reflection (ATR), allowing for the analysis of a wide range of solid or liquid components. This technique has been shown to be useful for a range of identification and quantification problems in several sectors, such as food chemistry,7 biology,8 environment,9 and fuel.10 Recently, multivariate analysis was applied to near-infrared (NIR) spectroscopy and MIR by Pereira et al.10 to determined gasoline adulteration. Multivariate analysis is important because the IR spectra of vegetable oils and their respective esters are very similar, causing overlapped signals.11 Che Man and Setiowat12 used MIR and partial least-squares (PLS) calibration to determine fatty acid in palmitolein. Knothe6 has used IR spectroscopy and PLS to monitor the completion of the transesterification reaction of biodiesel. Oliveira13 used MIR and NIR spectroscopy to design calibration models for the determination of the methyl ester content in biodiesel blends (methyl ester plus diesel). Pimentel et al.4 developed multivariable calibration models based on MIR and NIR spectroscopy to determine the content of biodiesel in diesel fuel blends, considering the presence of raw vegetable oil. After a F test was applied in PLS calibration models, they showed the same efficiency for both spectroscopy regions studied. PLS regression is a popular multivariate calibration method. It has been widely applied to the multicomponent spectral analysis, especially in vibrational spectroscopy, such as IR, NIR, and Raman spectroscopy.14 In PLS regression, the objective is to assess the degree of relationship between a set of x-predictor variables and a set of y-outcome variables.4 In the present work, PLS was employed to determine the concentration of soybean oil in biodiesel using a leave-one-out cross-validation method, (7) Sedman, J.; Voort, F. R.; Ismail, A. A. J. Am. Oil Chem. Soc. 2000, 77, 399–403. (8) Nadtochenko, V. A.; Rincon, A. G.; Stanca, S. E.; Kiwi, J. J. Photochem. Photobiol., A 2005, 169, 131–137. (9) Acha, V.; Meurens, M.; Naveau, H.; Agathos, S. N. Biotechnol. Bioenerg. 2000, 68, 473–487. (10) Pereira, R. C. C.; Skrobot, V. L.; Castro, E. V. R.; Fortes, I. C. P.; Pasa, V. M. D. Energy Fuels 2006, 20, 1097–1102. (11) Zagonel, G. F.; Peralta-Zamora, P.; Ramos, L. P. Talanta 2004, 63, 1021–1025. (12) Che Man, Y. B.; Setiowaty, G. Food Chem. 1999, 66, 109–114. (13) Oliveira, F. C. C.; Souza, A. T. P. C.; Dias, J. A.; Dias, S. C. L.; Rubim, J. C. Quim. NoVa 2004, 27, 218–225. (14) Hasegawa, T. In Handbook of Vibrational Spectroscopy; Chalmers, J., Griffiths, P. R.; Wiley: Chichester, U.K., 2001, p 2293.
Soares et al.
in which one sample is picked out for calculating the predictive residual error sum of squares (PRESS).15 PRESS and root-meansquare error of calibration (RMSEC) are commonly used criteria for latent variables (LV) number selection. The LV number is determined to be the number where the RMSEC begins to decrease insignificantly with the increase of the LV number.16 The root-mean-square error of calibration for cross-validation (RMSECV) was calculated from each set of 40 samples (calibration set) from three different biodiesel origins, totaling 120 samples. It was also calculated from a set of 120 samples altogether. To evaluate the prediction ability of the multivariate calibration methods for external validation, a set of 15 samples of each origin, totaling 45 samples, was employed. These samples were used to calculate the root-mean-square error of prediction (RMSEP). Wavenumber or wavelength selection to establish a calibration model giving the minimum errors in prediction is decided by choosing a subset of spectral channels with the established calibration model and gives the minimum errors in prediction.15 The benefit gained from wavenumber or wavelength selection is not only the stability of the model to the colinearity in multivariate spectra but also the interpretability of the relationship between the model and the sample compositions.17 Some elaborate wavelength selection methods, such as genetic algorithms,18 moving window partial least-squares regression,17 and simulated annealing,19 have been developed. However, these methods still tend to be slow and cumbersome compared to the simpler and more intuitive methods as forward and stepwise methods.20 The forward selection method adds variables to the model one at a time. The first variable included in the model is the one, which has the highest correlation with the independent variable y. The variable that enters the model as the second variable is one, which has the highest correlation with y, after y has been adjusted for the first variable. This process finishes when the last variable entering the model has an insignificant regression coefficient or all of the variables are included in the model.21 In a stepwise procedure, a variable that entered the model in the earlier stages of selection may be excluded at later stages. That is, the stepwise method is essentially a forward selection procedure, but at each stage, the possibility of excluding a variable, as in backward elimination, is considered. The number of variables retained in the model is based on the levels of significance assumed for inclusion and exclusion of variables from the model.21 Several stepwise selection schemes have been proposed to select wavelengths from small data sets faster and more methodically than the search procedures. The advantages of these methods are speed and simplicity.20,22 In this work, a set of three different sources of biodiesel were mixed with raw soybean oil in various concentrations and analyzed by attenuated total reflection (ATR)-FTIR. To determine and quantify the biodiesel adulteration with raw (15) Du, Y. P.; Liang, Y. Z.; Jiang, J. H.; Berry, R. J.; Ozaki, Y. Anal. Chim. Acta 2004, 501, 183–191. (16) Jouan-Rimbaud, D.; Walczak, B.; Massart, D. L.; Last, I. R.; Prebble, K. A. Anal. Chim. Acta 1995, 304, 285–295. (17) Jiang, J.; Berry, R. J.; Siesler, H. W.; Ozaki, Y. Anal. Chem. 2002, 74, 3555–3565. (18) Bangalore, A. S.; Schaffer, R. E.; Small, G. W.; Arnold, M. A. Anal. Chem. 1996, 68, 4200–4212. (19) Horchner, U.; Kalivas, J. H. Anal. Chim. Acta 1995, 311, 1–13. (20) Spiegelman, C. H.; McShane, M. J.; Goetz, M. J.; Motamedi, M.; Yue, Q. L.; Coté, G. L. Anal. Chem. 1998, 70, 35–44. (21) Xu, L.; Zhang, W. Anal. Chim. Acta 2001, 446, 477–483. (22) Center, V.; Massart, D. L.; de Noord, O. E.; de Jong, S.; Vandeginste, B. M.; Sterna, C. Anal. Chem. 1996, 68, 3851–3858.
Blends of Raw Soybean Oil/Biodiesel
Energy & Fuels, Vol. 22, No. 3, 2008 2081
Figure 1. IR spectra of (a) cotton biodiesel, (b) castor biodiesel, (c) palm biodiesel, and (d) raw soybean oil.
soybean oil, multivariate PLS calibration models based on MIR spectroscopy were developed using classic variable selection methods: forward and stepwise. 2. Experimental Section 2.1. Samples. Biodiesel samples from different parts of Brazil were donated by enterprise and/or universities, which already produced biodiesel for the market or at bench scale. Castor oil ester was provided by Universidade Estadual de Santa Cruz (Bahia State); palm oil ester was provided by Agropalma enterprise (Pará State); and cotton oil ester was obtained from Soyminas (Minas Gerais State). The samples were prepared by mixing biodiesel from different sources with raw soybean oil in different percentages varying from 1 to 40% (v/v), with a 1% (v/v) increment, totaling 120 samples. These samples were used as the calibration set. The external validation set was comprised by other 15 samples, which were prepared in the same way as the calibration set, but the percentage of raw soybean oil added was randomly chosen, totaling 45 samples. 2.2. ATR-FTIR Analysis. ATR-FTIR spectra were obtained using an ABB Bomen IR spectrometer model MB 102 equipped with an ATR sampling accessory with a deuterated triglycerine sulfate detector. All spectra were colleted at 16 ( 1 °C using an average of 16 scans, with spectral resolution of 2 cm-1. The background spectra were obtained using a clean ATR accessory. After each spectrum was recorded, the cell was cleaned by successive treatments with heptane. The average spectrum from triplicate analysis ranging from 4000 to 665 cm-1 was treated chemometrically using MINITAB software, version 14. To develop a good PLS calibration model for FTIR spectra data, it was necessary to eliminate spectral regions that do not give enough information. These regions are those for which changes in the concentration of raw soybean oil in the mixture did not cause substantial changes in absorbance values. Beyond that, the noise associate to each spectral channel is also eliminated. The spectra region chosen was between 2760 and 1800 cm-1. Forward and stepwise methods were used for PLS calibration models using FTIR data.
3. Results 3.1. ATR-FTIR Analysis. MIR spectra of castor, palm, and cotton biodiesel and soybean oil are showed in Figure 1. All biodiesel used absorbed well in regions 3700-3000, 1900-1500,
and 1800-800 cm-1. Peaks around 1200 cm-1 may be assigned to the antisymmetric axial stretching vibrations of CC(dO)-O bonds of the ester, while peaks around 1183 cm-1 may be assigned to asymmetric axial stretching vibrations of O-C-C bonds.23 Region 1300-900 cm-1 is known as the “fingerprint” region of complex spectra that include many coupled vibrational bands. These overlapped peaks indicate that univariate calibration models may cause significant prediction error to quantify biodiesel samples with different concentrations when raw oil is present. Those models are also inadequate for identifying the presence of raw oil in a spoiled blend either because of incomplete conversion during the esterification reaction or the illegal addition of raw oil. Zagonel et al.11 also observed overlapped peaks of soybean oil and its corresponding ester in MIR spectra. Those authors used the region 1800-1700 cm-1 corresponding to axial stretching vibrations of carbonyl groups to distinguish soybean from its ester. Another important feature observed in Figure 1b is that the spectrum of biodiesel from castor oil shows more differences in comparison especially to region around 3330 cm-1. This spectra region can be assigned to axial stretching vibrations of the O-H bond of the hydroxyl23 from ricinoleic acid.24 3.2. Chemometric Analysis. Experiments were carried out to evaluate and develop a PLS calibration model using FTIR data aiming to determine the presence and quantify the amount of raw soybean oil in different samples. Forward and stepwise selection methods were tested for each mixture of three biodiesel sources and soybean oil. Both variable selection methods lead to quite satisfactory results, except for the palm biodiesel mixture. The stepwise method did not work for this mixture, meaning that the forward method was better to obtain a more appropriated calibration model for all mixtures studied. Figures 2-5 illustrate the results obtained when variable selection methodology was applied or not applied for FTIR data set of each of the three different biodiesel sources (cotton, palm, and castor) in the mixture and for the whole set altogether. It can be noticed that the PLS calibration model with variable (23) Silverstein, R. M.; Webster, F. X. Spectrometric Identification of Organic Compounds, 6th ed.; Wiley: New York, 1998. (24) Conceição, M. M.; Candeia, R. A.; Silva, F. C.; Bezerra, A. F.; Fernandes, V. J., Jr.; Souza, A. G. Renewable Sustainable Energy ReV. 2007, 11, 964–975.
2082 Energy & Fuels, Vol. 22, No. 3, 2008
Soares et al.
Figure 2. PLS calibration of cotton biodiesel: (a) without variable selection, (b) with stepwise selection, and (c) with forward selection.
Figure 3. PLS calibration of castor biodiesel: (a) without variable selection, (b) with stepwise selection, and (c) with forward selection.
Figure 4. PLS calibration of palm biodiesel: (a) without variable selection and (b) with forward selection.
Figure 5. PLS calibration of all origins: (a) without variable selection, (b) with stepwise selection, and (c) with forward selection.
selection was much more efficient than those without it for each biodiesel source mixture studied. Figure 2 shows that for cotton biodiesel there is a better match between real and predicted values when variable selection methods were used. For cotton biodiesel data, the forward method lead to a lower data
dispersion throughout the analytical curve, showing to be more appropriated. According to Figure 3, for castor biodiesel using the forward or stepwise method, one lead to a good match between real and predict values, presenting similar data dispersion. It means that both methods could be used to build a PLS
Blends of Raw Soybean Oil/Biodiesel
Energy & Fuels, Vol. 22, No. 3, 2008 2083
Table 1. PLS Calibration Model Results for Biodiesel Samples Mixed with Raw Soybean Oil variable selection method none
stepwise
forward
biodiesel cotton castor palm complete cotton castor palm complete cotton castor palm complete
RMSEC (%, v/v)
R
ME (%, v/v)
MCV
LV
RMSEP (%, v/v)
SEN
1.37 2.15 4.04 1.30 1.21 0.80
0.994 0.993 0.964 0.988 0.989 0.995
5.67 0.25 2.65 2.99 1.42 0.08
744 744 744 744 29 23
6 6 4 10 6 7
/ / / / 1.32 0.96
6.0 × 10-4 4.3 × 10-4 9.1 × 10-4 7.8 × 10-4 1.8 × 10-4 9.8 × 10-5
1.05 1.11 0.77 1.16 1.32
0.993 0.990 0.995 0.989 0.989
0.45 0.43 0.47 2.20 1.44
20 25 18 26 13
10 7 5 5 10
1.39 1.02 0.65 1.40 2.09
1.6 × 10-4 1.8 × 10-4 1.4 × 10-4 3.7 × 10-4 1.1 × 10-4
model calibration for this biodiesel. Figure 4 shows PLS calibration models for palm biodiesel using or not using selection methods. The calibration model for the forward method presented a very good response, while the stepwise method was not able to select any variable. When the selection method was not used, the calibration model presented some dispersion throughout the analytical curve. Figure 5 illustrates PLS calibration models of the entire calibration set (120 samples), which were developed using both selection methods and without any of them. The results demonstrate that there is a slight dispersion throughout the analytical curve for the stepwise method. It means that there is a better match between real and predicted values for this selection method in relation to the other approaches. Table 1 confirms this evaluation, comparing the results of the correlation coefficient (R), mean error of the calibration model (ME), RMSEC, RMSEP, and LV for PLS calibration model with and without variable selection and variable number of the calibration model (MV). RMSEC was calculated allowing RMSEP for each mixture of different biodiesel source and raw soybean oil using MINITAB software, version 14. There was a limitation in the software algorithm in relation to the variable numbers to be used; it is not possible to execute external validation and consequently to calculate RMSEP values for PLS calibration models built using all variables. The maximum variable number accepted by the software is 1000, and for cross-validation, using all variables will give a total number of 1488 (744 calibration variables plus 744 predicted variables). Thus, it was not possible to make up an external calibration for the PLS model using all variables. Table 1 demonstrates that, in general, when variable selection methodology was applied, there is a trend in the error reduction for all mixtures with different biodiesel sources. Therefore, there is not a significant change in values of R. According to Table 1, in particular, for the cotton biodiesel mixture, although the R value is higher when variable selection was not applied, the mean error is almost 4 times greater. This means that there is a large dispersion around the data, which can be confirmed by Figure 2. For the palm biodiesel mixture, there is a reduction in all parameters analyzed when variable selection was used. The number of variables used to develop the calibration model dropped significantly for all samples analyzed, which means that one can use fewer variables to develop a better calibration model. The external calibration values (RMSEP) obtained were very close to RMSECV values, which confirm the model efficiency. Table 1 also shows the sensitivity (SEN) values of each set of data using or not using variable selection methods. The magnitude order is 1 × 10-4 for all PLS calibration models
Table 2. Real versus Predicted Values for the Predicted Set Samplesa real/predicted
A
B
C
D
E
F
G
2.5 4.5 6.5 7.5 11.5 13.5 17.5 19.5 22.5 26.5 27.5 31.5 33.5 34.5 39.5
3.31 3.66 7.34 6.22 10.01 12.03 18.91 18.18 23.04 27.83 28.29 33.13 35.52 32.37 40.03
1.66 3.96 5.83 8.08 10.75 12.19 16.38 20.23 23.48 27.89 28.39 33.01 33.98 33.52 38.61
4.13 3.37 5.91 9.42 10.74 12.18 15.87 20.03 23.46 27.85 29.76 33.12 32.02 33.41 38.11
2.81 4.87 5.38 6.29 10.72 14.66 18.62 20.85 23.17 25.02 28.84 32.95 33.28 33.41 39.32
2.21 4.89 5.43 7.18 10.93 14.01 18.29 20.16 23.02 26.15 28.37 31.04 33.1 35.69 38.89
1.72 3.56 8.27 8.91 12.74 13.14 16.29 18.08 20.63 28.36 28.87 32.85 31.73 36.18 40.83
4.13 2.69 4.19 9.29 13.68 15.07 15.38 17.57 20.21 28.92 25.77 28.94 31.26 36.86 41.64
a A, cotton stepwise; B, castor stepwise; C, complete stepwise; D, cotton forward; E, castor forward; F, palm forward; and G, complete forward.
built, except for biodiesel from palm oil (without variable selection method), which means that the one with better precision will be the more sensitive. Table 2 presents the comparison between real and predicted values for each set of samples analyzed using variable selection methods. In general, it shows that all PLS calibration models present a higher error for the samples with lower concentrations of raw soybean oil. However, there is a trend in forward method results that gave lower error values for all biodiesel/raw soybean oil blends. 4. Conclusions PLS multivariate models based on MIR infrared spectra developed in this work was proven suitable as a practical analytical method to predict raw soybean content in biodiesel blends from 1 to 40% (v/v). Variable selection methodology was an important tool to evaluate and develop a good PLS calibration model. In relation to each group of sample analyzed, both methods of variable selection showed different efficiency to obtain a proper calibration model. This means that each set of data gives different information. The forward method lead to better PLS calibration models considering each set of biodiesel source/raw soybean oil blend alone. However, the stepwise method built up a better PLS model when the complete set (all 120 samples) were used altogether. The calibration model developed in this study seems to be quite good because it can preview with 95% of significance all samples analyzed. EF700531N