2212
Energy & Fuels 2007, 21, 2212-2215
Identification of Gasoline Origin by Physical and Chemical Properties and Multivariate Analysis Paulo J. S. Barbeira,* Rita C. C. Pereira, and Camila N. C. Corgozinho Laborato´ rio de Ensaios de Combustı´Veis, Departamento de Quı´mica, ICEx UniVersidade Federal de Minas Gerais, AV. Antoˆ nio Carlos 6627, 31270-901, Belo Horizonte, Minas Gerais, Brazil ReceiVed August 28, 2006. ReVised Manuscript ReceiVed April 19, 2007
Hierarchical cluster and linear discriminant analysis, associated with physicochemical assays, were used to identify the origin of gasoline commercialized in Brazilian gas stations, which is important information to determine the profile of fuel and aid in the identification of dissimilar samples that may be associated with adulterations.
1. Introduction The Brazilian fuel market has gone through great changes in the last few years, because oil refining and fuel production ceased being a state monopoly. These facts increased the number of distribution companies and gas stations and encouraged competition, thus causing great price variation. Despite this, the quality of the fuel is not guaranteed. The Brazilian fuel market commercializes four types of automotive fuels: gasoline [with a mixture of 20-25% (v/v) of ethanol], diesel oil, natural gas, and hydrated alcohol. In Brazil, refineries and petrochemical plants produce gasoline and sell it to distributors, who in turn add ethanol and sell it to gas stations. In 2006, 24.0 billion liters of gasoline were commercialized,1 and it is estimated that 10% was adulterated with the addition of industrial solvents, mainly heavy aliphatic, light aliphatic, and aromatic hydrocarbons (HCs). Gasoline and solvents have different taxations, making adulteration very easy and profitable. Furthermore, each state has its particular taxation and allows gas stations to buy fuel from any dealer, produced in different refineries, and, therefore, to commercialize gasoline from various origins in Brazil, mainly from the Minas Gerais state. A study of the gasoline source based on information found on the invoices of the samples collected for the National Petroleum Agency Monitoring Program, in Minas Gerais, indicated that, during some months, more than 50% of the samples did not have a known origin.2 Hence, tax evasion and fuel adulteration, mainly of gasoline, are problems that need to be controlled. Many authors have studied adulteration of fuel, especially the addition of kerosene to diesel.3-10 Gasoline adulteration with * To whom correspondence should be addressed. E-mail: barbeira@ ufmg.br. (1) www.anp.gov.br (accessed Feb 28, 2007). (2) Barbeira, P. J. S. Engenharia Te´ rmica 2002, 2, 48-50. (3) Kumar, A.; Singh, V. R.; Parashar, D. C. Res. Ind. 1991, 36 (3), 168-170. (4) Bahari, M. S.; Criddle, W. J.; Thomas, J. D. R. Analyst 1992, 117 (4), 701-706. (5) Roy, S. Sens. Actuators, B 1999, 55 (2-3), 212-216. (6) Bali, L. M.; Srivastava, A.; Shukla, R. K.; Srivastava, A. Opt. Eng. 1999, 38 (10), 1715-1721. (7) Kalligeros, S.; Zannikos, F.; Stournas, S.; Lois, E.; Anastopoulos, G. Int. J. Energy Res. 2001, 25 (15), 1381-1390. (8) Patra, D.; Mishra, A. K. Anal. Chim. Acta 2002, 454 (2), 209-215.
kerosene was analyzed,9 and some techniques to detect it were proposed.11-14 Many authors have studied the adulteration of Brazilian gasoline with the addition of solvents.2,15-18 Barbeira2 used histograms and normal distribution curves (Gaussian fit) to determine the average composition of the gasoline commercialized in the state of Minas Gerais and to detect samples with dissimilar composition. This atypical behavior may have two possible causes: samples from different origins (e.g., different refineries) or careful adulteration with the controlled addition of solvents, to control the product obtained, so that it falls within the legal specifications. The addition of solvents may cause poor engine performance, rubber corrosion, and environmental pollution, as well as tax revenue losses because industrial solvents and fuels have different taxation. The study showed a considerable number of samples (24%) of dissimilar composition, although the samples tested were from the same source. Furthermore, a great number of samples of unknown source justified this study. Multivariate analysis is an important statistical tool that has been used to predict physicochemical properties of petrochemical products from gas chromatography (GC),19-20 Fourier transform infrared spectroscopy (FTIR),21-26 nuclear magnetic (9) Kalligeros, S.; Zannikos, F.; Stournas, S.; Lois, E. Energy 2003, 28 (1), 15-26. (10) Kalligeros, S.; Zannikos, F.; Stournas, S.; Lois, E.; Anastopoulos, G. Energy ConVers. Manage. 2005, 46 (5), 677-686. (11) Culp, R. A.; Noakes, J. E. J. Agric. Food Chem. 1992, 40 (10), 1892-1897. (12) Burgess, D. S. Photonics Spectra 2004, 38 (3), 26. (13) Suri, S. K.; Prasad, K.; Ahluwalia, J. C.; Rogers, D. W. Talanta 1981, 28 (5), 281-286. (14) Dhole, V. R.; Ghosal, G. K. J. Liq. Chromatogr. 1995, 18 (12), 2475-2488. (15) de Oliveira, F. S.; Teixeira, S. G.; Arau´jo, M. C. U.; Korn, M. Fuel 2004, 83 (7-8), 917-923. (16) Moreira, L. S.; D’Avila, L. A.; Azevedo, D. D. Chromatographia 2003, 58 (7-8), 501-505. (17) Wiedemann, L. S. M.; D’Avila, L. A.; Azevedo, D. D. Fuel 2005, 84 (4), 467-473. (18) Wiedemann, L. S. M.; D’Avila, L. A.; Azevedo, D. D. J. Braz. Chem. Soc. 2005, 16 (2), 139-146. (19) Morris, R. E.; Hammond M. H.; Shaffer, R. E.; Gardner, W. P.; Rose-Pehrsson, S. L. Energy Fuels 2004, 18 (2), 485-489. (20) Skrobot, V. L.; Castro, E. V. R.; Pereira, R. C. C.; Pasa, V. M. D.; Fortes, I. C. P. Energy Fuels 2005, 19 (6), 2350-2356. (21) Soyemi, O. O.; Busch, M. A.; Busch, K. W. J. Chem. Inf. Comput. Sci. 2000, 40 (5), 1093-1100.
10.1021/ef060436l CCC: $37.00 © 2007 American Chemical Society Published on Web 06/19/2007
Communications
resonance (NMR),27,28 and other optical methods.29,30 Chemometric statistical methods were used to analyze fuel degradation,31,32 the mechanism of fire retardants,33 the relationship between composition and HC emission,34 particulate matter compositions,35 solvent extraction of petroleum HCs in sediments,36 and parameters of smoke emission control.37 GC and chemometric techniques were used for pattern recognition classification of diesel38 and jet fuel39 and in the analysis of specific constituents in fuel as naphthalenes40 and aromatics in jet fuel,41 HC in gases,42 or multicomponents in fuels,43-48 besides sulfur in diesel49 by FTIR. Oliveira et al.15 used the distillation curve profiles associated with the discriminatory soft independent modeling of class analogy (SIMCA) model classification to screen gasoline adulteration. Only three of the evaluated parameters for gasoline quality control (ethanol concentration and temperatures measured for 90 and 100% evaporated) were really necessary to characterize sample adulterations, but the temperature of the evaporated 60% was the most important parameter for SIMCA (22) Reboucas, M. V.; Neto, B. D. J. Near Infrared Spectrosc. 2001, 9 (4), 263-273. (23) Macho, S.; Larrechi, M. S. TrAC, Trends Anal. Chem. 2002, 21 (12), 799-806. (24) Pavoni, B.; Rado, N.; Piazza, R.; Frignani, S. Ann. Chim. 2004, 94 (7-8), 521-532. (25) Oliveira, F. C. C.; de Souza, A. T. P. C.; Dias, J. A.; Dias, S. C. L.; Rubim, J. C. Quim. NoVa 2004, 27 (2), 218-225. (26) Pereira, R. C. C.; Skrobot, V. L.; Castro, E. V. R.; Fortes, I. C. P.; Pasa, V. M. D. Energy Fuels 2006, 20 (3), 1097-1102. (27) Meusinger, R. Fuel 1996, 75 (10), 1235-1243. (28) Meusinger, R. Oil Gas Mag. 2001, 27 (4), 35-38. (29) LitaniBarzilai, I.; Sela, I.; Bulatov, V.; Zilberman, I.; Schechter, I. Anal. Chim. Acta 1997, 339 (1-2), 193-199. (30) Cooper, J. B. Chemom. Intell. Lab. Syst. 1999, 46 (2), 231-247. (31) Johnson, K. J.; Morris, R. E.; Rose-Pehrsson, S. Abstr. Pap. Am. Chem. Soc. 2004, 228, U182-U182 84-PETR Part 2. (32) Johnson, K. J.; Rose-Pehrsson, S. L.; Morris, R. E. Energy Fuels 2004, 18 (3), 844-850. (33) Smith, C. S.; Metcalfe, E. Polym. Int. 2000, 49 (10), 1169-1176. (34) Schuetzle, D.; Siegl, W. O.; Jensen, T. E., Dearth, M. A.; Kaiser, E. W.; Gorse, R.; Kreucher, W.; Kulik, E. EnViron. Health Perspect. 1994, 102 (4), 3-12. (35) Jeon, S. J.; Meuzelaar, H. L. C.; Sheya, S. A. N.; Lighty, J. S.; Jarman, W. M.; Kasteler, C.; Sarofim, A. F.; Simoneit, B. R. T. J. Air Waste Manage. Assoc. 2001, 51 (5), 766-784. (36) Chee, K. K.; Wong, M. K.; Lee, H. K. EnViron. Monit. Assess. 1997, 44 (1-3), 587-603. (37) Blanco, M.; Coello, J.; Maspoch, S.; Puigdomenech, A.; Peralta, X.; Gonzalez, J. M.; Torres, J. Oil Gas Sci. Technol. 2000, 55 (5), 533541. (38) Johnson, K. J.; Wright, B. W.; Jarman, K. H.; Synovec, R. E. J. Chromatogr., A 2003, 996 (1-2), 141-155. (39) Johnson, K. J.; Synovec, R. E. Chemom. Intell. Lab. Syst. 2002, 60 (1-2), 225-237. (40) Johnson, K. J.; Prazen, B. J.; Young, D. C.; Synovec, R. E. J. Sep. Sci. 2004, 27 (5-6), 410-416. (41) Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Anal. Chem. 2000, 72 (17), 4154-4162. (42) Svenningstorp, H.; Widen, B.; Salomonsson, P.; Ekedahl, L. G.; Lundstrom, I.; Tobias, P.; Spetz, A. L. Sens. Actuators, B 2001, 77 (1-2), 177-185. (43) Bruckner, C. A.; Prazen, B. J.; Synovec, R. E. Anal. Chem. 1998, 70 (14), 2796-2804. (44) Prazen, B. J.; Bruckner, C. A.; Synovec, R. E.; Kowalski, B. R. Anal. Chem. 1999, 71 (6), 1093-1099. (45) Prazen, B. J.; Bruckner, C. A.; Synovec, R. E.; Kowalski, B. R. J. Microcolumn Sep. 1999, 11 (2), 97-107. (46) Synovec, R. E.; Prazen, B. J.; Johnson, K. J.; Fraga, C. G.; Bruckner, C. A. AdV. Chromatogr. 2003, 42, 1-42. (47) Hope, J. L.; Johnson, K. J.; Cavelti, M. A.; Prazen, B. J.; Grate, J. W.; Synovec, R. E. Anal. Chim. Acta 2003, 490 (1-2), 223-230. (48) Johnson, K. J.; Wright, B. W.; Jarman, K. H.; Synovec, R. E., J. Chromatogr., A 2003, 996 (1-2), 141-155. (49) Breitkreitz, M. C.; Raimundo, I. M.; Rohwedder, J. J. R.; Pasquini, C.; Dantas, H. A.; Jose´, G. E.; Arau´jo, M. C. U. Analyst 2003, 128 (9), 1204-1207.
Energy & Fuels, Vol. 21, No. 4, 2007 2213
classification. The model classified correctly 92% of the 50 gasoline samples analyzed. Azevedo et al.16-18 used GC-mass spectrometry (MS) and physicochemical parameters, associated with a chemometric statistical method (Hierarchic Cluster Analysis), to detect gasoline adulteration. Many types of adulteration, such as the addition of heavy and light aliphatic HCs, can be detected. The purpose of this work is to use the pattern recognition ability of multivariate analysis methods (principal components analysis, PCA; hierarchical cluster analysis, HCA; and linear discriminant analysis, LDA), together with physicochemical assays, to identify the origin of gasoline commercialized in gas stations in the state of Minas Gerais (Brazil), which is important information to determine the profile of fuel. During inspection procedures, it is often important to determine the origin of the fuel, especially in the cases in which this information is not available because the retailer does not have the purchase invoice. However, the origin of fuel cannot be determined using only physicochemical methods. It is important to establish the origin because of the great turnover of fuel in the country and to the higher incidence of adulteration in certain regions, where the committee for control and inspection carries out special procedures to trace the product and solve the problem. Besides, in the case of leakage and environmental damages, the origin of the product is of main importance to identify those responsible and speed up legal actions. To determine whether the cause of an anomalous behavior of the results of the physicochemical analysis carried out in gasoline is related to a reasonable adulteration, it is necessary to exclude the possibility of this gasoline being a fuel produced in a different refinery. In the face of this, the importance of developing simple, fast, efficient, and low-cost methods and analytical techniques to certify the origin, quality, and authenticity of the commercialized fuels becomes conspicuous. 2. Experimental Section A total of 1328 samples of regular and special gasolines were collected in the east area of Minas Gerais (Figure 1), in about 550 municipalities. Approximately 80% of the samples were from Regap, the only refinery in the state of Minas Gerais. The participation of fewer samples from other refineries is due to the fact that most of them are commercialized near the border of other states; they are closer to other refineries and sometimes have lower taxation, although they are practically commercialized in the whole state. The origin of the samples was on the invoices issued by the gas stations, before the experiments were carried out. The samples were maintained under refrigeration (8-15 °C) until the experiments were performed in the following sequence: specific gravity, infrared spectroscopy, and distillation, to avoid the loss of the more volatile components. An automatic densimeter Anton Paar DMA 1500 [American Society for Testing and Materials (ASTM) D 4052], an automatic distiller Herzog HDA 627 (ASTM D 86), and an infrared analyzer Petrospec GS 1000 were used for the assays. The multivariate analysis was performed with statistical software (Minitab Release 14) using 10 parameters analyzed for each sample by three distinct assays: specific gravity, distillation (temperatures equivalent to 10, 50, and 90% of the distilled volume and final evaporation point), and infrared spectroscopy [motor octane number (MON) and research octane number (RON) values, besides aromatics, olefins, and saturated contents].
3. Results and Discussion According to Table 1, samples from refineries located in Sa˜o Paulo, Rio de Janeiro, Bahia, and Minas Gerais, itself, were
2214 Energy & Fuels, Vol. 21, No. 4, 2007
Communications
Figure 2. Evaluation of the formation of clusters in gasoline samples from Rlam, Regap, Replan, Revap, and Reduc refineries using PCA with a correlation matrix.
Figure 1. Minas Gerais state location, sampling region, neighboring states, and location of the closest refineries (black dots). (1) Regap, (2) Reduc, (3) Manguinhos, (4) Replan, (5) Recap, (6) Rpbc, (7) Revap, and (8) Rlam. Table 1. Source and Amount of Samples Collected and Their Codes Based on Invoices Issued by the Gas Stations refineries (code)
number of samples
percentage (%)
Rlam (1) Regap (2) Replan (3) Revap (4) Reduc (5) total
32 1085 30 41 140 1328
2.4 81.7 2.3 3.1 10.5 100.0
Figure 3. Dendrogram of gasoline samples from Rlam (1), Regap (2), Replan (3), Revap (4), and Reduc (5) refineries obtained with Euclidean distance and Ward linkage criteria. Table 3. Summary of the Agglomerative Coefficient for Gasoline Samples from Different Refineries
Table 2. Results of the Principal Components and Accumulated Variance for Samples Collected principal components
accumulated variance (%)
PC1 PC2 PC3
46.9 71.1 80.6
found in the Minas Gerais state. A total of 1328 samples of known origin were used in the model construction. Initially, data of the physicochemical properties were converted into a data matrix with the gasoline samples in the rows and the figures of the density, distillation, temperature, etc., in the columns. The preprocessing used to treat the data matrix was mean-centered. Out of 1328 gasoline samples, approximately 120 (∼25 gasoline samples from each refinery) were used in the construction of the PCA and HCA models to make visualization of the clusters in the graphics easier. Analyzing the data by PCA with a correlation matrix in the original data matrix, it was observed that the three first principal components (accounting for 81.6% of the accumulated variance; Table 2) permitted the separation of samples by origin: Rlam (1), Regap (2), Replan (3), Revap (4), and Reduc (5), as shown in Figure 2. Table 3 summarizes the results obtained from the agglomerative coefficients of gasoline samples according to the linkage criteria and distance measurement used. It was observed that the best segregation is that obtained by the combination of the Euclidean criteria and the Ward linkage. Applying HCA to the original data matrix, it was noticed that the samples segregated
distance linkage
Euclidean
Manhattan
average complete simple ward
0.8902 0.9384 0.8439 0.9632
0.8860 0.9254 0.8601 0.9562
Table 4. Summary of the Classification of the Training Set Samplesa
Rlam Regap Replan Revap Reduc total of samples a
Rlam
Regap
Replan
Revap
Reduc
22 0 0 0 0 22
0 748 7 2 2 759
0 2 14 4 0 20
0 0 2 27 0 29
0 2 1 2 93 98
Percentage of samples correctly classified in the training set ) 97.4%.
according to the origin (Figure 3), and a similar result was obtained using the PCA technique. To carry out the LDA, the 1328 samples were divided into two sets: the training set (70% of the samples, 930 samples) and the test set (30% of the samples, 398 samples). The samples for each group were chosen randomly. The results showed that, in the training set, the percentage of correctly classified samples was 97.4% and, in the set group, the percentage of correctly classified samples was 97.0%, as shown in Tables 4 and 5. Table 6 shows the figures obtained for the Mahalanobis distance between the groups. The results indicate that the most similar gasoline samples are from Revap and Replan. Although they present a smaller distance (15.99), no Replan sample in
Communications
Energy & Fuels, Vol. 21, No. 4, 2007 2215
Table 5. Summary of the Classification of the Test Set Samplesa
Rlam Regap Replan Revap Reduc total of samples
Rlam
Regap
Replan
Revap
Reduc
10 0 0 0 0 10
0 320 4 2 0 326
0 0 6 3 0 9
0 0 1 11 0 12
0 2 0 0 40 42
a Percentage of samples correctly classified of the test set samples ) 97.0%.
between them, and the Mahalanobis distance was equal to 101.40. The misclassifications may be associated with the great similarity of gasoline produced in these refineries (Replan and Revap) to the mixture of gasoline in the gas-station tanks. The results proved that PCA, HCA, and LDA techniques associated with gasoline physicochemical properties allow the segregation of gasoline by the origin. It is also worth stating that the LDA technique achieved 97% efficiency in the segregation of samples.
Table 6. Mahalanobis Distance between Groups of Gasoline Samples
Rlam Ragap Replan Revap Reduc
Rlam
Regap
Replan
Revap
Reduc
0.00 101.40 54.87 62.24 36.15
101.40 0.00 31.44 34.98 40.84
54.87 31.44 0.00 15.99 53.51
62.24 34.98 15.99 0.00 56.81
36.15 40.84 53.51 56.81 0.00
the test group was classified as being from Revap or vice versa. The results also indicate a similarity between Regap and Replan samples (Mahalanobis distance of 31.41). On the other hand, samples from Rlam and Regap groups presented less similarity
4. Conclusion This study shows that PCA, HCA, and LDA techniques were valuable in the discrimination of gasoline samples by refinery. It was also concluded that the methodology developed, that is, physicochemical properties associated with multivariate techniques (PCA, HCA, and LDA), is a powerful technique to identify the origin of gasoline commercialized in Minas Gerais. It is also important to point out that the correct identification of the origin of gasoline allows for more reliable results. EF060436L