Discriminant Analysis and Cluster Analysis of ... - ACS Publications

Similarly, Ruschel and co-workers applied discriminant analysis using mid-infrared and multivariate analysis to diesel S500/biodiesel blends in order ...
0 downloads 0 Views 3MB Size
Subscriber access provided by UNIVERSITY OF SOUTH CAROLINA LIBRARIES

Article

Discriminant analysis and cluster analysis of biodiesel fuel blends based on Fourier Transform Infrared Spectroscopy (FTIR) Victor Hugo Jacks Mendes dos Santos, Eduardo do Canto Bruzza, Jeane E. L. Dullius, Rogerio Vescia Lourega, and Luiz Frederico Rodrigues Energy Fuels, Just Accepted Manuscript • DOI: 10.1021/acs.energyfuels.6b00447 • Publication Date (Web): 11 May 2016 Downloaded from http://pubs.acs.org on May 24, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Energy & Fuels is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Discriminant analysis and cluster analysis of biodiesel fuel blends based on Fourier Transform Infrared Spectroscopy (FTIR) ‡

Victor Hugo J. M. dos Santos a, Eduardo Do Canto Bruzza a, Jeane E. L. Dullius a,b, Rogério V.

Lourega a,b, ‡Luiz F. Rodrigues * a a

Institute of Petroleum and Natural Resources, Pontifical Catholic University of Rio Grande do

Sul, Av. Ipiranga, 6681 – Building 96J, 90619-900, Porto Alegre, Brazil. b

Faculty of chemistry (FAQUI), Pontifical Catholic University of Rio Grande do Sul, Av.

Ipiranga, 6681 – Building 12B, 90619-900, Porto Alegre, Brazil. KEYWORDS: infrared spectroscopy; multivariate data analysis; multivariate classification; biodiesel; fuel blends

ABSTRACT. In this work, a multivariate approach was used to classify diesel/biodiesel fuel blends among 0 % to 100 % of biodiesel content on fuel mixture through discriminant analysis and cluster analysis associated with Fourier Transform Infrared Spectroscopy (FTIR). The multivariate statistical techniques used in this work were Partial Least Squares Discriminant Analysis (PLS-DA), Principal Component Analysis (PCA), Soft Independent Modeling of Class Analogy (SIMCA) and Hierarchical Clustering Analysis (HCA) and Support Vector Machine

ACS Paragon Plus Environment

1

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 55

(SVM). Multivariate analysis was performed on the following oil samples: soybean, corn biodiesel, diesel S10 and fuel blends prepared from 0 % to 100 % (v/v) of biodiesel content. All multivariate statistical techniques were able to discriminate between the oil source and the ester percentage in the mixture. It was possible to develop robust multivariate models associated with the FTIR to allow for simultaneous discrimination of the types of oils used for biodiesel production and their content in fuel blends.

1. INTRODUCTION In recent years there has been an increase in the number of studies of renewable energy sources in an attempt to reduce the world’s dependence on fossil fuels. The use of biofuels is one of the most promising alternatives for reducing this dependence on petroleum derivatives.1,2 Biodiesel is a type of fuel obtained from biomass and is defined as a monoalkyl ester derived from long chain fatty acids mainly originated from edible or non-edible oilseeds and other sources of fatty materials such as animal fat.3,4 The main raw materials available for biodiesel production are oilseeds from palm, canola and soybean, although many studies suggest other sources for the production of second and third generation biofuels, such as sugarcane bagasse and algae biodiesel.4 In Brazil, there are 55 plants authorized by the National Petroleum Agency - ANP, which collectively produce over 20000 m³ of biodiesel every day. Soybean seeds, which represent 72 % of the raw materials used, are the main source of biodiesel in Brazil.5 In 2010, Brazil became the second largest producer of biodiesel, (behind only Germany) with 2.4 million m³ produced that year.6,7 Due to the increase in biodiesel production, it was necessary to develop a series of regulations that define quality control parameters for the application of biodiesels as fuel.

ACS Paragon Plus Environment

2

Page 3 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

The standard methods are proposed mainly by American Standard and Testing Materials (ASTM) and the European Standardization Organizations (ESOS), responsible for the European Standards (EN). In Brazil, the Brazilian Association of Technical Standards (ABNT – NBR) regulates the methods and biodiesel quality standards and through the Law Nº 13.033/2014 are defined as 7 % the mandatory blending of biodiesel into diesel. In 2016, a new target was established through the Law Nº 13.263/2016 and until 36 months, the percentage of biodiesel in the fuel blend should reach 10 %. Since the introduction of the new fuel blend content definitions, different techniques have been proposed for the quality control of the percentage of biodiesel added to the fuel mixed commercialized (fuel blends). Techniques based on hydrogen nuclear magnetic resonance (HNMR)8,9, fluorescence spectroscopy10, radiocarbon by liquid scintillation11,12 and accelerator mass spectrometry (AMS)13 have been proposed to ensure the proper content of the fuel blends. However, these techniques are too expensive to use as quality control measures in industry. A more cost-effective infrared spectroscopy method (FTIR) has also been recommended by the standardizations ASTM, EN and NBR and is now considered the most appropriate for implementation in industries. Besides presenting a low cost, the infrared technique also requires little or no sample processing, performs non-destructive analysis14 and may be combined with multivariate data analysis. A multivariate approach is one of the best ways to perform high throughput data analysis. Several works have been published to predict property of compounds and mixtures. Bona and Andrés estimated the property values for coals samples with near-infrared spectroscopy (NIR) data and physicochemical parameters of coal applying partial least squares regression (PLS-R),

ACS Paragon Plus Environment

3

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 55

qualitative multivariate analysis techniques, Hierarchical Clustering Analysis (HCA) and linear discriminant analysis (LDA).15 The best result obtained in this work was found when the information of physicochemical properties and NIR data were combined and applied in a LDA model after PCA analysis. Balabin and Safiesa studied another application of multivariate techniques for fuel classification purpose. They used NIR for discrimination of gasoline and applied the LDA, SIMCA and multilayer perceptron (MLP) to classify 382 samples arranged in three sets by source (refinery or process) and type.16 The best results for gasoline classification using only NIR was obtained by the method based on MLP. For the biodiesel studies, the multivariate techniques have been applied to determine the composition of diesel/biodiesel blends17–19, adulteration control20 and measurements of specific gravity21,22, flash point21,22, cetane number 22, among others. The techniques based on infrared, established by ASTM, are applied to methyl esters and require the Horizontal Attenuated Total Reflection (HATR) accessory. To determine the biodiesel percentage in the blend, three standard methods are used: ASTM D7806 based on the univariate regression23, ASTM D7861, applied to portable devices that use a linear variable filter (LVF) 24, and ASTM D7371, based on a multivariate approach by partial least squares regression (PLS-R).25 Few studies have applied discriminant and classificatory analysis to the evaluation of biodiesel. Mustafa and coworkers developed a chemometric approach for classification of biodiesels based on artificial neural network (ANN - Multilayer Perceptron). The successfully recognition of biodiesels according to their feedstock (sunflower, rapeseed, corn, soybean, palm,

ACS Paragon Plus Environment

4

Page 5 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

peanut) are performed through the gas chromatography analysis and assessment of content of five major components in the FAME: esters derived from C16:0, C18:0, C18:1, C18:2 and C18:3 acids.26 On the other hand, some cheaper methods are proposed from authors as Hocevar and coauthors, which used a mid-infrared spectroscopy and multivariate analysis to determine the composition of used cooking oil.27 They performed a discrimination from biodiesel sample of soybean, palm, hydrogenated vegetable oil and their mixtures using (PCA) and developed models based on PLS-R to estimate the composition of the blends. In the same way, Mueller and colleagues studied the application of FTIR and multivariate analysis to identify six vegetable oils used in biodiesel production.28 The dendrograms obtained from Hierarchical Clustering Analysis (HCA) and the PCA were used to discriminate between oil samples. The Interval Principal Component Analysis (iPCA) was also applied to determine the best spectral region and further application of SIMCA. The discriminant analysis of fuel blends may also be used to identify the raw material origin and type of biodiesel (methyl or ethyl). Mazivila and coworkers used the mid-infrared (MIR), combined with the supervised chemometric tool PLS-DA, to distinguish six different blends of biodiesel/diesel.29 The blends analyzed were methyl and ethyl esters from soybean, jatropha and waste frying oil. The best regions used were around 2920 cm-1 and between the 1750-1730 cm-1, which are attributed to the symmetric stretching (C-H) of CH3 and to the stretching of the carbonyl bond (C=O) respectively. Similarly, Ruschel and coworkers applied discriminant analysis using mid-infrared and multivariate analysis to diesel S500/biodiesel blends in order to classify mixtures of methyl and

ACS Paragon Plus Environment

5

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 55

ethyl esters derived from soybean, waste frying and hydrogenated vegetable oil, where the HCA and PCA were applied using a spectral range from 1800 cm-1 to 650 cm-1 and four principal components (PCs) to classify the mixtures.30 The aim of this work is firstly perform an exploratory analysis, based on mid-infrared spectroscopy, in five types of oilseeds biodiesel, two types of diesel, with different sulfur contents (S10 and S500), and binary mixtures of diesel and biodiesel derived from soybeans and corn oil. The discriminant and cluster analyses were applied only to the corn and soybean fuel blends, prepared between 0 % to 100 % of biodiesel in the mixture. This biodiesel range was not reported in the literature before. The binary mixtures were classified through the multivariate statistical techniques Hierarchical Clustering Analysis (HCA), Soft Independent Modeling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA) and Support Vector Machine (SVM). 2. BRIEF DESCRIPTIONS OF MULTIVARIATE ANALYSIS TECHNIQUES The multivariate analysis techniques developed in this work, briefly presented in this session, are PCA, PLS-R, PLS-DA, HCA, SIMCA and SVM. 2.1. Principal Component Analysis (PCA) The aim of the Principal Component Analysis (PCA) it is to project large data into a small dimensional space. 16 The PCA is a descriptive multivariate projection technique based on a linear combination of variables used to obtain the PCs. As a result, PCs are plotted orthogonally to each other in order to obtain the maximum explained variance resulting in the greatest spatial separation of the samples in a graphical projection.31

ACS Paragon Plus Environment

6

Page 7 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

2.2. Partial Least Squares Regression (PLS-R) The Partial Least Squares Regression (PLS-R) is an extension of the multiple linear regression model. It is used to evaluate the degree of relationship between a set of predictive variables (X) and a set of output variables (Y). The PLS-R method has found wide dissemination in the multivariate analysis of spectral data. The main objective of PLS-R is to build calibration models but this technique can also be applied for classification purposes.32 2.2.1. Partial Least Squares Discriminant Analysis (PLS-DA). The Partial Least Squares Discriminate Analysis (PLS-DA) is applied when the objective is classification. The algorithm used is similar to the PLS-R with spectral data arranged on the X axis but replacing the data on the Y axis with classes that will be used for model construction.29 2.3. Hierarchical Clustering Analysis (HCA) The purpose of Hierarchical Clustering Analysis (HCA) is to emphasize the groupings and natural patterns in a dataset.28 In the HCA the samples are grouped based on a measure of similarity between them, until there is only one group containing all samples.31 The result of HCA is presented in a two-dimensional graph called a dendrogram. 2.4. Soft Independent Modeling of Class Analogy (SIMCA) The Soft Independent Modeling of Class Analogy (SIMCA) is a supervised classification method for datasets. The models are developed using the PCA, calibrated with samples belonging to each class that needs to be analyzed .28 An observation is assigned to a class when a remaining distance to the model is lower than the statistical threshold for the group.32 2.5. Support Vector Machines (SVM)

ACS Paragon Plus Environment

7

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 55

Support Vector Machines (SVMs) corresponds to a linear method that are non-linearly related to the input space.33 In this method, the data are mapped onto an input dimensional space and then a hyperplane is constructed based on X training vectors using a Kernel function to obtain maximum separation between the groups of samples.31–34

3. EXPERIMENTAL SECTION In this work it was developed biodiesel derived from oilseeds sources and included five biodiesel samples from different vegetable oil: soybean, corn, canola, sunflower and olive oil. For the discriminant and cluster analyses, only binary blends of diesel S10 and one of the biodiesel (from soybean oil or corn oil) was used. In order to enrich the discussion for multivariate data analysis, the exploratory analysis were also applied to the diesel S500 and other oils (canola, sunflower and olive oils). 3.1. Materials The reagents used in this work were methanol 99.8 % (Anidrol) Magnesol® (Dallas Group), 85 % KOH (Merck), methyl heptadecanoate (Sigma-Aldrich), refined vegetable oils obtained from local businesses, diesel S10 and diesel S500 from Refinaria Alberto Pasqualini (Refap) containing 10 and 500 ppm of sulfur, respectively. 3.2. Sample Preparation 3.2.1 Synthesis of biodiesel. The synthesis of biodiesel was performed using a molar ratio of alcohol to oil of 6:1, a temperature of 60 °C, 1 % KOH as a catalyst, a reaction time of 1 hour

ACS Paragon Plus Environment

8

Page 9 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

under minimal agitation of 500 rpm.35 After completion of the reaction, the products were transferred to a separatory funnel and allowed to stand for separation of the glycerin. 3.2.2. Purification of the biodiesel. After removing the excess alcohol, biodiesel passed through cleaning with 1 % Magnesol® (w/w) at 70-80 °C for 1 hour followed by subsequent filtration.36 3.3. Gas chromatography analysis The gas chromatography (GC) analyses were performed in a GC-14B Shimadzu gas chromatograph, equipped with a flame ionization detector (FID) and the produced esters were characterized and quantified based on EN 14103 method and using methyl heptadecanoate as internal standard.37 The capillary column was a HP-INNOWAX (30 m x 250 µm x 0.25 µm i.d.) and the injector apply a split ratio of 1:30 with nitrogen as the carrier gas (1.0 mL min-1). The detector was setted at 250 °C (hydrogen flow of 35.0 mL min-1 and synthetic air 350.0 mL min-1). The oven temperature start at 160 °C for 1.0 min and increased at a heating rate of 40 °C min-1 up to 190 °C, kept at this temperature for 1.0 min and finally increased at a heating rate of 3 °C min-1 to 250 °C and kept at this for 1.0 min.35

3.4. Acquisition of spectral data Fourier Transform Infrared Spectra (FTIR) were obtained using a Perkin-Elmer spectrometer model Spectrum One with Horizontal Attenuated Total Reflectance (HATR)

ACS Paragon Plus Environment

9

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 55

accessory with a zinc selenide (ZnSe) crystal. The spectral range are from 4000 cm-1 to 650 cm-1 with a resolution of 4 cm-1, 16 scans per spectrum, triplicates analysis for each sample. The spectrum of the clean and dry HATR ZnSe crystal against air was used as background and the crystal are cleaned twice with acetone, hexane and dried by nitrogen flow after each sample. According to the recommendation of ASTM D-7371, the mixtures were prepared based on their mass of diesel and biodiesel and then converted to a volumetric mixture (v/v) through of relative density of both. A total of 90 blends were made: 45 blends of methyl esters of soybean oil and 45 blends of methyl esters of corn oil. 3.5. Multivariate analysis All spectra were treated by multivariate analysis tools. The models were built using the software The Unscrambler X 10.4® CAMO Software Company. No baselines correction method was used in the FTIR instrument software, but the spectra was treated subsequently in the multivariate software. The algorithms were used without prior modification by applying the default settings of the software. 4. RESULTS AND DISCUSSION The results and discussions were organized in order to present separately the multivariate analysis of raw materials and fuel blends. 4.1. Multivariate analysis of raw materials Firstly, the PCA was applied to raw material samples in order to observe the possible separation among the esters in five vegetable oils used to produce biodiesel (soybean, corn,

ACS Paragon Plus Environment

10

Page 11 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

canola, sunflower and olive oil ) and diesel with different sulfur concentration (Figure 1a). The ester content obtained for each biodiesel by gas chromatography were soybean biodiesel (86.18 %), corn biodiesel (84.04 %), canola biodiesel (80.11 %), sunflower biodiesel (86.68 %) and olive biodiesel (82.29 %). The PCA was subsequently performed separately on biodiesels and diesel, in order to improve the discussion based on the obtained spectral information and better understand the data relationships (Figure 1b and Figure 1c). The model spectra were preprocessed using smoothing and first order derivate applying the Savitzky-Golay algorithm with 11 points window and a polynomial order of 2 being used NIPALS algorithm, mean centered data, without rotation and cross-validation.

Figure 1. PCA analysis of the raw materials for the production of biodiesel and diesel S10 and S500: a) PCA of all components; b) PCA of biodiesel; c) PCA of diesel.

After obtaining the PCA score plot results, it was possible to project graphically the properties of the analyzed samples. The PCA discrimination of five Fatty Acid Methyl Esters (FAMEs) and diesel (S10 and S500) can be observed in Figure 1a. Only two PCs were used to explain 99 % of the variance of the samples using the full spectral range (4000-650 cm-1). According to Figure 1a, the esters are arranged along the Y axis with the most unsaturated oils distributed in positive values while in Figure 1b the esters are arranged along the X axis mainly

ACS Paragon Plus Environment

11

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 55

due to the carbonyl intensity and the degree of oil saturation. Soybean and corn oils are closely grouped due to their similar chemical characteristics. Discrimination between the two different diesels evaluated can be observed in Figure 1c. All information spectral data are enough to separate samples in a small number of dimensions. Figure 2 shows the loading of PCA about the sample groups for infrared absorption spectrum.

Figure 2. PCA analysis: a) Loading of all components; b) Loading of biodiesel; c) Loading of diesel.

The PCA loading elucidates the data structure in terms of correlation between the modeled variables. For this work, four relevant regions were identified for sample discrimination to explain the variance of the model. The first spectral region, between 3050-2750 cm-1, corresponds to the symmetric and asymmetric stretching of the C-H bond as well as the stretching of C-H bonds of sp2 carbon C=C-H. This region is important for separating from each other between the methyl esters of various sources (Figure 2b) and among two kinds of diesel (Figure 2c). The second spectral region between 1800-1650 cm-1 it was the most significant and corresponds to the carbonyl stretching vibration band C=O and to the vibrations of the C=C. Since there are no esters in diesel, the carbonyl group is a great option to discriminate biodiesel and diesel samples (Figure 2a) but also sets an important region for distinguishing between the types of methyl esters used in the mixture (Figure 2b).

ACS Paragon Plus Environment

12

Page 13 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

There were small differences in the intensity and position of the carbonyl groups in all esters investigated (Figure 3), which the difference were detected by the principal components ands used for construction of the PCA model.17 The third spectral region between 1500-1350 cm-1 could also be used to discriminate diesel samples (Figure 2c) where can be observed the symmetric stretching (~1450 cm-1) and asymmetric stretching (~1380 cm-1) for CH3 and the scissor bending vibration for CH2.29 Finally, the fourth important region (Figure 2a and Figure 2b) was the spectral area between 1300-900 cm-1 which corresponds to the beginning of the fingerprint region of the compounds. The most important absorptions are associated with the symmetric and asymmetric stretching of the C-O bond of esters as well as the C-C.28 The Figure 3 shows the derivative graph for the carbonyl region using the five esters already discussed. Slight differences in the position and intensity of the carbonyl absorption were applicable as information for discrimination between raw materials by PCA.

Figure 3. Displacement of the first derivative FTIR spectra of the carbonyl (1760-1730 cm-1) of FAMEs

4.2. Multivariate Analysis of Fuel Blends 4.2.1. Fuel Blends. Twenty four points of fuel blends were prepared for calibration25 and twenty one points for validation38 of the multivariate model and was presented in Table 1. Totally, forty-five points were prepared and used for calibration/validation as well subjected to the discriminant analysis and cluster analysis of biodiesel fuel blends.

ACS Paragon Plus Environment

13

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 55

Table 1. Points used for construction and validation of a multivariate model

The infrared spectra (Figure 4) were obtained by FTIR and then further processed for the development of multivariate statistical techniques. The fuel blends were divided into three ranges based on the volumetric concentration of biodiesel in the mixture: SET A 0-10 %, SET B 10-30 % and SET C 30-100 %.

Figure 4. Biodiesel/diesel FTIR spectra.

4.2.2 Development of the PCA of the Fuel Blends. After checking the most significant spectral region, the PCA analysis of the fuel blends was conducted using the spectral range from 1800 cm-1 to 650 cm-1. The spectra were preprocessed using smoothing and first order derivate applying the Savitzky-Golay algorithm with 11 points window and a polynomial order of 2 being used NIPALS algorithm, mean centered data, without rotation and cross-validation. Figures 5 and 6 show the resulting score graphics obtained for the discrimination of the blends by PCA.

Figure 5. PCA analysis of corn and soybean blends

ACS Paragon Plus Environment

14

Page 15 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

The score plots of the blends (Figure 5) show that separation of the respective sets can be achieved. The best visual representation was obtained from the PC1xPC2 graph for soy biodiesel blends and PC1xPC3 for corn biodiesel blends. Figure 5 shows the SET’s separation, this mean a possible separation between groups with different concentration range of biodiesel content (SET A, SET B and SET C). In mixtures with increasing biodiesel contents there was a trend toward displacement on the positive side of the X axis (Figure 5a and Figure 5b). For soybean oil blends (Figure 5a) there was a displacement along the positive direction of the Y axis (PC1 x PC2) for higher values (SET A and SET B). However, in SET C, there is an inversion of this behavior for lower values. In corn oil blends (Figure 5b), it was a displacement along the negative direction of the Y axis (PC1xPC3) for lower values (SET A and SET B) but in SET C, there is an inversion of this behavior for higher values. This behavior, when the both kinds of blends change their displacement tendency from SET B to SET C, means that other spectral regions became more significant for explain variance in samples with higher percentages of ester in the mixture. Figure 6 demonstrates plots of the scores of the PCA soybean x corn according to their respective sets. The samples in the graph was the validation blends described in Table 1 which helped to interpret the discriminant analysis results. Biodiesel contents between 0 % and 10 % (v/v) which were prepared with corn and soybean blends of SET A could be discriminated by PCA (Figure 6a). The soybeans blends were

ACS Paragon Plus Environment

15

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 55

shifted to the positive side with respect to the Y axis and mixtures with high ester content were displaced to the positive side with respect to the X axis. As the biodiesel content in the mixture increases, the PCA begins to lose its ability to discriminate between different oils used as raw material (Figure 6b). The corn blend scores from SET B are shifted to the positive side of the Y axis and values of esters above 20 % in the mixture could be displaced from their respective groups. Finally, the blends of SET C (Figure 6c) could not be clearly distinguished between corn and soy blends, therefore, this model was not able to classify blends with a high content of biodiesel, but was able to classify different blends with lower biodiesel content.

Figure 6. PCA analysis of corn x soy blend: a) SET A; b) SET B; c) SET C; d) All samples

The PCA analysis showed that, for SET containing 0-10 % (v/v) biodiesel in the mixture, there were no major problems in discrminating biodiesel blends of corn and soybean. However, for SET containing 10-30 % (v/v) and SET containing 30-100 % (v/v) of biodiesel in the mixture, there were a few points of each type of biodiesel positioning outside their respective groups (soybean and corn blends). Figure 6d, shows the samples distribution along all the range from 0 % to 100 %. 4.2.3. Development of HCA model. The Hierarchical Clustering Analysis (HCA) was conducted using the spectral regions between 3100-2810 cm-1 and 1800-650 cm-1. The best results were obtained using the hierarchical average-linkage as the clustering method and

ACS Paragon Plus Environment

16

Page 17 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Spearman's rank correlation as a distance measure. The infrared spectra were preprocessed using smoothing and first order derivate by Savitzky-Golay algorithm with 11 points window and a polynomial order of 2. The results are presented in dendrograms in Figure 7.

Figure 7. Dendrograms of HCA of corn x soy blend: a) SET A; b) SET B; c) SET C.

The visualization provided by dendrograms reinforces the PCA results. In SET A (Figure 7a) the distinction between soy and corn blends, which were organized into two clusters, was observed. Relating to biodiesel concentration in blend was observed that inside the cluster (corresponding to each raw material), there was a subdivision into two other clusters. These subdivisions were associated with 0 % to 3 % (v/v) and 5 % to 10 % (v/v) of biodiesel in diesel blend. The dendrogram in Figure 7b shows the blends divided mainly based on the in two groups: the range of 11 % to 16 % (v/v) and from 17 % to 30 % (v/v) of biodiesel in diesel blend. Inside these groups, the samples were separated into two clusters which corresponded to corn and soybean blends. The dendrogram for SET C (Figure 7c) was divided into two clusters. The first cluster, covering the mixture between 40 % (v/v) to 70 % (v/v) of biodiesel was not arranged in different groups. The second cluster, covering the mixture with more than 70 % (v/v) of biodiesel, was clearly divided the samples of each type of oil. The results of dendrogram reinforce the results

ACS Paragon Plus Environment

17

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 55

obtained by PCA (Figure 6c), where the biodiesel of corn and soybean blends (B40S, B40C, B50S, B50C and B55S) have the same region in graph of PCA obstructing a good distinction. The HCA facilitates the visualization of similarities between samples and the formation of clusters that are organized depending on the type of oil and the concentration of esters in the mixture, and can be used to enhance the aspects already discussed. 4.2.4. SIMCA. The Soft independent modeling by class analogy (SIMCA) is a supervised method which uses PCA performed with objects from each class as training data. The samples are later arranged as belonging to a one or multiple classes, not necessarily producing a classification without overlapping of classes.16 The SIMCA was conducted using six PCA classes, one for each set of soybeans and corn biodiesel, with two principal components in each class. The spectra were preprocessed using smoothing and first order derivate applying the Savitzky-Golay algorithm with 11 points window and a polynomial order of 2 being used NIPALS algorithm, mean centered data, without rotation and cross-validation. The PCA analysis of the fuel blends was conducted using the spectral range from 1800 cm-1 to 650 cm-1 and the SIMCA results did not classify the samples fully in their respective SETs. Almost half of the samples of SETs B and C for soybean were identified in another class (Table 2).

Table 2. Classification results obtained by SIMCA

ACS Paragon Plus Environment

18

Page 19 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

The SIMCA analysis showed that corn had a higher percentage of correctly classified mixtures and suggests that the PCA analysis may have been impaired by some soy blends, which could be identified as outliers (SET B and SET C of the PCA).

4.2.5. Partial Least Squares Discriminant Analysis. Before running the regression model by PLS-R, the spectra were preprocessed using smoothing and first order derivate using the Savitzky-Golay algorithm with 11 points window and a polynomial order of 2. For the development of the PLS-DA, the spectral regions used were 3699-2974 cm-1 and 1841-925 cm-1.18 The PLS-DA was conducted putting the spectral data in the block X, which was related to arranged classes in the vector Y. The class values were arranged in a single column where one of them (+1) was used for the corn biodiesel mixture and the other one (0) for the soybean biodiesel blends. The number of factors chosen for the PLS-DA models follows the criterion of lowest predicted error in full cross validation as well as the explained variance was also evaluated. The model efficiency evaluations were carried out with calibration and testing blends according to the respective SET A, SET B, SET C and a full range classification. The classification parameters obtained by the PLS-DA as well as the explained variance for each model are presented in Table 3.

Table 3. Classification parameters obtained by the PLS-DA

ACS Paragon Plus Environment

19

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 55

The PLS-DA model used 7 as the optimal number of factors for all sets (SET A,SETS B and SET C). The threshold value was 0.5 for all models and the samples were presented with their respective estimated deviations in the graph in Figure 8 according with their respective SETs. All samples whose deviation crossed the limit line were classified as belonging to another class and were accounted as false negatives or false positives. The PLS-DA model uses as evaluation criteria the sensitivity and specificity, calculated according to equations 1 and 2 39, where TP, FN, TN and FP denote the number of observations as: true positives, false negatives, true negatives and false positives, respectively. For a more thorough discussion of these parameters, please see the reviews by Ellison and Fearne39 and López and collaborators40.

Sens =

TP 1 TP + FN

Spec =

TN 2 TN + FP

The results of the test set and their respective deviations were obtained by PLS-DA (Figure 8).

Figure 8. Discrimination by PLS-DA model. a) SET A, b) SET B, C) SET C

ACS Paragon Plus Environment

20

Page 21 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

The PLS-DA model obtained 100 % sensitivity to identify corn biodiesel blends for SET A, SET B and SET C; however, as some samples of soybean biodiesel were identified as belonging to the other groups, sensitivity to the soybean blend was lower than 100 %. The specificity of the model for SETs B and C to corn blends was 75 % and 87.5 %, respectively, which reflects the false positives identified in the soybean groups. However, the specificity for soybean blends was 100 % for all concentration ranges. Lastly, all the samples were classified by a single PLS-DA classification and required 10 as an optimal number of factors. The threshold value was 0.5 and Figure 9 shows the samples with their respective estimated deviations.

Figure 9. Full range classification by PLS-DA model: a) Calibration samples; b) Test samples

The PLS-DA model obtained 98.4 % of sensitivity to identify corn biodiesel blends from all range, however presents 60.4 % of sensitivity for soybean blends discrimination. The model specificity was 60.4 % and 98.4 % for corn and soybean samples, respectively and the low specificity value for corn reflects the false positives belonging to the soybean groups that crosses the threshold limit. The complete PLS-DA model lost efficiency comparing to the classification SET by SET, especially for the soybean samples. Therefore, optimization is required for classification purpose. In order to verify the variables that most contribute to the modeling capability of the PLS-DA model, the loading weights of the 4 main factors are shown in the Figure 10.

ACS Paragon Plus Environment

21

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 55

Figure 10. Loading weights of the four main factors of the PLS-DA single calibration: a) Factor 1; b) Factor 2; c) Factor 3; d) Factor 4.

The principal variables of the infrared spectra that contribute to the model are the region assigned to the carbonyl stretching vibration (C=O) between 1800-1650 cm-1 and could be observed in the factors 1, 2 and 3 (Figure 10a, Figure 10b and Figure 10c). To a lesser extend, the fingerprint region and stretching of the C-H bonds also are important. In the fourth factor (Figure 10d), the spectral region between 1500-1350 cm-1 correponding to the symmetric stretching (~1450 cm-1) and asymmetric stretching (~1380 cm-1) for CH3 and the scissor bending vibration for CH2, become the most important variables for the discrimination between the classes.29 Therefore, the PLS-DA was perfectly capable of identifying almost all corn blends; however, the specificities of the models were compromised due to the lower sensitivity for the identification of soybean samples. 4.2.6. Support Vector Machines Classification (SVM). For classification based on Support Vector Machine (SVM), were built a single model with two classification classes, respective to samples of soybeans and corn fuel blends. The software “The Unscrambler X 10.4” uses a modified SVM algorithm licensed by Hsu and coworkers.41 For the SVM analysis the spectral range used are from 3699 cm-1 to 2974

ACS Paragon Plus Environment

22

Page 23 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

cm-1 and 1841cm-1 to 925 cm-1.18 The spectra were preprocessed using smoothing and first order derivate applying the Savitzky-Golay algorithm with 11 points window and a polynomial order of 2 being used NIPALS algorithm, mean centered data, without rotation and cross-validation. The model was based on the Radial Basis Function Kernel (RBK Kernel) and, with the aim of finding the best combination of the parameters C (capacity factor) and γ (gamma), several grid search was performed. Several simulation was performed with different values of C and γ, obtained through the cross-product of both factors (ranging between 10-2 and 102), following by the analysis of the training and validation accuracy from each situation.

The best model classification was selected and used for developing of the C-SVM model. Table 4 shows the parameters used and their relative efficiency based on cross-validation of the calibration blends.

Table 4. Parameters obtained from cross validation by C-SVM

Table 5 shows the "Confusion Matrix" resulting from the classification obtained for the validation sample by the SVM model. Horizontally are arranged the classes to which belong the samples and vertically are arranged the respective classes to which the samples were assigned. After to apply the model for all the validation samples, the results were subdivided in their respective classes (SET A, SET B and SET C) in order to better visualization of the results.

Table 5. Confusion matrix and classification efficiency obtained by SVM

ACS Paragon Plus Environment

23

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 55

The C-SVM classified more than 80 % of the sample with 87.1% of sensitivity for soybean blends and 82.5 % for the corn blends. The model demonstrated a low resolution for recognize patterns in the samples ranging from 10 % to 30 % of biodiesel in the mixture (SET B). However, the convenience of process all the samples in a single model and the percentage of samples correctly classified recommend deeper investigation of the application of C-SVM for fuel blends classification. 4.2.7. Comparison of multivariate data evaluation methods. The values of sensibility and specificity was used for evaluate the global efficiency of each model and were used to compare the results of the multivariate techniques (Table 6). The results of the SIMCA and PLS-DA are corresponding to the classification SET by SET by adding the results together and the *PLS-DA and C-SVM to the single model performance.

Table 6. Comparison of the supervised multivariate classification methods

Considering only the numerical values, the best model for the approach develop in this work points for the PLS-DA, however, the model lost their efficiency when using all the data at the same time. On the other hand, through a single model, the C-SVM showed great pattern

ACS Paragon Plus Environment

24

Page 25 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

recognition ability and results consistency which were represented by the similarity between the sensitivity and specificity values for corn and soybean fuel blends. 5. CONCLUSIONS With many countries recently increasing the percentage of biodiesel in the fuel blends, the development of techniques to carry out the traceability and identification of different raw materials used in biodiesel production has gained importance. This paper discusses the application of five multivariate statistical techniques (PLS-DA, PCA, SIMCA, HCA and C-SVM) for discriminant analysis and cluster analysis of diesel/biodiesel fuel blends in the range of 0 % to 100 % (v/v) using only FTIR. The multivariable models were robust to perform the discriminant analysis of raw materials used for biodiesel production. All models discussed in this work were able to classify samples containing up to 10 % biodiesel and this range could be found in practically all countries that use fuel blends. The best results presented in this work indicate the C-SVM as the best multivariate model for the application proposed due its robustness and ability classify all data in a single calibration. Deeper investigation of the application of C-SVM for fuel blends classification are required by applying biodiesel from others sources and higher sampling sets. In addition, it is important to highlight the possibility of combining the supplementary information with infrared spectra to develop a more robust multivariate classification model mainly used for blends with a higher content of biodiesel (30-100 % v/v).

ACS Paragon Plus Environment

25

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 55

AUTHOR INFORMATION Corresponding Author * Tel: +55 51 3320-3689. E-mail: [email protected]

ACKNOWLEDGMENT The authors thank the REFAP by donating diesel fuel samples, Institute of Petroleum and Natural Resources - IPR of the Pontifical Catholic University of Rio Grande do Sul for the infrastructure and the National Counsel of Technological and Scientific Development- CNPq for the research scholarships. ABBREVIATIONS FTIR, Fourier Transform Infrared Spectroscopy; HATR, Horizontal Attenuated Total Reflectance; ASTM, American Society for Testing and Materials; PLS-R, Partial Least Squares Regression; PLS-DA, Partial Least Squares Discriminant Analysis; PCA, Principal Component Analysis; SVM, Support Vector Machine; SIMCA, Soft Independent Modeling of Class Analogy; HCA, Hierarchical Clustering Analysis; C-SVM, C-Support Vector Classification; ANN, Artificial Neural Network.

ACS Paragon Plus Environment

26

Page 27 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

REFERENCES (1)

Issariyakul, T.; Dalai, A. K. Renew. Sustain. Energy Rev. 2014, 31, 446–471.

(2)

Mofijur, M.; Masjuki, H. H.; Kalam, M. A.; Ashrafur Rahman, S. M.; Mahmudul, H. M. Renew. Sustain. Energy Rev. 2015, 46, 51–61.

(3)

ASTM Standard D6751. West Conshohocken, PA ASTM Int. 1991, . 2015, 1–11.

(4)

Borugadda, V. B.; Goud, V. V. Renew. Sustain. Energy Rev. 2012, 16 (7), 4763–4784.

(5)

Agência Nacional do Petróleo Gás Natural e Biocombustíveis - ANP. 2016, Fevereiro, 1– 13.

(6)

Bergmann, J. .; Tupinambá, D. .; Costa, O. Y. .; Almeida, J. R. .; Barreto, C. .; Quirino, B. . Renew. Sustain. Energy Rev. 2013, 21, 411–420.

(7)

Geraldes Castanheira, É.; Grisoli, R.; Freire, F.; Pecora, V.; Coelho, S. T. Energy Policy 2014, 65, 680–691.

(8)

Monteiro, M. R.; Ambrozin, A. R. P.; Lião, L. M.; Ferreira, A. G. Fuel 2009, 88 (4), 691– 696.

(9)

Monteiro, M. R.; Ambrozin, A. R. P.; Silva Santos, M. da; Boffo, E. F.; Pereira-Filho, E. R.; Lião, L. M.; Ferreira, A. G. Talanta 2009, 78 (3), 660–664.

(10)

Caires, A. R. L.; Lima, V. S.; Oliveira, S. L. Renew. Energy 2012, 46, 137–140.

(11)

Idoeta, R.; Pérez, E.; Herranz, M.; Legarda, F. Appl. Radiat. Isot. 2014, 93, 110–113.

(12)

Norton, G. A.; Cline, A. M.; Thompson, G. C. Fuel 2012, 96, 284–290.

(13)

Reddy, C. M.; DeMello, J. A.; Carmichael, C. A.; Peacock, E. E.; Xu, L.; Arey, J. S. Environ. Sci. Technol. 2008, 42 (7), 2476–2482.

(14)

Silva, M. P. da F.; Brito, L. R. e; Honorato, F. A.; Paim, A. P. S.; Pasquini, C.; Pimentel, M. F. Fuel 2014, 116, 151–157.

(15)

Bona, M.; Andres, J. Talanta 2007, 72 (4), 1423–1431.

(16)

Balabin, R. M.; Safieva, R. Z. Fuel 2008, 87 (7), 1096–1101.

(17)

Fernanda Pimentel, M.; Ribeiro, G. M. G. S.; da Cruz, R. S.; Stragevitch, L.; Pacheco Filho, J. G. A.; Teixeira, L. S. G. Microchem. J. 2006, 82 (2), 201–206.

(18)

Oliveira, J. S.; Montalvão, R.; Daher, L.; Suarez, P. A. Z.; Rubim, J. C. Talanta 2006, 69 (5), 1278–1284.

(19)

Gontijo, L. C.; Guimarães, E.; Mitsutake, H.; Santana, F. B. De; Santos, D. Q.; Borges Neto, W. Fuel 2014, 117 (PARTB), 1111–1114.

(20)

Mazivila, S. J.; Gontijo, L. C.; Santana, F. B. de; Mitsutake, H.; Santos, D. Q.; Borges Neto, W. Energy & Fuels 2015, 29 (1), 227–232.

ACS Paragon Plus Environment

27

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 55

(21)

Ferrão, M. F.; Viera, M. D. S.; Pazos, R. E. P.; Fachini, D.; Gerbase, A. E.; Marder, L. Fuel 2011, 90 (2), 701–706.

(22)

Ruschel, C. F. C.; Huang, C. Te; Samios, D.; Ferrão, M. F.; Yamamoto, C. I.; Plocharski, R. C. B. J. Am. Oil Chem. Soc. 2015, 92 (3), 309–315.

(23)

ASTM Standard D7806. West Conshohocken, PA ASTM Int. 1991, . 2012, 1–8.

(24)

ASTM Standard D7861. West Conshohocken, PA ASTM Int. 1991, . 2015, 1–7.

(25)

ASTM Standard D7371. West Conshohocken, PA ASTM Int. 1991, . 2014, 1–10.

(26)

Mustafa, Z.; Surchev, S.; Milina, R.; Sotirov, S. Pet. Coal 2015, 57 (1), 40–47.

(27)

Hocevar, L.; Soares, V. R. B.; Oliveira, F. S.; Korn, M. G. A.; Teixeira, L. S. G. J. Am. Oil Chem. Soc. 2012, 89 (5), 781–786.

(28)

Mueller, D.; Ferrão, M.; Marder, L.; Costa, A. da; Schneider, R. de C. de S. Sensors 2013, 13 (4), 4258–4271.

(29)

Mazivila, S. J.; Santana, F. B. de; Mitsutake, H.; Gontijo, L. C.; Santos, D. Q.; Neto, W. B. Fuel 2015, 142, 222–226.

(30)

Ruschel, C. F. C.; Huang, C. Te; Samios, D.; Ferrão, M. F. Quim. Nova 2014, 37 (5), 810– 815.

(31)

Rocha, W. F. C.; Vaz, B. G.; Sarmanho, G. F.; Leal, L. H. C.; Nogueira, R.; Silva, V. F.; Borges, C. N. Anal. Lett. 2012, 45 (16), 2398–2411.

(32)

Balabin, R. M.; Safieva, R. Z.; Lomakina, E. I. Microchem. J. 2011, 98 (1), 121–128.

(33)

Amendolia, S. R.; Cossu, G.; Ganadu, M. L.; Golosio, B.; Masala, G. L.; Mura, G. M. Chemom. Intell. Lab. Syst. 2003, 69 (1-2), 13–20.

(34)

Chih-Wei Hsu, Chih-Chung Chang, and C.-J. L. BJU Int. 2008, 101 (1), 1396–1400.

(35)

Pereira, E.; Santos, L. M. dos; Einloft, S.; Seferin, M.; Dullius, J. Waste and Biomass Valorization 2015, 6 (3), 343–351.

(36)

Sabudak, T.; Yildiz, M. Waste Manag. 2010, 30 (5), 799–803.

(37)

EN Standard 14103. Eur. Comm. Stand. Manag. Cent. 2011.

(38)

ASTM Standard E2056. West Conshohocken, PA ASTM Int. 1991, . 2010, 1–10.

(39)

Ellison, S. L. R.; Fearn, T. TrAC Trends Anal. Chem. 2005, 24 (6), 468–476.

(40)

López, M. I.; Callao, M. P.; Ruisánchez, I. Anal. Chim. Acta 2015, 891, 62–72.

(41)

Chang, Chih-Chung; Lin, C.-J. ACM Trans. Intell. Syst. Technol. 2011, 2 (3), 27:1–27:27.

ACS Paragon Plus Environment

28

Page 29 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Energy & Fuels

Table 1. Blends used for construction and validation of multivariate model a

Calibration Set ID % Biodiesel B0 0 B0.25 0.25 B0.5 0.5 B1 1 B2.5 2.5 B5 5 B7.5 7.5 B10 10 B12.5 12.5 B15 15 B17 17 B20 20 B25 25 B30 30 B50 50 B70 70 B75 75 B80 80 B90 90 B95 95 B97 97 B99 99 B99.8 99.8 B100 100 a

b

Validation Set ID % Biodiesel B0.75 0.75 B1.5 1.5 B3 3 B4.5 4.5 B6 6 B7 7 B8 8 B9 9 B11 11 B13 13 B16 16 B19 19 B22.5 22.5 B27.5 27.5 B40 40 B55 55 B67.5 67.5 B77.5 77.5 B85 85 B92 92 B98 98

Recommendation of ASTM D7371 25; b 21 validation blends, minimum necessary as recommended by the ASTM E205638

ACS Paragon Plus Environment

29

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 55

Table 2. Classification results obtained by SIMCA Class SET A SET B SET C SET A SET B SET C

Raw Material CORN CORN CORN SOYBEAN SOYBEAN SOYBEAN

PCs 2 2 2 2 2 2

Correctly Classified 87.5 % 80 % 100 % 100 % 60 % 60 %

Incorrectly Classified 12.5 % 20 % 0% 0% 40 % 40 %

ACS Paragon Plus Environment

30

Page 31 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Energy & Fuels

Table 3. Classification parameters obtained by the PLS-DA PLS-DA R² Factors a EV % (X/y) RMSEC RMSEP b Sens (%) c Spec (%) d TV PLS-DA R² Factors a EV % (X/y) RMSEC RMSEP b Sens (%) c Spec (%) d TV a

SET A 0.9998 5 99.02 0.0490 0.3372 100 100 0.5 SET A 0.9999 5 99.02 0.0194 0.1315 100 100 0.5

Soybean Blends SET B SET C 0.9999 0.9996 6 6 99.13 99.42 0.022 0.3313 0.5671 3.5824 66.7 76.5 100 100 0.5 0.5 Corn Blends SET B SET C 1 0.9993 6 6 99.13 99.42 0.0005 0.4463 0.4810 2.9798 100 100 66.7 76.5 0.5 0.5

SC 0.9721 10 97.21 0.1036 0.0835 60.4 98.4 0.5 SC 0.9721 10 97.21 0.1036 0.0835 98.4 60.4 0.5

EV-Explained variance;b Sens-Sensitivity; cSpec-Specificity; dTV- Threshold value, SC –single calibration

ACS Paragon Plus Environment

31

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 55

Table 4. Parameters obtained from cross validation by C-SVM C-SVM Parameters

FTIR

Classes Training Accuracy Validation Accuracy SVM Type C (capacity factor) γ (gamma) Kernel type Transformation

2 92.37 % 89.31% C-SVM 59.95 4.6116 Radial basis function Mean Centered

ACS Paragon Plus Environment

32

Page 33 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Table 5. Confusion matrix and classification efficiency obtained by SVM Confusion matrix C-SVM Samples Soybean Corn

Class Soybean Corn Soybean Corn

SET A 23 1 0 24

SET B 14 4 6 12

SET C 17 3 5 16

Total 54 8 11 52

Parameters SET A SET B SET C Sens (%) 95.8 77.8 85.0 Spec (%) 100 66.7 76.2 Sens (%) 100 66.7 76.2 Spec (%) 95.8 77.8 85.0 Sens-Sensitivity, Spec-Specificity

Total 87.1 82.5 82.5 87.1

Classification efficiency Samples Soybean Corn

ACS Paragon Plus Environment

33

Energy & Fuels

Page 34 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

34

Page 35 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Energy & Fuels

Table 6. Comparison of the supervised multivariate classification methods Classification Efficiency Samples Soybean Corn

Parameters Sens (%) Spec (%) Sens (%) Spec (%)

SIMCA

PLS-DA

*PLS-DA

C-SVM

75.8 89.2 89.2 75.8

81.3 100 100 81.3

60.4 98.4 98.4 60.4

87.1 82.5 82.5 87.5

Sens-Sensitivity, Spec-Specificity, *PLS-DA – Single calibration PLS-DA

ACS Paragon Plus Environment

35

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 55

Figure 1. PCA analysis of the raw materials for the production of biodiesel and diesel S10 and S500: a) PCA of all components; b) PCA of biodiesel; c) PCA of diesel.

ACS Paragon Plus Environment

36

Page 37 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 2. PCA analysis: a) Loading of all components; b) Loading of biodiesel; c) Loading of diesel.

ACS Paragon Plus Environment

37

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 55

Figure 3. Displacement of the first derivative FTIR spectra of the carbonyl (1760-1730cm-1) of FAMEs.

ACS Paragon Plus Environment

38

Page 39 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 4. Biodiesel/diesel FTIR spectra.

ACS Paragon Plus Environment

39

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 55

Figure 5. PCA analysis of corn and soybean blends.

ACS Paragon Plus Environment

40

Page 41 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 6. PCA analysis of corn x soy blend: a) SET A; b) SET B; c) SET C; d) All samples.

ACS Paragon Plus Environment

41

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 55

Figure 7. Dendrograms of HCA of corn x soy blend: a) SET A; b) SET B; c) SET C.

ACS Paragon Plus Environment

42

Page 43 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 8. Discrimination by PLS-DA model. a) SET A, b) SET B, C) SET C.

ACS Paragon Plus Environment

43

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 55

Figure 9. Full range classification by PLS-DA model: a) Calibration samples; b) Test samples.

ACS Paragon Plus Environment

44

Page 45 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 10. Loading weights of the four main factors of the PLS-DA single calibration: a) Factor 1; b) Factor 2; c) Factor 3; d) Factor 4.

ACS Paragon Plus Environment

45

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. PCA analysis of the raw materials for the production of biodiesel and diesel S10 and S500: a) PCA of all components; b) PCA of biodiesel; c) PCA of diesel. 361x618mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 46 of 55

Page 47 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 2. PCA analysis: a) Loading of all components; b) Loading of biodiesel; c) Loading of diesel. 372x655mm (96 x 96 DPI)

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Displacement of the first derivative FTIR spectra of the carbonyl (1760-1730 cm-1) of FAMEs 305x213mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 48 of 55

Page 49 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 4. Biodiesel/diesel FTIR spectra. 361x196mm (96 x 96 DPI)

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. PCA analysis of corn and soybean blends 686x783mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 50 of 55

Page 51 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 6. PCA analysis of corn x soy blend: a) SET A; b) SET B; c) SET C; d) All samples 788x443mm (96 x 96 DPI)

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Dendrograms of HCA of corn x soy blend: a) SET A; b) SET B; c) SET C. 368x646mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 52 of 55

Page 53 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 8. Discrimination by PLS-DA model. a) SET A, b) SET B, C) SET C 367x674mm (96 x 96 DPI)

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 9. Full range classification by PLS-DA model: a) Calibration samples; b) Test samples 371x440mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 54 of 55

Page 55 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Figure 10. Loading weights of the four main factors of the PLS-DA single calibration: a) Factor 1; b) Factor 2; c) Factor 3; d) Factor 4. 740x434mm (96 x 96 DPI)

ACS Paragon Plus Environment