Prediction of Kinematic Viscosity and Density of Biodiesel Using

Jul 26, 2016 - ABSTRACT: By Brazilian law, biodiesel has to satisfy certain quality requirements and measurements established by standardized procedur...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/EF

Prediction of Kinematic Viscosity and Density of Biodiesel Using Electrospray Ionization Mass Spectrometry by Multivariate Statistical Models Rodrigo V. Leal,*,†,§ Gabriel F. Sarmanho,† Luiz H. Leal,† Fernanda A. Silva,† Alex P. Barbosa,‡ and Peter R. Seidl§ †

Organic Analysis Laboratory and ‡Fluids Laboratory, National Institute of Metrology Quality and Technology (INMETRO), Rio de Janeiro 20261-232, Brazil § School of Chemistry, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-900, Brazil S Supporting Information *

ABSTRACT: By Brazilian law, biodiesel has to satisfy certain quality requirements and measurements established by standardized procedures, as is the case for kinematic viscosity and density. In this respect, information on the profile of methyl esters in biodiesel is very important because they are directly related to both these parameters. The objective of this study was to determine the profile of methyl esters present in a biodiesel sample by electrospray ionization mass spectrometry and evaluate its reliability in predicting their kinematic viscosity and density. Two multivariate statistical models were used for this purpose, the multiple multivariate linear regression (MMLR) and the partial least square regression (PLSR). The input variables used in the models were the relative intensities of the main methyl ester peaks, and the models were compared by their predictive behavior. Samples were randomly divided into two parts: 87% in the training or calibration set, used for the estimation of MMLR and PLSR models, and the remaining 13% in the test or validation set, which was used to evaluate the predictive power of each model that was estimated. Although the root mean squared error and R2 for the MMLR model were slightly better than those of the PLSR model (R2PLSR = 0.9232 and R2MMLR = 0.9908 for kinematic viscosity and R2PLSR = 0.8721 and R2MMLR = 0.9415 for density), both showed a similarity with respect to predicted values for the training and validation sets, and thus for the performance statistics, attesting to the quality of these models in predicting kinematic viscosity and density. Furthermore, the prediction of kinematic viscosity showed better performance compared to the density.

1. INTRODUCTION

of the technique in the identification of chemical compounds, its quantification has not yet been explored very much.2,3 There are two ionization modes in electrospray, and its application depends on the nature of the molecule to be analyzed. In biodiesel samples, the alkyl esters can be identified by the positive mode, ESI(+), and free fatty acids using the negative mode, ESI(−). Furthermore, in the positive mode it can check the efficiency of the transesterification reaction through the analysis of unreacted mono-, di-, and triacylglycerols. Electrospray ionization is considered a soft ionization technique because the ions that are generated usually have a low amount of internal energy.4 Thus, in the mass spectra, little or no fragmentation can usually be observed. In this kind of ionization, the solution containing the sample is subjected to an electrolytic spray under atmospheric pressure. A fine spray is formed in the presence of a high electric field. Drops with excess charge (positive or negative) are formed. The evaporation of the solvent, due to the action of the nebulizer gas, decreases the size of these drops and consequently increases the electrostatic repulsion between charges. The surface tension of the droplets becomes weak and ever decreasing until it breaks.

Research on biodiesel as an alternative source of energy has been growing because of changes associated with gas emissions and sustainable development and because it is obtained from renewable sources such as oilseed plants. Presently the use of diesel in Brazil is regulated by law1 as a 7% blend of biodiesel in diesel. Pure biodiesel must satisfy quality requirements established by resolutions of the National Agency of Petroleum, Natural Gas and Biofuels, including acceptance limits and procedures by technical standards. Biodiesel consists predominantly of alkyl esters of triacylglycerols obtained by transesterification reactions. A good understanding of these esters’ profiles is important because it is directly related to some physicochemical properties such as density, kinematic viscosity, iodine value, cold filter plugging point, and oxidative stability. Gas chromatography (GC) techniques are commonly used for the analysis of this profile and quantification of the corresponding esters. The development of instrumental analytical methods leads to a large body of information on a sample; in the same way, the improvement of statistical techniques allows the analysis of large amounts of data. In this context, electrospray ionization mass spectrometry (ESI-MS) using direct infusion has been increasingly used to identify chemical profiles and natural markers, mainly with nonvolatile molecules. Despite the strength © XXXX American Chemical Society

Received: April 13, 2016 Revised: July 5, 2016

A

DOI: 10.1021/acs.energyfuels.6b00890 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels

methanol, and 30 mL of concentrated sulfuric acid, as esterification agent for free fatty acids; sodium chloride aqueous saturated solution to avoid emulsions; hexane as an extraction agent. For ESI-MS analysis, the following were used: toluene, formic acid solution in methanol (0.1%), and sodium chloride aqueous solution (0.1 N). 2.2. Biodiesel Synthesis. Syntheses were based on the alkaline transesterification reaction. In each run, 300 mL of potassium hydroxide solution in 100 mL of each vegetable oil sample was added under stirring for 10 min at 60 °C. After the sample was cooled, 150 mL of esterification agent was added under stirring for 10 min more at 60 °C. Then 150 mL of a sodium chloride solution and 200 mL of hexane were added. After phase separation, the upper one was removed and the solvent evaporated, leaving only biodiesel. 2.3. ESI-MS Analysis. Sample preparation proved to be very effective by diluting biodiesel in two stages. In the first stage, 30 μL of biodiesel was dissolved in 1 mL of toluene; and in the second stage, 10 μL of the first solution was dissolved in 1 mL of formic acid solution in methanol, usually used for analysis in positive mode ESI ionization. The addition of sodium chloride solution was needed to induce only sodium adducts [M + Na]+, thus eliminating the other possible forms of the same species, e.g., protonated [M + H]+, ammonium adducts [M + NH4]+, and potassium adducts [M + K]+. Relative intensities were obtained by direct infusion into the mass spectrometer Xevo-TQ (Waters) consisting of an ESI source and quadrupoles as analyzers. The ionization source parameters were set as capillary voltage, 3000 V; sampling cone voltage, 30 V; desolvation gas flow, 300 L/h; collision energy, 1 V; extraction cone voltage, 7.5 V; and desolvation and source temperatures, 40 °C. The biodiesel samples were analyzed in the positive ionization mode, ESI(+), for detecting methyl esters in the form of sodium adducts [M + Na]+. The spectra were acquired in the m/z range between 100 and 500, thus covering all possible types of esters in the biodiesel. 2.4. Density and Kinematic Viscosity Analysis. The density measurements at 40 °C and kinematic viscosity at 40 °C were performed in digital viscometer model Stabinger SVM 3000 (Anton-Paar).10 2.5. Multivariate Statistical Analysis. Regarding the data structure and objective of the study, an appropriate statistical technique for such situations is the regression analysis, which should explain and quantify a variable set of interest (also called dependent), here denoted by Y = (Y1, ..., Yq), from a set of regression variables (also called independent), here denoted by X = (X1, ..., Xp), based on a sample of size n. Therefore, in this study, q = 2, with Y1 = density (g/cm3) and Y2 = kinematic viscosity (mm2/s) as dependent variables, and p = 11, with Xi being one among the 11 relative intensities from methyl esters (%) as independent variables. 2.6. Multiple Multivariate Linear Regression. This is the natural approach when both X and Y are quantitative and multivariate. It can be viewed as an extension of the multiple linear regression (MLR)11,12 when Y has more than one variable simultaneously associated with the same regressors, X. The model can be described by the matrix equation Y = XB + E, wherein X is the matrix of independent variables juxtaposed to the column vector whose elements are all equal to 1; B is the (p + 1) × q matrix containing the unknown model parameters; and E is the (n × q) random errors matrix with mean vector 0 (q × 1) and covariance matrix Σ, usually unknown. It is also supposed to be independent between lines (observations), while the error terms associated with different responses (columns) can be correlated. Similarly to the MLR model, it is possible to estimate the parameter matrix B via least-squares or the maximum likelihood methods, by using equation b = (XTX)−1XTY. The estimated model can be evaluated by multivariate analysis of variance, and the parameter significance is commonly carried out by a multivariate statistical test based in Pillai statistics. This one uses the eigenvalues of a matrix which is a function of the decomposition matrix of the sum of squares and cross regression products and has no known distribution, but tabulated values.13 2.7. Partial Least Squares Regression. PLSR14 is a very common technique in chemometric literature, being used when the basic assumptions of MMLR are violated, such as strong linear correlation between independent variables (multicollinearity), small sample size

The combination of analytical techniques and statistical models provides a wide variety of information from the sample. In this study the performance of the ESI-MS by direct infusion to predict two physicochemical properties of biodiesel, density and kinematic viscosity, was evaluated. This task was carried out by extracting the relative intensities of methyl esters from the spectra and then using multivariate statistics to construct prediction models, specifically the multiple multivariate linear regression (MMLR) and the partial least square regression (PLSR). In the literature there are models to predict some biodiesel properties: Meira et al. (2012)5 used fluorescence spectroscopy to predict the kinematic viscosity and density of biodiesel/diesel blends by using the PLSR. Meng et al. (2014)6 applied artificial neural network (ANN) to predict the kinematic viscosity of biodiesel with the data collected from the literature, using mass fractions of methyl esters. Tong et al. (2011)7 developed a relationship between the compositions of methyl esters of fatty acids and cetane numbers (CN). The CN data were collected from the literature for different types of biodiesel. Regression analysis was performed to establish a correlation equation. However, there are only few studies using ESI-MS data to analyze chemical properties using statistical techniques. Prates et al. (2010)8 combined multivariate calibration and ESI-MS to quantify the biodiesel content of a soybean/tallow blends with diesel. Alves et al. (2014)9 applied PLS in spectra obtained by ESI-MS to determine adulteration of extra virgin olive oil with four adulterant oils (soybean, corn, sunflower, and rapeseed). The objective of this work was the development of an appropriate statistical model to measure two properties of interest in biodiesel samples, the kinematic viscosity and the density, through the relative intensities of methyl esters extracted by direct infusion into ESI-MS. The MMLR and PLSR models were fitted to the data, and once a model was estimated, predictions could be made for new biodiesel samples. This paper is organized as follows: Experimental details and a part of the models’ theory are described in Experimental Section. Statistical analyses with biodiesel data are in Discussion and Results, including (i) exploratory analysis from graphs and descriptive measurements in order to observe the possible behavior patterns or other anomalies in the data set and (ii) inferential analysis, where MMLR and PLSR models are estimated and compared by their predictive performance. The final remarks and discussion are in Conclusions.

2. EXPERIMENTAL SECTION In order to ensure a better distribution of methyl esters and therefore achieve greater variability of kinematic viscosities and densities, samples of biodiesel from different origins were synthesized, as well as samples of their respective blends. In this process, 38 samples of biodiesel were obtained and the measurement of independent variables were made in terms of relative intensities by ESI-MS from 11 representative methyl esters: C10:0, C12:0, C14:0, C16:0, C16:1, C18:0, C18:1, C18:1:OH, C18:2, C18:3, and C22:1, considering the representation Cn:db as general abbreviation, where C is carbon, n the number of carbons referred to carboxylic acid function in the ester, and db the number of double bonds present in carboxylic acid chain (i.e., C18:1 is methyl cis-9octadecenoate, or methyl oleate). Only methyl ricinolate has a hydroxyl (OH) group, and its location is after db indication. 2.1. Samples and Reagents. All reagents used were analytical grade and water type 1. Commercial vegetable oils were used as raw material samples. For biodiesel synthesis, the following were used: potassium hydroxide solution (0.5 mol·L−1) in anhydrous methanol, as a transesterification agent; a solution of 20 g of ammonium chloride, 600 mL of anhydrous B

DOI: 10.1021/acs.energyfuels.6b00890 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels when compared to the number of independent variables, or missing values, among others. PLSR is similar to principal component regression (PCR), where response variables Y are explained by some latent variables, scores or components, which are linear combinations of independent variables, X.15 By using components as regressors, it reduces the problem dimension reasonably, explaining the variability. The main difference is that in PLSR the components simultaneously minimize the variability of X and Y, maximizing the covariance between the scores and Y. Therefore, in PLSR the major components have maximum correlations with the response variables, using them to define scores and loadings. In general, PLSR requires a higher number of components to reach the same prediction error when compared to PCR. The usual way16 to determine the number of components is through cross validation (CV), whose core is the evaluation of predictive capacity of the model through a sequential data set partition process. At each step one part is used for the model adjustment and another for the prediction and calculation of statistics. In the end, models with different numbers of components are compared. Because there is no analytical solution to PLSR parameter estimation, many algorithms were developed and are available in the literature. In all of them the scores of X and Y are also estimated, allowing interpretability of correlations and other statistics. 2.8. Statistical Software. All statistical analyses were performed using the R statistical software,17 an open source free environment for statistical computing and graph creation. MMRL and PLSR models were built using the packages car18 and pls,19 respectively. The confidence level of all tests was 95%.

3. RESULTS AND DISCUSSION 3.1. Data Set. All 38 biodiesel samples were analyzed in triplicate by mass spectrometry, varying only the injections of the samples, thereby the variability of the instrument was considered. Analogously, the measurements of kinematic viscosity and density were performed in replicates. The averages of the replicates were used to compose both X and Y matrices required for estimation of statistical models. Figure 1 shows the same biodiesel (soy) sample from two perspectives. The upper panel presents the mass spectrum in terms of relative intensity of the highest peak, reported by ESIMS, and the lower panel presents the “mean spectrum” after averaging the replicates of the absolute intensities and converting to the relative intensities related to the total peaks present in the sample, which represents the composition of each ester in the sample closer to the real values. Samples were randomly divided into two parts: 33 (approximately 87%) in the group called training or calibration, used for the estimation of MMLR and PLSR models, and the remaining 5 (approximately 13%) in the test or validation group, used to evaluate the predictive power of each estimated model. 3.2. Exploratory Data Analysis. To determine possible general features, such as correlations, trends, or anomalous behavior, an exploratory statistical analysis was performed with all measured variables in the biodiesel samples. Table 1 shows univariate statistics of both target and regressor variables. It can be noted that the highest relative intensities values are in esters C18:2 (28.63%), C18:1 (19.67%), and C18:1:OH (15.55%). The median values of C18:2 (24.37%) and C18:1 (17.39%) esters confirm their high intensities in the samples, because they are very close to their mean values. Otherwise the median of C18:1:OH is zero, which means that there are samples with very high intensities of this ester despite many others with none. This can be explained by the fact that only some samples prepared by biodiesel blends contain castor oil, which is the only source of methyl ricinoleate (C18:1:OH).

Figure 1. (Upper panel) Biodiesel spectrum measured by ESI-MS; (lower panel) graph created after conversion to the relative intensities to the total peaks present in the sample.

The high standard deviation values for C18:2 (19.25%), C18:1 (12.78%), C18:1:OH (26.35%), and C22:1 (7.75 %), evidencing high intensities variability among the samples, might be explained by the intentional preparation of the blend samples, which had randomly distributed different esters and thus provides a large range of kinematic viscosity and density. The box-plots in Figure 2 illustrate the behavior of empiric values of all variables. Furthermore, C12:0, C14:0, and C16:0 have close values, justifying the appearance of box-plots related to each other, which was expected because these three saturated esters are found typically in the same proportions in certain types of samples as, for example, in biodiesel from coconut. Visually, the variable “density” has a roughly symmetric distribution, with little variability related to the mean, as noted by its coefficient of variation (0.0267) in Table 1. The variable “kinematic viscosity” is asymmetric, skewed to the left, with median 3.77 mm2/s and a slightly higher mean, 4.54 mm2/s. This suggests the influence of higher values, as seen in the box plot, where the four samples were found discrepant in relation to the others, explaining the high coefficient of variation (0.60) when compared to the variable density. Before fitting the regression models, it is important to evaluate the possible correlations between all variables, dependent and independent. Figure 3 shows the Pearson’s correlation coefficient inside a mixed graphical representation of a correlation matrix and a heat map. There is a strong linear correlation (0.9) between the kinematic viscosity and density. This fact is important to justify the use of multivariate models as MMLR for joint modeling of the target variables. In an opposite scenario, low correlation C

DOI: 10.1021/acs.energyfuels.6b00890 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels Table 1. Descriptive Statistics for the Target (Y) and the Regressor (X) Variable Sets of All Biodiesel Samples type target or dependent variables regressors or independent variables (esters)

variable

mean

median

min

max

SD

CV

kinematic viscosity (mm2/s) density (g/cm3) C10:0 (%) C12:0 (%) C14:0 (%) C16:0 (%) C16:1 (%) C18:0 (%) C18:1 (%) C18:1:OH (%) C18:2 (%) C18:3 (%) C22:1 (%)

4.5436 0.8562 0.2338 3.0442 2.5160 3.0383 0.0742 0.8843 19.6664 15.5517 28.6296 5.1949 3.9933

3.7736 0.8535 0.1564 0.0000 0.2235 1.7778 0.0579 0.0000 17.3906 0.0000 24.3744 4.3932 0.6374

1.7733 0.8171 0.0172 0.0000 0.0000 0.0637 0.0032 0.0000 0.5305 0.0000 1.2206 0.1702 0.0130

13.0480 0.9044 1.0980 21.4342 16.9574 11.1529 0.2069 5.3444 54.1286 79.9791 62.1665 14.1166 32.7713

2.7296 0.0229 0.2226 6.0986 4.7375 3.2732 0.0517 1.5946 12.7861 26.3517 19.2591 4.0826 7.7504

0.6008 0.0267 0.9521 2.0033 1.8830 1.0773 0.6960 1.8033 0.6502 1.6945 0.6727 0.7859 1.9409

Figure 3. Correlation matrix between the target and regressor variables. In the upper triangle are the values of the Pearson’s correlation coefficient, and in the lower triangle is a heatmap representing the respective correlation intensity, as described in the legend.

Figure 2. Box-plots of the variables: targets (kinematic viscosity and density) and the regressors (methyl esters relative intensities).

five esters (C10:0, C12:0, C14:0, C16:0, and C18:1) had values in the range (−0.68, −0.43). 3.3. Multivariate Inferential Analysis. Once the biodiesel data was described, the MMLR and PLSR models were fitted to the training set, in order to describe a linear relationship between the kinematic viscosity and the density of the biodiesel from relative intensities of the esters measured by ESI-MS. At first, the MMLR model was fitted to the training data, regarding all independent variables as regressors. The results of the hypothesis test in this “general” or full model, by means of a multivariate analysis of variance (MANOVA) type II sum of squares approach of the MMLR model, is shown in Table 2. Considering the results for Pillai statistics, only four esters (C12:0, C14:0, C18:1:OH, and C18:2) were statistically significant for simultaneously explaining the variability of kinematic viscosity and density, once the respective p-values for the parameters’ F-tests are larger than the predetermined significance level (5%). This might have happened because of the multicollinearity in the X regressor matrix; therefore, many variables should be eliminated to avoid redundancy. Because of this fact, a second model was adjusted, considering only the four significant variables in the previous model as

would suggest the adoption of univariate independent models for each one them. Among the regression variables X, many pairs had strong linear correlation: C10:0 and C12:0 (0.81), C10:0 and C14:0 (0.81), C10:0 and C16:0 (0.75), C12:0 and C14:0 (near 1), C12:0 and C16:0 (0.93), and between C14:0 and C16:0 (0.93). The heat map in Figure 3 allows easy viewing of the small cluster formed by these four esters and also between them and the C18:0, with stronger colors in intersection points. There is also negative linear correlation, C18:1 and C18:1:OH (−0.64) and C18:2 and C18:1:OH (−0.54). Thus, it is evident that the presence of multicollinearity in the regressors variables matrix, which can be a problem in the estimation of MMLR, does not affect the PLSR model. There are high positive linear correlation values between the C18:1:OH and the target variables: kinematic viscosity (0.82) and density (0.73). It was the only regressor variable that has a high positive index, and this fact can be explained by the highest kinematic viscosity index for biodiesel samples from mamona, the only raw material that presents methyl ricinolate. Among those with negative linear correlation with the target variables, D

DOI: 10.1021/acs.energyfuels.6b00890 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels

size (here, k = 10), and in each k steps of the process, k − 1 subsets are used to fit the PLSR model and the remaining set is used to predict new values and calculate predictions statistics, such as the root mean squared error (RMSE) and the coefficient of determination (R2). In Figure 4, RMSE and R2 average values (of the k CV steps) are compared according to the number of components used in the PLSR model.

Table 2. MANOVA for General MMLR Model, the One with All Esters as Predictor Variables variable

Pillai statistic

F value

DF num

DF den

P -value

intercept C10:0 C12:0 C14:0 C16:0 C16:1 C18:0 C18:1 C18:1:OH C18:2 C18:3 C22:1

0.99527 0.11217 0.39067 0.40387 0.07871 0.03026 0.12389 0.22782 0.29105 0.42223 0.0448 0.22814

2106.14 1.26 6.41 6.77 0.85 0.31 1.41 2.95 4.11 7.31 0.47 2.96

2 2 2 2 2 2 2 2 2 2 2 2

20 20 20 20 20 20 20 20 20 20 20 20