Sulfur Determination in Brazilian Petroleum Fractions by Mid-infrared

Dec 30, 2015 - Julia T. C. Rocha†‡, Lize M. S. L. Oliveira§, Julio C. M. Dias§, Ulysses B. ... Lúcio L. Barbosa , Cristina M.S. Sad , Vinícius...
0 downloads 0 Views 784KB Size
Subscriber access provided by ORTA DOGU TEKNIK UNIVERSITESI KUTUPHANESI

Article

Sulfur determination in brazilian petroleum fractions by MIR and NIR using PLS associated to variable selection methods (iPLS, siPLS, UVE and GA) Julia Tristao do Carmo Rocha, Lize Mirela S. L. Oliveira, Julio Cesar Magalhaes Dias, ULYSSES BRANDAO PINTO, Maria de Lourdes S. P. Marques, Betina Pires de Oliveira, Paulo Roberto Filgueiras, Eustaquio Vinicius Ribeiro Castro, and Marcone Augusto Leal de Oliveira Energy Fuels, Just Accepted Manuscript • DOI: 10.1021/acs.energyfuels.5b02463 • Publication Date (Web): 30 Dec 2015 Downloaded from http://pubs.acs.org on December 31, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Energy & Fuels is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Sulfur determination in Brazilian petroleum fractions by MIR and NIR using PLS associated with variable selection methods (iPLS, siPLS, UVE and GA) Julia T. C. Rocha*,†,‡, Lize M. S. L. Oliveira§, Julio C. M. Dias§, Ulysses B. Pinto§, Maria de Lourdes S. P. Marques§, Betina P. Oliveira‡, Paulo R. Filgueiras‡,

Eustáquio V. R. Castro‡, Marcone A. L. de Oliveira† †

Grupo de Química Analítica e Quimiometria (GQAQ), Department of Chemistry,

Federal University of Juiz de Fora, 36036-900, Juiz de Fora, MG, Brazil. ‡

Laboratory of Research and Development of Methodologies for the Analysis of Oils

(LabPetro), Department of Chemistry, Federal University of Espírito Santo, Avenida Fernando Ferrari, 514, Goiabeiras, 29075-910, Vitória, Espírito Santo, Brazil. §

CENPES/PETROBRAS, Av. Horacio Macedo 950, University City-RJ 21941-598,

Brazil.

KEYWORDS Near-infrared spectroscopy; Mid-infrared spectroscopy; Petroleum derivatives; Sulfur content; Variable selection

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Chemometric tools (PLS) with variable selection (iPLS, siPLS, UVE and GA), associated with near infrared (NIR) and mid infrared (MIR) spectroscopy, were applied for the determination of sulfur content (wt %) in petroleum derivatives. The evaluation of the models was conducted by the determination and analysis of the following requirements: coefficient of determination (R2); the curve obtained by plotting the predicted versus measured values; and also the cross-validation and prediction errors. The developed models presented in this work had satisfactory results, demonstrating that both NIR and MIR techniques, combined with chemometric tools, can be used to determine the sulfur content in petroleum fractions, with LOD and LOQ from 0.0234 and 0.0781 respectively. All variable selection techniques improved the predictive ability of the mode when compared with the full-spectra model, except the UVE method that showed a better performance only when associated with MIR data.

ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

INTRODUCTION The quantitative analytical chemistry was greatly impacted with the development of multivariate calibration techniques. Through these, it is possible to estimate a property of interest from other measurements, which are commonly spectra obtained by simpler, faster and less costly analytical procedures and less dependent on sample size.1 The multivariate calibration methods allow the treatment of complex data from a mathematical and statistical point of view by correlating instrumental measurements and values with a corresponding property of interest.2,3 In this point, the chemometrics, associated with spectroscopy (near and midinfrared in particular), has shown potential as a tool for analytical chemistry by generating alternative methods for the characterization and evaluation of physical and chemical properties of petroleum and its derivatives with high precision, reliability and speed.4-10 Such methods have increasingly been used in petroleum and its derivatives data, since these products are generally highly complex, and their physicochemical characterization requires considerable effort. In this context, sometimes there is urgency to obtain results of certain analysis and the decision-making is hampered by the way the analyses are made.11-14 Thus, there is a necessity to develop faster, simpler, more economical, more reliable techniques and that cause less environmental impact for the determination of these properties. The sulfur content, for example, is an important parameter for the characterization of petroleum and its products because, besides being toxic, most sulfur compounds cause problems during handling, transportation and the usage of the products such as corrosion in pipelines, emission of greenhouse gases during combustion of fuels and reduction of the catalytic efficiency in refineries.15

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

An upper limit for sulfur content in transportation fuels such as gasoline and diesel is determined by environmental legislations around the world and currently the value allowed is around 10 ppm.16 The products obtained from petroleum in refineries, and also the residues, go through a desulfurization process and, the recovered sulfur is used in various industries such as fertilizer, cosmetic, pharmaceutical and paper.17 Currently, for the quantification of the sulfur content in petroleum derivatives, fluorescence spectroscopy of x-rays and ultraviolet are the techniques used as the reference methods (ASTM D4294 and ASTM D 5453).18,19 Although these methods are reliable, the fluorescence x-rays method, in particular, is a very laborious method which requires vacuum for its implementation. Therefore, the implementation of alternative methods, like multivariate calibration methods, is interesting to determine the sulfur content in petroleum derivatives. In this context there are already published in the literature, some successful studies involving the application of spectroscopic methods associated with chemometric techniques for the determination of this property in petroleum and in some kind of derivatives, especially in diesel fuel.20-24 The partial least squares (PLS) method is the most commonly used regression method to build multivariate calibration models from first order data. This method does not require an accurate knowledge of all components present in the samples and it can make the prediction even in the presence of interfering compounds, once these are also present during the construction of the model.25,26 The model-building by multivariate calibration can be performed using the information of the entire spectra, correlating it with the property of interest. However, as the entire spectral range provides a large number of variables, some of these can interfere in the model-building, furthermore it can make the treatment of the data slower. Therefore, to improve the performance of multivariate calibration techniques,

ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

appropriate procedures have been used for the selection of the spectral regions associated with the property of interest.26,27 Some methods to select spectral region have already been described in the literature and they usually improve significantly the performance of the model when compared to full spectra calibration methods. These methods are called variable selection methods and they choose spectrum specific regions (a wavelength or a set of wavelengths) in which the collinearity is not so important, capable to generate models which are more stable, robust and simpler to interpret. In practice, the philosophy is based on the identification of a subset of the entire data that produces lower errors of prediction.25,26 The existing variable selection methods existing differ by the procedure performed for the selection of the spectral region. Among the methods currently used it can be highlighted the Interval Partial Least Square (iPLS),25,27 the Elimination of Uninformative Variables in Partial Least Square (UVE-PLS)28 and the Genetic Algorithm (GA).29 The variable selection technique allows the elimination of not important information like, for example, infrared bands that do not contain any information of the properties to be analyzed or that have significant signal-to-noise amplitude.25 A study has already been made comparing the efficiency of the application of these variable selection techniques, associated with near-infrared spectra, applied to the analysis of intact tablets.30 In that work, the author highlighted that the results found were valid only for that specific data set and that further measurements and investigations were needed to be made before any general conclusion could be drawn about the variable selection methods. In that context, the main proposal of this work is the application of chemometric tools (PLS ) with variable selection, iPLS , siPLS (Synergy Interval Partial Least Square) , UVE and GA in particular , associated with

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

spectroscopy in the mid-Infrared (MIR) and near infrared (NIR) regions, for the determination of sulfur content (wt %) in petroleum derivatives.

EXPERIMENTAL In this study 101 petroleum fractions were used. They were obtained by distillation of nine different Brazilian petroleums with API (American Petroleum Institute) gravity ranging from 12.3 to 57.7. The petroleum samples used were distilled between 15 °C and 500 °C according to the standard methods ASTM D 289231 and ASTM D 5236.32 Determination of the sulfur content. The sulfur content was determined according to the respective standard methods. This property has two related methods, the ASTM D 545319 which refers to the analysis of light fractions, and the ASTM D 42941, concerning to the analysis of medium and heavy fractions. The experimental sulfur content ranged from 0.00005 to 0.94 wt %. Instrumentation. The NIR spectra were obtained on a Nicolet 380 spectrometer, from Thermo Fisher, equipped with DTGS KBr detector and XT- KBr beam splitter, using a white light as source. The registered spectrum was obtained as an average of 64 successive scans with a resolution of 8 cm-1 in the operating range of 9,200 to 3,500 cm-1. The MIR spectra were obtained on a FTLA2000-102 spectrometer, manufactured by ABB BOMEN, using as accessory a horizontal ATR cell (ZnSe, 45°) 80 mm long, 10 mm wide, 4 mm thick and 10 reflections, by Pike Technologies Manufacturer. The recorded spectrum was obtained as the average of 32 consecutive scans, with resolution of 4 cm-1, in the operating range of 4,000 to 630 cm-1. Chemometric approach. The PLS multivariate regression method was used in this work to build predictive models of sulfur content in petroleum fractions. The PLS

ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

method uses the analytical responses, as well as the information of interest to capture the variance of the data from the matrix X and from the vector with the property of interest y, through their successive and simultaneous decomposition, correlating them.2,33 The PLS model is, thus, obtained through an iterative process, which simultaneously optimizes the projection of the samples on the loadings to determine the scores, and the set by a linear function of the scores of the matrix X on the scores of the vector y, in a sum of h latent variables (LV), in order to minimize the deviations. The PLS model can be defined through external relationships, that correlate individually X and Y (according to the equations 1 and 2), and internal relationships that correlate both matrices2.

X = TP T + E X = ∑ t h p Th + E X y = Uq T + e y = ∑ u h q Th + E Y

(1) (2)

where X is the data matrix (instrumental measurement), y is the response vector (containing the interest property), T and U are the scores of X and y respectively; the elements of P and q are the loadings. The Ex and Ey matrices correspond to the residues and h is the number of LVs. Variables selection. The variable selection methods iPLS, siPLS, UVE and GA were tested and their performances were compared in order to assess the presence of variables which could interfere negatively at the model-building, or slower the processing of the data. iPLS and siPLS. The iPLS method is an extension of the PLS, which develops local models in equidistant intervals of the full spectral region. Its main purpose is to provide relevant information in different subdivisions of the global spectrum, in order to remove the spectral regions whose variables are supposedly presented as less relevant

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and/or interfering. From this point, a new PLS model is built from the selected variables.27 In the iPLS method, successive regressions by partial least squares in equalweight subintervals of the entire spectrum are executed. The spectrum is divided into as many parts as desired, and successive iterations are made until an optimum subinterval is achieved, that is, spectral regions with prediction errors smaller than the global model. This assessment also considers other factors such as the coefficient of determination (R2), slope and offset of the plot of predicted versus measured values.27 Anomalous samples and/or PLS predictions detected must be, generally, removed prior to application of iPLS. The built models are evaluated just like in a conventional PLS model. The method is intended to give an overview of the data and may be useful to select the most significant variables in the construction of a suitable calibration model. However, it indicates the region in which the information is contained, being an univariate approach, but it does not provide synergism of the spectral regions involved.34 In that context, to select the subintervals in order to achieve better predictive ability, siPLS can be used, as it is an extension of the iPLS. This enables to select the best combination of intervals combining 2 by 2, 3 by 3 and 4 by 4 sub-regions of the spectrum, generally providing better R2 and smaller prediction errors than iPLS. UVE-PLS. In the UVE-PLS method, the algorithm detects and eliminates from a PLS model the uninformative variables. The criterion used to distinguish the informative and uninformative variables is the reliability of the regression coefficients b, obtained by leave-one-out cross-validation. For that, a regression coefficient b is left out of the estimative of the mean of the regression coefficients bj (j = 1, ... , n) and of

ACS Paragon Plus Environment

Page 8 of 30

Page 9 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

the standard deviation of the vector of n coefficients bij, s(bj), then the factor which was left out is predicted by the regression model.35 The reliability criterion tj (for each variable j) indicate the variables that can be removed, the coefficients are considered small if lower than tj. It is calculated based on Equation 3:

tj =

bj

s (b j )

, for j=1, ... ,2p

(3)

where bj and s(bj) correspond to the mean and standard deviation of the regression coefficients respectively and p is the number of variables of the instrumental responses. However, the standard deviations s(bj) for the regression coefficients b cannot be estimated directly. It is necessary to use a matrix of random variables (which simulates the noise data) attached artificially to the experimental data.35 For the implementation of the UVE-PLS itself, initially the algorithm creates an array of random numbers [0, 1] with the same dimension of the data matrix X. These numbers are, then, multiplied by a small constant (e.g., 10-10) giving them at least an order of magnitude similar to the instrumental noise. This multiplication applied retains the variation of the variables, but makes their influence in the model negligible.28 The new matrix is added to the original matrix X, forming an extended matrix with twice the number of the original matrix variables. PLS models are built for each sample using leave-one-out cross-validation. This leads to a matrix of coefficients of regression b with n rows (samples) and a column for each variable, original and randomic. The t values are calculated as shown in Equation 3. Finally, a cut-off limit (Equation 4) is fixed by a range with the highest positive and negative t values calculated for the random variables. All the variables with equal or lower t value, that is, in the cut-off range, are eliminated from the final model. This means that all random

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 30

and original variables, which are considered to contain nothing but noise, are eliminated.28,35

cutoff = k * max( abs ( c noise ))

(4)

where k is an arbitrary value,  are the values for the artificial variables, and  ( (  is the maximum absolute value for the reliability criterion. A new PLS model is, therefore, built from the selected variables by UVE and its efficiency is compared to the global PLS model. GA. The GA is a variable selection technique widely applied in chemometrics. The start point for the use of GA is the mathematical representation of the problem. After coding, the algorithm is initialized and searches, iteratively, optimal points inside the sample domain.29,36 For the vibrational spectroscopy, each spectrum is treated as being composed of a set of genes (wavenumbers, for example) that are arranged in a binary code. Each variable or wavenumber can receive the binary encoding "1" or "0" (selected or not selected).29,30 The original chromosome is randomly disturbed creating several chromosomes from the initial population. The response for each chromosome, associated with the corresponding experimental conditions, is evaluated. This is done by building a PLS model for each chromosome, using only the variables that were encoded with "1". The model is then evaluated through cross-validation to acquire a value that describes the ability of the model quality. From the initial population, a new one which can be considered as the “next-generation” is obtained by random crossing of genetic material from different chromosomes. For that, two parent chromosomes are generally divided into two or three parts, each one chosen randomly, which are crossed and combined to form two daughter chromosomes to replace the parent chromosomes into a new generation. A new evaluation is performed, and the chromosomes with the highest

ACS Paragon Plus Environment

Page 11 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

ability have a greater probability of reproduction than the chromosomes with the lowest skills.29,30,37 Mutations can be incorporated into the model and they are, sometimes, necessary to overcome some problems in the population. Mutations are used to give new genetic information to the population, i.e., a variable not selected in any of the original chromosomes, would never be selected in the next generation if mutations were not present; they are also used to prevent the population to saturate with similar chromosomes (premature convergence). A mutation is nothing more than the reversal of a gene in the chromosome. The mutation rate is usually defined and fixed from 0.001 to 0.01.29,37,38 The algorithm is repeated until the termination condition is fulfilled, which is based on a convergence criterion, wherein the algorithm is terminated when a certain percentage of the chromosomes is identical or when a certain number of generations is reached.30,39 The variables of the most able chromosome are selected to build the calibration model (GA-PLS) and its efficiency is compared to the PLS model built with the entire spectral range. Treatment and preprocessing techniques. Prior to the application of the chemometric tools PLS, iPLS and siPLS, in the data set to be investigated, a data processing resource was applied and tested, the standard normal variate (SNV).40 This technique aims to reduce the effects of multiplicative effects caused by variations in scattering between samples. It was also applied and tested the derivative with SavitzGolay smoothing41 using a second-order polynomial and a 7-point window, which removes not only simple additive offsets, but also first order effects like drift in baseline. All the data were, still, mean centered before the model-building.

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 30

Similarly, prior to the application of the variable selection techniques GA and UVE, it was applied the baseline correction using adaptive iteratively reweighted penalized least squares (Air-PLS)42 which provides a simple but flexible, valid and fast algorithm for estimating baselines in analytical chemistry. It gives extremely fast and accurate baseline corrected signals for both simulated and real signals. Posteriorly, all the data were also mean centered, before the model-building. Model-building. Before the model-building itself, the available samples were divided into calibration and prediction sets, with approximately 70% of the samples being selected for the calibration set and the 30% remaining for the prediction set (Table 1). For that, it was used the procedure of ordering the reference values in ascending order and choosing one sample every three for the prediction set. After the choice of the calibration and validation sets, the global models (with the entire spectral range) were built, using PLS analysis, for each type of pre-processing method tested. Table 1. Calibration and prediction samples set Samples Set Calibration Prediction

Number of Samples

Sulfur content range (wt %)

67 34

0.00005 - 0.94 0.00005 - 0.92

The “leave-one-out 5-fold” cross-validation method was applied to the construction of all models and for the optimization of the number of LVs. A maximum number of 10 LVs was established, avoiding unnecessary calculations with many components. Subsequently, calibration models with variables selections iPLS and siPLS were built. The 1764 absorbances of the original MIR spectra were split into 6, 12, 18, 24, 30 and 36 intervals corresponding to approximately 294, 147, 98, 74, 59 and 49 number

ACS Paragon Plus Environment

Page 13 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

of variables in each interval respectively. To verify the synergism between the intervals, the siPLS method was applied where combinations of up to 3 intervals were made. Similarly, the 2956 absorbances of the original NIR spectra were split into 10, 20, 30, 40, 50 and 60 intervals (corresponding to 296, 148, 95, 74, 59 and 49 variables in each interval respectively) so that the intervals built by MIR and NIR had approximately the same number of original variables to facilitate the comparison between the selected models by these two methods. Also, the siPLS method was applied to the NIR data and, for that, up to 3 intervals were combined. The intervals (and their associations) with the best performance, determined according to R2 and RMSEP values, were selected for the construction of the different models and, subsequently, they were compared with the efficiency of the global model. From the obtained values, the most correlated intervals were selected and used to predict external samples. To the spectral data, it was also applied the variable selection methods GA and UVE. The UVE method was applied considering a confidence level of 99% and the regression coefficients were obtained by the leave-one-out 5-fold cross validation method. The variables selection for the GA application was basically set under the following conditions: •

Population size: 400 chromosomes for MIR spectra and 600 for NIR spectra;



Maximum number of generations: 200;



Convergence: 80%;



Mutation rate: 0.005 (0.05%);



Maximum number of variables per gene: 5;



Number of individuals in the initial population: 10%;

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Page 14 of 30

Number of splits in which the chromosomes are divided: 2.

The GA was executed 100 times to reduce the probability of uninformative variable selection, because when GA is executed consecutive times, important variables tend to be repeatedly selected, while less important variables are randomly selected. At the end, the variables selected by UVE and GA methods were used in the modelbuilding by PLS and their performances were compared with the global model. Subsequently the comparison between the applied variable selection methods was carried out, observing the gain of these applications regarding the performance of the global model. Model evaluation. The evaluation of the models was conducted by the determination and analysis of the following requirements: coefficient of determination (R2); the curve obtained by plotting the predicted versus measured values; and also the cross-validation and prediction errors. It is known that R2 can vary from 0 to 1 and the closer to 1, the greater is the adjustment of the data to the line. Thus, it was established for the models evaluation in this work, a minimum of 0.8 for the R2 assuming, therefore, that lower values indicate a low predictive quality of the model. The efficiency of multivariate calibration models was evaluated by calculating three types of root mean square error (RMSE): the root mean square error of calibration (RMSEC), defined in Equation 5, where N is the sample number and LV is the number of latent variables:43 N

∑ (y RMSEC =

predicted

− y reference

)

2

i =1

N − LV − 1

(5)

where ypredicted and yreference are values of sulfur content determined by multivariate model and reference standard method. The root mean square error (RMSE), determined

ACS Paragon Plus Environment

Page 15 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

at the cross validation procedure (root mean square error of cross validation RMSECV) and at the prediction procedure (root mean square error of prediction RMSEP), was calculated according to Equation 6. N

∑ (y RMSE =

predicted

− y reference

)

2

i =1

N −1

(6)

It was also established that the RMSECV and RMSEP values should not be statistically different, in other words, they should not exhibit a significant difference. These parameters indicate the magnitude of the errors associated to the obtained results. The chosen models were subjected to statistical tests. The bias test was applied to verify the presence of systematic errors. To evaluate the trend in the model residuals, the permutation test was used.44 When any of these tests indicated a significant error, a new evaluation was performed, changing the LV number. All these tests were calculated from algorithms created based on ASTM E165545 and a significance level of 0.05 was previously established. Finally, the limit of detection (LOD) and the limit of quantification (LOQ) were calculated according to Olivieri et al.46 After evaluation of the models and selection of those with best predictive capacity for each variables selection method applied in each one of the spectroscopy techniques, the results found by MIR and NIR were compared. For the data processing, construction, validation and evaluation of the models it was used the computer program MATLAB version 7.8 (The Math Works, http://www.mathworks.com, USA) with the packages "itoolbox" Version 8, developed by Lars Norgaard47 and PLS Toolbox package.48 For the application of UVE method it was also used the package provided by the author.28

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RESULTS AND DISCUSSION Figure 1 shows the respective NIR and MIR spectra obtained for all the samples, where it can observed that the MIR spectra have more noise and baseline shift when compared to the NIR spectra, which show a relevant noise only in the region between 4,400 and 3,500 cm-1 approximately.

(a)

(b)

Figure 1. (a) NIR and (b) MIR spectra of the petroleum fractions. Table 2 shows the RMSEP and R2 values of calibration and prediction for the models built by PLS using all the variables and with the following types of variable selection: iPLS, siPLS, UVE and GA. About the statistical tests, all the selected models had no systematic errors or trends in a level of significance of 5%. For iPLS and siPLS methods the SNV and the derivative with Savitz-Golay smoothing were tested as data processing resource and the best models were obtained by applying the derivative in both NIR and MIR data. In contrast, for the construction of models by PLS without variable selection the model created from the spectrum without any treatment, except mean centering, had the best performance both for MIR and NIR.

Table 2. Calculated values for the validation of the selected models.

ACS Paragon Plus Environment

Page 16 of 30

Page 17 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Spectroscopy Model PLS iPLS siPLS UVE GA PLS iPLS siPLS UVE GA

NIR

MIR

LV R2calibration RMSEC(a) R2prediction RMSEP(a) LOD(a) LOQ(a) number 5 8 7 8 8 7 6 6 4 5

0.9291 0.9764 0.9720 0.9452 0.9813 0.9814 0.9825 0.9733 0.9747 0.9789

0.0702 0.0418 0.0448 0.0633 0.0370 0.0359 0.0347 0.0425 0.0407 0.0375

0.9136 0.9711 0.9734 0.8837 0.9533 0.9458 0.9593 0.9430 0.9583 0.9492

0.0777 0.0419 0.0421 0.0872 0.0527 0.0582 0.0517 0.0621 0.0521 0.0587

0.0448 0.0606 0.0445 0.0432 0.0365 0.0974 0.0471 0.0234 0.0528 0.0467

(a) = wt %. The results show that the sulfur content was satisfactorily predicted with low values of RMSEP and high values of R2 by MIR and NIR in all the presented models. However, when the variable selection method UVE was applied to the NIR spectra, it did not get a good predictive ability for external samples when compared to the other evaluated methods. The PLS model without without variables selection in the MIR region generated satisfactory results with errors in the same magnitude as in the models with variable selection. Moreover, the PLS model without variables selection in the NIR region had worse results than the models with the selection of variables (except the UVE method), probably due to the incorporation of the noise region (3,500 the 4,400 cm-1) in the model, thus increasing the errors. The performances of iPLS and siPLS methods were slightly better when applied to NIR than to MIR data. The GA method generated consistent and similar models in their results, by MIR as well as by NIR. A considerable decrease in LOD and LOQ values was observed for the prediction in the MIR region, by applying variable selection, when compared to the PLS models. The same was not observed in the models created for the NIR region, probably due to the fact that the RMSEC values for these models have widely varied. The LOD and LOQ found were higher than the lowest value determined experimentally (0.00005)

ACS Paragon Plus Environment

0.1493 0.2021 0.1485 0.1440 0.1215 0.3248 0.1570 0.0781 0.1762 0.1558

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

probably due to the sensitivity of the laboratorial technique to quantify the information related to bonds involving sulfur. Although the siPLS model, built from the MIR data, has not generated better RMSE and R2 than the iPLS model, it showed the lowest LOD and LOQ, so this model can be also useful and cannot be discarded. It has not been found in the literature a paper in which sulfur content was predicted in petroleum fractions with the range of the ones used in this work. However, Breitkreitz et al24 proposed a method for the determination of the total sulfur in diesel fuel employing NIR spectroscopy, variable selection and multivariate calibration. Among the techniques they worked, there are the PLS and the GA variable selection method. By comparing the values of RMSEP found, it can be noted that they are similar to the values obtained in this work, which, as already mentioned, included a greater range of distilled oil samples (between 15 and 500° C). In this context, Itania et al22 showed a methodology for determining sulfur content, also only in diesel samples, using FTIR spectroscopic data associated with multivariate calibration by PLS. The calibration models were built using all the spectral band spectrum regions and the stepwise variable selection method, and likewise, the errors found are similar to the ones obtained in this work. When comparing the time demanded for the computer to make one run of the algorithms, there are two algorithms that need far more time than the others: GA and siPLS. The Figure 2 indicates the efficiency in improving the accuracy of the multivariate model when variables selection methods are used. In general, the application of these methods was more significant when applied to NIR data (Figure 2a), where the percentage reduction in RMSEP value was nearly 50% in iPLS and siPLS models. In UVE model there was an increase of approximately 10% in the

ACS Paragon Plus Environment

Page 18 of 30

Page 19 of 30

RMSEP value while in the GA models there was a reduction of 32%. In the MIR spectroscopy data the reduction of the accuracy was smaller, reaching approximately 90% of the global PLS error, for iPLS and UVE model (Figure 2b). An F test was executed to these results and it showed that, the iPLS, siPLS and GA models, built at the NIR region, are equal, but different from PLS and UVE models. For the models built at the MIR region, the F test showed that all of them were equivalent.

120

120

(a) 100

RMSEP/RMSEP(PLS)

RMSEP/RMSEP(PLS)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

80 60 40 20 0

PLS

iPLS siPLS UVE

(b)

100 80 60 40 20

GA

0

PLS

Model

iPLS siPLS UVE

GA

Model

Figure 2. Bar graph showing the RMSEP percentage values of the variables selection methods regarding the PLS method with full spectrum. Using (a) NIR and (b) MIR.

When the NIR results are compared to the results obtained by Abrahamsson et al30, who compared the efficiency of the application of these variable selection techniques associated with near infrared spectra variables for intact tablets analysis, it seems that for petroleum derivatives, the percentage reduction in RMSEP, regarding the global PLS error, was more significant than for intact tablets. Still, they obtained the best results applying the GA technique in NIR data, while for petroleum derivatives, the best result was obtained by iPLS method application.

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Figures 3 and 4 show the regions where each model was built and it can be observed that siPLS models were created just adding new regions to iPLS models by MIR as well as by NIR. The variables selected by UVE and GA are similar, however a larger number of variables were selected by GA when compared with UVE. Thorough verification of the region in which each model performed better, for the prediction of the sulfur content by iPLS in the NIR region, showed a better performance in a region with chemical bonds between sulfur and hydrogen, carbon and hydrogen and oxygen and hydrogen, as expected. For the application of iPLS method in the MIR region, the sulfur content prediction was made in a region of absorption related to chemical bonds between sulfur and oxygen. In this context, the siPLS application did not had better results probably due to the addition of regions, to the iPLS models, which cannot be directly correlated to the sulfur content. The iPLS and siPLS methods select variables in specific spectral ranges, while the UVE and GA methods select variables distributed throughout the spectrum. The great interest in the use of variables selection methods is to eliminate variables which have no relation to the property of interest, thus a more parsimonious model can be built, with fewer variables. However, the UVE and GA methods, besides selecting important variables, appear to also consider uninformative variables. This was observed because these methods have selected some variables that apparently have no relation to the sulfur content.

ACS Paragon Plus Environment

Page 20 of 30

Page 21 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

(a)

(b)

(c)

(d)

Figure 3. Selected regions (highlighted) for the prediction of the sulfur content by NIR applying different variable selection methods (a) iPLS, (b) siPLS, (c) UVE, (d) GA.

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

(d)

(c)

Figure 4. Selected regions (highlighted) for the prediction of the sulfur content by MIR applying different variable selection methods (a) iPLS, (b) siPLS, (c) UVE, (d) GA

CONCLUSIONS Most of the models developed in this work obtained satisfactory results, demonstrating that NIR as well as MIR spectroscopy, combined with chemometric methods, can be used to determine the sulfur content in petroleum fractions, with LOD and LOQ from 0.0234 and 0.0781 respectively. The results showed that the variable selection application was more efficient when applied to the NIR data. The global PLS model built at MIR region showed results comparable to those obtained by applying variable selection. From that, it can be concluded that the prediction by MIR had a slightly superior efficiency than by NIR. All variable selection techniques improved the predictive ability of the model, except the UVE method, which showed a better performance when applied to MIR data,

ACS Paragon Plus Environment

Page 22 of 30

Page 23 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

but when applied to NIR data caused a decline of the model performance. Still comparing MIR and NIR performances, the iPLS and siPLS methods had a greater gain in the model performance when applied to NIR data; this is probably because the error of the global PLS model for NIR was greater than the one for MIR. On the other hand, the GA method showed similar and consistent performance for both data types. Finally, by comparing the four variable selection applied (iPLS, siPLS, UVE and GA), the iPLS presented the best performance, by NIR as well as by MIR, even though for the MIR model, the results were only slightly better than for the global model.

AUTHOR INFORMATION *Corresponding author Tel.: +55 27 40097735; E-mail: [email protected]

Author Contributions Julia T. C. Rocha, responsible for developing the PLS, iPLS, siPLS UVE and GA algorithm, statistical analysis of the results and conclusions and for the MIR analysis and results interpretation. Lize M. S. L. Oliveira, responsible for the 9 petroleum distillation, for the sulfur content determination at the 101 petroleum fractions, for NIR analysis and for their results interpretation. Julio C. M. Dias, responsible for the 9 petroleum distillation, for the sulfur content determination at the 101 petroleum fractions, for NIR analysis and for their results interpretation. Ulysses B. Pinto, responsible for the 9 petroleum distillation, for the sulfur content determination at the 101 petroleum fractions, for NIR analysis and for their results interpretation. Maria de Lourdes S. P. Marques, responsible for the 9 petroleum distillation, for the sulfur content determination at the 101 petroleum fractions, for NIR analysis and for their results interpretation.

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 30

Betina P. Oliveira, responsible for developing the PLS, iPLS, siPLS UVE and GA algorithm, statistical analysis of the results and conclusions. Eustáquio V. R. Castro, responsible for developing the PLS, iPLS, siPLS UVE and GA algorithm, statistical analysis of the results and conclusions. Paulo R. Filgueiras, responsible for developing the PLS, iPLS, siPLS UVE and GA algorithm, statistical analysis of the results and conclusions. Marcone A. L. de Oliveira responsible for developing the PLS, iPLS, siPLS UVE and GA algorithm, statistical analysis of the results and conclusions.

ACKNOWLEDGMENTS The

authors

would

like

to

thank

Petróleo

Brasileiro

S.A.



PETROBRAS/CENPES for providing the petroleum samples, NCQP/UFES by analysis and UFJF for all the support.

ACS Paragon Plus Environment

Page 25 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

REFERENCES

1. Ferreira, M. M. C.; Antunes, A. M.; Melgo, M. S.; Volpe, P. L.O.; Quimiometria I: calibração multivariada, um tutorial, Quím. Nova 1999, v.22, 724-731. 2. Brereton, R. G.; Chemometrics: Data Analysis for the laboratory and Chemical Plant. John Wiley & Sons, USA, 2003. 3. Sekulic, S.; Seasholtz, M. B.; Wang, Z.; Kowalski, B. R.; Nonlinear multivariate calibration methods in analytical chemistry, Anal. Chem. 1993, v.65, n.19, A835-A845. 4. Khanmohammadi M, Garmarudi AB, Guardia, M. Characterization of petroleum-based products by infrared spectroscopy and chemometrics. TrAC Trends Anal Chem 2012; 35:135-49. 5. Soyemi OO, Busch MA, Busch KW. Multivariate analysis of near-infrared spectra using the G-programming language. J Chem Inf Comput Sci 2000; 40:1093-100. 6. Pasquini C, Bueno AF. Characterization of petroleum using near-infrared spectroscopy: quantitative modeling for the true boiling point curve and specific gravity. Fuel 2007; 86: 1927-34. 7. Falla FS, Larinia C, Le Roux GAC, Quina FH, Moro LFL, Nascimento CAO. Characterization of crude petroleum by NIR. J Pet Sci Eng 2006; 51:127-37. 8. Kallevik H, Kvalheim OM, Sjöblom J. Quantitative determination of saphaltenes and resins in solution by means of near-infrared spectroscopy. Correlations to emulstion stability. J Colloid Interface Sci 2000; 225:494-504. 9. Kuptsov AKh, Arbuzova TV. A study of heavy oil fractions by Fouriertransform near-infrared raman spectroscopy. Petroleum Chemistry 2011; 51:203-11.

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10. Hannisdal A, Hemmingsen PV, Sjöblom J. Group-type analysis of heavy crude oils using vibration spectroscopy in combinations with multivariate analysis. Ind Eng Chem Res 2005; 44:1349-57. 11. Speight, J. G.; Handbook of Petroleum Product analysis, Wiley-Interscience. USA, 2002. 12. Riazi, M. R.; Caracterization and properties of petroleum fracions. ASTM Stock Number: MNL50, First Edition, USA, Philadelphia, PA, 2005. 13. Simanzhenkov, V.; Idem R. Crude oil chemistry. New York: Marcel Dekker, Inc; 2003. 14. Lyons, W. C.; Plisga, G. J. Standard Handbook of Petroleum & Natural Gas Engeneering. 2nd ed. Amsterdam: Elsevier; 2005. 15. Pavlova, A.; Ivanova, P.; Dimova, T. Sulfur compounds in petroleum hydrocarbon streams. Petroleum & Coal 2012, 54 (1), 9-13. 16. Ito, E.; Veen J.A.R. v. On novel processes for removing Sulphur from refinery streams. Catalysis Today 2006, 116, 446-460. 17. Afonso J.C.; Pereira, K.S. Análise de compostos sulfurados em efluentes gasosos de refinarias de petróleo. Quim. Nova 2010, 33 (4), 957-963. 18. ASTM International. ASTM D4294 – 10. Standard test method for sulfur in petroleum and petroleum products by energy dispersive X-ray fluorescence spectrometry. West Conshohocken, PA: ASTM International; 2010. 19. ASTM International. ASTM D5453 – 12. Standard test method for determination of total sulfur in light hydrocarbons, spark ignition engine fuel, diesel engine fuel, and engine oil by ultraviolet fluorescence. West Conshohocken, PA: ASTM International; 2012.

ACS Paragon Plus Environment

Page 26 of 30

Page 27 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

20. Satya S, Roehner RM, Deo MD, Hanson FV. Estimatin of properties of crude oil residual fracions using chemometrics. Energy & Fuels 2007,21,998-1005. 21. Nielsen, K. E.; Dittmer, J.; Malmendal, A.; Nielsen, N. C.; Quantitative analysis of constituents in heavy fuel oil by 1H Nuclear Magnetic Resonance (NMR) spectroscopy and multivariate data analysis. Energy & Fuels 2008, 22, 40704076. 22. Soares, I.P.; Rezende, T.F.; Fortes, I.C.P.; Determination of súlfur in diesel using ATR/FTIR and multivariate calibration. Eclética química 2010; 35 (2), 7178. 23. Aburto, P.; Zuñiga, K.; Campos-Terán, J.; Aburto, J.; Torres, E.; Quantitative analysis of sulfur in diesel by enzimatic oxidation, steady-state fluorescence, and linear regression analysis. Energy & Fuels 2014, 28, 403-408. 24. Breitkreitz, M. C.; Raimundo, I. M.; Rohwedder, J.J.R.; Pasquini, C.; Filho, H.A.D.; José, G.E.; Araújo, M.C.U.; Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration. The Analyst 2003; 128, 1204-1207. 25. Oliveira, F. C. C.; Souza, A. T. P. C.; Dias, J. A.; Dias, S. C. L.; Rubim, J. C.; A escolha da faixa espectral no uso combinado de métodos espectroscópicos e quimiométricos, Quím. Nova 2004, 27, 218-225. 26. Soares, I.P.; Rezende, T.F.; Silva, R.C.; Castro, E.V.R.; Fortes, I.C.P.; Multivariate Calibration by Variable Selection for Blends of Raw Soybean Oil/Biodiesel from Different Sources Using Fourier Transform Infrared Spectroscopy (FTIR) Spectra Data, Energy & Fuels 2008, 22, 2079-2083. 27. Norgaard, L.; Saudland, A.; Wagner, J.; Nielsen, J. P.; Munck, L; Engelsen, S.B.; Interval partial least-square regression (iPLS): A comparative chemometric

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

study with an example from near-infrared spectroscopy, Appl. Spectrosc. 2000,54 (3), 413-419. 28. Centner, V.; Massart, D. Elimination of uniformative variables for multivariate calibration, Anal. Chem. 1996, 68 (21), 3851-3858. 29. Costa Filho, P.A.; Poppi, R.J.; Algoritmo genético em química, Quím. Nova 1999, v.22, n.3, 405-411. 30. Abrahamsson, C. et al. Comparison of different variable selection methods conducted on NIR transmission measurements on intact tablets, Chemom. Intell. Lab. Syst. 2003, 69, 3-12. 31. ASTM D2892. Standard Test Method for Distillation of Crude Petroleum (15Theoretical Plate Column). West Conshohocken, PA: ASTM International; 2011. 32. ASTM D5236. Standard Test Method for Distillation of Heavy Hydrocarbon Mixtures (Vacuum Potstill Method). West Conshohocken, PA: ASTM International; 2007. 33. Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta 1986; 185,1-17. 34. Leardi R, Nørgaard L. Sequential application of backward inerval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemom 2004; 18,486-97. 35. Terra, L.A.; Filgueiras, P. R.; Tose, L. V.; Romão, W.;De Souza, D. D., de Castro, E. V. R.; de Oliveira, L. M.L.; Dias, J.C.M.; Poppi, R.J.. Petroleomics by electrospray ionization FT-ICR mass spectrometry coupled to partial least squares with variable selection methods: prediction of the total acid number of crude oils. Analyst 2014, 139, 4908-4916.

ACS Paragon Plus Environment

Page 28 of 30

Page 29 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

36. Niazi, A.; Leardi, R. Genetic algorithms in chemometrics. Journal of Chemometrics 2012, 26, 345-351 37. Goldberg, D. E. Genetic Algorithms in search, optimization and machine learning, Reading, Addison-Wesley; 1989. 38. Leardi, R. Application of genetic algorithm–PLS for feature selection in spectral data sets. Journal of Chemometrics 2000, 14, 643-655. 39. Zupan, J.; Gasteiger, J. Neural Networks for Chemistry: an introduction. Weinheim: VCH, 1993. 40. Barnes RJ, Dhanoa MS, Lister SJ. Standard Normal Variate transformation and De-trending of Near-infrared Diffuse Reflectance Spectra. Appl Spectrosc 1989; 43,772-7. 41. Savitzky, A.; Golay M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 1964, 36 (8), 1627 – 1639. 42. Zhang, Z-M; Chen, S.; Liang, Y-Z. Baseline Correction using adaptative iteratively reweighted penalized least squares. Analyst 2010,135,1138-1146. 43. BRERETON, R. G.; Introduction to multivariate calibration in analytical chemistry, Analyst 2000, (125), 2125-2154. 44. Filgueiras PR, Alves JCL, Sad CMS, Castro EVR, Dias JCM, Poppi RJ. Evaluation of trends in residuals of multivariate calibration models by permutation test. Chemome Intel Lab Syst 2014; 133, 33-41. 45. Annual book of ASTM. Standards, standards practices for infrared, multivariate, quantitative analysis, E1655-05, vol.03.06, ASTM International, West Conshohocken, Pennsylvania, USA, 2005. 46. Olivieri AC, Faber NKM, Ferré J, Boqué R, Kalivas JH, Mark H. IUPAC 2006;78:633–61.

ACS Paragon Plus Environment

Energy & Fuels

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

47. itoolbox, 〈http://www.models.life.ku.dk/ipls〉 2000. 48. Wise BM, Gallagher NB, Bro R, Shaver JM, Windig W, Koch RS. PLS Toolbox Version 4.0 for Use with Matlab. Wenatchee: Eigenvector Research Inc.; 2006.

ACS Paragon Plus Environment

Page 30 of 30