Subscriber access provided by Fudan University
Article
Assessment and Prediction of Lubricant Oil Properties Using Infrared Spectroscopy and Advanced Predictive Analytics Carolina Tavares Pinheiro, Ricardo R. Rendall, Margarida J Quina, Marco Seabra Reis, and Licinio M. Gando-Ferreira Energy Fuels, Just Accepted Manuscript • DOI: 10.1021/acs.energyfuels.6b01958 • Publication Date (Web): 01 Dec 2016 Downloaded from http://pubs.acs.org on December 1, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Energy & Fuels is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
Assessment and Prediction of Lubricant Oil
2
Properties Using Infrared Spectroscopy and
3
Advanced Predictive Analytics
4 5
Carolina T. Pinheiro, Ricardo Rendall, Margarida J. Quina, Marco S. Reis and Licínio M.
6
Gando-Ferreira*
7
CIEPQPF, Department of Chemical Engineering, Faculty of Sciences and Technology,
8
University of Coimbra, Pólo II, Rua Sílvio Lima, 3030-790 Coimbra, Portugal
9 10 11
KEYWORDS
12
Lubricant oil properties; Fourier transform infrared spectroscopy; variable selection; latent
13
variable; penalized regression; tree based ensemble methods.
14 15
ABSTRACT
16
Multivariate methods such as Partial Least Squares (PLS), interval PLS and other variants
17
are often the default option for prediction of lubricants properties based on FTIR spectra.
18
However, other advanced analytical methodologies are also available that have not been
19
properly tested and comparatively assessed so far. The present work focuses on the 1 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 31
1
comparison of the predictive ability of four classes of analytical methods: regression with
2
variable selection, penalized regression, latent variable regression and tree-based ensemble
3
methods. A dataset of 62 lubricant samples for different applications was collected in
4
Portugal. Assessed lubricant properties included kinematic viscosity (at 40 and 100 ºC),
5
viscosity index, density, total acid number (TAN), saponification number and percentage of
6
aromatics, naphthenics and paraffinics. This work showed that there is no overall superior
7
regression method and the choice is dependent on the predicted property. Density,
8
percentage of aromatics, naphthenics and paraffinics were well predicted (correlation
9
between predicted and observed of 0.97-0.98). Elastic nets was the best method to predict
10
naphthenics and density, but the former property was also well predicted by Least Absolute
11
Shrinkage and Selection Operator. Interval PLS was the method that provided the better
12
prediction of aromatics and paraffinics. TAN could be reasonably predicted by support
13
vector regression but some clusters were observed. Saponification number and the
14
properties related to viscosity were not satisfactorily predicted with any of the tested
15
methods. Finally, it can be concluded that the adopted methodology is highly relevant in
16
the field of prediction of lubricant oil properties.
17 18 19
1. INTRODUCTION
20
Lubricants are products applied on mechanical moving parts, forming a protective film that
21
reduce friction and wear of machinery, protect against corrosion and reduce energy
22
consumption. Their function may also include the removal of debris and cooling of
23
surfaces1. A diverse variety of lubricants are designed and manufactured to operate over a 2 ACS Paragon Plus Environment
Page 3 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
wide range of temperatures, loads, sliding speeds and operating environments, leading to
2
different physicochemical properties that must be properly monitored and controlled.
3
Monitoring physicochemical properties (density, viscosity, total acid number, etc.) is
4
essential to ensure proper quality of the lubricant oils placed on the market. Traditionally,
5
routine measurements of petroleum products properties are based on the American
6
Standards for Testing and Material (ASTM) procedures. However, such methods can be
7
quite complex to perform, time and reagent consuming and require large sample sizes and
8
specific equipment. Infrared spectroscopy (IR) combined with advanced multivariate
9
analysis has been proposed as a powerful alternative to the classic ASTM methods, given
10
the fast implementation and low associated unit cost, requiring only one sample drop and
11
avoiding the use of toxic reagents as well as the generation of undesirable wastes2.
12
Several studies have been conducted on the prediction of gasoline, diesel and biodiesel
13
properties based on IR spectroscopy3–8. Regarding lubricating oils, Sastry et al. determined
14
physicochemical properties (viscosity index and pour point) and performed carbon type
15
analysis (paraffins, isoparaffins, naphthenes, aromatics and heteroaromatics) of base oils
16
using IR spectral features and Partial Least Squares (PLS)9. Braga et al. developed a
17
method for determining the viscosity index of lubricant oils based on vibrational
18
spectroscopy and PLS. This method was applied to 81 different producers/brands, from
19
different kinds of lubricating oils and origin of the base oils (mineral or synthetic), as well
20
as covering different API and SAE classifications10. IR spectroscopy has also been used for
21
the prediction of contaminants, oxidation products and additives, typically employing
22
Principal Component Regression (PCR), PLS and interval-PLS (iPLS) 9–12. The
23
determination of total acid and base number to evaluate the lubricant condition in service 3 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 31
1
has been also reported in the literature11–13. The determination of moisture content by
2
infrared spectroscopy in lubricants was described by Voort et al.14. Borin et al., investigated
3
the feasibility of using a narrow spectral range in the MIR region for the determination of
4
contaminants (gasoline, ethylene glycol and water) in lubricating oils using iPLS15.
5
Analyzing vibrational spectroscopy works available in the literature, one can verify the
6
existence of a strong preference towards employing standard multivariate methods based on
7
latent variables such as PLS, iPLS and some variants. This is usually a suitable solution,
8
since spectroscopy data is highly collinear, a feature that can be handled by this class of
9
approaches. However, in practical scenarios of quality control and monitoring, one is often
10
interested in obtaining optimal predictive performance and therefore the search space of
11
suitable predictive approaches must be enlarged and enriched with different modeling
12
approaches. The selection of the proper predictive methodology may sometimes be
13
conducted based on prior knowledge regarding the problem under analysis. However, the
14
theoretical information for supporting this decision is often very scarce and the whole
15
procedure is highly prone to bias towards the preferred methods of the researcher. This
16
selection approach is therefore not recommended in general, unless there are sound and
17
well-established reasons to implement it. Instead, the present work recommends the
18
implementation of a robust and rigorous data-driven comparison framework that has the
19
capability to estimate the methods performance as well as the statistical significance of
20
their differences, thus assessing their relative merits in an unbiased and consistent way.
21
This approach was followed in this study, where a rich variety of predictive analytical
22
methods was considered for the prediction of nine lubricant oil properties of paramount
23
importance for characterizing their quality. 4 ACS Paragon Plus Environment
Page 5 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
In this context, this work is focused on the evaluation of the ability of different predictive
2
analytical methods and on the identification of those with high potential to predict the
3
desired oil properties. Methods were selected from four distinct classes of modeling
4
approaches: regression with variable selection, penalized regression, latent variables
5
regression and tree-based ensembles. These methods were included in the comparison study
6
since they cover different prior assumptions regarding the underlying data structures and
7
allow for the analysis of the suitability of different modelling formalisms in characterizing
8
the relationship between predictors and response variables. The typical procedure for model
9
validation consists in a single split between training and test sets. The former is used to
10
build models while the latter allows the assessment of the methods’ prediction ability and
11
check for overfitting. This procedure is robust but the effect of different data splits is not
12
properly taken into account. To overcome this drawback, a double cross-validation
13
procedure was adopted in this work, allowing for a better and more conservative estimate
14
of the methods’ true prediction ability. This feature is highly recommended but not often
15
(even rarely) employed in comparison studies. The number of predicted oil properties and
16
prediction methods considered, and also the double-cross validation procedure employed,
17
makes this work, to the best of our present knowledge, one of the most extensive and
18
thorough analysis in the prediction of oil properties.
19 20 21
2. MATERIALS AND METHODS 2.1. SAMPLES DESCRIPTION
22
Lubricant samples were selected based on the most representative lubricants currently
23
available in the Portuguese market. In this work, we studied oils from a variety of 5 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 31
1
applications (including brake fluids), chemical compositions and oil brands. The entire
2
dataset is composed by 62 samples collected in 2015, from 13 different producers,
3
classified according to the categories indicated in Table 1. The highest number of analyzed
4
samples were from engine category (31), in which 44% were synthetic base oil type, 30%
5
semi-synthetic and 36% mineral.
6
Table 1. Classification categories of lubricant oil samples. Classification Categories Cutting fluid Gear Brake fluid Hydraulic Engine Marine Others
Number of Samples 2 11 6 4 31 2 6
Number of different producers 1 5 3 2 8 1 2
Sample Reference 1-2 3-13 14-19 20-23 24-54 55-56 57-62
7 8 9
2.2. PHYSICOCHEMICAL CHARACTERIZATION
10
Lubricating oil properties were determined using reference ASTM methods. Kinematic
11
viscosity was obtained at 40 and 100°C according to ASTM D7042, using an Anton Parr-
12
SVM 3000 Stabinger Viscosimeter. Viscosity Index was calculated following ASTM
13
D2270. The density was determined according to ASTM D4052, using a digital densimeter,
14
Mettler Toledo DM40. Total acid number (TAN) was determined by potentiometric
15
titration according to ASTM D664. Titrations were carried out using a solution of alcoholic
16
potassium hydroxide (0.1 M) to neutralize the lubricant acidic components. A mixture of
17
toluene, isopropanol and water (50, 49.5 and 0.5% v/v, respectively) was used as titration
18
solvent. TAN was expressed in milligrams of KOH required to neutralize the acidic
19
constituents per gram of oil. Saponification number (SN) was determined by potentiometric
6 ACS Paragon Plus Environment
Page 7 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
titration following ASTM D94. Oil samples are dissolved in a mixture of a KOH alcoholic
2
solution (0.5M) and butanone (50 mL each). Taking into account the difficulty to dissolve
3
organic samples such as lube oil and some additives in these solvents, it was necessary to
4
add 25 ml of White Spirit solution. The mixture is heated in total reflux for 30 min, and the
5
condenser washed with 50 mL of naphtha to remove any remaining sample. Finally, the
6
titration was performed with HCl (0.5M). The Saponification number corresponds to the
7
milligrams of KOH required to saponify fatty material present in one gram of oil.
8
FTIR spectra were recorded in transmittance mode using potassium bromide (KBr) pellets,
9
prepared using a pneumatic press. One drop of lubricant was placed on the top of one
10
pellet, creating a thin film, which was placed into an appropriate sample holder and
11
immediately analyzed in a Jasco FT/IR – 4200. 64 scans/sample with a resolution of 4 cm-1
12
were carried out in the 4000-500 cm-1 range. Every analysis was performed using a
13
different pellet and before each measurement a background spectrum was obtained using
14
one clean pellet without lubricant. The determination of aromatic (arom.), paraffinic
15
(paraff.), and naphthenic (napht.) carbon was based on the Indian Standard Method
16
13155:1991.
17 18 19
2.3. PREDICTIVE
20
FRAMEWORK
ANALYTICAL
METHODS
AND
COMPARISON
21
7 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
In order to explore the potential of different modeling approaches for predicting lubricant
2
oil properties based on FTIR, the available methods were first grouped in four classes:
3
variable selection, penalized regression, latent variables and tree-based ensembles. Then,
4
from each class, representatives with high potential interest were selected and used in this
5
study. Each class will be briefly described next, followed by the comparison framework
6
used to assess the performance of the selected methods.
7
Variable selection methods. The main assumption of this class of methods is that only
8
some predictor variables carry relevant information regarding the response (the sparsity
9
assumption). Thus, in order to reduce model complexity and avoid overfitting, there is a
Page 8 of 31
10
first stage of screening where potentially important variables are selected, discarding others
11
which do not contribute to predict the response or do not bring an additional significant
12
advantage in this regard. In this class of methods, Forward Stepwise Regression16 (FSR)
13
was the regression method included in the comparison study. Briefly, FSR starts with a
14
model that has no predictors and gradually includes those that are deemed significant to
15
predict the response. Variables already included in the model may be removed in a later
16
stage, if they are found to be redundant after the inclusion of others. The p-value of the
17
partial F statistical test is used to assess the predictors’ significance and a Multiple Linear
18
Regression (MLR) model structure is adopted to relate the selected predictors with the
19
response variables.
20
Penalized regression methods. The class of penalized regression methods is characterized
21
by imposing a certain penalty to the values of the regression coefficients in a least squares
22
formulation, resulting in biased versions of the MLR method, but presenting smaller
23
variances and therefore lower expected errors. The decrease in variance usually overcomes 8 ACS Paragon Plus Environment
Page 9 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
the increased bias, improving the prediction results. In this class, four methods were
2
included in the study: Ridge Regression (RR), Least Absolute Shrinkage and Selection
3
Operator (LASSO), Elastic Nets (EN)17, 18 and Support Vector Regression (SVR)19, 20. EN
4
is a general approach that contains both RR and LASSO as particular cases. The regression
5
coefficients for an EN model,
6
bˆ EN
, are obtained by solving Eq. (1):
n p 2 1 − α p 2 bˆ EN = arg min ∑ ( y ( i ) − yˆ ( i ) ) + γ α ∑ b j + ∑ bj 2 j =1 b =[ b0 ...bp ]T i =1 j =1
(1)
(α ∈ [ 0,1]) is an hyper-parameter that weights the squared penalty and the norm
7
where α
8
penalty and γ controls the bias-variance tradeoff. Both hyper-parameters are selected by
9
cross-validation. The RR solution is obtained for α = 0 and LASSO for α = 1 .
10
Latent variable methods. The third class of regression methods takes advantage of the
11
correlation structure in both predictors and response variables. In particular, it assumes that
12
the observed variability is governed by a few underlying variables (also known as latent
13
variables), which are not directly measured or observed, but may be estimated using linear
14
combinations of the original measurements. The general model for the latent variable
15
methods21 is presented in Eq. (2) and (3) :
16
17
X = TP T + E (2)
18
y = TcT + f
(3)
9 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 31
1 2
where P is a p×a matrix of loadings, T is a n×a score matrix, c is a 1×a vector that relates
3
the latent variables scores and the response and E and f are assumed to be random errors
4
with dimensions n×p and n×1, respectively. In this class of methods, four approaches were
5
considered: Principal Component Regression (PCR)22, Principal Component Regression
6
with a Forward Stepwise procedure (PCR_FS)23, Partial Least Squares (PLS)24, 25 and
7
interval PLS (iPLS)26.
8
Tree-based ensembles. The last class of methods is based on ensembles of regression
9
trees18, 27-29, from which three approaches were selected: Bagging of Regression Trees
10
(BRT), Random Forests (RF) and Boosted Trees (BT). BRT uses bootstrapping to generate
11
training datasets, which are used to build many regression trees and the predicted response
12
is the average prediction from all trees in the ensemble. RF is similar to BRT but each tree
13
only considers a randomly selected subset of the predictor variables so that the predictions
14
of the trees in the ensemble are more uncorrelated. Both BRT and RF often use a high
15
number of trees to stabilize the variance of predictions and this number may be selected by
16
cross-validation. Lastly, BT30-32 uses weak learners in order to iteratively approximate the
17
relationship between predictors and response.
18
Comparison framework. The comparison framework used is based on a double cross-
19
validation procedure with an inner and outer validation loop, providing a more conservative
20
approach than traditional cross-validation followed by model validation in a test set. The
21
procedure starts by randomly splitting the complete dataset in a training set, containing
22
80% of the samples and a left-out set containing the remaining 20%. The training set is
10 ACS Paragon Plus Environment
Page 11 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
used for model building and usual cross-validation can be employed to select an
2
appropriate hyper-parameter (this cross-validation constitutes the inner loop of the
3
comparison framework). After model building, the models are used to predict the left-out
4
set and the quantification of the prediction errors characterize the performance of the
5
regression methods. The outer loop of the double cross-validation procedure consists in
6
generating new random splits of the complete dataset, providing a different training and
7
left-out sets which are again used for model building and assessment, respectively. In this
8
work, 40 iterations of double cross-validation were applied and the median value of the
9
( R% ) quantifies correlation between the predictions and measured values in the left out set 2 dcv
10
the performance of the regression methods while its inter-quartile range (difference
11
between the third and first percentiles) characterizes the variability across different data
12
splits.
13 14 15
3. RESULTS AND DISCUSSION 3.1. PREDICTION OF PHYSICOCHEMICAL PROPERTIES
16
In this section, the framework described in Section 2.3 will be applied to predict nine
17
physiochemical properties of lubricant oils. Since the number of properties is rather large,
18
the overall results are presented first and the most promising regression methods are
19
identified. Then, a more detailed analysis is conducted on the properties that were
20
successfully predicted.
21
In a first stage of exploratory data analysis, PCA was applied to the nine oil properties in
22
order to observe the samples’ distribution and assess the degree of collinearity between the
23
different oil properties. The results are presented in Figure 1 where the scores of the 1st and 11 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 31
1
2nd principal components (Figure 1.a) and their respective loadings (Figure 1.b) are shown.
2
The presence of clusters is quite clear in Figure 1.a and can be related to the different oil
3
types (Section 2.1, Table 1). In particular, one can observe a cluster formed by the brake
4
fluid type (samples 14 to 19) that are significantly distant from the remaining samples and
5
in the case of gear category there is a rather extreme result (sample 7). In terms of oil
6
properties, Figure 1.b shows that the first principal component (1st PC) mainly describes
7
density, TAN, saponification number and the percentage of naphthenics and paraffinics
8
while the second component (2nd PC) describes the different viscosity measures, which
9
are, as expected, correlated to each other and therefore are close to in the loadings plot.
10
Nevertheless, oil properties do not present a high collinear structure because the PCA
11
results showed that in order to describe 90% of the dataset, 4 principal components are
12
needed. Compared with the total number of properties (9), this is a rather low compression
13
ratio and suggests that each property should be better predicted individually.
14
In a first preliminary study, although some samples could be considered outliers, none was
15
removed from the dataset and the comparison framework, described in Section 2.3, was
16
applied to assess the performance of the different regression methods. However, the results
17
obtained showed that some samples (with reference 7, 57 and 59) presented systematically
18
higher prediction residuals for all oil properties and had a high influence on the model
19
(these results are presented in the Supplementary material). This observation can be
20
explained by the chemical composition of these samples which is very different from the
21
commonly used mineral or synthetic oils. Sample 7 is a polyalkylene glycol synthetic oil,
22
applied in gears working under severe conditions. Samples 57 and 59 are used for
12 ACS Paragon Plus Environment
Page 13 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
compressors lubrication and cooling, containing diester and alkyl-benzene synthetic oils,
2
respectively.
3
The results from the regression trees-based ensemble methods (BRT, RF and BT) showed
4
systematically poorer prediction performances in this case, so they were suppressed from
5
further comparisons in order to simplify the analysis of results. This fact can be partially
6
justified by the presence of clusters in the measured responses and also by the underlying
7
relationship between FTIR and oil properties, which from a theoretical standpoint is
8
presumably described better by a continuous linear function rather than a step-wise relation.
9
Sample clusters in the predicted properties negatively affect regression trees since their
10
predictions are bounded to be within the response values observed in the training set. If
11
during the random split, no samples from a given cluster are included in the training data,
12
the ensemble will fail to extrapolate to the new regions.
13
Following these considerations, a second stage of the comparison was performed, where
14
extreme samples and methods with smaller predictive ability were removed. The overall
15
%2 results are presented in Table 2, where the RDCV and its interquartile range are reported for
16
each regression method.
10
0.7 7
Viscosity (100°C) 0.6
8
Viscosity (40°C) 0.5
6
0.4 Viscosity Index
0.3
4
Density 0.2
2
3 9
0
Napht.
8 0.1
36 38 29 43 12 534 37 33 40 35 46 54 52 50 5542 2728 30 10 44 41 31 56 47 49 53 39 45 1160 2526 48 51 32 23 413 20 22 24 21 6 58
16 2 6162
-2
1815 19 17 14
TAN
0 -0.1
Sap. Number Paraff.
57 -0.2
59
Arom. -4 -3
-2
-1
0
1
2
3
4
5
6
-0.3 -0.6
-0.4
1 st PC (41.1%)
-0.2
0
0.2
0.4
0.6
1 st PC loadings (41.1%)
13 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(a)
Page 14 of 31
(b)
1
Figure 1. PCA results obtained to the nine oil properties: (a) the scores of the 1st and 2nd
2
principal components and (b) their respective loadings.
3 4 5
Table 2. Prediction results for oil properties. *
Predictive Analytical Methods Oil properties
6 7 8 9 10
FSR
RR
LASSO
EN
SVR
PCR
PCR_FS
PLS
iPLS
Density @ 15°C (Kg/m3)
0.95 (0.04)
0.95 (0.05)
0.97 (0.03)
0.98 (0.03)
0.91 (0.07)
0.95 (0.07)
0.84 (0.25)
0.97 (0.04)
0.92 (0.11)
Viscosity @ 40°C (cSt)
1)
0.08 (>1)
1)
0.9 (0.33)
0.65 (>1)
Sap. Number (mg KOH/g)
1)
0.24 (0.99)
0.36 (>1)
0.24 (>1)
1)
0.56 (>1)
0.1 (>1)
0.4 (>1)
0.32 (0.82)
Arom. (%)
0.9 (0.1)
0.79 (0.29)
0.83 (0.16)
0.82 (0.16)
0.79 (0.23)
0.53 (0.51)
0.29 (0.33)
0.82 (0.25)
0.94 (0.1)
Napht. (%)
0.96 (0.03)
0.94 (0.04)
0.97 (0.03)
0.97 (0.03)
0.93 (0.04)
0.95 (0.04)
0.71 (0.32)
0.96 (0.03)
0.96 (0.04)
Paraff. (%)
0.93 (0.06)
0.93 (0.04)
0.97 (0.04)
0.97 (0.04)
0.91 (0.06)
0.96 (0.04)
0.72 (0.26)
0.95 (0.05)
0.98 (0.03)
FSR – Forward Stepwise Regression; RR – Ridge Regression; LASSO – Least Absolute Shrinkage and Selection Operator; EN – Elastic Nets; SVR – Support Vector Regression; PCR – Principal Component Regression; PCR_FS – PCR with Forward Stepwise; PLS – Partial Least Squares; iPLS – interval PLS. 2 *The value at the top of each cell represents the R% DCV and the value inside parenthesis is the respective interquartile
range. The best 3 regression methods for each property are identified in boldface.
11
14 ACS Paragon Plus Environment
Page 15 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
The results in Table 2 shows that not all physicochemical properties are well predicted and
2
%2 the variation in RDCV ranges from 0.98, for density or the percentage of paraffinics, to
3
values below 0 (only possible because the R2 is evaluated by cross-validation).
4
%2 Nevertheless, many properties have high RDCV values, suggesting that they can be
5
predicted using a suitable predictive model. In these situations, FTIR measurements could
6
be considered a reliable route instead of the more common chemical/physical
7
quantification. The confidence in the results is assured by the double cross-validation
8
procedure used in the comparison framework which produces a more conservative
9
estimation of the performance of the predictive models under testing scenarios. Density and
10
the percentage of aromatics, naphthenics and paraffinics are all well predicted and the best
11
%2 %2 methods obtain RDCV above 0.94. TAN is reasonably well predicted and although the RDCV
12
is high, the interquartile range is also significant. Lastly, saponification number and
13
%2 properties related to viscosity are poorly predicted and their top RDCV is around 0.56. These
14
results provide a first estimate of the suitability of different regression methods and should
15
be viewed in the context of a given practical application, characterized by a desired
16
accuracy. If the prediction errors are below the desired accuracy, a model can be built and
17
used with a high degree of confidence, whereas a wide gap between the methods’
18
performance and the desired accuracy indicates that the regression methods are not capable
19
of modelling the relationships between FTIR and the desired oil property. If the gap is
20
small enough, variants of the methods included in the comparison study (e.g. SVR with
21
non-linear transformations) might improve results.
15 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 31
1
Another fact worth noting in Table 2 is that there is no overall best regression method and
2
the most suitable ones are dependent on the particular physicochemical property under
3
analysis. This was foreseen since the underlying relation between predictors and response
4
variables is not expected to be optimally represented by the same modelling formalism but
5
instead the information content in the predictors combined differently in order to obtain
6
good estimates of the response. As a consequence, the comparison framework adopted in
7
this work is of critical importance in order to find out, in a rigorous and unbiased way,
8
which methods are better and when. Nevertheless, it is interesting to note that LASSO
9
provides consistent predictions and is often one of the top methods for this dataset.
10
In the next subsections, each successfully predicted physicochemical property is analyzed
11
in detail in order to identify important predictor variables. The results related to density,
12
TAN and the percentage of aromatics, naphthenics and paraffinics are further explored
13
whereas the saponification number and properties related to viscosity will not be discussed.
14
The failure to predict these last properties may arise from different reasons, one of them
15
could be the number of samples collected when one takes into account the different oil
16
types analysed. As an example, Braga et. al.10 successfully predicted the viscosity index
17
based on 1085 representative samples from the Brazilian market. However, obtaining such
18
a high number of samples is not always possible in practice and no method is able to
19
predict, a priori, the minimum sufficient number of samples. Thus, the number of samples
20
may also be a limiting factor in the development of good prediction models for the
21
aforementioned properties. Another possible reason can be related to the fact that lubricant
22
oils are subject to restrict quality control procedures and should conform to tight
23
specifications. One may therefore argue that the observed variability for these properties is 16 ACS Paragon Plus Environment
Page 17 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
not enough to develop a good predictive model, as it would be if the range of variation of
2
their values was higher.
3 4
3.1.1. DENSITY
5
The density of the samples is well predicted and it is desirable to validate the model by
6
analyzing the prediction results and also to identify regions of the spectrum that are more
7
relevant. Since EN was the best predictive method for this property, Figure 2.a-c presents
8
the observed density values and those predicted by the EN models in the outer loop of the
9
double cross-validation procedure, some typical spectrum from different lubricating oil
10
types and the regression coefficients obtained. In terms of predicted and observed density
11
values (Figure 2.a), two clusters seem to exist: one centered at approximately 875 kg/m3
12
and another at 1075 kg/m3. The second cluster is composed only of brake fluid samples.
13
The existence of clusters is known to increase the value of the R2, even when the model
14
fails to predict well within each cluster. Computing the R2 for the predicted and measured
15
density within each cluster results in a value of 0.72 for the first cluster and 0.14 for the
16
second one. These lower values are different from those obtained in Table 2, suggesting
17
that the models are able to discriminate between samples with low or high densities but are
18
less reliable in making inferences within each cluster. In fact, the root mean squared error
19
of double cross-validation
20
deviation of 2.6 kg/m3, a rather large value when one considers the small ranges spanned by
21
both clusters. Nevertheless, densities in the range from 800-900 kg/m3 fall close to the 1:1
22
line and the predictions are generally acceptable. As a general conclusion, the results
( RMSEDCV ) has a mean value of 10.1 kg/m3 and a standard
17 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 31
1
suggest that the regression models developed to predict the samples’ density should be
2
limited to scenarios where only approximate estimates are needed.
3
The distribution of regression coefficients (Figure 2.c) is also worth analyzing since one
4
can identify spectral regions that are not being used to predict the response and that can be
5
suppressed to obtain a more parsimonious model. In particular, the region between
6
wavenumbers 1900 cm-1 and 2700 cm-1 are largely ignored, being almost 0 in most
7
iterations of double cross-validation. Furthermore, the region between 3100 cm-1 and 3400
8
cm-1 and also wavenumbers higher than 3700 cm-1 are seldom used for model building,
9
which suggests a sparse structure, where penalized regression methods that tacitly
10
implement variable selection may have an advantage. This can be verified by the good
11
performances obtained with LASSO and EN. On the other hand, one can also remove the
12
irrelevant spectral regions and reapply some of the regression methods that do not
13
implement variable selection in order to improve the prediction results. For instance, it is
14
seen that PLS makes very good predictions without explicitly removing the uninformative
15
regions. Thus, one may test if their removal could lead to even better predictions, a scenario
16
often observed in practice.
17
18 ACS Paragon Plus Environment
Page 19 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
(b)
(a)
(c)
1
Figure 2. Results obtained with EN method to predict density: (a) predicted and measured
2
density values, (b) representative spectrum of different sample classes and (c) EN
3
regression coefficients in the comparison study.
4 5
3.1.2. TOTAL ACID NUMBER (TAN)
6
%2 In general, TAN has high RDCV but the deviations from the median value are quite
7
substantial. This fact suggests that the performance is highly dependent on the split between
8
training data and the left out fold obtained during the double cross-validation procedure.
9
Figure 3.a-c presents the predicted and measured TAN values, the typical spectra and the
10
regression coefficients for the different PLS models obtained during double cross-
11
validation. In Figure 3.a one can see two clusters similar to those found when predicting 19 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 31
1
density. Indeed, the second cluster is formed again by brake fluid samples. One can further
2
compute the R2 between predicted and measured TAN values for both regions, obtaining a
3
value of 0.94 for the region below 10 mg KOH/g and a value below 0 for the region near 22
4
mg KOH/g. Furthermore, the
5
deviation of 0.96 mg KOH/g. This is a clear indication that the developed models are
6
unable to predict TAN reliably and can only provide reasonable estimates. In terms of
7
regression coefficients (Figure 3.c), one can see a sparse structure where only a few
8
wavenumbers contribute to the final model. The region between 500 and 1800 cm-1 and
9
also the region around 3000 cm-1 seem to be the more active and the value of the regression
RMSEDCV
has a mean of 1.5 mg KOH/g and a standard
10
coefficients tend to be higher.
11
In order to improve the prediction results and assess the influence of the brake fluid
12
samples, PLS regression was applied to the dataset with and without the brake fluid
13
samples. The first PLS model, which included the brake fluids, had a minimum RMSECV of
14
1.5 mg KOH/g, which is similar to the value obtained with the double cross-validation
15
procedure. However, after discarding the brake fluid samples, the RMSECV decreased to 0.6
16
mg KOH/g, a significant reduction that is below half its initial value. This indicates that in
17
terms of predicting TAN values, the brake fluids oil type is not consistent with the
18
remaining oils and can be discarded if they are not expected to be found in a given practical
19
application.
20
20 ACS Paragon Plus Environment
Page 21 of 31
35
Measured TAN (mg KOH/g)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
30 25 20 15 10 5 0 -5
0
5
10
15
20
25
30
35
Predicted TAN (mg KOH/g)
(b)
(a)
(c)
1
Figure 3. Results obtained with PLS to predict TAN: (a) predicted and measured TAN
2
values, (b) representative spectrum of different sample classes and (c) PLS regression
3
coefficients in the comparison study.
4 5
3.1.3. AROMATICS
6
The percentage of aromatics can be predicted quite well but a significant gap is observed
7
between the best and worse methods. Figure 4.a-c presents the predicted and measured
8
percentage of aromatics, some representative spectra from different lubricating oil classes
9
and the number of times each bin was included in the iPLS model (each bin contains 12
10
wavenumbers). In terms of predicted and measured values (Figure 4.a), it can be observed
11
that the predictions are close to the straight line, thus the majority of samples seem to be
21 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
RMSEDCV
Page 22 of 31
1
well predicted and the mean
2
regards to important spectral regions, the bin containing the region 1592-1614 cm-1 stands
3
out as it is selected in all 40 iterations of double cross-validation. Furthermore, bins
4
adjacent to this region are selected far more often than the remaining spectrum, which
5
allows one to conclude that, overall, the region 1500-1637 cm-1 contains relevant predictive
6
information of the percentage of aromatics.
is 0.5% and the standard deviation is 0.24%. In
(b)
(a)
(c)
7
Figure 4. Results obtained with iPLS to predict the percentage of aromatics: (a) predicted
8
and measured percentages of aromatics, (b) representative spectra from different sample
9
classes and (c) number of times each bin was included in the regression model.
10 11
3.1.4. NAPHTHENICS 22 ACS Paragon Plus Environment
Page 23 of 31
1
%2 The RDCV for the percentage of naphthenics is high and the variation around the median
2
value is quite small. For this property, LASSO and EN had similarly good performances.
3
Since the solution provided by LASSO is often more sparse, Figure 5.a-c presents the
4
measured and predicted percentages of naphthenics in the different splits of double cross-
5
validation, some typical spectra and the regression coefficients. The analysis of Figure 5.a
6
shows that the model is indeed able to predict the response and no clusters are observed.
7
The majority of predictions fall reasonably close to the straight line and the mean
8
RMSEDCV is 2.2 %. Another interesting fact regards the sparse structure of the regression
9
coefficients (Figure 5.c), where many spectral regions are not used to obtain the final model
10
(e.g. from wavenumbers 1800 cm-1 to 2700 cm-1 and wavenumbers above 3700 cm-1). In
11
terms of relevant regions, those near 3500 cm-1 and between 500 cm-1 and 1700 cm-1 seem
12
to be important since they were often included in the model and had significant regression
13
coefficients.
Measured percentage of naphthenics (%)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
(b)
23 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(a)
Page 24 of 31
(c)
1
Figure 5. Results obtained with LASSO to predict the percentage of naphthenics: (a)
2
predicted and measured percentage of naphthenics, (b) representative spectra for different
3
sample classes and (c) regression coefficients.
4 5
3.1.5. PARAFFINICS
6
The last successful predicted property was the percentage of paraffinics. For this property,
7
iPLS, LASSO and EN were the best regression methods. The LASSO regression
8
coefficients or the selected bins in iPLS can be easily analyzed but due to the simpler
9
interpretation of the selected bins in iPLS, its results are presented here (similar conclusions
10
are found when considering the other top methods). Figure 6.a presents the predicted and
11
measured percentage of paraffinics and one can observe that the iPLS models are indeed
12
working as expected. Although there are some large prediction errors when predicting the
13
sample with the lowest percentage of paraffinincs. For higher percentages, the models seem
14
much more reliable. Figure 6.b presents representative spectra from the different lubricant
15
classes while in regards to important spectral bands, Figure 6.c presents the number of
16
times each bin was selected to be included in an iPLS model and one can observe that the
17
bin containing wavenumbers 707 to 728 cm-1 is clearly the most important. Out of 40
18
iterations of double cross-validation, it was selected 36 times to be included in the final
19
model whereas the others were selected at most 5 times. Further investigation showed that
20
the correlation between the transmittance in this bin and the percentage of paraffinics has
21
values as high as 0.98. This is in accordance with previous knowledge regarding the
22
measurement system, which uses a specific wavelength to quantify the percentage of
23
paraffinics. If the estimates provided by the spectrometer are reliable enough, then the 24 ACS Paragon Plus Environment
Page 25 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
developed models will also provide a good indication and can be used in scenarios where
2
the measurement system find limitations.
(b)
(a)
(c)
3
Figure 6. Results obtained with iPLS to predict the percentage of paraffinics: (a) predicted
4
and observed response and (b) representative spectra for different sample classes and (c)
5
number of times each bin was included in the iPLS model.
6 7 8 9
4. CONCLUSIONS
10
A comparison study was carried out in order to predict nine properties of different lubricant
11
oils, namely: density, viscosity at 40 ºC and 100 ºC, viscosity index, TAN, saponification
12
number and the percentages of aromatics, naphthenics and paraffinics. The main goal was
25 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 31
1
to derive robust and accurate models that can predict the desired properties based on FTIR
2
spectra of lubricant oil samples.
3
A wide variety of regression methods from four different classes, namely variable selection,
4
penalized regression, latent variables and regression tree-based ensembles were tested. One
5
of the main conclusions of this comparison study is that there is no overall better regression
6
method and their optimality is dependent on the property to be predicted. It was found that
7
density and percentage of aromatics, naphthenics and paraffinics can be well predicted and
8
%2 high values of RDCV were obtained (0.97-0.98 for the best methods). In general, LASSO
9
method is able to predict these properties. Although better results can be obtained with EN
10
for density and iPLS for aromatics and paraffinics. Nevertheless, some clusters were
11
observed for the density and thus predictions can only be used as rough estimates for this
12
property. Total acid number could be reasonably predicted by RR, SVR and PLS methods,
13
but again some clusters were identified. Indeed, samples belonging to the brake fluid type
14
were rather different. The exclusion of brake fluid samples from the regression analysis
15
allowed the development of more reliable predictive models (RMSECV decreased from 1.5
16
to 0.6 mg KOH/g). The saponification number and the properties related to viscosity were
17
not satisfactorily predicted, due to the low number of samples available, the diversity of oil
18
types present and the reduced variation in their properties, which is due to the tight
19
specification range they have to meet, making model development harder.
20 21
ASSOCIATED CONTENT
22
Supplementary Material
26 ACS Paragon Plus Environment
Page 27 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Energy & Fuels
•
2 3 4 5
Physicochemical characteristics of lubricant oil samples used for model development;
•
Prediction results for nine oil properties, including all samples and regression methods initially considered.
This material is available free of charge via the Internet at http://pubs.acs.org.
6 7
AUTHOR INFORMATION
8 9 10
Corresponding Author *Email:
[email protected]; Tel: 351 239798745; Fax-351 239798703.
11 12 13
ACKNOWLEDGMENTS
14
The authors gratefully acknowledge the financial support of SOGILUB – Sociedade de
15
Gestão Integrada de Óleos Lubrificantes Usados, Lda.
16 17
ABBREVIATIONS
18
BT, Boosted Trees; BRT, Bagging of Regression Trees; EN, Elastic Nets; FSR, Forward
19
Stepwise Regression; iPLS, interval Partial Least Squares; LASSO, Least Absolute
20
Shrinkage and Selection Operator; PCR, Principal Component Regression; PCR_FS, PCR
21
with Forward Stepwise; PLS, Partial Least Squares; RF, Random Forests; RR, Ridge
22
Regression; SVR, Support Vector Regression; TAN, Total Acid Number.
23 27 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 28 of 31
1 2
REFERENCES
3 4 5
(1) Mang, T.; Dresel, W. Lubricants and Lubrication, 2nd ed.; Wiley-VCH: Weinheim,
6
Germany, 2007.
7
(2) Al-Ghouti, M. A.; Al-Degs, Y. S.; Amer, M. Application of chemometrics and FTIR for
8
determination of viscosity index and base number of motor oils. Talanta 2010, 81,
9
1096−1101.
10
(3) Marinović, S.; Krištović, M.; Špehar, B.; Rukavina, V.; Jukić, A. Prediction of diesel
11
fuel properties by vibrational spectroscopy using multivariate analysis. J. Anal. Chem.
12
2012, 67 (12), 939−949.
13
(4) Pasadakis, N.; Sourligas, S.; Foteinopoulos, C. Prediction of the distillation profile and
14
cold properties of diesel fuels using mid-IR spectroscopy and neural networks. Fuel 2006,
15
85, 1131−1137.
16
(5) Lira, L. F. B.; Albuquerque, M. S.; Pacheco, J. G. A.; Fonseca, T. M.; Cavalcanti, E. H.
17
S.; Stragevitch, L.; Pimentel, M. F. Infrared spectroscopy and multivariate calibration to
18
monitor stability quality parameters of biodiesel. Microchem. J. 2010, 96, 126−131.
19
(6) Zhang, W.; Yuan, W.; Zhang, Z.; Coronado, M.; Predicting the dynamic and kinematic
20
viscosities of biodiesel–diesel blends using mid-and near-infrared spectroscopy. Appl.
21
Energ. 2012, 98, 122−127.
22
(7) Lira, L. F. B.; Vasconcelos, F. V. C.; Pereira, C. F.; Paim, A. P. S.; Stragevitch, L.;
23
Pimentel, M. F. Prediction of properties of diesel/biodiesel blends by infrared spectroscopy
24
and multivariate calibration. Fuel 2010, 89, 405−409. 28 ACS Paragon Plus Environment
Page 29 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
(8) Pavoni, B.; Rado, N.; Piazza, R.; Frignani, S. FT‐IR spectroscopy and chemometrics as
2
a useful approach for determining chemical‐physical properties of gasoline, by minimizing
3
analytical times and sample handling. Ann. Chim. 2004, 94, 521−532.
4
(9) Sastry, M.; Chopra, A.; Sarpal, A.; Jain, S.; Srivastava, S.; Bhatnagar, A. Determination
5
of physicochemical properties and carbon-type analysis of base oils using mid-IR
6
spectroscopy and partial least-squares regression analysis. Energy Fuels 1998, 12,
7
304−311.
8
(10) Braga, J. W. B.; Junior, A. A. S.; Martins, I. S. Determination of viscosity index in
9
lubricant oils by infrared spectroscopy and PLSR. Fuel 2014, 120, 171−178.
10
(11) Van de Voort, F. R.; Sedman, J.; Yaylayan, V.; Laurent, C. S. Determination of acid
11
number and base number in lubricants by Fourier transform infrared spectroscopy. Appl.
12
Spectrosc. 2003, 57 (11), 1425−1431.
13
(12) Felkel, Y.; Dörr, N.; Glatz, F.; Varmuza, K. Determination of the total acid number
14
(TAN) of used gas engine oils by IR and chemometrics applying a combined strategy for
15
variable selection. Chemom. Intell. Lab. Syst. 2010, 101, 14−22.
16
(13) Dong, J.; Van de Voort, F. R.; Ismail, A. A.; Akochi-Koble, E.; Pinchuk, D. Rapid
17
determination of the carboxylic acid contribution to the total acid number of lubricants by
18
Fourier transform infrared spectroscopy. Lubr. Eng. 2000, 56 (6), 12−20.
19
(14) Van de Voort, F. R.; Sedman, J.; Yaylayan, V.; Laurent, C. S.; Mucciardi, C.
20
Quantitative determination of moisture in lubricants by Fourier transform infrared
21
spectroscopy. Appl. Spectrosc. 2004, 58 (2), 193−198.
22
(15) Borin, A.; Poppi, R. J.; Application of mid infrared spectroscopy and iPLS for the
23
quantification of contaminants in lubricating oil. Vib. Spectrosc. 2005, 37, 27−32.
29 ACS Paragon Plus Environment
Energy & Fuels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 30 of 31
1
(16) Andersen, C. M.; Bro, R. Variable selection in regression - a tutorial. J. Chemom.
2
2010, 24 (11−12), 728−737.
3
(17) Hesterberg, T.; Choi, N. H.; Meier, L.; Fraley, C. Least angle and ℓ1 penalized
4
regression: A review. Stat. Surv. 2008, 2, 61−93.
5
(18) Hastie, T.; Tibshirani, R.; Friedman, J. The elements of statistical learning: data
6
mining, inference, and prediction. Springer: Berlin, 2001.
7
(19) Canu, S.; Grandvalet, Y.; Guigue, V.; Rakotomamonjy, A. SVM and Kernel Methods
8
Matlab Toolbox. Perception Systemes et Information. INSA de Rouen, Rouen, France.
9
2005.
10
(20) Smola, A. J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput.
11
2004, 14 (3), 199−222.
12
(21) Burnham, A. J.; MacGregor, J. F.; Viveros, R. Latent variable multivariate regression
13
modeling. Chemom. Intell. Lab. Syst. 1999; 48, 167−180.
14
(22) Jolliffe, I. Principal component analysis, 2nd ed.; Springer-Verlag: New York, 2002.
15
(23) Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab.
16
Syst. 1987, 2, 37−52.
17
(24) Wold, S.; Sjöström, M.; Eriksson.; L. PLS-regression: a basic tool of chemometrics.
18
Chemom. Intell. Lab. Syst. 2001, 58, 109−130.
19
(25) Geladi, P.; Kowalski, B. R. Partial least-squares regression: a tutorial. Anal. Chim.
20
Acta 1986, 185, 1−17.
21
(26) Nørgaard, L.; Saudland, A.; Wagner, J.; Nielsen, J. P.; Munck, L.; Engelsen, S. B
22
Interval partial least-squares regression (iPLS): a comparative chemometric study with an
23
example from near-infrared spectroscopy. Appl. Spectrosc. 2000, 54 (3), 413-419.
30 ACS Paragon Plus Environment
Page 31 of 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
(27) Strobl, C.; Malley, J.; Tutz, G. An introduction to recursive partitioning: rationale,
2
application, and characteristics of classification and regression trees, bagging, and random
3
forests. Psychol. Methods 2009, 14 (4), 323−348.
4
(28) Breiman, L. Random forests. Mach. Learn. 2001, 45 (1), 5−32.
5
(29) Breiman, L.; Friedman, J.; Stone, C. J.; Olshen, R. A. Classification and regression
6
trees. CRC press: New York, 1984.
7
(30) Elith, J.; Leathwick, J. R.; Hastie, T. A working guide to boosted regression trees, J.
8
Anim. Ecol. 2008, 77, 802−813.
9
(31) Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. J. Jap. Soc. Artif.
10
Intell. 1999, 14 (5), 771−780.
11
(32) Cao, D-S.; Xu, Q-S.; Liang, Y-Z.; Zhang, L-X.; Li, H-D. The boosting: A new idea of
12
building models. Chemom. Intell. Lab. Syst. 2010, 100, 1−11.
13
31 ACS Paragon Plus Environment