Use of Near-Infrared Spectroscopy, Partial Least ... - ACS Publications

Apr 18, 2016 - Camila Assis,. †. Maria Lúcia F. Simeone,. ‡ and Marcelo M. Sena*,†,§. †. Departamento de Química, Instituto de Ciências Ex...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/EF

Use of Near-Infrared Spectroscopy, Partial Least-Squares, and Ordered Predictors Selection To Predict Four Quality Parameters of Sweet Sorghum Juice Used To Produce Bioethanol Cristiane C. Guimaraẽ s,†,‡ Camila Assis,† Maria Lúcia F. Simeone,‡ and Marcelo M. Sena*,†,§ †

Departamento de Química, Instituto de Ciências Exatas (ICEx), Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, Minas Gerais, Brazil ‡ Embrapa Milho e Sorgo, MG 424, Km 45, 35701-970 Sete Lagoas, Minas Gerais, Brazil § Instituto Nacional de Ciência e Tecnologia em Bioanalítica, 13083-970 Campinas, São Paulo, Brazil S Supporting Information *

ABSTRACT: Sweet sorghum juice is gaining importance as a raw material for the first-generation ethanol production in the period between harvests of sugar cane. Breeding programs are seeking to improve sorghum quality to increase productivity, what has generated an excessive number of samples to be analyzed. Thus, the aim of this paper was to develop rapid and low-cost methods based on partial least-squares (PLS) and near-infrared spectroscopy (NIRS) for the determination of four quality chemical parameters of sweet sorghum. Spectra were recorded with a transflectance accessory, and robust models were built with 500 samples obtained from more than 200 hybrids and inbred strains. Optimization by variable selection was carried out with ordered predictors selection (OPS), providing simpler, more interpretable and predictive multivariate calibration models. The methods were developed in the working ranges of 5.5−18.1 °Brix, 1.2−5.2%, 0.3−13.0%, and 9.8−83.0% for degrees Brix, reducing sugars, polarizable sugars, and apparent purity, respectively. Root-mean-square errors of prediction (RMSEP) of 0.3 °Brix, 0.3%, 0.6%, and 5.3% were obtained for these four parameters, respectively. Finally, a complete multivariate analytical validation was carried out, and the methods were considered linear, accurate, sensitive, and without bias.



yeast has provided a rapid fermentation process.6 Considering the importance of feedstock quality for increasing the ethanol yield, biotechnology has been applied in sorghum breeding programs that seek to enhance the productivity by selecting varieties and new hybrids with higher sugar contents and appropriate composition.7 This type of program starts from a bank of germplasm, passing through steps, such as field trials and compositional analysis, until the final estimate of the bioethanol yield. One example is the sweet sorghum breeding program (SSBP) of the Brazilian Agriculture Research Corporation (EMBRAPA). A key step in this research is the chemical analysis of the extracted juices. This step generates a large number of samples that demand rapid analysis, because the sorghum juice is a perishable sample. The main parameters determined for these samples are °Brix, reducing sugars, polarizable sugars (Pol), and apparent purity. Degrees Brix is a relative density scale indicating the percentage of the total soluble solids by weight (grams per 100 mL of water) expressed as sucrose, in °Brix. Reducing sugars represent glucose and fructose. Pol measures the sucrose concentration. Apparent purity is a general quality parameter, defined as the percentage of Pol in relation to °Brix. This last parameter is very important because juice samples with apparent purities below 75% are considered of low potential for producing bioethanol. Industrial units may even refuse to receive juice loadings with low

INTRODUCTION Biofuels represent a well-established sustainable energy source. They present a series of advantages over fossil fuels, such as the reduction of pollutant emissions in the atmosphere, the improvement in the air quality, their lower costs associated, and the opportunity of income generation in rural areas.1 Firstgeneration biofuels are obtained by direct fermentation of the juice extracted from sugar raw materials, such as sugar cane and beet, or from starchy materials, such as corn and wheat grains. In contrast, second-generation biofuels are obtained from lignocellulosic feedstock, such as sugar cane bagasse and corn or rice straw. Brazil is the second largest producer of firstgeneration ethanol in the world, after the U.S.A., and has a long and strong tradition in the production of biofuels.2,3 Despite the well-established process for producing ethanol from sugar cane, there has been an increasing interest in the research for other alternative raw materials. One of the most promising alternatives for bioethanol production is sweet sorghum, a C4 plant of African origin, which has been mainly used for animal feed. This plant has a high capacity to accumulate sugars in the stalk and presents several agronomic advantages, such as early maturity, high photosynthetic efficiency, high resistance to heat and drought, minimal requirement of fertilizers, and wide adaptability.4,5 The main advantage of sorghum is the shorter harvest period, 100− 140 days, about one-third of the sugar cane cycle (12−18 months). This makes it a suitable complementary crop for the diversification of sugar cane croplands in Brazil. In addition, ethanol production from the sorghum juice using immobilized © XXXX American Chemical Society

Received: February 20, 2016 Revised: April 15, 2016

A

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels apparent purity.8 Nevertheless, these parameters used to be determined by classical wet methods or chromatographic analysis, which are laborious, time-consuming, and relatively expensive, consuming a lot of reagents and generating a large amount of residues. In the last few years, the joint use of vibration spectroscopic techniques [mainly near-infrared spectroscopy (NIRS)] and multivariate calibration methods [mainly partial least-squares (PLS)] has allowed the development of direct, non-destructive, fast, and low-cost methods. These methods have been applied for the quality control of pharmaceuticals, foods, agricultural commodities, fuels, etc. Other important advantages of them have been the reduction in the consumption of reagents and in the generation of chemical waste, in accordance with the principles of green chemistry.9 Thus, the use of NIRS and PLS is a very attractive alternative for developing methods to meet the high demand of chemical analyses generated by sorghum breeding programs, such as SSBP. In a previous paper, direct NIR methods for determining composition of sorghum biomass have been developed, aiming to evaluate the production of second-generation bioethanol.10 In the present work, the focus is on the sorghum juice. The determination of sugar composition or related parameters, such as °Brix and Pol, by NIRS has been carried out in fruits11,12 and sugar cane juices.13−15 Other papers have predicted the sugar content with NIRS from skin of sugar cane16 and stalks of sorghum.17,18 One important aspect explored in this work is the use of variable selection for improving the analytical performance of the developed methods. The selection of a limited number of signals as the most predictive spectral variables provides models with richer information content specifically related to the properties/analytes of interest as well as less spectral overlapping with interferences.14,19 Variable selection can be of two general types: (1) on the basis of the inspection of informative vectors for the full PLS model, such as regression coefficients and variable importance in projection (VIP) scores, or (2) on the basis of the search for sensors that provide minimum prediction errors. The first type of methods and the algorithms associated tend to be significantly faster than the second type. Sorol et al.14 have evaluated the use of several variable selection methods on NIR spectra for predicting °Brix in sugar cane juice and observed somewhat better results with genetic algorithms, a method of the second type. An important conclusion of this paper has been that the use of regression coefficients or other informative vectors in variable selection should always be complemented with some kind of window search, to appropriately select sensors (wavenumbers) for successful PLS models. In this paper, the developed models were optimized using ordered predictors selection (OPS), a method that combines the principles of the two types of variable selection principles mentioned above. OPS has been recently proposed20 and aims at selecting the most predictive variables by systematic investigating PLS informative vectors in a cross-validation process, leading to a great reduction in the number of sensors. Its algorithm is based on bidiagonal decomposition, and there are seven options of informative vectors for starting the method: regression coefficients, correlation vector between columns of blocks X and Y, residual vector, covariance procedure vector, VIP scores, net analyte signal (NAS), and signal-to-noise ratio vector. This method can be summarized in five sequential steps: (1) an informative vector or a combination of more than one is obtained from a PLS

model; (2) variables are differentiated according to their absolute values in the informative vector; (3) they are sorted in descending order; (4) regression models are built and evaluated by leave-N-out cross-validation; an initial variable window is selected; and increments are added; and (5) variable sets are compared on the basis of quality parameters of cross-validation, such as root-mean-square error of cross-validation (RMSECV) and correlation coefficient. OPS has been applied for optimizing multivariate calibration models obtained from spectrofluorimetric,20 Raman,20 and NIR20,21 spectra, chromatographic signals,22 and quantitative structure−activity relationship problems.23 The methods developed in this work were also previously optimized by a robust process of detection of outliers based on three parameters: leverages, spectral, and prediction residuals.24 Finally, the reliability of the methods was confirmed by carrying out a full multivariate analytical validation, an issue absent in the most of the NIRS models published in the literature. For this purpose, several figures of merit (FOM), such as linearity, trueness, precision, working range, selectivity, sensitivity, analytical sensitivity (γ), bias, and residual prediction deviation (RPD), were estimated on the basis of the relevant and recent literature.24−28 Particularly, the linearity was evaluated by proper statistical tests to ensure the random behavior of the residuals. In this protocol,29 the tests of Ryan−Joiner (RJ), Brown−Forsythe (BF), and Durbin−Watson (DW) were sequentially used for evaluating normality, homoscedasticity (constant variance across all values), and independency, respectively. Thus, the objective of this research was to develop and validate analytical methods, to meet the high-frequency demand of rapid, direct, and non-destructive determinations of juice samples from the EMBRAPA sweet sorghum breeding program. For this purpose, models based on NIRS and PLS were built for predicting four quality parameters (°Brix, Pol, reducing sugars, and apparent purity) of sorghum juice used to produce first-generation bioethanol. All of the methods were optimized by variable selection with OPS, providing simpler, more interpretable, and predictive multivariate calibration models.



EXPERIMENTAL SECTION

Samples. Samples originated from different sweet sorghum genotypes were obtained from the breeding program (SSBP) and the Active Germplasm Bank of the EMBRAPA Maize and Sorghum Research Center, located in Sete Lagoas, Minas Gerais, Brazil. Characterization of the materials was carried out by three tests in the EMBRAPA experimental fields, and then three genetic diversity panels were built. Samples from a maturation test were also included to improve the variability contained in the model. These samples were collected from the early development until the last stage of plant maturation, ensuring a wide range of sugar contents. A total of 500 juice samples were obtained from more than 200 recombinant inbred lines of sweet sorghum derived from two contrasting lines in relation to the quality and quantity of sugars. Plant stalks of each genotype, without panicles, were initially shredded and homogenized. In the sequence, the juice was extracted with a hydraulic press, 250 kgf cm2 for 1 min.30 The extracted juice was collected in polyethylene flasks and promptly analyzed by reference methods and NIRS. Reagents. All reagents were of analytical grade and purchased from certified suppliers. Sucrose (99.9%, w/w), anhydrous D-glucose (99.7%, w/w), Octopol (clarifying reagent containing aluminum), Fehling A (copper sulfate), and Fehling B (sodium and potassium tartrate, and sodium hydroxide) solutions were used. Deionized water was obtained from a Millipore Milli-Q system. B

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels Table 1. Optimization of the PLS Models (Final Results in Italics) by Outlier Detection degrees Brix Ncala Nvalb RMSECc RMSEPc a

reducing sugars

Pol

apparent purity

first

second

third

fourth

first

second

third

fourth

first

second

third

fourth

first

second

third

fourth

333 167 0.7 0.7

305 167 0.6 0.8

284 167 0.5 0.8

284 145 0.5 0.4

333 167 0.6 0.7

301 167 0.4 0.7

276 167 0.4 0.7

276 136 0.4 0.4

333 167 1.2 1.1

302 167 0.9 1.2

279 167 0.7 1.2

279 137 0.7 0.8

333 167 8.8 7.8

310 167 7.1 7.6

288 167 6.3 8.1

288 142 6.3 5.8

Number of calibration samples. bNumber of validation samples. cReducing sugars, Pol, and apparent purity in % (w/v) and degrees Brix in °Brix.

samples. The best models were obtained using the first derivative with Savitzky−Golay smoothing (15 points in the filter and second-order polynomial fit). The best PLS models for all of the parameters were selected with 6 LV, except for reducing sugars, for which 9 LV were used (Table 1). Raw spectra of 500 samples of sweet sorghum juice are shown in Figure 1. First derivatives of these same spectra are

Instrumentation and Software. The following instruments were used for the reference methods: a portable digital refractometer Atago 3810 PAL-1 (Tokio, Japan), a polarimeter with a diode laser source and birefringent prisms analyzer developed by Ribeiro et al.,31 a reducing sugars determinator Redutec TE-088 (Piracicaba, Brazil), a hydraulic press Hidraseme PHS 250 (Ribeirão Preto, Brazil), a shredder, and a homogenizer IRBI (Araçatuba, Brazil). A Büchi NIRFlex N-500 FT-NIR spectrometer (Flawil, Switzerland), equipped with a transflectance accessory, was used for spectra acquisition. MATLAB, version 7.13 (The MathWorks, Inc., Natick, MA), PLS Toolbox, version 6.7.1 (Eigenvector Technologies, Manson, WA), and a homemade routine for OPS20 were used for data handling. NIR Spectra Acquisition. Juice samples were previously filtered in cotton and transferred to Petri dishes (100 mm diameter), on where the transflectance accessory was positioned and NIR spectra were recorded. Spectra were obtained from 10 000 to 4000 cm−1 with 4 cm−1 steps, 32 scans, at 25 ± 2 °C, in triplicate. Replicate spectra (10) of the empty cell were obtained in the same conditions for estimating instrumental noise. Reference Methods.8 Degrees Brix was directly measured in the refractometer. Pol was determined on the basis of a saccharimetric reading at 589 nm; juice samples were diluted in deionized water (1:1, v/v), mixed with 7 g of the Octopol reagent, and filtered in qualitative filter paper; the filtrate was placed in a quartz cuvette; and the measure was obtained in a polarimeter previously calibrated with a 15% (w/v) glucose solution. Reducing sugars were determined on the basis of an adaptation of the Lane−Eynon method,32 which is recognized by the Brazilian sugar cane industry.8 This adaptation consists of replacing the Lane−Eynon indicator methylene blue by a platinum electrode using a specific apparatus (Redutec), to improve the accuracy of the end point determination.33 Samples were filtered in absorbent cotton, diluted in deionized water (1:10, v/v), and transferred to a buret; this solution was used for titrating a known volume of Fehlling’s solution previously standardized with D-glucose (0.25%, w/v), under boiling conditions; the titration was carried out in the Redutec system; and the end point was potentiometrically determined. Apparent purity (Q) was calculated from a relation between saccharimetric reading (S) and °Brix, according to Q = 100 × S/°Brix.

Figure 1. Raw NIR spectra of 500 sorghum juice samples.

shown in Figure S1 of the Supporting Information. The main sample components responsible for NIR vibrations are glucose, sucrose, fructose, and water. Four spectral regions can be highlighted. Strong peaks between 7400 and 6400 cm−1 and between 5400 and 4600 cm−1 are mainly related to the first overtone of O−H stretching and O−H combination bands of water, respectively. Spectral regions between 5800 and 5400 cm−1 and between 4600 and 4000 cm−1 are related to the first overtone of C−H stretching and C−H + C−H and C−H + C− C combination bands, respectively, both attributed to vibrations of the molecules of sugars.17,36,37 The region between 10000 and 7800 cm−1, related to second and third overtones, presented low absorption, increasing the signal-to-noise ratios. Considering the absence of significant signals in this region, it was deleted previously to the building of the models. Figure 2 shows the regression coefficients for the models. The coefficients for °Brix, Pol, and apparent purity present a great similarity among them, with the exception of the region between 5200 and 4700 cm−1. This similarity was not a surprise, because °Brix mainly reflects the total content of sugars, predominantly sucrose (2−4 times higher than the content of glucose plus fructose), while Pol is only related to sucrose, and apparent purity is a relation between the two former parameters. The most distinct spectral region is related to a C−O + O−H combination band36 attributed to sugars (a peak at 5186 cm−1),17 which is overlapped by water O−H vibrations in the raw spectra. In general, coefficients associated with water vibrations are negative, while the coefficients



RESULTS AND DISCUSSION Initial PLS Models. A total of 500 samples were analyzed, representing all of the chemical variation normally expected for routine analysis28 of sorghum juice from the breeding program. For all four parameters, the reference values were approximately normal distributed in the analytical ranges. The samples were split in 333 and 167 for the calibration and validation sets, respectively, using the Kennard−Stone algorithm.34 This algorithm aims to select representative sample spectra homogeneously distributed in the whole multivariate space. Cross-validation by contiguous blocks (6 splits) was used for choosing the best number of latent variables (LV) based on the RMSECV. Data were previously mean-centered. In addition, other proper preprocessings were tested [multiplicative scatter correction (MSC), standard normal variate (SNV), and derivatives],35 to cope with the presence of spectral baseline deviations caused by light scattering present in the turbid juice C

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels

Figure 2. Regression coefficients of the initial PLS models for (a) °Brix, (b) reducing sugars, (c) Pol, and (d) apparent purity.

previous paper,10 in which NIR models for determining sorghum biomass parameters were developed. Variable Selection with OPS. Initial PLS models (Table 1) were constructed with full NIR spectra, in the wavenumber range of 7800−4000 cm−1. With the aim to develop simpler and predictive models, a discrete variable selection with OPS was applied. Initial conditions tested were cross validation with leave-N-out, with N = 30, about 10% of the number of samples in the calibration set,20 windows of 50 and increments of 20, 30, and 50 variables, and number of OPS components (hOPS) between 6 and 20. All seven options of prognostic vectors were tested. Results for the optimized OPS−PLS models built with preprocessed spectra (first derivative and Savitzky−Golay smoothing) were shown in Table 2. The best models were obtained with increments of 20 variables and regression coefficients as the prognostic vector, which is coherent with the literature.20 The number of OPS components used in the optimization varied from 13 (°Brix) to 19 (reducing sugars). All of the best models were obtained with 6 LV. The number of variables used to build the models was significantly reduced

associated with sugars are positive. The interpretation of regression coefficients should be carefully considered, because it strongly depends upon the specific data under analysis and their noise structure.38 Thus, further discussion about spectral attributions will be more properly performed in variable selection with the OPS section. Model Optimization by Outlier Detection. Initial PLS models were optimized by detection of outliers based on appropriate parameters.13,24,28,29 Samples with extreme leverages, large residuals in the X block (spectral outliers) or in the Y block (prediction outliers), were detected at the 95% confidence level. A limit of three rounds for detection of outliers in the calibration set was adopted, aiming to avoid the snowballing effect.13 After the calibration set, detection in the validation set was carried out in only one round. According to the Brazilian39 and international29,40 guidelines, no more than 22.2% (2/9) of the total number of samples can be detected as outliers. The results for each round of outlier detection are shown in Table 1. The number of 6 LV for °Brix, Pol, and apparent purity models was maintained, while the number of LV of the model for predicting reducing sugars decreased from 9 to 8. The number of outliers removed varied from 14% (apparent purity) to 17% (reducing sugars) in the calibration set and from 13% (°Brix) to 18.5% (reducing sugars) in the validation set. About 75% of the outliers was detected on the basis of large prediction residuals, while 17% was based on large spectral residuals and 8% was based on extreme leverages. The high number of outliers based on prediction residuals is justified by the fact that the reference values were obtained without replicates with wet chemical methods, as previously established in the routine analysis protocol of the laboratory. Percentages of detected outliers and their distribution among the three criteria were reasonably similar to the results obtained in our

Table 2. Optimization of the PLS Models by OPS Variable Selection Nvarsa hOPSb RMSECVc RMSECc RMSEPc

degrees Brix

reducing sugars

Pol

apparent purity

130 13 0.4 0.3 0.3

50 19 0.4 0.3 0.3

50 17 0.5 0.5 0.6

90 17 6.5 6.1 5.3

a

Number of variables used in the model. bNumber of components used in the OPS optimization. cReducing sugars, Pol, and apparent purity in % (w/v) and degrees Brix in °Brix.

D

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels

Figure 3. Selected wavenumbers by OPS for (a) °Brix, (b) reducing sugars, (c) Pol, and (d) apparent purity.

showed positive regression coefficients and can be attributed to glucose and fructose vibrations, including some specific wavenumbers, such as 4434 and 4404 cm−1,17,37 located in the region of C−H + C−H combination bands. For Pol, a parameter directly related to sucrose, 50 variables were selected (Figure 3c). One-third of these variables (16) is located in the regions of water vibrations, between 7036 and 6824 cm−1 (O− H first overtone) and between 5244 and 4816 cm−1 (O−H combination band). On the other hand, two-thirds (34 variables) is located in the spectral range of C−H combination bands. A wavenumber specifically characteristic of a sucrose absorption band, 4398 cm−1 (2274 nm),37 was selected and showed a positive peak in the Pol regression vector. For apparent purity, a parameter that expresses a relation between °Brix and Pol, an intermediate number of variables was selected, 90 (Figure 3d), with the majority of them already chosen for °Brix or Pol models. The selected spectral variables were between 7228 and 7196 cm−1 (O−H first overtone), between 5316 and 4724 cm−1 (O−H combination bands), between 4468 and 4184 cm−1 (C−H + C−H combination bands), and between 4064 and 4056 cm−1 (C−H + C−C combination band).36 Multivariate Analytical Validation. The developed methods were validated, and the appropriated FOM are

from 950 (full spectra) to 50−130, depending upon each parameter. This decrease corresponds to 5.3−13.7% of the total number of original variables. All of the OPS models were improved in comparison to full spectra models, presenting lower root-mean-square errors of calibration (RMSEC) and prediction (RMSEP). This can be observed by comparing Tables 1 and 2. Figure 3 displays the selected variables for each parameter/ model. The largest number of variables (130) was selected for the °Brix model. Degrees Brix is a quantitative measurement of the total soluble solids in the juice that includes all sugars but does not provide any qualitative information about them. The selected wavenumbers (Figure 3a) include small regions between 7128 and 7076 cm−1 and between 6972 and 6924 cm−1 (first overtone of O−H stretching) and multiple variables between 5260 and 4756 cm−1 (O−H combination bands), which can be attributed to water vibrations. These regions showed negative regression coefficients in the PLS model and are inversely related to the sugar content. Other selected wavenumbers are comprised between 4588 and 4040 cm−1, showing predominantly positive correlation coefficients and attributed to C−H and C−C combination bands of sugars. For reducing sugars, OPS selected only 50 variables (Figure 3b), all of them located in the region of C−H + C−H combination bands, between 4784 and 4280 cm−1. Most of these variables E

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels Table 3. Parameters for Evaluating FOM Used in the Analytical Validation of the Final OPS−PLS Models FOM linearity

trueness precision working range selectivity analytical sensitivity (γ) γ−1 bias RPD

parameter

degrees Brix

reducing sugars

Pol

apparent purity

Req (RJ test) tL (BF test) d (DW test) slopea intercepta correlation coefficient (r)a REPb RSDc

0.9988 1.2691 2.14 0.98 ± 0.02 0.2 ± 0.6 0.9915 2.2% 2.6% 5.5−18.1 °Brix 0.08 2.7 °Brix−1 0.4 °Brix 0.037 ± 0.339 °Brix 1.31 9.5 8.3

0.9952 2.0346 1.94 0.90 ± 0.07 0.8 ± 2.3 0.9044 9.7% 5.1% 1.2−5.2% 0.05 1.7%−1 0.6% −0.051 ± 0.371% 1.60 2.9 3.1

0.9916 1.0385 2.16 0.96 ± 0.04 0.8 ± 2.1 0.9747 7.2% 4.5% 0.3−13.0% 0.08 2.7%−1 0.4% 0.108 ± 0.797% 1.59 4.9 4.8

0.9905 1.7111 2.11 0.91 ± 0.07 8.5 ± 19.5 0.9145 7.4% 3.7% 9.8−83.0% 0.07 1.1%−1 0.9% −0.086 ± 5.811% 0.18 2.5 2.3

bias ± SDVd estimated t RPD calibration RPD validation

Values for the line fitted to the calibration samples. bMean relative prediction error for the validation set. cResults for three samples, each at three different content levels. dStandard deviation of validation errors. a

typical FOM: RMSECV, RMSEC, and RMSEP (Table 2). RMSEP of 0.3 °Brix, 0.3%, 0.6%, and 5.3% were estimated for °Brix, reducing sugars, Pol, and apparent purity, respectively, indicating a good agreement between reference and predicted values for independent samples. Particularly, RMSEP and RMSEC for °Brix, both of 0.3 °Brix, are in the same order of typical errors for refractometric measurements.25 Another FOM calculated for evaluating trueness was the mean relative prediction error for the validation set (REP), whose values varied from 2.2% for degrees Brix to 9.7% for reducing sugars (Table 3). Precision was only estimated at the level of repeatability. Intermediate precision cannot be evaluated, because the juice samples are unstable and readily degradable. Thus, they must be analyzed at the same day of collecting. Repeatability was evaluated by estimating relative standard deviations (RSD) for triplicates of three samples at three levels of content: low, medium, and high. RSD varied between 2.6% (°Brix) and 4.7% (reducing sugars). After the linearity and accuracy of the methods were established, working ranges were defined as 5.5−18.1 °Brix, 1.2−5.2% (reducing sugars), 0.3− 13.0% (Pol), and 9.8−83.0% (apparent purity). Sensitivity and selectivity for multivariate methods are estimated on the basis of the concept of NAS and as mean values for the calibration samples.24−26 Opposite to univariate methods, here there is no sense in requiring selectivity of 100%, because low values can be associated with accurate multivariate calibration models. Estimated values meant that between 5% (reducing sugars) and 8% (°Brix and Pol) of the analytical signal was used for the prediction of each parameter. Considering the pure sensitivity not adequate for comparison between different methods, analytical sensitivity (γ) was estimated by dividing the former by an estimate of the instrumental noise (0.0083). The inverse of γ estimates the minimum differences in the parameter values distinguishable by the methods, taking into account the random instrumental noise as the only source of errors. This estimate is important for defining the number of significant digits used to express the results. Thus, γ−1 values shown in Table 3 suggested the use of only one decimal place in the prediction results. Bias was estimated for validation samples, according to the ASTM International guidelines.28 With the bias and standard

shown in Tables 2 and 3. The majority of FOM for multivariate methods is estimated differently from univariate methods.24−27 Linearity was estimated through the adjustment of the lines of reference versus predicted values, as shown in Figure S2 of the Supporting Information. Correlation coefficients of these lines varied between 0.9915 for °Brix and 0.9044 for reducing sugars. The worse result for reducing sugars can be explained by their lower concentration in the juice (in comparison to sucrose) and the less precise method, which is based on a hot titration. To assure the linearity of the developed models, the fit residuals of these plots should present a random behavior. This was checked by applying appropriate statistical tests to sequentially verify the normality, homoscedasticity, and independency of the residuals.10,29 Critical values for these tests were calculated adopting N = 288, the highest number of calibration samples among the four methods (apparent purity in Table 1). Normality of the residuals was checked by the RJ test, estimating a Req value for each model. If Req ≥ Rcritic, the residuals are normally distributed. All of the calculated Req values, except for °Brix, were below Rcritic at the 95% confidence level (0.9966). However, Req values for all of the models were above Req at 99% (0.9902). Considering that the RJ test was originally proposed for univariate models, which use a much lower number of samples, the high number of samples typical of multivariate calibration methods leads to a high Rcritic and makes this test too rigorous. Thus, the model residuals were normally distributed at 95% for °Brix and 99% for the other three parameters. Homoscedasticity was checked by the BF test, estimating a tL value for each model. If tL ≤ tcritic, the residuals are homoscedastic. All of the calculated values, except for reducing sugars (tL = 2.03), were below 1.96, the critical value at 95% with infinite degrees of freedom. The reducing sugars model was considered homoscedastic at 98% (tcritic = 2.33). The absence of autocorrelation in the residuals was checked by the DW test, estimating d values that were all within the acceptance range at 95% (1.81−2.19). Once the assumptions about the residuals for linear models were checked, the parameters of the adjustments shown in Table 3 can be held valid. Accuracy of the methods is attested by trueness and precision studies. Trueness of multivariate methods is evaluated by F

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX

Energy & Fuels



deviation of validation errors (SDV), t values were calculated and were below the critical t (1.96, at 95% with infinite degrees of freedom) for all of the parameters. Thus, the absence of systematic errors was verified. RPD is a FOM used for evaluating the performance of multivariate calibration models in absolute terms.41 Models with RPD greater than 1.5 are considered acceptable, while RPD greater than 2.4 are good. As seen in Table 3, all of the models were considered basically good.



CONCLUSION Breeding programs focused on biofuel production, such as SSBP, produce a lot of experiments in the search for more productive plant varieties. This, in turn, generates a high demand of rapid chemical analysis to characterize varieties according to their quality and sugar content. This paper developed NIRS−PLS methods for the determination of four quality chemical parameters (°Brix, reducing sugars, Pol, and apparent purity) in sweet sorghum juice samples used to produce first-generation bioethanol. Robust models were built by analyzing 500 samples obtained from more than 200 hybrids and inbred strains. These methods were optimized by variable selection with OPS, reducing the used number of wavenumbers from 5 to 14% of the original spectra and providing simpler, more interpretable, and predictive NIR models. The developed methods were rapid, non-destructive, and low-cost, being appropriate for replacing the more laborious reference methods. The only sample pretreatment required was a simple juice filtering step, and the results can be immediately available for the laboratory, in less than 1 min. Finally, the methods were submitted to a complete multivariate analytical validation and were considered linear, accurate, sensitive, and without bias.





NOMENCLATURE BF = Brown−Forsythe DW = Durbin−Watson FOM = figures of merit LV = latent variables NAS = net analyte signal NIRS = near-infrared spectroscopy OPS = ordered predictors selection PLS = partial least-squares Pol = polarizable sugars REP = relative prediction error RJ = Ryan−Joiner RMSEC = root-mean-square error of calibration RMSECV = root-mean-square error of cross-validation RMSEP = root-mean-square error of prediction RPD = relative prediction deviation RSD = relative standard deviation SDV = standard deviation of validation errors SSBP = sweet sorghum breeding program VIP = variable importance in projection γ = analytical sensitivity REFERENCES

(1) Soimakallio, S.; Koponen, K. Biomass Bioenergy 2011, 35, 3504− 3513. (2) de Moraes, M. A. F. D.; Zilberman, D. Production of Ethanol from Sugarcane in Brazil; Springer: London, U.K., 2014; DOI: 10.1007/9783-319-03140-8. (3) Nass, L. L.; Pereira, P. P. A.; Ellis, D. Crop Sci. 2007, 47, 2228− 2237. (4) Billa, E.; Koullas, D. P.; Monties, B.; Koukios, E. G. Ind. Crops Prod. 1997, 6, 297−302. (5) Prasad, S.; Singh, A.; Jain, N.; Joshi, H. C. Energy Fuels 2007, 21, 2415−2420. (6) Mei, X.; Liu, R.; Shen, F.; Wu, H. Energy Fuels 2009, 23, 487− 491. (7) Murray, S. C.; Rooney, W. L.; Mitchell, S. E.; Sharma, A.; Klein, P. E.; Mullet, J. E.; Kresovich, S. Crop Sci. 2008, 48, 2180−2193. (8) Conselho dos Produtores de Cana-de-Açuć ar, Açuć ar e Á lcool do Estado de São Paulo (CONSECANA-SP). Manual de Instruçoẽ s; CONSECANA-SP: Piracicaba, Brazil, 2006. (9) Moros, J.; Garrigues, S.; de la Guardia, M. TrAC, Trends Anal. Chem. 2010, 29, 578−591. (10) Guimarães, C. C.; Simeone, M. L. F.; Parrella, R. A. C.; Sena, M. M. Microchem. J. 2014, 117, 194−201. (11) Xie, L.; Ye, X.; Liu, D.; Ying, Y. Food Chem. 2009, 114, 1135− 1140. (12) Cozzolino, D.; Cynkar, W. U.; Shah, N.; Smith, P. Food Res. Int. 2011, 44, 1888−1896. (13) Valderrama, P.; Braga, J. W. B.; Poppi, R. J. J. Agric. Food Chem. 2007, 55, 8331−8338. (14) Sorol, N.; Arancibia, E.; Bortolato, S. A.; Olivieri, A. C. Chemom. Intell. Lab. Syst. 2010, 102, 100−109. (15) O’Shea, M. G.; Staunton, S. P.; Slupecki, P. Int. Sugar J. 2011, 113, 879−887. (16) Nawi, N. M.; Chen, G.; Jensen, T.; Mehdizadeh, S. A. Byosyst. Eng. 2013, 115, 154−161. (17) Chen, S. F.; Danao, M. G. C.; Singh, V.; Brown, P. J. J. Sci. Food Agric. 2014, 94, 2569−2576. (18) Wu, L.; Li, M.; Huang, J.; Zhang, H.; Zou, W.; Hu, S.; Li, Y.; Fan, C.; Zhang, R.; Jing, H.; Peng, L.; Feng, S. Bioresour. Technol. 2015, 177, 118−124. (19) Galvão, R. K. H.; Araujo, M. C. U. Variable selection. In Comprehensive Chemometrics; Brown, S. D., Tauler, R., Walczak, B.,Eds.; Elsevier: Amsterdam, Netherlands, 2009; Vol. 3, pp 233−283, DOI: 10.1016/B978-044452701-1.00075-2.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.energyfuels.6b00408. NIR spectra of 500 sorghum juice samples after preprocessing with first derivative and Savitzky−Golay smoothing (Figure S1) and plot of reference versus predicted values of the calibration (○) and validation (▼) samples for (a) °Brix, (b) reducing sugars, (c) Pol, and (d) apparent purity (Figure S2) (PDF)



Article

AUTHOR INFORMATION

Corresponding Author

*Telephone: +55-31-34096389. Fax: +55-31-34095700. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors are grateful to Reinaldo F. Teófilo (Universidade Federal de Viçosa, Brazil) for making available the OPS routine, Rafael A. C. Parrella (EMBRAPA, Sete Lagoas, Brazil) for providing the sweet sorghum samples, and Célio Pasquini (Universidade Estadual de Campinas, Brazil) for the use of a polarimeter. Camila Assis also thanks CAPES for a fellowship. G

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX

Article

Energy & Fuels (20) Teófilo, R. F.; Martins, J. P. A.; Ferreira, M. M. C. J. Chemom. 2009, 23, 32−48. (21) Costa, R. C.; de Lima, K. M. G. J. Braz. Chem. Soc. 2013, 24, 1351−1356. (22) da Silva, G. A.; Maretto, D. A.; Bolini, H. M. A.; Teófilo, R. F.; Augusto, F.; Poppi, R. J. Food Chem. 2012, 134, 1673−1681. (23) Fresqui, M. A. C.; Ferreira, M. M. C.; Trsic, M. Anal. Chim. Acta 2013, 759, 43−52. (24) Ferreira, M. H.; Braga, J. W. B.; Sena, M. M. Microchem. J. 2013, 109, 158−164. (25) Olivieri, A. C.; Faber, N. M.; Ferré, J.; Boqué, R.; Kalivas, J. H.; Mark, H. Pure Appl. Chem. 2006, 78, 633−661. (26) Valderrama, P.; Braga, J. W. B.; Poppi, R. J. Quim. Nova 2009, 32, 1278−1287. (27) Botelho, B. G.; Mendes, B. A. P.; Sena, M. M. Food Anal. Methods 2013, 6, 881−891. (28) ASTM International. ASTM E1655-05, Standard Practices for Infrared Multivariate Quantitative Analysis. Annual Book of ASTM Standards; ASTM International: West Conshohocken, PA, 2012. (29) de Souza, S. V. C.; Junqueira, R. G. Anal. Chim. Acta 2005, 552, 25−35. (30) Tanimoto, T. Hawaii. Plant. Rec. 1964, 51, 133−150. (31) Ribeiro, L. P. D.; Rohwedder, J. J. R.; Pasquini, C. Anal. Chim. Acta 2013, 771, 1−6. (32) Lane, H.; Eynon, L. J. Soc. Chem. Ind., London 1923, 42, 32T− 37T. (33) Horii, J.; Gonçalves, R. H. STAB Açuć ar Á lcool Subprodutos 1991, 10, 45−47. (34) Kennard, R. W.; Stone, L. A. Technometrics 1969, 11, 137−148. (35) Rinnan, A.; van den Berg, F.; Engelsen, S. B. TrAC, Trends Anal. Chem. 2009, 28, 1201−1222. (36) Workman, J. J., Jr.; Weyer, L. Practical Guide to Interpretive NearInfrared Spectroscopy; CRC Press: Boca Raton, FL, 2008. (37) Rambla, F. J.; Garrigues, S.; de la Guardia, M. Anal. Chim. Acta 1997, 344, 41−53. (38) Brown, C. D.; Green, R. L. TrAC, Trends Anal. Chem. 2009, 28, 506−514. (39) Secretaria de Defesa Agropecuária, Ministério da Agricultura, Pecuária e Abastecimento (MAPA). Manual da Garantia da Qualidade ́ ́ AnaliticaResi duos e Contaminantes em Alimentos; Governo do Brasil: ́ Brazil, 2011; pp 227. Brasilia, (40) Horwitz, W. Pure Appl. Chem. 1995, 67, 331−343. (41) Williams, P. Implementation of Near-Infrared Technology. In Near-Infrared Technology in the Agricultural and Food Industries, 2nd ed.; Williams, P., Norris, K., Eds.; American Association of Cereal Chemists, Inc.: St. Paul, MN, 2001.

H

DOI: 10.1021/acs.energyfuels.6b00408 Energy Fuels XXXX, XXX, XXX−XXX