Enhanced Single Seed Trait Predictions in Soybean (Glycine max

Jan 15, 2016 - Single seed near-infrared reflectance (NIR) spectroscopy predicts soybean (Glycine max) seed quality traits of moisture, oil, and prote...
0 downloads 0 Views 931KB Size
Article pubs.acs.org/JAFC

Enhanced Single Seed Trait Predictions in Soybean (Glycine max) and Robust Calibration Model Transfer with Near-Infrared Reflectance Spectroscopy Gokhan Hacisalihoglu,† Jeffery L. Gustin,‡ Jean Louisma,† Paul Armstrong,§ Gary F. Peter,∥ Alejandro R. Walker,∥ and A. Mark Settles*,‡ †

Biology Department, Florida A&M University, Tallahassee, Florida 32307, United States Horticultural Sciences Department, University of Florida, Gainesville, Florida 32611, United States § Center for Grain and Animal Health Research, USDA-ARS, Manhattan, Kansas 66502, United States ∥ School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, United States ‡

S Supporting Information *

ABSTRACT: Single seed near-infrared reflectance (NIR) spectroscopy predicts soybean (Glycine max) seed quality traits of moisture, oil, and protein. We tested the accuracy of transferring calibrations between different single seed NIR analyzers of the same design by collecting NIR spectra and analytical trait data for globally diverse soybean germplasm. X-ray microcomputed tomography (μCT) was used to collect seed density and shape traits to enhance the number of soybean traits that can be predicted from single seed NIR. Partial least-squares (PLS) regression gave accurate predictive models for oil, weight, volume, protein, and maximal cross-sectional area of the seed. PLS models for width, length, and density were not predictive. Although principal component analysis (PCA) of the NIR spectra showed that black seed coat color had significant signal, excluding black seeds from the calibrations did not impact model accuracies. Calibrations for oil and protein developed in this study as well as earlier calibrations for a separate NIR analyzer of the same design were used to test the ability to transfer PLS regressions between platforms. PLS models built from data collected on one NIR analyzer had minimal differences in accuracy when applied to spectra collected from a sister device. Model transfer was more robust when spectra were trimmed from 910 to 1679 nm to 955−1635 nm due to divergence of edge wavelengths between the two devices. The ability to transfer calibrations between similar single seed NIR spectrometers facilitates broader adoption of this high-throughput, nondestructive, seed phenotyping technology. KEYWORDS: seed phenotyping, near-infrared spectroscopy, chemometrics, microcomputed tomography, density, oil, protein



INTRODUCTION Seeds are the primary commodity produced from soybeans with an annual worldwide production of 319 million tons and $120 billion in economic value.1 Soybean is a major component of animal and fish feed, because soybean seeds contain significant amounts of protein (∼38%) and oil (∼20%) on a dry matter basis when compared to cereal grains.2 Soybean is also used extensively in processed foods such as soy milk, tofu, and soy sauce. Variation in seed size, density, and chemical constituents can impact industrial, food, and feed uses of soybeans. Traits such as seed weight and seed number are the targets for crop improvement due to their association with agronomic performance or consumer preference.3 Development of rapid, nondestructive methods for assessment of seed phenotypes is important for breeding, quality analysis, and seed sorting.4 NIR spectroscopy is a nondestructive phenotyping technique that uses light between 780 and 2500 nm to infer chemical composition.5 NIR light is absorbed through overtone vibrations of the organic bonds: C−H, O−H, and N−H. Using chemometric approaches, NIR absorbance characteristics of a material can be used to predict the levels of organic constituents.4 Spectra are collected as either transmitted or © XXXX American Chemical Society

reflected light, typically from bulk samples of ground or whole seeds. Although bulk NIR data is useful in breeding and quality assessment of seed lots, single seed analysis is necessary for seed sorting.6 Single-seed NIR spectroscopy has been shown to predict chemical constituents in multiple cereal and legume seed crop species.2,4,7,8,11 Single seed NIR analysis of soybeans was initially demonstrated with transmittance spectra collected from manually positioned soybeans to predict moisture content.9 To increase throughput of spectra collection, a single seed NIR device was developed to collect a reflectance spectrum from an individual seed as it tumbles through an illuminated glass tube.8 Comparisons among different single seed NIR acquisition methods found the glass-tube device achieved the highest level of accuracy for predictions of seed moisture, oil, and protein constituents.12 Single seed NIR can also be used to predict physical properties of seeds such as volume and density.7,8 Analytical Received: November 18, 2015 Revised: January 12, 2016 Accepted: January 15, 2016

A

DOI: 10.1021/acs.jafc.5b05508 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Journal of Agricultural and Food Chemistry



reference data for volume and density are conventionally collected with a pycnometer, but most gas pycnometers are limited to bulk analysis of seeds.11 For single seed analysis, Xray microcomputed tomography (μCT) can be used to image and reconstruct a three-dimensional (3D) representation of individual seeds.10 Imaging for μCT is completed with a series of radiographic slices along the long axis of the seed. Each pixel within the radiograph indicates X-ray attenuation of the sample, which is directly related to density. μCT has previously been used to measure the volume and density of maize kernels. Additional seed traits such as maximum cross-sectional area as well as length and width features can also be derived from the 3D reconstruction. Collection and analysis of μCT data is slow and exposes seeds to intense X-ray irradiation making the method impractical for high throughput phenotyping but useful as a reference method to develop single seed NIR calibrations.7 Transferring calibration models between devices of similar design is essential for broad application of NIR spectroscopy to seed phenotyping due to the expense and time required to generate an NIR calibration.13 Robust transfer of a calibration from a master or primary unit to other spectrometers allows trait predictions without needing to develop an independent calibration for each NIR analyzer. Ideally, calibration models can be applied directly to NIR spectra collected on similar devices. However, spectral variations between NIR analyzers due to mechanical and environmental differences can degrade the accuracy of predictions.13,14 Spectral variation between devices can be corrected by collecting NIR spectra from a reference sample set on both the primary and secondary devices.15−19 The spectra from the secondary unit can then be standardized using least-squares regression,17,18 orthogonal signal correction,19 or Fourier transformation.20 Alternatively, the predictions from the primary calibration model applied to a new machine can be adjusted with seed samples analyzed solely on the secondary unit. Spectra are collected from representative samples with associated analytical reference values. The calibration from the primary unit is used to predict constituent values from the NIR spectra. Linear regression between predicted and reference values of these samples defines slope and bias parameters to correct future predictions on the new device.13,14 Transferring calibrations between bulk NIR analyzers of the same model using spectral variation correction methods is a cost-effective method for calibrating a large number of machines. Calibration models can even be transferred between different brands with minimal loss of accuracy.19,20 However, the accuracy of transferring calibrations between single seed NIR analyzers has not been reported. One of the primary challenges is correcting for spectral variation. For protein and starch traits, individual seeds are destroyed for constituent analysis making the seeds unavailable for spectra collection in other NIR units. While strategies for spectral correction may be more challenging for single seed devices than for bulk analyzers, the feasibility of transferring calibrations between glass-tube devices has yet to be experimentally determined. This study had two objectives. First, single seed NIR predictions of soybean traits were enhanced with PLS regressions for μCT derived traits including volume, air space, maximum seed area, length, width, and density. Second, single seed NIR calibration models for oil, protein, and weight were transferred between independent glass tube NIR devices to determine best strategies for cross-device predictions.

Article

MATERIALS AND METHODS

Seed Samples. Ninety soybean genotypes were obtained from USDA Plant Germplasm System at Urbana, Illinois to develop NIR calibrations at the University of Florida. The soybean genotypes were selected to maximize geographic diversity based on the location where the accession was originally collected. Three calibration sets of seeds were sampled for different analytical reference methods, including microcomputed tomography, oil, and protein analysis. Each set sampled three seeds from each genotype for a total of 270 seeds per analytical method. For calibration transfer experiments, 20 soybean bulk samples were reported previously for calibrations developed at the USDA-ARS in Manhattan, Kansas.12 For the Kansas oil calibration set, 24 seeds were randomly picked from the 20 soybean bulks for a total of 480 seeds. For the Kansas protein calibration set, 56 samples were removed due to inaccurate protein reference data resulting in final set of 424 samples. NIR Spectroscopy and Seed Weight. The seeds for the University of Florida calibrations were individually weighed and NIR spectra collected using a semiautomated, single seed NIR analyzer as described.2,4,7,8 Each seed was weighed on a microbalance (MK4, CI Electronics, U.K.) and NIR spectra were collected as the seed fell through a 60 mm glass tube angled at 45° and illuminated by 48 halogen lamps. The NIR spectrometer (NIR-256-1.7T1, Control Development, South Bend, IN, U.S.A.) recorded a spectrum at 1 nm intervals between 907 and 1688 nm with a 20 ms integration time. Absorbance values were calculated as log(1/reflectance) and mean centered to an average absorbance value of one. Each seed was passed through the NIR device four times, and the average weight and spectrum was used in subsequent analyses. The USDA-ARS soybean seed NIR and weight data were described previously.12 Microcomputed Tomography. Seed density and size parameters were measured using a ScanCo Medical μCT 35 (Brüttisellen, Switzerland). Individual seeds were arranged in 20 mm diameter tubes by layering six seeds between low density foam blocks. Seeds were oriented so that a radiograph through the center of the seed would be equivalent to a sagittal section. X-ray scans were taken at 20 μm steps. The seed image in each scan was contoured semiautomatically with limited manual intervention using the manufacture’s software. The threshold to distinguish air from sample was set at 105 for all samples and 3D reconstruction of contoured slices used default parameters. Total and material density as well as total and material volume were reported from the 3D reconstruction. Air space was measured by subtracting material volume from total volume. Seed area, length, and width were recorded from the slice that had the maximum contoured area. Fourteen seeds were dropped from the μCT data set due to incomplete μCT scans of the kernels causing a section of the seed to be missing from the reconstruction. Seed Protein and Oil Measurements. Seed oil and moisture were measured on a MiniSpec mq20 NMR Instrument (Bruker BioSpin, TX, U.S.A.) according to manufacturer’s oil and moisture protocol as described.8 Protein was measured using a C and N (CN) analysis. Single seeds were transferred to 2 mL microcentrifuge tubes and milled for 4−8 min in a MiniBeadbeater-96 (BioSpec Products, OK, U.S.A.) using 8 mm steel ball bearings. Soybean meal was dried at 60 °C for 3 days and stored under vacuum until ready for analysis. For each seed, approximately 10 mg of dry meal was analyzed by a Carlo Erba NA1500 CN elemental analyzer. Total protein was calculated as N × 6.25. Oil and protein content of individual seeds was recorded as a percentage on a dry weight basis. Outlier measurements were identified by calculating within genotype standard deviation, and 18 consecutive oil measurements were removed from the data set due to an apparent sample tracking problem. The six genotypes excluded were Vinton-81, Will, Williams, Williams-82, Young, and Zi-pi-dou, leaving 252 seeds for oil calibration. Collection of the protein and oil reference values from the USDA-ARS calibration set was described previously.12 PLS Regression. Each NIR spectrum was mean-centered and then transformed with multiplicative scatter correction (MSC),21 standard normal variate (SNV), first derivative (1st der.), or second derivative B

DOI: 10.1021/acs.jafc.5b05508 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry Table 1. Descriptive Statistics of Soybean Germplasm for Calibration and Validation Sets calibration CVa

seed trait % oil % protein density (g/cm3) weight (mg) volume (mm3) max area (mm2) length (mm) width (mm) % air space a

N

0.16 0.09 0.05 0.27 0.28 0.23 0.13 0.13 0.47

168 180 170 180 170 170 170 170 170

validation

sdb

mean 19.3 38.8 1.56 151 126 0.35 7 5.89 4.76

min

2.7 3.4 0.08 42 35 0.06 0.83 0.66 2.3

max

9.2 30.5 1.33 47 36 0.19 5.16 3.63 0.9

N

25.1 49.1 1.75 272 240 0.59 10.1 7.74 15.9

sdb

mean 84 90 86 90 86 86 86 86 86

19.0 39 1.55 146 125 0.34 6.98 5.86 4.76

min

2.6 3.7 0.09 40 34 0.07 0.8 0.61 2

max

11 32 1.26 44 54 0.2 5.26 4.37 0.9

23.5 48.8 1.68 249 243 0.55 9.13 7.2 11.4

Coefficient of variation was calculated from all soybeans sampled in this study. bStandard deviation.

Table 2. Pearson’s Correlation Coefficients (r) between Soybean Seed Traitsa

a

trait

oil

protein

oil protein density weight volume max area length width air

1 −0.43

1

density

weight

volume

area

length

width

1 1 0.87 0.66 0.87

1 0.88 0.67 0.87

1 0.90 0.79

1 0.46

1

air

1 0.46 0.46

0.54 −0.38

−0.60

1

Displayed coefficients are significant at a Bonferroni corrected p-value