Application of 1H NMR Profiling To Assess Seed Metabolomic

May 5, 2015 - Monsanto Company, 800 North Lindbergh Boulevard, St. Louis, Missouri 63167, United States. § 150 North ... E-mial: george.g.harrigan@mo...
4 downloads 15 Views 894KB Size
Article pubs.acs.org/JAFC

Application of 1H NMR Profiling To Assess Seed Metabolomic Diversity. A Case Study on a Soybean Era Population George G. Harrigan,*,† Kirsten Skogerson,*,† Susan MacIsaac,§ Anna Bickel,† Tim Perez,† and Xin Li† †

Monsanto Company, 800 North Lindbergh Boulevard, St. Louis, Missouri 63167, United States 150 North Research Campus Drive, Kannapolis, North Carolina 28081, United States

§

S Supporting Information *

ABSTRACT: 1H NMR spectroscopy offers advantages in metabolite quantitation and platform robustness when applied in food metabolomics studies. This paper provides a 1H NMR-based assessment of seed metabolomic diversity in conventional and glyphosate-resistant genetically modified (GM) soybean from a genetic lineage representing ∼35 years of breeding and differing yield potential. 1H NMR profiling of harvested seed allowed quantitation of 27 metabolites, including free amino acids, sugars, and organic acids, as well as choline, O-acetylcholine, dimethylamine, trigonelline, and p-cresol. Data were analyzed by canonical discriminant analysis (CDA) and principal variance component analysis (PVCA). Results demonstrated that 1H NMR spectroscopy was effective in highlighting variation in metabolite levels in the genetically diverse sample set presented. The results also confirmed that metabolite variability is influenced by selective breeding and environment, but not genetic modification. Therefore, metabolite variability is an integral part of crop improvement that has occurred for decades and is associated with a history of safe use. KEYWORDS: soybean (Glycine max), NMR, metabolomics, canonical discriminant analysis, principal variance component analysis



and plant research,2,15,17−21 including applications to soybean. The objectives of the study described here were to characterize metabolomic variability in soybean seed through application of 1 H NMR and applied multivariate statistics and to interpret results in the context of selective breeding and the history of safe consumption of soybean. 1H NMR profiles of nine soybean varieties sharing a genetic lineage representing ∼35 years of breeding (commercial launch years 1972−2008) and increasing yield potential were therefore generated. The selected varieties included four older conventional lines (commercial launch years 1972−1996), two newer conventional lines (1997− 2008), and three glyphosate-tolerant genetically modified (GM) lines (1999−2008) that were grown concurrently at two replicated field sites in the United States during the 2011 growing season. The 1H NMR analysis utilized standard one-dimensional profiling techniques, and a total of 27 metabolites were quantified. Analysis of the metabolite data set included canonical discriminant analysis (CDA), a multivariate approach that finds linear combinations of the quantitative variables that can provide maximal separation between classes or groups. Principal variance component analysis (PVCA) was pursued to assess interrelationships between the seed metabolites as well as the impacts of location and pedigree (variety) on variation in metabolite levels.

INTRODUCTION Soybean (Glycine max L.) is a global food and feed commodity, utilized primarily as a source of protein and oil. It also contains bioactive metabolites such as isoflavones, which are associated with health benefits.1,2 Globally, breeding programs are committed to developing new varieties with improved agronomic characteristics34 and nutritional profiles.5 The steady rise in yield potential observed in newer varieties over the past few decades6−8 has been driven by the need to maintain sustainable production of this important crop to meet the demands of a growing population. Studies have shown that selective breeding as well as differences in crop genetics and environment can affect soybean composition.8−11 In general, levels of protein and oil, the major value components of soybean, are inversely correlated to each other and vary depending on yield potential, genotype, and environment.5,12,13 Levels of other soybean components such as isoflavones as well as antinutrients such as certain oligosaccharides are similarly variable. A better understanding of the relationships between agronomic performance and levels of key crop nutrients and metabolites will be of value as improvements to soybean yield potential continue apace. Metabolomics may be an effective tool in evaluating agronomic and nutritional performance.2,14−19 In recent mass spectrometry (MS)-based metabolomics studies, it has been demonstrated that soybean cultivars can be differentiated on the basis of metabolite profiles.1 Nuclear magnetic resonance (NMR) spectroscopy has several benefits as a metabolite profiling technique including ease of sample preparation, high sample throughput, identification of a broad range of chemical compounds in a single experiment, and generation of quantitative data. 1H NMR techniques have been broadly applied in the metabolomics community for biomedical, food, © XXXX American Chemical Society

Received: February 26, 2015 Revised: April 23, 2015 Accepted: April 24, 2015

A

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry



Additional regions where spectral alignment was poor were also removed prior to binning the data (δ 6.51−6.8, δ 7.07−7.095, δ 7.2− 7.31, δ 7.52−7.60, δ 7.78−7.9, δ 8.09−8.30). Binned data were normalized to the spectrum area excluding the regions that were removed to account for solvent peaks and alignment shifts. These binned data (files S1 and S2) are available as Supporting Information. Chemical shifts of metabolites identified and quantified from 1H NMR spectra of soybean extracts are available in Table 2.

MATERIALS AND METHODS

Biological Material. Nine soybean varieties representing a genetic lineage from Williams (1972) to A3555 (2008) were grown concurrently at two sites in Illinois [Jerseyville (ILJE) and Jacksonville (ILJA)] during the 2011 season. Varieties included six conventional and three glyphosate-tolerant (GM) lines. Variety launch year and grouping based on genetic similarity (see the Supporting Information) are listed in Table 1. Starting seeds were planted in a randomized

Table 2. Chemical Shifts of Metabolites Identified and Quantified from 1H NMR Spectra of Soybean Extracts

Table 1. Launch Year and Average Yield of Each Variety variety

launch year

yield at ILJAa

yield at ILJEa

Williams A3127 CX366 CX375 A3469 AG3701 AG3705 A3555 AG3803

1972 1979 1986 1996 1997 1999 2006 2008 2008

66.5 68.2 71.9 71.8 80.3 72.8 80.6 85.9 78.8

65.3 61.0 66.6 66.3 73.5 71.1 77.2 74.2 76.4

metabolite 2-oxoglutarate 4-aminobutyrate acetate alanine arginine asparagine aspartate choline citrate dimethylamine formate fumarate galactose

a

Bushels/acre. ILJA represents the Jacksonville, IL, site, and ILJE represents the Jerseyville, IL, site.

complete block design with six replicates. Soybean plants were treated with maintenance pesticides as necessary throughout the growing season at both sites. The three glyphosate-tolerant varieties were not treated with glyphosate. Seed was harvested at maturity and homogenized by grinding with dry ice to a fine powder, lyophilized, and stored in a freezer set to maintain −20 °C prior to analysis. Two samples (one replicate of AG3705 at ILJA and one replicate of CX375 at ILJE) were lost at harvest. Comprehensive compositional and MS-based metabolomics assessments of these study samples have been presented in Harrigan et al.11 and Kusano et al.,22 respectively. Genetic characterization of all varieties using the Illumina Infinium platform was also presented in Kusano et al.;22 a summary of thaose data has been incorporated into Supporting Information Figure 1. Sample Extraction and NMR Analysis. Ground, lyophilized grain samples (50 mg) were suspended in 1.8 mL of 80% MeOH and sonicated for 15 min at room temperature. Following sonication the sample was centrifuged and supernatant collected. The pellet was reextracted twice as above. Pooled supernatants were dried and resuspended in 1 mL of NMR buffer (100 mM sodium phosphate, pD 7.4, with 0.5 mM 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) chemical shift standard). Samples were filtered prior to loading into NMR tubes (700 μL) due to the presence of insoluble material. All NMR data were collected on a Bruker AVANCE III spectrometer (600.13 MHz) at 298 K with a Bruker 5 mm TCI CryoProbe. One-dimensional 1H NMR spectra were acquired using a standard Bruker noesypr1d pulse sequence. Each spectrum consisted of 128 scans of 65563 data points with a spectral width of 7.2 kHz, an acquisition time of 4.55 s, and a recycle delay of 1 s per scan. Data collection parameters were based on those used by Chenomx to generate the standard spectra in the library to ensure best possible results for the quantitative fitting of the spectra. Exponential line broadening of 0.3 Hz was applied before Fourier transformation. Spectra were phase corrected manually and referenced to DSS (methyl singlet chemical shift of 0.00 ppm). Processed spectra were uploaded into the Chenomx NMR Suite for metabolite identification and quantitation using the 600 MHz library (version 7.7). The Chenomx software algorithms reconstruct each sample spectrum from a linear sum of the individual standard spectra in the library, and quantitation is based on the concentration of the DSS methyl singlet reference. Spectra were also binned (0.02 and 0.01 ppm bin size) from 0.0 to 10.0 ppm with selected regions excluded to remove variations from residual solvents: δ 3.33−3.36 (residual MeOH), δ −0.04 to 0.04, δ 0.58−0.67, δ 1.75−1.80, δ 2.88−2.93 (DSS), δ 4.68−4.88 (H2O).

glucose glutamate isoleucine lactate leucine malate O-acetylcholine p-cresol succinate sucrose threonine trigonelline tryptophan valine

H chemical shift (multiplicity)a

1

2.43 (t), 2.99 (t) 1.89 (m), 2.28 (t), 3.0 (t) 1.90 (s) 1.47 (d), 3.78 (q) 1.64 (m), 1.72 (m), 1.89 (m), 1.92 (m), 3.23 (t), 3.76 (t) 2.85 (dd), 2.94 (dd), 3.99 (q) 2.67 (dd), 2.80 (dd), 3.89 (dd) 3.19 (s), 3.51 (m), 4.06 (m) 2.53 (d), 2.67 (d) 2.72 (s) 8.44 (s) 6.51 (s) 3.48 (dd), 3.64 (dd), 3.69 (m), 3.71 (m), 3.73 (m), 3.74 (m), 3.77 (m), 3.80 (dd), 3.84 (dd), 3.92 (d), 3.98 (d), 4.07 (t), 4.57 (d), 5.25 (d) 3.22 (dd), 3.40(m), 3.46 (m), 3.48 (t), 3.53 (dd), 3.70 (t), 3.72 (dd), 3.76 (dd), 3.82 (m), 3.84 (m), 3.89 (dd), 4.63 (d), 5.22 (d) 2.04 (m), 2.12 (m), 2.32 (m), 2.36 (m), 3.75 (dd) 0.93 (t), 1.00 (d), 1.25 (m), 1.46 (m), 1.97 (m), 3.66 (d) 1.32 (d), 4.11 (q) 0.95 (m), 1.67 (m), 1.70 (m), 1.73 (m), 3.73 (m) 2.35 (q), 2.66 (dd), 4.29 (d) 2.14 (s), 3.22 (s), 3.72 (m), 4.54 (m) 2.22 (s), 6.84 (m), 7.14 (m) 2.39 (s) 3.46 (t), 3.55 (dd), 3.68 (s), 3.75 (t), 3.79 (m), 3.82 (m), 3.83 (m), 3.88 (m), 4.04 (t), 4.21 (t), 5.40 (d) 1.32 (d), 3.58 (d), 4.25 (m) 4.42 (s), 8.07 (t), 8.82 (dd), 9.11 (s) 7.18 (m), 7.27 (m), 7.31 (s), 7.53 (d), 7.72 (d) 0.98 (d), 1.03 (d), 2.26 (m), 3.6 (d)

a1

H chemical shifts used for metabolite identification and quantification were determined at pD 7.4 in 100 mM sodium phosphate buffer and expressed as relative values to that of DSS at 0 ppm. Letters in parentheses denote the peak multiplicities: s, singlet; d, doublet; t, triplet; dd, doublet of doublets; q, quartet; and m, multiplet. Univariate Statistical Analysis. Means, standard errors, and ranges were calculated in SAS 9.4 (SAS Institute Inc.) (Table 3). Values were combined across both sites. Pearson correlation coefficients were also determined between the 27 metabolites and yield. Canonical Discriminant Analysis. CDA is a dimension-reduction technique related to principal component analysis and canonical correlation. CDA finds linear combinations of the quantitative variables that provide maximal separation between classes or groups. Given a classification variable and several quantitative variables, the CANDISC procedure in SAS derives canonical variables, linear combinations of the quantitative variables that summarize betweenclass variation in much the same way that principal components summarize total variation. Principal Variance Component Analysis. The procedure and rationale for applying PVCA to crop compositional data are similar to B

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry Table 3. Least-Squares Means of Metabolites Assessed across Both Sites Williams

A3127

CX366

CX375

A3469

AG3701

AG3705

A3555

AG3803

metabolitea

1972

1979

1986

1996

1997

1999

2006

2008

2008

2-oxoglutarate 4-aminobutyrate acetate alanine arginine asparagine aspartate choline citrate dimethylamine formate fumarate galactose glucose glutamate isoleucine lactate leucine malate O-acetylcholine succinate sucrose threonine trigonelline tryptophan valine p-cresol

6.26 9.57 33.01 7.18 21.51 9.51 46.49 96.97 583.34 66.99 4.02 2.56 19.42 22.73 58.69 8.03 5.23 14.16 116.78 9.60 5.36 4052.38 7.36 16.53 24.76 8.26 17.78

6.00 10.90 40.73 10.18 23.29 10.40 46.40 102.09 618.18 102.29 4.37 2.36 20.70 21.27 65.70 8.75 5.54 15.46 113.06 11.96 5.23 3393.53 8.19 18.30 22.98 9.60 23.17

6.28 10.24 41.89 9.38 25.43 10.90 44.14 93.35 609.53 94.50 4.61 2.45 20.78 23.09 63.76 8.50 5.45 14.49 123.83 12.12 5.18 4156.46 7.98 19.41 22.14 9.06 21.11

6.50 10.09 38.86 7.66 23.56 10.75 50.39 100.43 635.69 91.47 4.17 2.48 19.99 22.61 59.49 7.77 4.88 13.76 111.11 12.43 4.76 3851.73 7.74 15.99 23.71 8.79 21.36

8.08 10.67 38.21 8.94 25.00 9.16 43.72 95.44 555.20 80.85 4.08 3.42 20.82 23.82 53.60 8.15 5.19 11.62 176.29 8.69 4.98 4364.50 7.92 16.03 27.73 9.40 22.66

7.42 9.45 41.48 7.95 22.74 10.01 56.67 102.11 585.56 86.35 4.35 2.82 21.54 24.63 53.40 7.49 5.17 12.89 155.28 7.69 4.65 4512.91 7.77 18.37 22.44 9.46 25.26

6.73 9.79 42.61 7.33 22.02 8.75 52.39 88.87 554.94 90.41 4.70 2.63 19.48 23.85 59.03 7.17 5.06 11.94 119.64 9.19 5.46 4266.23 7.89 16.15 17.74 8.42 20.27

7.10 11.05 27.73 10.47 33.14 11.22 48.56 86.53 567.81 69.82 3.63 4.34 22.79 27.81 53.71 10.48 5.37 15.26 179.45 9.65 5.19 3945.86 8.54 13.74 26.02 10.83 22.44

6.48 9.90 38.92 9.41 23.81 8.24 42.79 94.14 622.98 83.39 3.93 3.14 24.28 28.72 47.99 8.54 5.19 14.07 152.08 10.33 5.93 4613.51 8.28 12.43 23.70 9.32 16.12

14.72 65.87

12.02 64.58

13.56 69.26

13.36 69.30

12.23 76.87

12.66 71.95

14.55 78.73

13.95 80.05

14.89 77.57

seed weightb yieldc

a Micrograms per 50 mg DW. bGrams per 100 seed. cBushel/acre. Least-squares means were based on a total of 12 biological replicates, six from the Jacksonville (ILJA) site and six from the Jerseyville (ILJE) site (n = 12), with an exception for AG3705 (n = 11), where one replicate was lost at harvest at ILJA, and for CX375 (n = 11), where one replicate was lost during harvest at ILJE.



those described in detail in Harrigan et al.11 Essentially, the first step of the PVCA procedure is to normalize the responses, if necessary. In this study the responses were all approximately normally distributed. The second step is to check for linear dependencies in the data. This can be done by computing the correlation matrix of variables and then computing the rank of that matrix. If the rank is less than the number of variables, then some variables must be dropped from the analysis. The third step is to apply PCA to the correlation matrix. Li et al.23 continued development of the original PVCA procedure with the principal components, but we added an intermediate step, factor analysis, to assist with results interpretation. Factor analysis with varimax rotation is a method of deriving new linear combinations of variables with information derived from PCA that assist in deciding the appropriate number of factors needed to capture sufficient variability. The final step is to apply variance components analysis to each of the derived factor variables (F1, F2, ..., Fj). An ANOVA model is applied, and all of the sources of variation of interest to the researcher are modeled with random effects. In our application of PVCA,1,11 the principal components and factor scores are strongly correlated with individual compositional analytes if the absolute value of the linear correlation between the derived variable and the analyte exceeds 0.707, with the interpretation that at least 50% of the variation in the analyte can be explained by its linear relationship with the new variable. Absolute correlations between 0.5 and 0.707 can be described as moderately correlated, with the amount of variation in the analyte that can be explained by the factor ranging between 25 and 50%. PVCA was conducted in SAS 9.4 (SAS Institute Inc.).

RESULTS AND DISCUSSION Selective plant breeding has contributed to an increase in soybean yield over time, as reflected in the varieties assessed in this study (Table 1). The genotypic diversity found in these soybean varieties (see the Supporting Information) allowed an opportunity to evaluate metabolite variability associated with decades of selective breeding and yield differences. 1H NMR spectroscopy was assessed as a profiling technique due to the robustness of the platform and its quantitation capabilities, although it typically detects many fewer metabolites than mass spectrometry-based methods. In this study 1H NMR profiling allowed quantitation of 27 metabolites, which included free amino acids (alanine, arginine, asparagine, aspartate, glutamate, isoleucine, leucine, threonine, tryptophan, valine, 4-aminobutyrate), sugars (galactose, glucose, and sucrose), organic acids (2-oxoglutarate, acetate, citrate, formate, fumarate, lactate, malate, and succinate), choline, O-acetylcholine, dimethylamine, trigonelline, and p-cresol (Table 2). Least-squares means combined across both sites are presented in Table 3. Corresponding ranges and standard errors are included in the Supporting Information. Portions of each spectrum remained unassigned due to the available metabolites in the Chenomx library and spectral overlap. Metabolite coverage could be improved for these soybean samples through the addition of soybean-specific metabolites to the Chenomx library. C

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry

−0.75), and formate (r = −0.75) also correlated well with Can 1. Varieties contained within either of the two main clusters discriminated by Can 1 showed separation along the Can 2 axis (Figure 1). Can 2 was most highly correlated with p-cresol (r = −0.82). Principal Variance Component Analysis. PVCA included the variables yield and seed size in addition to the metabolite data. It was applied here to investigate the interrelationships among metabolite components as well as the impact of location and pedigree on variability in the levels of these metabolites. A total of eight factors described >80% of the variance in the study (Supporting Information Table 3). Factors F1−F5 are discussed in more detail below. The correlation matrix of components listed by factor, as determined by the varimax rotation, can be found in Supporting Information Table 4. Next, analysis of variance with only random effects was used to estimate the variance within each factor from the following sources: location, pedigree (i.e., variety), block (location) (i.e., biological replicate within location), and residual error (Figure 2). Percent of total variability shows that residual accounted for 42.3%; pedigree, 37.8%; unexplained, 17.3%; location, 1.7%; and block (location), 0.9% (Figure 3). Factor F1: Free Amino Acids (Alanine, Arginine, 4Aminobutyrate, Asparagine, Isoleucine, Leucine, and Valine) and an Organic Acid (Fumarate). The first factor, F1, explained 24.4% of the total variation in the metabolomics study (Figure 2 and Table 4). It was strongly correlated with three free amino acids (arginine, isoleucine, and valine) and moderately correlated with four free amino acids (alanine, 4aminobutyrate, asparagine, and leucine) as well as the organic acid fumarate. Analysis of variance showed that the largest source of variance in F1 was pedigree followed by residual error. Figure 4 contains box plots to display the variation of values in F1 according to pedigree. These results from PVCA were consistent with those from CDA. For example, in the CDA analysis all amino acids listed above, with the exception of leucine, had correlations with Can 3 of r > 0.6, whereas fumarate, as noted earlier, correlated very strongly with Can 1 (r = 0.92). In other words, metabolites shown by PVCA to be highly variable according to pedigree were also observed to contribute to separation in the CDA plots showing that results from both multivariate approaches were broadly consistent with each other. Variety A3555, which had the highest score for factor F1 in the PVCA (see Figure 4), also had (or shared) the highest combined-site mean values for alanine, arginine, 4-aminobutyrate, asparagine, isoleucine, leucine, valine, and fumarate. This newer conventional line may therefore be a major contributor to the metabolite variation observed in these data. Factor F2: Sugars (Galactose and Glucose), Organic Acids (Lactate and Fumarate), and Free Amino Acids (Alanine and Threonine). The second factor, F2, explained 16.6% of the total variation in the metabolomics study (Supporting Information Table 5). It was strongly correlated with two sugars (galactose and glucose), two organic acids (lactate and succinate), and the free amino acid threonine. It was moderately correlated with the free amino acid alanine. The largest source of variance in F2 was residual error. Supporting Information Figure 3 contains box plots that display the variation of values in F2 according to pedigree. F2 also had the most notable contribution to variance from location.

Statistical analysis of the data set focused on two key methods. The first method is CDA to determine if linear combinations of the quantitative variables exist that can be used to discriminate between the nine varieties. The second method is PVCA to estimate the percentage of the total variability in the data set attributed to each major source of variability. Canonical Discriminant Analysis. CDA is a dimensionreduction technique related to principal component analysis and canonical correlation. CDA finds linear combinations of the quantitative variables that provide maximal separation between classes or groups. When applied here, the first three canonical factors (Can 1, Can 2, and Can 3) accounted for 87% of the variation in the data (Supporting Information Table 1). These factors proved effective in providing discrete clustering of the nine different varieties, as shown in Figure 1 and Supporting Information Figure 2. The older varieties were discriminated from the newer varieties, with the exception of AG3705.

Figure 1. Canonical discriminant analysis of metabolite data for all nine varieties. On canonical 1 axis, the four older varieties (A3127, CX375, CX366, and Williams) can be distinguished from four of newer varieties (A3469, A3555, AG3701, and AG3803). AG3705 clustered with the older varieties and was shown (see text) to have some metabolic features similar to these.

Notably, Can 1 proved highly effective in discriminating between the older varieties and, with the exception of AG3705, the newer varieties. Can 1 was strongly correlated with malate (r = 0.93), fumarate (r = 0.92), glucose (r = 0.86), and glutamate (r = −0.88) (Supporting Information Table 2). The least-squares mean values (combined site) for these metabolites could, at least in part, also differentiate the varieties. For example, mean malate values for the older varieties and AG3705 ranged from 111.11 to 123.83 μg/50 mg DW in contrast to 155.28−179.45 μg/50 mg DW for the newer varieties (excluding AG3705). Corresponding values (older varieties plus AG3705 versus newer varieties excluding AG3705) for the other metabolites were as follows: fumarate, 2.36−2.56 versus 2.63−4.34 μg/50 mg DW; glucose, 21.27− 23.09 versus 23.82−28.72 μg/50 mg DW; and glutamate, 58.69−65.70 versus 47.99−59.03 μg/50 mg DW. Other metabolites such as galactose (r = 0.72), trigonelline (r = D

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry

Figure 2. Proportions of variance explained by variance components for each factor.

Figure 3. Proportions of variance explained by each factor.

Figure 5 contains box plots to display the variation of values in F3 according to pedigree. Figure 6 shows the contribution of pedigree to variation in each metabolite associated with F3. Figure 5 reveals that the five newer higher yielding soybean varieties have F3 scores above the zero reference line, whereas the four older lower-yielding lines have negative scores. This is broadly consistent with results from the CDA where Can 1 also discriminated (with the exception of AG3705) the older and newer varieties. Malate, glutamate, and fumarate all correlated highly with Can 1 in that analysis. Sucrose and O-acetylcholine had lower correlations with Can 1 (r = 0.51 and −0.55, respectively) with the exception of A3555, 4.27−4.61 mg/50 mg DW (A3555 value = 3.95) versus 3.39−4.16 mg/50 mg DW for the older varieties. Sucrose has been reported to

Table 4. Summary of Factor 1 Sources of Variation (24.4%) source of variance

fraction of variance due to source

PVC contribution

pedigree residual error location block (location)

68.33 31.50 0.18 0.00

16.67 7.69 0.04 0.00

Factor F3: Yield, Sucrose, Organic Acids (Malate and Fumarate), Glutamate, and O-Acetylcholine. The third factor, F3, explained 12.35% of the total variation in the study (Table 5). It was strongly correlated with malate and glutamate and moderately correlated with yield, sucrose, fumarate, and Oacetylcholine. The largest source of variance in F3 was pedigree. E

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry

Figure 4. Factor F1 by pedigree. This factor was highly associated with pedigree (variety). Plotting factor 1 for each pedigree allows a visual representation of how varieties vary in levels of metabolites associated with this factor.

plots) are presented in the Supporting Information; as described there, fumarate, malate, glutamate, and O-acetylcholine had some of the highest correlations relative to that of the other metabolites. Factor F4: Acetate, Formate, and Dimethylamine. The fourth factor, F4, explained 9.1% of the total variation in the study (Supporting Information Table 6). It was strongly correlated with acetate, formate, and dimethylamine. The largest source of variance in F 4 was residual error. Supplementary Figure 4 contains box plots to display the variation of values in F4 according to pedigree. Although variance in F4 due to pedigree was relatively low, a wide range

Table 5. Summary of Factor 3 Sources of Variation (12.35%) source of variance

fraction of variance due to source

PVC contribution

pedigree residual error block (location) location

81.19 17.28 1.07 0.46

10.03 2.13 0.13 0.06

correlate positively with yield,24 and NMR profiling results are also consistent with those observed in a targeted analysis of the same sample set reported here.11 Correlation coefficients for individual metabolites and yield (including individual scatter

Figure 5. Factor F3 by pedigree. This factor was highly associated with pedigree (variety). Plotting factor 3 for each pedigree shows that the four older varieties (A3127, CX375, CX366, and Williams) can be distinguished from the newer varieties (A3469, A3555, AG3701, AG3705, and AG3803). F

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry

Figure 6. Pedigree effect accounts for notable proportion of variation for those components that strongly correlated with factor F3. For yield, a location effect also contributed to variation.

could distinguish older lower yielding from newer higher yielding varieties, which implies the potential for associating metabolite changes with improved soybean yield and quality. PVCA indicated the major source of metabolite variability in the study was associated with variety (pedigree), but location was a notable source of variation for some components. Several metabolites were grouped in the same factor as yield, which suggests those metabolites are correlated with yield potential. This was consistent with the CDA analysis that showed some metabolites were associated with the newer higher yielding varieties as well as correlation analyses of the individual 27 metabolites and yield (Supporting Information). Overall, these analyses suggest that certain metabolite pathways may be linked to yield potential. Other metabolites were less associated with yield potential but were still highly variable when assessed across the entire data set. This would indicate that some pathways are incidental to yield gain but can be expected to vary depending on genetic background and growing location. The study confirms that crop genetics and breeding for crop improvements can have a notable influence on metabolite variability; some metabolites may be associated with gains in yield potential, and others more incidental to germplasm improvement and/or differences in growing location. In addition, the transgenic and conventional lines were not uniquely distinguishable from each other, which is consistent with numerous publications (e.g., ref 22) and supports the position that genetic modification is not a meaningful contributor to metabolite variability. In summary, metabolite variability is associated with a history of safe use and possibly essential for crop improvement.

of values for these metabolites was observed. This variation did not show a trend with launch era; in fact, the combined-site mean values, as well as the F4 score, for the oldest variety (Williams, 1972) and one of the newest (A3555, 2008) are most similar. Factor F5: Aspartate, Asparagine, Oxoglutarate, and p-Cresol. The fifth factor, F5, explained 7.0% of the total variation in the study (Supporting Information Table 7). It was strongly correlated with aspartate and moderately correlated with asparagine, oxoglutarate, and p-cresol. The largest sources of variance in F5 were residual error and pedigree. Supporting Information Figure 5 contains box plots to display the variation of values in F5 according to pedigree. Of these metabolites, the CDA analysis indicated p-cresol had the highest correlation with any of the canonical scores (Can 2, r = −0.82). Aspartate and oxoglutarate also had negative correlations with Can 2 (r = −0.62 and −0.60, respectively). It was noted earlier that Can 2 was effective in within-group separation of the two major clusters discriminated by CDA. For example, the Can 2 scores groups for the newer varieties can be ordered as AG3701 < A3469 < A3555 < AG3803. The difference between AG3701 and AG3803 is reflected both in the F5 scores and in the individual metabolite combined-site mean values. For example, the mean value for pcresol in AG3701 is 25.26 μg/50 mg DW and in AG3803 is 16.12 μg/50 mg DW. Corresponding values for aspartate are 56.673 and 42.79 μg/50 mg DW, respectively. The wide range appears to be driven predominantly by AG3701, which has the highest p-cresol and aspartate values in the entire data set. Otherwise, the within-group variation for these metabolites is very similar for the older and newer varieties. A better understanding of the relationships between agronomic performance and levels of key crop nutrients and metabolites in soybean seed will be of value as improvements to yield potential continue apace. This study provided a quantitative comparison of seed metabolite profiles from a soybean lineage representing decades of breeding and differing yield potential. The application of 1H NMR spectroscopy and multivariate statistics generated data that could differentiate soybean cultivars based on metabolite profiles. Notably CDA



ASSOCIATED CONTENT

S Supporting Information *

Additional figures and tables as described in the text. The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jafc.5b01069. G

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX

Article

Journal of Agricultural and Food Chemistry



(15) Jiao, Z.; Si, X.; Zhang, Z.; Li, G.; Cai, Z. Compositional study of different soybean (Glycine max L.) varieties by 1H NMR spectroscopy, chromatographic and spectrometric techniques. Food Chem. 2012, 135, 285−291. (16) Kusano, M.; Saito, K. Role of metabolomics in crop improvement. J. Plant Biochem. Biotechnol. 2012, 21 (Suppl. 1), S24−S31. (17) Silvente, S.; Sobolev, A. P.; Lara, M. Metabolite adjustments in drought tolerant and sensitive soybean genotypes in response to water stress. PLoS One 2012, 7, 1−11. (18) Valdes, A.; Simo, C.; Ibanez, V.; Garcia-Cana, V. Foodomics strategies for the analysis of transgenic food. TrAC, Trends Anal. Chem. 2013, 52, 2−15. (19) Hohmann, M.; Christoph, N.; Wachter, H.; Holzgrabe, U. 1H NMR profiling as an approach to differentiate conventionally and organically grown tomatoes. J. Agric. Food Chem. 2014, 62, 8530− 8540. (20) Mannina, L.; D’Imperio, M.; Capitani, D.; Rezzi, S.; Guillou, C.; Mavromoustakos, T.; Vilchez, M. D. M.; Fernandez, A.; Thomas, F.; Aparicio, R. 1H NMR-based protocol for detection of adulteration of refined olive oil with refined hazelnut oil. J. Agric. Food Chem. 2009, 57, 11550−11556. (21) Neumuller, K. G.; Carvalho de Souza, A.; Van Rijn, J.; Appeldoorn, M. M.; Streekstra, H.; Schols, H. A.; Gruppen, H. Fast and robust method to determine phenoyl and acetyl esters of polysaccharides by quantitative 1H NMR. J. Agric. Food Chem. 2013, 61, 6282−6287. (22) Kusano, M.; Baxter, I.; Fukushima, A.; Oikawa, A.; Okazaki, Y.; Nakabayashi, R.; Bouvrette, D. J.; Achard, F.; Jakubowski, A. R.; Ballam, J. M.; Phillips, J. R.; Culler, A. H.; Saito, K.; Harrigan, G. G. Assessing metabolomic and chemical diversity of a soybean lineage representing 35 years of breeding. Metabolomics 2015, 11, 261−270. (23) Li, J.; Bushel, P. R.; Chu, T.-M.; Wolfinger, R. D. Principal variance components analysis: estimating batch effects in microarray gene expression data. In Batch Effect and Experiemental Noise in Microarray Studies: Sources and Solutions; Wiley: West Sussex, UK, 2009; pp 141−154. (24) Wilcox, J. R.; Shibles, R. M. Interrelationships among seed quality attributes in soybean. Crop Sci. 2001, 41, 11−12.

AUTHOR INFORMATION

Corresponding Authors

*(G.G.H.) Phone: (314) 694-7432. E-mial: george.g.harrigan@ monsanto.com. *(K.S.) Phone: (636) 737-6071. E-mail: kirsten.skogerson@ monsanto.com. Notes

The authors declare the following competing financial interest(s): The authors are employed in the agricultural biotechnology business.



ACKNOWLEDGMENTS We are very grateful for the agronomic support provided by Matt Culler of Monsanto. The excellent logistical support provided by James McCarter and his sample management team was also critical to the success of this experiment.



REFERENCES

(1) Lin, H.; Rao, J.; Shi, J.; Hu, C.; Cheng, F.; Wilson, Z. A.; Zhang, D.; Quan, S. Seed metabolomic study revelas significant metabolite variations and correlations among different soybean cultivars. J. Integr. Plant Biol. 2014, 56, 826−836. (2) Caligiani, A.; Palla, G.; Maietti, A.; Cirlini, M. B. 1H NMR fingerprinting of soybean extracts, with emphasis on identification and quantification of isoflavones. Nutrients 2010, 2, 280−289. (3) Mikel, M. A.; Diers, B. W.; Randall, L. N.; Smith, H. H. Genetic diversity and agronomic improvement of North American soybean germplasm. Crop Sci. 2010, 50, 1219−1229. (4) Thompson, J. A.; Randall, L. N. Utilization of diverse germplasm for soybean yield improvement. Crop Sci. 1998, 38, 1362−1368. (5) Clemente, T. E.; Cahoon, E. B. Soybean oil: genetic approaches for modification of functionality and total content. Plant Physiol. 2009, 151, 1030−1040. (6) Jin, J.; Liu, X.; Wang, G.; Mi, L.; Shen, Z.; Chen, X.; Herbert, S. J. Agronomic and physiological contributions to the yield improvement of soybean cultivars released from 1950 to 2006 in Northeast China. Field Crop Res. 2010, 115, 116−123. (7) Koester, R. P.; Skoneczka, J. A.; Cary, T. R.; Diers, B. W.; Ainsworth, E. A. Historical gains in soybean (Glycine max Merr.) seed yield are driven by linear increases in light interception, energy conversion, and partitioning efficiencies. J. Exp. Bot. 2014, 65, 3311. (8) Specht, J. E.; Hume, D. J.; Kumudini, S. V. Soybean yield potential − a genetic and physiological perspective. Crop Sci. 1999, 39, 1560−1570. (9) Rotundo, J. L.; Westgate, M. E. Meta-analysis of environmental effects on soybean seed composition. Field Crop Res. 2009, 110, 147− 156. (10) Harrigan, G. G.; Lundry, D.; Drury, S.; Berman, K.; Riordan, S. G.; Nemeth, M. A.; Ridley, W. P.; Glenn, K. C. Natural variation in crop composition and the impact of transgenesis. Nat. Biotechnol. 2010, 28, 402−404. (11) Harrigan, G. G.; Culler, A.; Culler, M.; Breeze, M.; Berman, K.; Halls, S.; Harrison, J. Investigation of biochemical diversity in a soybean lineage representing 35 years of breeding. J. Agric. Food Chem. 2013, 61, 10807−10815. (12) Morrison, M. J.; Voldeng, H. D.; Cober, E. R. Agronomic changes from 58 years of genetic improvement of short-season soybean cultivars in Canada. Agron. J. 2000, 92, 780−784. (13) Morrison, M. J.; Cober, E. R.; Saleem, M. F.; McLaughlin, N. B.; Frégeau-Reid, J.; Ma, B. L.; Yan, W.; Woodrow, L. Changes in isoflavone concentration with 58 years of genetic improvement of short-season soybean cultivars in Canada. Crop Sci. 2008, 48, 2201− 2208. (14) Cheng, J.; Yuan, C.; Grahamn, T. L. Potential defense-related prenylated isoflavones in lactofen-induced soybean. Phytochemistry 2011, 72, 875−881. H

DOI: 10.1021/acs.jafc.5b01069 J. Agric. Food Chem. XXXX, XXX, XXX−XXX