Identification of Edible Oils by Principal Component Analysis of 1H

Jul 28, 2017 - Principal component analysis (PCA) is a statistical method widely used in chemometric studies to analyze large, correlated sets of data...
2 downloads 0 Views 1MB Size
Communication pubs.acs.org/jchemeduc

Identification of Edible Oils by Principal Component Analysis of 1H NMR Spectra Shauna L. Anderson, David Rovnyak, and Timothy G. Strein* Department of Chemistry, Bucknell University, Lewisburg, Pennsylvania 17837, United States S Supporting Information *

ABSTRACT: Principal component analysis (PCA) is a statistical method widely used in chemometric studies to analyze large, correlated sets of data. An undergraduate laboratory experiment involving PCA of 1H NMR spectral data is described. Students collect NMR spectra of an unknown oil sample, are provided with spectra of six oil standards (canola, corn, olive, peanut, sesame, and sunflower oil), and are asked to identify the unknown oil using score plots based on the PCA results. This laboratory experiment gives students hands-on experience collecting NMR spectra, performing NMR spectral processing, and utilizing freely available, web-based software to subject the data to PCA and to prepare the subsequent scoring plots. KEYWORDS: NMR Spectroscopy, Chemometrics, Upper-Division Undergraduate, Analytical Chemistry, Laboratory Instruction, Hands-On Learning/Manipulatives



INTRODUCTION Triacylglycerols (TAGs), also known as triglycerides, are the primary components of edible oils and are formed by the esterification of fatty acids.1 The length of the fatty acid chain and the degree of saturation of common TAGs vary among different oils. The TAGs found in common cooking oils typically have an even number of carbon atoms (i.e., 16, 18, and 20) with varied degrees of unsaturation in the linear aliphatic chain. The major unsaturated fatty acids present in edible oils are the 18-carbon TAGs of oleic, linoleic, and linolenic acids, which have one, two, and three double bonds, respectively (Figure 1, left). Saturated fatty acids, such as myristic (C14), palmitic (C16), and stearic acids (C18), are also present in edible oils (Figure 1, right). The mixture of these fatty acids of the TAGs in edible oils varies from plant to plant. Thus, subtle differences in fatty acid chain composition and the corresponding subtle spectral differences obtained upon NMR analysis of the oils from different plants allow for the characterization of the botanical origin and authenticity of an oil sample.1−3 The analysis of oils for their fatty acid composition is commonly performed with gas chromatography (GC).4−6 While effective, GC analysis requires time-consuming conversion of fatty acids to their respective methyl esters with much lower boiling points. Sample oxidation during pretreatment can also result in chromatograms that have inadequate © XXXX American Chemical Society and Division of Chemical Education, Inc.

resolution to characterize complex samples fully. Consequently, there has been a growing interest in employing spectroscopic techniques that do not require derivatization, yet provide results comparable to those obtained with GC. Spectroscopic characterization of oils by IR,7−10 NIR,7,11 Raman,2,7 and NMR1,3,10,12−17 methods has been reported. Because the spectroscopic signatures of many oils are quite similar, spectral data analysis often requires a multivariate statistical approach.6−8,11,16−19 Recently, 1H NMR spectroscopy has been used to characterize olive oils3,12,13,15 because NMR spectroscopy does not require sample pretreatment and offers a relatively large amount of information in a single spectrum.1 Validating the authenticity as well as the geographic origin of high-end commercial cooking oils has become an increasingly important endeavor, particularly in the olive oil industry, where adulteration of high-value olive oils with lower-quality oils is a major problem.5,13,16 With the aid of statistical analysis, the small variations among very similar spectroscopic data can be correlated to the composition of an oil.5,6,18 Principal component analysis (PCA) is a leading technique for treating complex spectroscopic data.20,21 In brief, PCA is Received: January 6, 2017 Revised: July 5, 2017

A

DOI: 10.1021/acs.jchemed.7b00012 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Communication

Figure 1. (left) Linear unsaturated fatty acids: oleic acid (C18H34O2), linoleic acid (C18H32O2), and linolenic acid (C18H30O2). (right) Linear saturated fatty acids: stearic acid (C18H36O2), palmitic acid (C16H32O2), and myristic acid (C14H28O2).

Multivariate analysis represents a large family of techniques (PCA, DA, hierarchical analysis, and others) where the overall goal is to divide a large set of correlated data into meaningful subgroups. PCA is a variable reduction technique that attempts to identify subsets and combinations of variables that explain most of the differences (a.k.a. variation) in the data between samples. When data consist of extremely large numbers of variables that cannot be analyzed by inspection, PCA can help to identify a much smaller and more manageable set of variables for subsequent analysis. Briefly, each sample is an experimental observation (an NMR spectrum) that is a vector (x1, x2, ..., xn) whose elements are the signal intensities over a discrete list of chemical shifts. Each element xi is a variable, and an NMR spectrum that has been processed to n points therefore consists of n variables. Since n can be very large, even for a onedimensional (1D) NMR spectrum, where 32k points or more are common, it is generally necessary to bin the data prior to analysis. Binning is just a reduction of the digital resolution. For example, a spectrum processed to 32k points may be reduced to 4k bins by averaging eight adjacent data points into one point. Binning data so that a single peak is typically represented by 1 or 2 bins also helps to remove redundancy in variables. In principle, an NMR peak should be represented by just one or two bins. The PCA method then uses matrix algebra to discern which variables (e.g., peaks) differ the most between the samples. Formally, for nbin bins, one first forms an nbin × nbin correlation or covariance matrix from all of the observations (spectra), and this matrix is diagonalized; the eigenvectors are linear combinations of variables termed principal components (PCs) and can be viewed as composites of the original variables. Diagonalization produces n PCs that are ranked according to how much variation each PC accounts for in the data. It is frequently the goal of principal component analysis to have the top two or three PCs describe the majority, usually >80%, of the variation in an experimental data set and to neglect the remaining PCs.19 Importantly, an aspect of PCA is that an experimental observation can be scored relative to the top few PCs. Samples that are similar to each other will give similar scores for each PC. A scatter plot of the numerical scores for one, two, or three PCs of each sample can reveal clustering of the samples into significant subgroups. In this experiment, a standard set of spectra is acquired from authentic samples of canola, corn, peanut, olive, sesame, and sunflower oil, and students acquire 1D 1H NMR spectra of unknown oils. All of the NMR data are processed, binned, and subjected to PCA.25 The experiment and data analysis can be conducted in a single 4 hour laboratory period of an upperdivision undergraduate analytical chemistry course either individually or in pairs of students. The experiment has now been performed for three years (2014, 2015, and 2016); in

useful for identifying effective variables, better known as principal components (PCs), that are linear combinations of subsets of the original variables (e.g., peak intensities) that change the most across different samples. The core tenet when applying PCA to data from laboratory measurements is that significant variation has an underlying physical basis, guiding researchers, for example, in identifying statistically significant differences in data that correlate with disease state.22−25 Following PCA variable reduction, many types of clustering analyses are applied to help classify samples. They include PCA score plots, which will be demonstrated here, and others such as k-means, dendrograms, heat maps, discriminant analysis (DA), partial least-squares−DA (PLS-DA), and orthogonal partial least-squares−DA (OPLS-DA).26,27 These methods can test whether unknown oil samples match with known oil samples, and have been used in the authentication of high-end olive oil samples.2,3 Importantly, PCA is unbiased (a.k.a. unsupervised) in the sense that it has no prior knowledge of the classifications of samples. Thus, if samples are observed to group in PCA score plots, the clustering can be taken to be statistically significant and to represent meaningful groups.21,28−30 Supervised methods such as DA/PLS-DA use prior knowledge of group classification to identify variables that most strongly depend on group membership, but they can be prone to overfitting and can require substantial expertise to validate the results.26 In the experiment described here, students analyze a series of known oils by 1H NMR spectroscopy and identify an unknown oil sample by performing PCA of the spectral data for both variable reduction and assessment of clustering in PCA score plots. In 2003, Rusak et al.9 described an undergraduate experiment in which students classify vegetable oils using FTIR spectroscopy and PCA. The work presented here is an adaptation of Rusak’s experiment using high-field NMR spectroscopy and PCA. In this experiment, students prepare samples, collect and process NMR spectral data, and perform principal component analysis on the data to identify an unknown oil sample, all within a single 4 h lab period. Several types of student experiences involving chemometrics have been reported in this Journal.9,31−35 Some involve students only in the statistical analysis of provided data,31−33 while others involve students in both data collection and analysis.9,34,35 Wanke and Stauffer35 reported a lab in which students use IR spectroscopy and chemometrics to quantify the components of a ternary mixture of alcohols. Stitzel and Sours34 reported an interesting HPLC analysis for the theobromine and caffeine content of single-origin chocolate samples, followed by the clustering technique DA to help determine the origin of the chocolate samples. B

DOI: 10.1021/acs.jchemed.7b00012 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Communication

labeled as “not for human consumption” when used for laboratory purposes.

recent years each student has been given an individual unknown, while in year 1 (2014) each group was given an unknown. To date, 29 students have performed this experiment, successfully generating final PCA score plots demonstrating clustering among the known oils and correctly identifying all 24 individual unknown oil samples (10 students worked in pairs in 2014). The specific pedagogical goals of allowing students to learn how to (1) collect their own NMR data and (2) perform PCA were accomplished. Also, the more general goals of increasing students’ understanding of both laboratory skills and data analysis skills appear to have been accomplished as well.



RESULTS AND DISCUSSION A plot of representative 1H NMR spectra of 10 vol % solutions of various commercial edible oils (Figure 2) shows only



EXPERIMENTAL SECTION Samples of canola, corn, peanut, olive, sesame, and sunflower oils were purchased in a local grocery store. Samples for NMR spectroscopy were prepared with deuterated chloroform (CDCl3, 99.8%) containing 0.05% tetramethylsilane (TMS). 1

H NMR Data Collection

Students are given a vial of an unknown oil (canola, corn, peanut, olive, sesame, or sunflower oil) and instructed to prepare two 10% oil by volume solutions by mixing 100 μL of oil with 900 μL of CDCl3 containing 0.05% TMS. The instructor prepared five 10 vol % samples of each known oil and performed 1H NMR spectroscopy of these known samples prior to the student experiment. 1 H NMR spectra are collected on a 400 MHz NMR spectrometer at room temperature using a standard 1H 1D onepulse sequence by accumulating four scans, applying an exponential multiplication factor of 0.5 Hz to the free induction decay, zero-filling, Fourier transformation, phasing, and baseline correction. The 1H NMR chemical shifts are expressed in relation to the internal standard TMS resonance, which is set to 0.00 ppm. After the 1H NMR spectra are processed, the region of 0.5 to 6.0 ppm is binned with a bin width of 0.005 ppm, resulting in 1100 variables per spectrum. PCA

In order to be formatted properly for subsequent analysis, binned data are exported as “comma-separated values” (.csv) files and separated into folders: one containing the unknown data and one for each set of standard spectral data. As PCA is unsupervised, the only purpose of dividing the data into folders is that the samples can be labeled on PCA score plots. The folders are compressed into a single zip file that is uploaded into the MetaboAnalyst software.36,37 Using the “Statistical Analysis” module within MetaboAnalyst, PCA is executed. Interquartile range (IQR) is used for data filtering, and the data are normalized by sum and subjected to Pareto scaling, a technique commonly used with PCA of binned NMR signals that allows intensity changes of larger peaks to contribute somewhat more to the overall variation.23 Students utilize both two-dimensional (2D) and three-dimensional (3D) score plots to classify their unknown oils.

Figure 2. Stacked plot of 1H NMR spectra of various edible oils. The spectra were obtained by students.



HAZARDS Deuterated chloroform is a carcinogen and an eye and skin irritant; proper eye and skin protection should be used. It is also hazardous if ingested or inhaled. Any student with an implant or pacemaker should not be allowed in the stray field of the NMR magnet (consult manufacturer’s specifications for fringe field radii). The NMR sample tubes are constructed of thin glass and should be handled with care. Edible oils must be

minimal differences upon visual inspection, consistent with the structural similarity of the fatty acids found in these oil samples. In general, the resonances with chemical shifts of 1−3 ppm correspond to fatty acid chain protons, while those at 4−6 ppm correspond to the glycerol unit.1 By visual inspection of these spectra, it would be difficult to identify conclusively the individual oil present in any one of these samples. The most apparent difference is the intensity of the multiplet at C

DOI: 10.1021/acs.jchemed.7b00012 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Communication

Figure 3. Student-generated 2D PCA score plots: PC1 vs PC2 (left column) and PC1 vs PC3 (right column) where the identity of the unknown is (top row) corn oil or (bottom row) peanut oil and the bin width is 0.005 ppm. There are three black diamonds (unknown) that are nearly coincident.

approximately 2.75 ppm, which is prevalent in the corn and sesame oil spectra. This multiplet corresponds to the methylene protons on C11 of linoleic acid and/or C11 and C14 of linolenic acid (methylene protons on carbons that bridge two double bonds). Additional complex differences are in the multiplets at ca. 1.3 ppm (from aliphatic methylene groups) and 2.0 ppm (from methylene protons adjacent to a double bond). However, using visual inspection of these signals to confidently discriminate among the six oils considered here would be very challenging. Conducting PCA of binned NMR spectral data allowed clear identification of unknown oil sample(s). Student-generated PCA score plots (Figure 3) illustrate the clustering of data from individual oil samples in 2D (PC1 vs PC2 and PC1 vs PC3) PC space. Each datum represents two PC scores from a single spectrum, and the different symbols represent different known oils. Also plotted in each score plot are the data from two different student’s unknown samples (black diamonds). By the proximal location to the clustered data from the known samples, the two unknowns were identified, correctly, as corn and peanut oil (Figure 3). In most cases, the use of a 2D PCA score plot (PC1 vs PC2) was sufficient for unknown classification; this was clearly the case for corn and peanut oils (Figure 3). Peanut oil was nearly fully separated from sunflower and olive oils in the PC1 dimension. Sesame and canola oils were also easily identified by the PC1 score. Sunflower and olive oils were weakly distinguished by PC1 and PC2 scores and moderately separated by PC3. Additional 2D and 3D score plots with other unknowns are included in the Supporting Information. Each

axis of a PCA plot is labeled with the percent of the variance (among all samples, knowns and the unknown) described by that PC. For example, in Figure 3 (top row), PC1 accounted for approximately 79% of the variance, PC2 only 8%, and PC3 4.1%. While only 2D interpretations of the PCA data are presented here, a common feature of statistical packages such as MetaboAnalyst is the capability to generate 3D PCA score plots (PC1 vs PC2 vs PC3) that can be rotated by the user to examine the net discrimination by multiple PCs. Sunflower and olive oils were more straightforwardly identified when the cumulative effect of all three PCs was represented in a 3D PCA score plot (Figure S5). The student-generated score plots (Figure 3) showed the effect of experimental scatter as well as an outlier in the data on the resultant PCA score plot, demonstrating the importance of replicate samples for identification. Student data processing of one of the standard canola oil spectra resulted in an outlier in the PC space; when the instructors returned to the student data it was noted that the associated NMR spectrum had been incorrectly referenced by about 0.01 ppm; nonetheless, the aggregate data were sufficient for PCA to give correct clustering and identification of each student unknown sample considered. When correct referencing was used for the outlying canola oil NMR spectrum, the subsequent PCA clustering improved further (Figure S7). While “known” NMR spectra were acquired by the instructors, all of the spectra and score plots presented here were generated by students, illustrating that this experiment can hold up to some of the unanticipated errors that can occur in teaching laboratories. D

DOI: 10.1021/acs.jchemed.7b00012 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Communication

Table 1. Comparative Results of the ASLE Survey Statements for Student Responsea 1. 2. 3. 4. 5. 6.

7. a

(A) Strongly Agree

(B) Agree

(C) Neutral

(D) Disagree

(E) Strongly Disagree

A or B (%)b

6 4 15 12

10 12 4 7

3 3 0 0

0 0 0 0

0 0 0 0

84 84 100 100

7

7

4

1

0

74

6

9

3

1

0

79

(A) Excellent

(B) Good 9

(D) Poor

(E) Very Poor

0

0

A or B (%)b 100

This experiment helped me develop my data interpretation skills This experiment helped me develop my laboratory skills I found this to be an interesting experiment It was clear to me what I was expected to learn from completing this experiment Completing this experiment has increased my understanding of chemometrics This experiment provided me with the opportunity to take responsibility for my own learning

Overall, as a learning experience I would rate this experiment as

10

(C) Average 0

Students performed this experiment in year 2 (2015) and year 3 (2016). bN = 19.

results of that survey, which is based on questions from the widely used ASLE instrument,38 can be found in Table 1. Clearly, the students found the experiment to be an interesting and useful learning experience.

It is a goal of most clustering analyses to identify the underlying variables that best contributed to the successful clustering of the data. In PCA, the principal components are effective variables that are linear combinations of the original variables. The weightings of the original variables that make up the principal components are termed the loadings. MetaboAnalyst provides easy access to the loadings, where it can be seen that PC1, for example, has significant contributions from the NMR signals of the aliphatic tails (ca. 1.3 ppm) as well as the methine positions (ca. 5.4 ppm); among the strongest loadings in PC1 are subtle differences in the methylene regions at 0.9, 1.4, and 2.0 ppm that would otherwise escape notice. Comparing the structures in Figure 1 with their corresponding spectra in Figure 2, it can be appreciated that the proportion and chemical shifts of the methylene signals may uniquely identify some of these common oils. Visually, such identification is difficult, as the NMR signals for methylene protons (ca. 1.3 ppm) overlap; however, PCA with normalization to the sum of all signals gives rise to fairly large loadings for these aliphatic signals, likely because the proportions of these normalized signals change with the chain length and also with the degree of saturation. Importantly, the loadings for PC1, PC2, and PC3 are not simple, with about 4 to 5 of the NMR signals making significant contributions, reinforcing that the spectra are too complex to attempt to classify these oils only by visual inspection. When the data are binned, smaller bin widths result in larger data file sizes but also greater resolution, which can aid the performance of PCA. For this group of samples, discerning between sunflower, olive, and peanut oil appeared to be the largest challenge; PCA of the peanut oil “unknown” data using bin widths of 0.001 ppm (not shown) and 0.005 ppm (Figure 3) successfully clustered the unknown and authentic peanut oil sample and resulted in very similar score plots (Figure S10). However, the use of 0.01 ppm bins failed to cluster the peanut oil (Figure S11). It is therefore suggested that a bin width of 0.005 ppm or smaller be used. Before this experiment was deployed in the teaching laboratory, the effect of sample concentration was also considered, where 1 to 10 vol % oil samples were prepared and analyzed to compare the spectral resolution and clustering of the known oils by PCA. Increasing the concentration of oil in the sample led to greater signal intensity (Figures S8 and S9) with only minimal changes in the observed spectral line widths. In the second and third iterations of this experiment, students were surveyed at the end of each lab period, and the



CONCLUSION This experiment provided an opportunity to give students an enabling hands-on experience with both modern NMR spectroscopy and PCA using easily acquired “real world” samples in a single 3−4 hour laboratory period. Students were able to successfully identify their respective unknown edible oils from a list of “knowns” for which NMR data were supplied.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available on the ACS Publications website at DOI: 10.1021/acs.jchemed.7b00012. Student handout, additional student-generated PCA plots, results (collected by instructors) with different concentrations of oil and different bin widths, copy of the student survey, and instructor’s notes (PDF, DOCX) Binned data presented in the top row of Figure 3 (ZIP)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Timothy G. Strein: 0000-0002-3747-642X Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors acknowledge Molly McGuire for helpful discussions about PCA in the laboratory component, the Bucknell University Chemistry Department, and NSF MRI CHE-0521108.



REFERENCES

(1) Barison, A.; Pereira da Silva, C. W.; Campos, F. R.; Simonelli, F.; Lenz, C. A.; Ferreira, A. G. A simple methodology for the determination of fatty acid composition in edible oils through 1H NMR spectroscopy. Magn. Reson. Chem. 2010, 48 (8), 642−650. E

DOI: 10.1021/acs.jchemed.7b00012 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Communication

(2) Baeten, V.; Meurens, M.; Morales, M. T.; Aparicio, R. Detection of Virgin Olive Oil Adulteration by Fourier Transform Raman Spectroscopy. J. Agric. Food Chem. 1996, 44 (8), 2225−2230. (3) Sacchi, R.; Mannina, L.; Fiordiponti, P.; Barone, P.; Paolillo, L.; Patumi, M.; Segre, A. Characterization of Italian Extra Virgin Olive Oils Using 1H-NMR Spectroscopy. J. Agric. Food Chem. 1998, 46 (10), 3947−3951. (4) Christie, W. W. Preparation of ester derivatives of fatty acids for chromatographic analysis. In Advances in Lipid Methodology, Vol. 2; Christie, W. W., Ed.; Oily Press: Dundee, Scotland, 1993; pp 69−111. (5) Blanch, G. P.; Caja, M. d. M.; Ruiz del Castillo, M. L.; Herraiz, M. Comparison of Different Methods for the Evaluation of the Authenticity of Olive Oil and Hazelnut Oil. J. Agric. Food Chem. 1998, 46 (8), 3153−3157. (6) Lee, D.-S.; Noh, B.-S.; Bae, S.-Y.; Kim, K. Characterization of fatty acids composition in vegetable oils by gas chromatography and chemometrics. Anal. Chim. Acta 1998, 358 (2), 163−175. (7) Yang, H.; Irudayaraj, J. Comparison of near-infrared, fourier transform-infrared, and fourier transform-raman methods for determining olive pomace oil adulteration in extra virgin olive oil. J. Am. Oil Chem. Soc. 2001, 78 (9), 889−895. (8) Gurdeniz, G.; Ozen, B. Detection of adulteration of extra-virgin olive oil by chemometric analysis of mid-infrared spectral data. Food Chem. 2009, 116 (2), 519−525. (9) Rusak, D. A.; Brown, L. M.; Martin, S. D. Classification of Vegetable Oils by Principal Component Analysis of FTIR Spectra. J. Chem. Educ. 2003, 80 (5), 541−543. (10) Crowther, M. W. NMR and IR Spectroscopy for the Structural Characterization of Edible Fats and Oils. J. Chem. Educ. 2008, 85 (11), 1550−1554. (11) Christy, A. A.; Kasemsumran, S.; Du, Y.; Ozaki, Y. The Detection and Quantification of Adulteration in Olive Oil by NearInfrared Spectroscopy and Chemometrics. Anal. Sci. 2004, 20 (6), 935−940. (12) Knothe, G.; Kenar, J. A. Determination of the fatty acid profile by 1H-NMR spectroscopy. Eur. J. Lipid Sci. Technol. 2004, 106 (2), 88−96. (13) Parker, T.; Limer, E.; Watson, A. D.; Defernez, M.; Williamson, D.; Kemsley, E. K. 60 MHz 1H NMR spectroscopy for the analysis of edible oils. TrAC, Trends Anal. Chem. 2014, 57 (100), 147−158. (14) Miyake, Y.; Yokomizo, K.; Matsuzaki, N. Determination of unsaturated fatty acid composition by high-resolution nuclear magnetic resonance spectroscopy. J. Am. Oil Chem. Soc. 1998, 75 (9), 1091−1094. (15) Vigli, G.; Philippidis, A.; Spyros, A.; Dais, P. Classification of Edible Oils by Employing 31P and 1H NMR Spectroscopy in Combination with Multivariate Statistical Analysis. A Proposal for the Detection of Seed Oil Adulteration in Virgin Olive Oils. J. Agric. Food Chem. 2003, 51 (19), 5715−5722. (16) Rezzi, S.; Axelson, D. E.; Héberger, K.; Reniero, F.; Mariani, C.; Guillou, C. Classification of olive oils using high throughput flow 1H NMR fingerprinting with principal component analysis, linear discriminant analysis and probabilistic neural networks. Anal. Chim. Acta 2005, 552, 13−24. (17) Hartel, A. M.; Moore, A. C. Extraction and 1H NMR Analysis of Fats from Convenience Foods: A Laboratory Experiment for Organic Chemistry. J. Chem. Educ. 2014, 91 (10), 1702−1705. (18) Ramadan, Z.; Jacobs, D.; Grigorov, M.; Kochhar, S. Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. Talanta 2006, 68 (5), 1683−1691. (19) Winning, H.; Larsen, F. H.; Bro, R.; Engelsen, S. B. Quantitative Analysis of NMR spectra with chemometrics. J. Magn. Reson. 2008, 190 (1), 26−32. (20) Teng, Q. NMR-Based Metabolomics. In Structural Biology: Practical NMR Applications; Springer Science: New York, 2013; pp 311−392. (21) Worley, B.; Powers, R. Multivariate Analysis in Metabolomics. Curr. Metabolomics 2013, 1 (1), 92−107.

(22) Bain, J. R.; Stevens, R. D.; Wenner, B. R.; Ilkayeva, O.; Muoio, D. M.; Newgard, C. B. Metabolomics applied to diabetes research: moving from information to knowledge. Diabetes 2009, 58 (11), 2429−2443. (23) Pan, Z.; Gu, H.; Talaty, N.; Chen, H.; Shanaiah, N.; Hainline, B. E.; Cooks, R. G.; Raftery, D. Principal component analysis of urine metabolites detected by NMR and DESI-MS in patients with inborn errors of metabolism. Anal. Bioanal. Chem. 2007, 387 (2), 539−549. (24) Cromer, M. K.; Choi, M.; Nelson-Williams, C.; Fonseca, A. L.; Kunstman, J. W.; Korah, R. M.; Overton, J. D.; Mane, S.; Kenney, B.; Malchoff, C. D.; Stalberg, P.; Akerström, G.; Westin, G.; Hellman, P.; Carling, T.; Björklund, P.; Lifton, R. P. Neomorphic effects of recurrent somatic mutations in Yin Yang 1 in insulin-producing adenomas. Proc. Natl. Acad. Sci. U. S. A. 2015, 112 (13), 4062−4067. (25) Lanza, I. R.; Zhang, S.; Ward, L. E.; Karakelides, H.; Raftery, D.; Nair, K. S. Quantitative metabolomics by 1H NMR and LC-MS/MS confirms altered metabolic pathways in diabetes. PLoS One 2010, 5 (5), e10538. (26) Gromski, P. S.; Muhamadali, H.; Ellis, D. I.; Xu, Y.; Correa, E.; Turner, M. L.; Goodacre, R. A tutorial review: Metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding. Anal. Chim. Acta 2015, 879, 10−23. (27) Ebbels, T. M. D. Non-Linear Methods for the Analysis of Metabolic Profiles. In The Handbook of Metabolomics and Metabonomics; Lindon, J. C., Nicholson, J. K., Holmes, E., Eds.; Elsevier: Amsterdam, 2007; pp 201−226. (28) Bartel, J.; Krumsiek, J.; Theis, F. J. Statistical methods for the analysis of high-throughput metabolomics data. Comput. Struct. Biotechnol. J. 2013, 4 (5), e201301009. (29) Monakhova, Yu. B.; Kuballa, T.; Lachenmeier, D. W. Chemometric Methods in NMR Spectroscopic Analysis of Food Products. J. Anal. Chem. 2013, 68 (9), 755−766. (30) Larive, C. K.; Barding, G. A., Jr; Dinges, M. M. NMR spectroscopy for metabolomics and metabolic profiling. Anal. Chem. 2015, 87 (1), 133−146. (31) Cazar, R. A. An Exercise on Chemometrics for a Quantitative Analysis Course. J. Chem. Educ. 2003, 80 (9), 1026−1029. (32) Besalú, E. From Periodic Properties to a Periodic Table Arrangement. J. Chem. Educ. 2013, 90 (8), 1009−1013. (33) Horovitz, O.; Sârbu, C. Characterization and Classification of Lanthanides by Multivariate-analysis Methods. J. Chem. Educ. 2005, 82 (3), 473−483. (34) Stitzel, S. E.; Sours, R. E. High-Performance Liquid Chromatography Analysis of Single-Origin Chocolates for Methylxanthine Compositions and Provenance Determination. J. Chem. Educ. 2013, 90 (9), 1227−1230. (35) Wanke, R.; Stauffer, J. An Advanced Undergraduate Chemistry Laboratory Experiment Exploring NIR Spectroscopy and Chemometrics. J. Chem. Educ. 2007, 84 (7), 1171−1173. (36) Xia, J.; Sinelnikov, I. V.; Han, B.; Wishart, D. S. MetaboAnalyst 3.0making metabolomics more meaningful. Nucleic Acids Res. 2015, 43 (W1), W251−W257. (37) MetaboAnalyst Web site. http://www.metaboanalyst.ca (last accessed May 2017). (38) Barrie, S. C.; Bucat, R. B.; Buntine, M. A.; Burke da Silva, K.; Crisp, G. T.; George, A. V.; Jamie, I. M.; Kable, S. H.; Lim, K. F.; Pyke, S. M.; Read, J. R.; Sharma, M. D.; Yeung, A. Development, Evaluation and Use of a Student Experience Survey in Undergraduate Science Laboratories: The Advancing Science by Enhancing Learning in the Laboratory Student Learning Experience Survey. Int. J. Sci. Educ. 2015, 37 (11), 1795−1848.

F

DOI: 10.1021/acs.jchemed.7b00012 J. Chem. Educ. XXXX, XXX, XXX−XXX