Introducing Undergraduate Students to ... - ACS Publications

Jun 3, 2017 - variance of the sample population in PC 1 is greater than the variance of the ... USB stick to a student's own personal computer. Princi...
3 downloads 0 Views 1MB Size
Laboratory Experiment pubs.acs.org/jchemeduc

Introducing Undergraduate Students to Metabolomics Using a NMRBased Analysis of Coffee Beans Peter Olaf Sandusky* Department of Chemistry, Wellesley College, Wellesley, Massachusetts 02481, United States S Supporting Information *

ABSTRACT: Metabolomics applies multivariate statistical analysis to sets of high-resolution spectra taken over a population of biologically derived samples. The objective is to distinguish subpopulations within the overall sample population, and possibly also to identify biomarkers. While metabolomics has become part of the standard analytical toolbox in many areas of chemical research, its principles and methods have not yet been generally incorporated into the undergraduate chemistry curriculum. Identification of the arabica and robusta varieties of green coffee beans using 1H NMR-based principle component analysis provides an inexpensive teaching laboratory experiment that introduces students to the methods of metabolomics. The experiment does not require any expensive chemicals, or unique equipment or software, or access to higher-field instruments. Because there is a general curiosity among students about the chemical composition of coffee, the experiment is also particularly engaging to the students’ interest and imagination. KEYWORDS: Upper-Division Undergraduate, Analytical Chemistry, Agricultural Chemistry, Bioanalytical Chemistry, Chemometrics, Food Science, NMR Spectroscopy, Hands-On Learning/Manipulatives



INTRODUCTION

distinguish between the arabica and robusta varieties of unroasted coffee beans. Almost all commercially cultivated coffee belongs to one of two species: Cof fea arabica, and Cof fea canephora, commonly called “robusta”. These two species differ statistically in the quantities of the various metabolites found in the beans.11,12 In this laboratory experiment, the students are provided with a set of authentic unroasted coffee bean samples representing both arabica and robusta coffees from a number of different countries, and one sample of unknown species. The students characterize the water-extractable organic components in these samples using 1H NMR spectroscopy, and then analyze the population of 1H NMR spectra using principal component analysis13,14 (PCA) to distinguish the subpopulations representing the arabica and robusta varieties. They can then assign the unknown sample into either the arabica or robusta subpopulations. PCA is a basic statistical tool used in metabolomics. Consider a PCA calculation on a data set where m data variables were measured over a population of n samples. (In the case of this experiment each data variable would be a 1H NMR integral.) The data input to the PCA computer program would be an n ×

Metabolomics is a statistical approach to understanding the complex organic chemistry of samples derived from biological sources, including blood, urine, animal and plant tissue extracts, and microbial cultures. A set of high-resolution spectra is taken over a population of samples. Multivariate statistics is then used to analyze the set of spectra in order to detect subpopulations within the parent sample population, and identify the variations in chemistry responsible for the subpopulations. Metabolomics has found a wide and growing application in a number of areas of chemical research. In the past ten years ACS journals have published 1121 metabolomics papers, of which 808 were published in the last five years (Table S1). However, despite this, the incorporation of metabolomics into the undergraduate chemistry curriculum has been limited. While the topic is treated in some undergraduate chemistry programs, it is ignored in most. This Journal has published a number of useful articles describing ways in which multivariate statistics may be incorporated into undergraduate chemistry curricula.1−10 However, few of these papers capture the essential features of metabolomics (Instructors’ Note 1). This paper describes a laboratory experiment used as part of an upper-level course in advanced analytical chemistry at Pomona College, in which students applied the methods of NMR-based metabolomics to © XXXX American Chemical Society and Division of Chemical Education, Inc.

Received: July 25, 2016 Revised: June 3, 2017

A

DOI: 10.1021/acs.jchemed.6b00559 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Laboratory Experiment

m matrix in which each row corresponds to one sample, and each column corresponds to one data variable. The PCA calculation then determines a new set of variables, the principal components (PCs). After the PCA calculation each sample, rather than being described as it originally was by a set of m data values, d, is now described by a set of “scores”, s. Each sample will have one score for each of the principal components calculated. For sample j and principal component k, sjk = l1kdj1 + l 2kdj2 + l3kdj3 + ... + lmkdjm

4. NMR tubes containing 30 mM phosphate buffer (pH 6) and 0.46 mM TMSP [3-(trimethylsilyl) propionic2,2,3,3-d4-acid] were prepared from the coffee extract supernatants from step 3. 5. 1H NMR spectra were acquired and processed on the complete set of samples. (Instrumental parameters and representative spectra can be found in Instructors’ Note 4 and Figures S1 and S2.) 6. The spectra were aligned by assigning the TMSP methyl peak to 0.000 ppm, and the entire population of spectra was overlaid. The integral regions, or “buckets”, were chosen so as to include all the major peaks observed in the downfield region of the spectra from 9.5 to 5.0 ppm (Figure 1 and Instructors’ Note 5).

(1)

Here “lik” is the “loading” coefficient linking the data variable i with the score in PC k (Instructors’ Note 2). The principal components are determined such that the variance of the sample population in PC 1 is greater than the variance of the sample population in PC 2, and in turn the variance of the sample population in PC 2 is greater than the variance of the sample population in PC 3, and so on. Because most of the variance of the sample population in principal component space is concentrated in the first few principal components, a plot of the PC 1 versus PC 2 scores will often reveal subpopulations within the parent sample population. Likewise, a plot of PC 1 versus PC 2 loadings will indicate which data variables, or, in the case of this experiment, which coffee metabolites, are significantly responsible for differences between the subpopulations.



THE EXPERIMENT A detailed description of the experimental procedure, and the student handout used in the advanced analytical course at Pomona College in the Fall semester of 2013, are included in the Supporting Information.

Figure 1. 400 MHz 1H NMR spectra of D2O extracts of unroasted coffee beans. The population of spectra used in the PCA calculation whose results are presented in Figures 2 and 3 are shown here. Spectra are aligned with the TMSP methyl peak assigned to 0.000 ppm. Arabica and robusta spectra are overlaid separately, and the scaling in this figure is adjusted so that the height of the sucrose anomeric proton peak at 5.43 ppm is the same in all spectra. Brackets correspond to one possible set of integral-buckets. Peak assignments are based on Wei et al.15

Materials

Samples of unroasted arabica and robusta coffee beans from a variety of different countries were purchased from various vendors as detailed in Tables S2A and S2B. (Also see Instructors’ Note 3.) Coffee Extraction

Steps 1−3 below were performed by the students during the first laboratory period. 1. Each team of students was provided with ten samples of unroasted coffee beans, including four or five samples of authentic arabica beans, four or five samples of authentic robusta beans, and one sample of beans of unknown type. Samples were ground using an electric coffee bean grinder. (See Instructors’ Note 3 on coffee bean grinders.) 2. A weighted portion of unroasted ground coffee beans from each sample, approximately 0.15 g contained in a 2 mL Eppendorf tube, was incubated at 95 °C in 1.5 mL of D2O for 1 h. Samples were cooled on ice for 15 min, and coffee solids were pelleted down by centrifugation. 3. Supernatants were immediately lifted off the pellets and transferred to fresh 2 mL Eppendorf tubes. Samples were stored at −4 °C until the second laboratory period (usually 2 days).

7. Each individual spectrum was integrated using the integral regions determined in step 6. The resulting integral text files were then e-mailed or transferred by USB stick to a student’s own personal computer. Principal Component Analysis

Steps 8 and 9 below were performed by the students at some time of their own choosing following the second laboratory period. 8. The integral text files were read into a spreadsheet program, and the integrals were arranged into the format of a PCA data input matrix, so that each row corresponded to one sample and each column corresponded to one NMR integral-bucket region. (Alternative procedures for constructing, calibrating and normalizing the PCA data input matrix are described in Instructors’ Note 6 in the Supporting Information.) 9. The spreadsheet data matrix from step 8 was read into a PCA program, and PCA was performed on the data set with mean centering and unit variance weighting (Instructors’ Notes 2 and 7).

Acquiring and Processing NMR Spectra

Steps 4−7 below were performed by the students during the second laboratory period. B

DOI: 10.1021/acs.jchemed.6b00559 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

■ ■

Laboratory Experiment

HAZARDS Water extracts of coffee beans will initially be hot and can burn.

Table 1. Coffee Sample Key for Figure 2

RESULTS The downfield half of the 1H NMR spectrum of water extracts of unroasted coffee beans is dominated by just six species: caffeine, sucrose, trigonelline, and three isomers of caffeoylquinic acid (CQA). All unroasted coffee extracts, both those of the arabica and the robusta varieties, will contain all these metabolites, but at subtly different, though statistically distinguishable, relative amounts (Figure 1). The 1H and 13C NMR spectra of water extracts of unroasted coffee beans have been rigorously assigned using COSY and HSQC by Wei et al.15 However, stacking of aromatic metabolites, particularly caffeine and the aromatic rings of the CQAs, causes variations in chemical shifts due to ring current effects,16 and this complicates the comparison of 1H spectra from different samples. Nonetheless, integral-bucket regions can be defined on the downfield half of the overlaid population of spectra such that each integral-bucket represents the relative concentration of one metabolite, or in the case of the CQAs, the concentration of a mixture of three isomers of CQA (Figure 1). Thus, no specialized software for bucket integration is needed. The subpopulations of arabica and robusta samples can be viewed as discrete clusters in the PC 1 versus PC 2 scores plots (Figure 2, Figure S3, and Instructors’ Note 8). The

a

Sample

Country of Origin

Coffee Varieties

A11 A12 A13 A14 A15 R10 R11 R13 R14 R15

Tanzania Ethiopia Guatemala Brazil Mexico Mexico Vietnam Vietnam Philippines a

Arabica Arabica Arabica Arabica Arabica Robusta Robusta (1) Robusta (2) Robusta Robusta

Country of origin unknown.

aminobutyric acid) does not appear to differ statistically much between the two groups (Instructors’ Note 5).

Figure 3. PC 2 vs PC 1 loadings plot from PCA calculation described in Figure 2.



DISCUSSION This experiment was originally developed and used in a course in advanced analytical chemistry taught at Pomona College during the Fall semester of 2013. Subsequently the experiment was repeated several times by the author, without student involvement, first at California State University Bakersfield, in order to determine whether the results could be reproduced using a different sample set, and then at Eckerd College, in order to confirm that the experiment could be performed on a 300 MHz instrument (Instructors’ Note 9). The Pomona course enrolled 22 students, all upper-division undergraduate chemistry or biochemistry majors. The laboratory sections of the course met twice a week for 3 h each. The students organized themselves into “project teams” for the course, with four or five students in each team (Instructors’ Note 10).17 Each team rotated through the course experiment schedule independently, so that only one team would be doing the coffee bean NMR metabolomics experiment each week. At the beginning of the first laboratory session of the experiment the instructor gave a 20 min overview to all members of the project team (Instructors’ Note 11). Subsequently, at each step in the experiment the instructor demonstrated any novel

Figure 2. PC 2 vs PC 1 scores plot from PCA calculation on 400 MHz NMR integral data from the spectra shown in Figure 1. Integral data from two experiments done on the same coffee bean sample set, but performed on different days, were combined in one PCA calculation. (See Table 1 for sample key.) The “b” label is used to indicate samples run in the second experiment. The integrals were calibrated relative to the TMSP methyl peak integral, which was assigned to a value of 1.00 in each spectrum. Percent numbers on axes indicate sample population variance in the corresponding PC.

discrimination between arabica and robusta sample clusters is for the most part along the PC 1 coordinate (Figure 2 and Table 1). This allows for a direct reading of the PC 1 versus PC 2 loadings plots to mean that, statistically, hot water extracts of arabica samples have higher concentrations of sucrose and trigonelline, and lower concentrations of caffeine and CQAs, than those of robusta samples (Figure 3). Whereas GABA (γC

DOI: 10.1021/acs.jchemed.6b00559 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Laboratory Experiment

procedures on the first sample, and the students then performed the experiment on the remaining nine samples independently (Instructors’ Note 12). All five student teams were successful in producing PC 1 versus PC 2 scores plots showing resolved clustering of arabica and robusta samples, and correctly identified the “unknown” sample’s membership (Figure 2 and Instructors’ Note 13). An excellent paper by Wei et al. describes the use of NMRbased metabolomics to distinguish between arabica and robusta samples of unroasted coffee beans.18 Their method began sample preparation with frozen beans, used both 13C and 1H spectra taken on a 500 MHz instrument, employed specialized software for rigorous bucket integration of the spectra, and invoked advanced statistical methods such as orthogonal projection of latent structure discriminate analysis (OPLSDA). These are cutting edge techniques within the context of NMR-based metabolomics, and allow not only for the discrimination between coffee bean types, but to a degree determination of the place of origin as well. However, the methods described by Wei et al. are too time-consuming, way too expensive, and, possibly, too mathematically sophisticated, to be used in an undergraduate laboratory course curriculum. In contrast, the experiment described in this paper was designed to be used as part of an upper-level undergraduate course in analytical chemistry or instrumental analysis. The experiment uses only 1H spectra taken at 300 or 400 MHz. This significantly cuts down on the instrument time required, and makes the experiment potentially available at schools without access to higher-field instruments. The experiment can be easily performed in two 3 h laboratory periods. A population of unroasted coffee bean samples, adequate to supply samples for the experiment for several years, can be collected for less than $100. The only piece of specialized equipment required, an electric coffee bean grinder, can be purchased for $18. All other pieces of equipment needed, including the statistics software, are generally available at any school with a 300 or 400 MHz NMR instrument. PCA, which is often taught to undergraduate students in their biology and social science courses, is readily explained to upper-level chemistry students. Finally, since most chemistry students drink coffee, an experiment examining the chemical composition of coffee beans is particularly engaging to their imagination.



Notes

The author declares no competing financial interest.



ACKNOWLEDGMENTS The author thanks the Departments of Chemistry at Pomona College (Claremont, CA) and California State University Bakersfield (Bakersfield, CA) for funds used to support the development of this experiment. The author thanks David Grove of the Eckerd College Department of Chemistry (St. Petersburg, FL) for the use of the department’s NMR instrument during the development of this experiment.



(1) Howery, D. G.; Hirsch, R. F. Chemometrics in the Chemistry Curriculum. J. Chem. Educ. 1983, 60 (8), 656−659. (2) Chau, F. T.; Chung, W. H. Using Matlab to Assist Undergraduates in Learning Chemometrics. J. Chem. Educ. 1995, 72 (4), A84−A85. (3) Ribone, M. É.; Pagani, A. P.; Olivieri, A. C.; Goicoechea, H. C. Determination of the Active Principal in a Syrup by Spectrophotometry and Principal Component Regression Analysis: An Advanced Undergraduate Experiment Involving Chemometrics. J. Chem. Educ. 2000, 77 (10), 1330−1333. (4) Cazar, R. A. An Exercise on Chemometrics for a Quantitative Analysis Course. J. Chem. Educ. 2003, 80 (9), 1026−1029. (5) Wanke, R.; Stauffer, J. An Advanced Undergraduate Chemistry Laboratory Experiment Exploring NIR Spectroscopy and Chemometrics. J. Chem. Educ. 2007, 84 (7), 1171−1173. (6) Gilbert, M. K.; Luttrell, R. D.; Stout, D.; Vogt, F. Introducing Chemometrics to the Analytical Curriculum: Combining Theory and Lab Experience. J. Chem. Educ. 2008, 85 (1), 135−137. (7) Pierce, K. M.; Schale, S. P.; Le, T. M.; Larson, J. C. An Advanced Analytical Chemistry Experiment Using Gas Chromatography- Mass Spectrometry, MATLAB, and Chemometrics To Predict Biodiesel Blend Percent Composition. J. Chem. Educ. 2011, 88 (6), 806−810. (8) Pezzolo, A. D. L. To See the World in a Grain of Sand: Recognizing the Origin of Sand Specimens by Diffuse Reflectance Infrared Fourier Transform Spectroscopy and Multivariate Exploratory Data Analysis. J. Chem. Educ. 2011, 88 (9), 1304−1308. (9) de Oliveira, R. R.; das Neves, L. S.; de Lima, K. M. G. Experimental Design, Near-Infrared Spectroscopy, and Multivariate Calibration: An Advanced Project in a Chemometrics Course. J. Chem. Educ. 2012, 89 (12), 1566−1571. (10) Stitzel, S. E.; Sours, R. E. High-Performance Liquid Chromatography Analysis of Single-Origin Chocolates for Methylxanthine Composition and Provenance Determination. J. Chem. Educ. 2013, 90 (9), 1227−1230. (11) Petracco, M. Our Everyday Cup of Coffee: The Chemistry Behind Its Magic. J. Chem. Educ. 2005, 82 (8), 1161−1167. (12) Coleman, W. F. The Chemistry of Coffee. J. Chem. Educ. 2005, 82 (8), 1167. (13) Basilevsky, A. Applied Matrix Algebra in the Statistical Sciences; Dover Publications, Inc.: Mineola, NY, 2005; pp 248−264. (14) Miller, J. N.; Miller, J. C. Statistics and Chemometrics for Analytical Chemistry, 4th ed.; Pearson-Prentice Hall: New York, 2000; pp 217−221. (15) Wei, F.; Furihata, K.; Hu, F.; Miyakawa, T.; Tanokura, M. Complex Mixture Analysis of Organic Compounds in Green Coffee Bean Extract by Two-Dimensional NMR Spectroscopy. Magn. Reson. Chem. 2010, 48, 857−865. (16) D’Amelio, N.; Fontanive, L.; Uggeri, F.; Suggi-Liverani, F.; Navarini, L. NMR Reinvestigation of the Caffeine−Chlorogenate Complex in Aqueous Solution and in Coffee Brews. Food Biophysics 2009, 4, 321−330. (17) Walters, P. J. Role-Playing in Analytical Chemistry Laboratories: Part I. Anal. Chem. 1991, 63 (20), 977A−985A.

ASSOCIATED CONTENT

S Supporting Information *

All material is available at The Supporting Information is available on the ACS Publications website at DOI: 10.1021/ acs.jchemed.6b00559. Table listing recent metabolomics papers published in ACS journals, tables of coffee bean samples and vendor sources, detailed experiment procedure, student laboratory handout, typical arabica and robusta NMR spectra at 400 and 300 MHz, typical instrument parameter sets, and instructors’ notes (PDF, DOCX)



REFERENCES

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Peter Olaf Sandusky: 0000-0002-9514-241X D

DOI: 10.1021/acs.jchemed.6b00559 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Laboratory Experiment

(18) Wei, F.; Furihata, K.; Koda, M.; Hu, F.; Kato, R.; Miyakawa, T.; Tanokura, M. 13C NMR-Based Metabolomics for the Classification of Green Coffee Beans According to Variety and Origin. J. Agric. Food Chem. 2012, 60, 10118−10125.

E

DOI: 10.1021/acs.jchemed.6b00559 J. Chem. Educ. XXXX, XXX, XXX−XXX