Comment on 'Identification of Edible Oils by Principal Component

Jun 28, 2019 - use of published peak lists is a viable alternate approach to variable reduction prior to ... students the general approach taken in re...
0 downloads 0 Views 570KB Size
Letter Cite This: J. Chem. Educ. XXXX, XXX, XXX−XXX

pubs.acs.org/jchemeduc

Reply to “Comment on ‘Identification of Edible Oils by Principal Component Analysis of 1H NMR Spectra’” David Rovnyak and Timothy G. Strein*

Downloaded via 109.236.53.254 on July 17, 2019 at 16:34:30 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Chemistry Department, Bucknell University, Lewisburg, Pennsylvania 17837, United States ABSTRACT: Additional student-acquired untargeted PCA data are presented, and potential variations on a NMR-based oil classification laboratory are discussed. While unambiguous sample identification is easily achieved without feature selection, the use of published peak lists is a viable alternate approach to variable reduction prior to PCA analysis. Individual instructors can evaluate whether an unbiased or a targeted approach best fits their pedagogical goals. KEYWORDS: Upper-Division Undergraduate, NMR Spectroscopy, Analytical Chemistry, Laboratory Instruction, Chemometrics

W

e are grateful for Tai-Sheng Yeh’s interest1 in our work,2 and we find the application of Vigli and coworkers’ method3 to be a potentially useful extension of the untargeted PCA student laboratory exercise that we reported.2 Depending on pedagogical goals and available laboratory time, instructors might wish to consider either or both approaches to the analysis, or even other variations. Regarding the performance of the clustering we reported, two points should be made. The first is that our work complied with the Journal policy of including only student-acquired data and student analysis. In the spirit of demonstrating the ruggedness of the untargeted PCA approach, extensive student data that included a misreferenced spectrum (canola oil) were purposefully chosen for inclusion in our inital report. Although the canola outlier somewhat degraded the clustering, correct unknown identification still resulted. With corrected referencing, the resulting PCA clustering is comparable to that obtained by Yeh’s peak-picking approach, wherein he may have also corrected the referencing of the student outlier in order to use feature selection.1 Readers may examine Figure S7 (student data with faculty-corrected referencing) of Anderson et al.’s paper.2 Second, to clarify the efficacy of our published technique,2 we include here (Figure 1) additional examples of studentacquired and student-analyzed data using the reported untargeted methodology, which achieved very good classification. Student work can sometimes lead to unanticipated results; for example, one student group in 2017 had an extreme outlier (Figure 2). While it is not always viable to have students redo an entire experiment when an issue is discovered, discoveries of outliers like that seen in Figure 2a are wonderful opportunities for learning moments. Notice in Figure 2a that PC1 is entirely concerned with the outlier, while PC2 has accomplished the majority of the clustering. Then, PC2 and PC3 need only be plotted to classify the groups successfully. The MetaboAnalyst platform4 provides an environment where instructors could lead students through an array of multivariate techniques. Adaptations of this experiment including using the published NMR peak lists for classifying selected oils that Vigli et al. reported3 (as suggested by Yeh),1 concentration profiling, analysis of variance for defined © XXXX American Chemical Society and Division of Chemical Education, Inc.

variables, and many more could certainly be productively pursued. One consideration when using data only from selected peaks is the possibility of missing variance in spectral regions that were not preselected as features. Our untargeted approach with full spectral data avoids this potential pitfall. While the use of a priori peak lists is a viable approach, we find it attractive to have students use untargeted PCA results in combination with the spectra to discover the features for themselves. In our report,2 we included a discussion of some of the features that emerged from the unbiased PCA analysis, and not surprisingly, these features are very similar to the features that were used by Yeh1 and reported by Vigli et al.3 We wish to reiterate that our work focused not on optimizing clustering, which has been ably demonstrated in the research literature, but on an efficient and rugged hands-on experiment that provides students with an authentic, entrylevel learning experience in multivariate analysis that could be completed in one laboratory period, has assessable learning goals, and accurately represents the initial steps in unbiased variable reduction and data clustering.2 In order to model for students the general approach taken in research, we purposefully chose an untargeted analysis. The subsequent grouping, filtering, and scaling within MetaboAnalyst4 lead to excellent oil classification, and the efficient workflow is important to enabling students to conduct the work within the time constraints of the laboratory period. With respect to grouping, Yeh has provided an excellent overview of the issues of binning spectral data and of grouping data.1 Indeed, grouping is a useful approximation in the described exercise, where the use of a 0.005 ppm digital resolution in the original data supports the subsequent grouping. The grouping is constrained by the digital resolution (i.e., binning) of the original data, and a part of our work was to optimize the selection of the bin sizes to best work with the grouping in MetaboAnalyst.2 We find Tai-Sheng Yeh’s approach1 to be a useful extension of the original experiment, potentially as a postlaboratory Received: June 14, 2019 Revised: June 28, 2019

A

DOI: 10.1021/acs.jchemed.9b00557 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Letter

Figure 1. Additional examples of representative student data using the untargeted PCA laboratory exercise reported in Anderson et al.2 acquired in recent semesters. As contrasted between panels (a) and (b), from year-to-year differences in student work lead to more or less intragroup variation, but clustering leads to unambiguous classification of the unknowns without the need for feature selection.

Figure 2. Unpredictability of student work in teaching laboratories can lead to unanticipated extreme outliers occurring in student work such as in panel (a), but these should be used as teachable moments. In this example, where the unknown was sesame oil, students see that PC1 accounts solely for the outlier, while PC2 nearly fully classifies these samples and the PC2−PC3 subspace plotted in panel (b) provides correct classification and unknown identification consistent with typical results (e.g., Figure 1).



exercise or as a second laboratory period in which students use their own unbiased PCA results to conduct variable reduction in order to optimize classification. Our preference is not to give students the features at the onset, but rather to have them experience the process of using unbiased methods to discover them. We recognize and support that other instructors may have different goals. We thank Yeh for the contribution,1 and the editors for the opportunity to discuss this activity further. We hope this discussion stimulates additional interest and discussion about the adoption and expansion of the use of multivariate analyses in the undergraduate chemistry laboratory.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

David Rovnyak: 0000-0003-0328-5083 Timothy G. Strein: 0000-0002-3747-642X Notes

The authors declare no competing financial interest. B

DOI: 10.1021/acs.jchemed.9b00557 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education



Letter

REFERENCES

(1) Yeh, T.-S. Comment on “Identification of Edible Oils by Principal Component Analysis of 1H NMR Spectra”. J. Chem. Educ. 2019, DOI: 10.1021/acs.jchemed.9b00133. (2) Anderson, S. L.; Rovnyak, D.; Strein, T. G. Identification of Edible Oils by Principal Component Analysis of 1H NMR Spectra. J. Chem. Educ. 2017, 94, 1377−1382. (3) Vigli, G.; Philippidis, A.; Spyros, A.; Dais, P. Classification of Edible Oils by Employing 31P and 1H NMR Spectroscopy in Combination with Multivariate Statistical Analysis. A Proposal for the Detection of Seed Oil Adulteration in Virgin Olive Oils. J. Agric. Food Chem. 2003, 51, 5715−5722. (4) MetaboAnalyst. https://www.metaboanalyst.ca (accessed Jun 26, 2019).

C

DOI: 10.1021/acs.jchemed.9b00557 J. Chem. Educ. XXXX, XXX, XXX−XXX