Activity pubs.acs.org/jchemeduc
Making Data Management Accessible in the Undergraduate Chemistry Curriculum Barbara A. Reisner,†,* K.T.L. Vaughan,‡ and Yasmeen L. Shorish‡ †
Department of Chemistry and Biochemistry, James Madison University, Harrisonburg, Virginia 22807, United States Libraries and Educational Technologies, James Madison University, Harrisonburg, Virginia 22807, United States
‡
S Supporting Information *
ABSTRACT: In the age of “big data” science, data management is becoming a key information literacy skill for chemistry professionals. To introduce this skill in the undergraduate chemistry major, an activity has been developed to familiarize undergraduates with data management. In this activity, students rename and organize cards that represent “data files” associated with experiments they completed in a previous course. This activity reveals differences between the way novices (students) and experts (librarians and faculty) organize data and highlights the need for introducing students to best practices in data management.
KEYWORDS: Second-Year Undergraduate, Upper-Division Undergraduate, Cheminformatics, Hands-On Learning ith the advent of “big data” science and the need to preserve data for long-term storage, data management has become an important topic in the research laboratory. In 2011, Science Magazine put together a special issue on the issues and challenges associated with the rapid explosion of data.1 Data management is helpful for making data more accessible and easier to find and share. Federal funding agencies are increasingly requiring that researchers develop data management plans, with a focus on data curation for reuse and sharing data with other scientists.2−5 Data management is clearly an issue that future researchers need to address. Good practices should become a standard part of professional development. As we strive to provide our students with the highest quality scientific preparation, it is important to address this topic in the undergraduate chemistry curriculum. Currently, little is being done to introduce data management into the undergraduate chemistry classroom. The current trend in the library and information science literature focuses on data management training for principal investigators, informationists, data curators, and librarians. Much of the work in data information literacy instruction has focused on graduate student research groups, which have different needs and expertise levels than undergraduate students. The ACS Division of Chemical Information (CINF) and Special Libraries Association (SLA) identify being familiar with software for the management of references as one of competencies for chemistry undergraduates, but they make no statements about the management of laboratory data.6 The Association of College and Research Libraries (ACRL) information literacy standards for science and engineering/technology acknowl-
W
© 2014 American Chemical Society and Division of Chemical Education, Inc.
edges that information, including data, may exist in many disparate forms and that specialized knowledge and data management expertise may be needed to access this information.7 The ACS Committee on Professional Training (CPT) Guidelines8 do not mention data management, although the proposed changes to the ACS Guidelines for Bachelor’s Degree Programs identify the introduction of data management as a new skill.9 We found a single article in the chemistry education literature that discusses data management by using cloud computing.10 At James Madison University (JMU), we have incorporated data management instruction into a one-credit Literature & Seminar course for chemistry majors. This required course is usually taken in the fall semester of the junior year. It meets for 1 h each week and is cotaught by a chemist (BAR) and a librarian (YLS or KTLV). One of the goals of this course is to promote chemical information literacy by introducing students to information resources used by professional chemists. We work together as an instructional team to develop and evaluate content and assignments for a wide range of topics in this area. For the past two years, we have devoted one class to the critical skill of data management. Our goals for this class session include understanding why data management is important and learning about strategies to make data accessible and easy to find. Published: August 21, 2014 1943
dx.doi.org/10.1021/ed500099h | J. Chem. Educ. 2014, 91, 1943−1946
Journal of Chemical Education
■
Activity
and organize them into “file folders.” To create folders, students glued the renamed cards to a 2′ x 3′ piece of paper, using a marker to illustrate the hierarchical order of arrangement (folders and subfolders). Students worked in six groups of 4−5 each, and were given 15 min to complete the task. After 15 min, a pair of students from each group was asked to present their new filenames and folder architecture to the class. Afterward, the instructors presented their version and discussed interesting differences, common points, and provided advice on good practices for file naming and architectures.
IN-CLASS ACTIVITY ON DATA MANAGEMENT The first year we introduced data management in this course, we talked about the federal mandates for data management, introduced students to resources on data management selected by the librarian,11 and had students organize their files on a computer. We felt that the students did not take much away from the activity because it was an abstract exercise that bordered on busy-work, especially for those few students who already employed a reasonable organization scheme. Students rushed through the exercise and worked independently, limiting their ability to assess different management practices and resulting in a very shallow view of data management. For the second year, we developed an in-class activity to help students improve their data accessibility in a relevant chemical context. In this activity, students renamed and organized a series of files to improve their ability to find and identify appropriate files five years in the future. We also devoted part of the class to discussing changing media storage, archiving data, and file backup methods. Although these skills may be common sense to the readers of this Journal, we found through class discussion that most of the students had not thought about these ideas and how they relate to their work. We briefly discussed metadata in the context of the activity. A larger discussion of data management, long-term preservation, and reuse by unanticipated audiences did not occur in this class due to time limitations. The intent of the activity was to expose students to the concept of file management and how it relates to the laboratory environment. Instructions were intentionally minimal to allow students to explore the concept as natively as possible. The “files” used in this activity were data or documents that students could have generated in three experiments in the sophomore lab sequence: dehydration of an alcohol, the synthesis of ferrocene and acetylferrocene, and a structured research project.12 By using familiar files from previous coursework, all students would have a pre-existing understanding of the data, documents, and how they fit together in the context of the course. This activity required only paper, glue sticks, and markers and was not graded. Students were given a series of poorly named “data files” that were not organized. These were cards that had an ambiguous, or nondescript, filename and a description of the contents of the file (Table 1 and Supporting Information). The students were instructed to rename the files
Folder Architecture
The students and instructors had similar ideas about segregating data at the global level. Every group recognized that there were three different projects and immediately decided to create three corresponding file folders. Interestingly, no group separated the files by data type, even though this is a common way to store data in research groups. In this article, we will discuss the data cards associated with the synthesis of ferrocene and acetylferrocene experiment (Table 1). Students and instructors separated the data into folders for the different laboratory experiments. Within the experiment folders, the students and instructors had dramatically different file organizations. The instructors put all files associated with a particular experiment in a single folder (Figure 1). Students
Figure 1. File architecture developed by instructors (chemistry faculty and science librarian).
tended to separate data files into many more folders (Figure 2). Specifically, the three student groups created separate folders for each compound. Two of these three groups created folders for each instrumental technique (Supporting Information). In general, the instructors’ method of organizing required fewer clicks to access data but contained a greater heterogeneity, as well as a larger number of files in each folder. There is no “absolute” structure for file organization; different folder structures may be appropriate for different laboratories. Files should be organized in a way that promotes efficiency in how the files are actually used and are understandable to someone new to the lab. The instructors’ naming and file strategy allowed for all of the files associated with a particular compound to be collocated and easily compared. With a larger number of files it might be advantageous to produce additional folders at the compound level, that is, one folder for acetylferrocene and one for ferrocene. The students’ method was more granular but would not allow for a quick comparison of file details, such as “last accessed date”, across a compound.
Table 1. File Names and Descriptions for Files Associated with the Synthesis of Ferrocene and Acetylferrocene Filename first NMR pure NMR crude acetyl NMR pure acetyl NMR carbon purified acetyl carbon purified product IR ferrocene IR acetyl filenames.xlsx
File Description 1
H NMR of crude ferrocene H NMR of purified ferrocene 1 H NMR of crude acetylferrocene 1
1
H NMR of purified acetylferrocene
13
C NMR of purified acetylferrocene product
13
C NMR of purified ferrocene product
FT-IR data of purified ferrocene collected with ATR-FTIR FT-IR data of purified acetylferrocene collected using a KBr pellet list of what’s contained in each of your data files for the ferrocene lab so you can reference them in the future.
File Naming Conventions
When naming files, the instructors’ format served to (i) identify the compound, (ii) identify the instrumental technique used, 1944
dx.doi.org/10.1021/ed500099h | J. Chem. Educ. 2014, 91, 1943−1946
Journal of Chemical Education
Activity
Figure 2. Example of a file architecture developed by chemistry students.
(iii) note any special conditions, and (iv) make sure the file could be found by search. To meet these criteria, the instructors’ filenames were relatively long, with an average of 20 characters per filename excluding the extension. However, the type of data in each file was clearly identifiable, and a separate spreadsheet was not needed to explain what was in each file. In general, students improved the names of the files. The groups created a standardized and consistent format for naming their files. Some, but not all, groups avoided spaces and special characters. Some groups used full chemical names; others used abbreviations. One group recognized the importance of keeping metadata about the naming practices used by keeping the filenames.xlsx file (Figure 2). Although there is no single way to name chemical data files, full chemical names could be used to disambiguate the different abbreviations used by the community. In cases where this is impractical, an associated text file detailing the abbreviation scheme is necessary. The biggest weaknesses in the student filenames were that the filenames were not unique across folders and incompletely specified what was in the file. Because two groups used folders to organize their data by instrumental technique, they did not include the technique in the filenames. The third group organized their folders by compound and, likewise, did not then include the compound name in the filename. All of the groups had multiple files with the same name in different folders. All of the data were present if both the file architecture and file name were included, but the data in individual files could not be identified by searching out of context. Another weakness in student filenames was how students chose to start their filename. One group started all of their filenames with the date, whereas another group used initials. Though these can be important elements, because neither the date nor the person collecting the data was important in this case, they did not add value. This showed a lack of awareness of the sort function for finding files; this function only works when the most important element of a file is consistently named first.
of our students participate in undergraduate research, we also had a chance to discuss best practices for naming files in the research laboratory. In this case, we discussed practices in an instructor’s research lab (BAR), how she changed her file naming protocols after working with a data management expert, and the positive impact that this change has had on her group’s ability to find and use data. We also stressed the importance of understanding and conforming with a research group’s file naming and organizational methods. Through the activity and conversation, students were introduced to several important ideas of file naming: 1. Avoid abbreviations in filenames. 2. Fully identify the contents of the file in the filename so you can access your files through search. 3. Think about how you will sort data to find files. Is it important to organize by the compound, technique, date, person collecting the data, and so on? 4. Avoid special characters and spaces. 5. When using dates, use international standards to improve sortability (YYYYMMDD, ISO 8601). 6. Different organizational methods may be required by different projects. 7. When sharing data, make sure everyone understands the naming conventions used and include any additional files (i.e., readme.txt) needed to make sense of the data. 8. Think about organization before you begin accumulating files. Developing good file names and structures at the start will make it easier to find and archive files. Reassess your organizational structure regularly to maintain access to files. The larger landscape of data management and curation and how it relates to the research lifecycle was not addressed in this exercise. If class time permits, we would like to incorporate some of the education modules from DataOne13 into a class session prior to the activity to provide the students with more context and a big picture view of the topic. Regardless, we believe that addressing data management with undergraduates can only help students improve their file management practices and better prepare them to organize the data and files that they will accumulate in their professional careers.
■
TAKE-HOME LESSONS ON DATA MANAGEMENT The activity and ensuing discussion gave us a way to model good file naming practices and discuss some of the issues that chemists should consider as we manage our data. Because many 1945
dx.doi.org/10.1021/ed500099h | J. Chem. Educ. 2014, 91, 1943−1946
Journal of Chemical Education
Activity
(10) Bennett, J.; Pence, H. E. Managing Laboratory Data Using Cloud Computing as an Organizational Tool. J. Chem. Educ. 2011, 88 (6), 761−763. (11) Data Management. http://guides.lib.jmu.edu/data (accessed Aug 2014). (12) Amenta, D. S.; Mosbo, J. A. Attracting the New Generation of Chemistry Majors to Synthetic Chemistry without Using Pheromones: A Research-Based Group Approach to Multistep Syntheses at the College Sophomore Level. J. Chem. Educ. 1994, 71 (8), 661−664. (13) Education Modules|DataONE. http://www.dataone.org/ education-modules (accessed Aug 2014).
In the end of semester course evaluations, some students mentioned data management as a take away skill, but just as many mentioned that the topic did not warrant an entire class period. Students participated in this activity with limited exposure to the concept of data management within the research framework but the activity was designed with the understanding that most third year undergraduates are still novice researchers. To be most effective, it is important to balance the sophistication of undergraduates as researchers with the larger research and data stewardship landscape. A systematic delivery method for data information literacy has not yet been established for this institution or the broader chemistry community, but the unique environment that this course provides for the chemistry librarian and the chemistry community could, in the future, help inform wider data management education efforts in the undergraduate community.
■
ASSOCIATED CONTENT
S Supporting Information *
Description of class structure; cards to be used for the data management activity; instructions for preparing cards; detailed supply list; additional literature/class reading; additional examples of student filenames and file structures. This material is available via the Internet at http://pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. Notes
The authors declare no competing financial interest.
■ ■
ACKNOWLEDGMENTS The authors acknowledge the students who have participated in the development of this activity. REFERENCES
(1) Science Staff. Science 2011, 331, 692−693. (2) Dissemination and Sharing of Research Results. http://www.nsf. gov/bfa/dias/policy/dmp.jsp (accessed Aug 2014). (3) NIH Data Sharing Policy and Implementation Guidance. http:// grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance. htm (accessed Aug 2014). (4) Holdren, J. P. Increasing Access to the Results of Federally Funded Scientific Research. http://www.whitehouse.gov/sites/ default/files/microsites/ostp/ostp_public_access_memo_2013.pdf (accessed Aug 2014). (5) Bird, C. L.; Frey, J. G. Chemical information matters: an eResearch perspective on information and data sharing in the chemical sciences. Chem. Soc. Rev. 2013, 42, 6754−6776. (6) Information Competencies for Chemistry Undergraduates. http://en.wikibooks.org/wiki/Information_Competencies_for_ Chemistry_Undergraduates (accessed Aug 2014). (7) Information Literacy Standards for Science and Engineering/ Technology. http://www.ala.org/acrl/standards/infolitscitech (accessed Aug 2014). (8) Chemical Information Skills. http://www.acs.org/content/dam/ acsorg/about/governance/committees/training/acsapproved/ degreeprogram/chemical-information-skills.pdf (accessed Aug 2014). (9) ACS Committee on Professional Training. White Paper: Proposed Changes to the ACS Guidelines and Evaluation Procedures for Bachelor’s Degree Programs (Prepared January 2013). http:// www.acs.org/content/dam/acsorg/about/governance/committees/ training/guidelines-white-paper.pdf (accessed Aug 2014). 1946
dx.doi.org/10.1021/ed500099h | J. Chem. Educ. 2014, 91, 1943−1946