Introduction to Computer-Aided Data Analysis in ... - ACS Publications

This chapter is an introduction to computer aided data analysis. It also provides an overview of computer-based data analysis tools (R, jMetrik, Apach...
0 downloads 6 Views 196KB Size
Chapter 1

Introduction to Computer-Aided Data Analysis in Chemical Education Research (CADACER) Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

Tanya Gupta* Department of Chemistry and Biochemistry, South Dakota State University, Brookings, South Dakota 57007, United States *E-mail: [email protected].

This chapter is an introduction to computer aided data analysis. It also provides an overview of computer-based data analysis tools (R, jMetrik, Apache Drill, ATLAS.ti, D2L) used in various educational research projects that form various chapters in this book. The intent is to provide information about availability and application of various computer-based tools of qualitative and quantitative research to diverse researchers, including both novices and experts. This book is valuable for chemical and science or education researchers alike. The chapters in this book can be read in any order as the reader may see a connection of any given tool portrayed in a chapter to their own research project. That is where the journey of seeking information or exploring these tools for a specific idea of a chemical education or other discipline based education research projects may begin for our reader, and hopefully culminate into applying one or more tools for digging into the data to make sense of numbers and words

Introduction Educators began using computers for developing and disseminating curriculum in the 1960s (a collaborative effort between the University of Illinois, Stanford University for elementary level education at California and Mississippi based schools on a large scale). The next five decades (1960-2010) revolutionized © 2017 American Chemical Society Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.

Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

teaching and learning with advancement of interactive communications technology (ICT), and the development of several new and unique applications including courseware (course software), reference works such as encyclopedia and dictionaries on Compact Discs and online, classroom aids such as interactive whiteboard, and alternative methods for assessment and testing for example – Moodle, QuestionMark, and Assessment Master. Educational research also underwent a transformation from being manual to computer based. The advancements in educational research focused hardware and software can be attributed to advances in computer hardware during the 1990s. The graphic capacity of computers became better, the cost of manufacturing computers became cheaper, and the Internet became popular spurring the need to quickly harness the vast amount of data generated during this period for various research purposes (1–3). Prior to technological advancements conducting research in any discipline was a tedious process. Every step of research, from planning for research to data collection and analysis had to be conducted manually. Tools such as notebook or journals were used to plan a systematic study and various data gathering tools for collecting audio or audio-visual data involved bulky equipment (video cameras and cassette recorders). Post-collection, the data was cleaned and sorted in physical bins for analytic purposes. The data analysis involved cutting and pasting segments of data and sorting these data segments into bins, and re-organizing (coding qualitative data) until meaningful categories were generated. For quantitative research, the data had to be manually organized and calculators were used to conduct statistical calculations. Irrespective of research methods being qualitative, quantitative or mixed methods, the entire process was complicated, tedious, and time consuming. Recent changes in technology in several forms has made it easier for researchers to gather and organize data in ways that allows systematic analysis and encourages researchers to collaborate across the geographical locations for collective sense making. For people involved in conducting quantitative research (statistical data) software programs such as SPSS, SAS, R, Stata are extremely useful. SPSS and SAS are commercially available programs that involve a licensing fee, whereas R and Stata are open access software and freely available. Commercial packages in particular include pre-programmed commands that facilitate very basic to advanced forms of statistical analysis. On the other hand free-access software such as R provide the flexibility to researchers to develop and use their own codes to conduct data analysis that is specific to the data and the project. These programs also have a possibility of generating visualizations that are useful to interpret data quickly and make presentation of research data and findings attractive for the audience. Researchers now have advanced tools at their disposal to engage in research process. This book presents tools available to chemical education researchers for engaging in qualitative, quantitative or mixed methods research. It is not an exhaustive book with respect to various software that are available to facilitate research using any of the above-mentioned methods. However, it is anticipated that this book will spur ideas, and provide some examples to the audience for using these tools as per their research needs. 2 Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.

Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

Focus of This Book Technology, in the form of desktop software and hardware has become a part of research analysis, as are the underlying concepts and techniques that have been with us for years. Although, sometimes analysis is complex, the challenge is not so much in the area of understanding concepts and techniques but is more often in terms of their applications. The fundamental question of putting various resources of data analysis to use dominates the minds of many experienced and novice researchers alike. For many folks delving on various aspects of designing and conducting research and gradually entering the realm of literature review, research design and data analysis leads one to wondering if there is a software available that could model some of these research problems. Is such a software user friendly and flexible for use? Maybe the software has some potential advantages when focusing on specific aspects of research, however this can only be determined based on hearing the experience of people who have embarked on various research projects by putting different available software to use. The availability of software has opened doors for advanced research that was practically impossible with limited resources and manual nature of research. The emphasis of this book is on how one might use various computer software or programs in different ways to support the management of the data and the analysis process. Research software is used to help support the research endeavor. The researcher needs to consider the type of data in order to decide whether qualitative or quantitative software will be helpful. The chapters in this book do not focus on data gathering techniques; the reference to such techniques in various chapters is minimal. Prior ACS series have provided more information in these area (4, 5). It is possible to collect qualitative data such as interview using computer-aided support system, and to conduct surveys using web-based tools such as QuestionPro or learning management system such as D2L for pre-post data. Our emphasis is mainly on the type of tools one can use once the data is collected. Different types of software are available for such analyses. Several quantitative packages are available for data analysis and management and they offer similar capabilities (SPSS, SYSTAT, R, SAS, STATA, MS-Excel with Add-in for advanced analysis). For example, SPSS (The Statistical Package for the Social Sciences) is very popular (6–8). It is important for researchers to learn about features and capabilities of such software packages. Commercially available packages may come in different versions and have different capabilities. Researchers need to be aware of differences in data handling and analysis procedures among softwares. Sometimes software require front-end work by researcher for data organization and data labeling before conducting the analysis. A data could have missing values or might need labels. For example, SPSS (Statistical Package for Social Sciences) needs users to provide specific labels for gender or institutions, and other demographic data before doing analysis. The default outputs of various software depends on the sequence of events selected by the user in the software interface. These ouputs may not include a complete analysis sought by the user. Such default outputs that can be generated by clicking 3 Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.

Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

a few buttons in the software interface lack additional tests that are useful. An example is the calculation of effect size, which is important in addition to significance test and p-values for statistical analysis. Awareness of such important aspects of any program is essential before adding it to one’s research toolkit. Researchers engaged in qualitative research have to deal with a range of issues when it comes to choosing a qualitative research software or tool. Qualitative Data Analysis Software includes features that support the process of qualitative research. These include data transcription and transcription analysis, coding and text interpretation, recursive abstraction, content analysis, and discourse analysis. The use of software saves time, increases flexibility and improves the validity of qualitative research. There are several freeware and commercially available programs for qualitative research (ATLAS.ti, NVivo, QDA-Miner, MAXQDA, HyperResearch, XSight, Focuss On, Dedoose etc). Researchers have their own preference for such packages depending on the accessibility and features that one seeks in such packages (9, 10). Qualitative software provides proximity to data, helps with triangulation, and keeping the research aligned with the context of the study. When using qualitative software tools, the data needs to be converted into a file type that can be uploaded or imported into the software. Some software may allow collecting interview data (audio file) and offer possibility to transcribe within the software. However the analysis or coding process requires transcribed data or a file that is ready to be coded. Very few packages make it easy to use raw qualitative data. These raw data could be images, script or other artifact, yet it needs to be in a format that is recognized by the software. Unlike quantitative software wherein one can press buttons to get default outputs for statistical tests, the qualitative data requires researcher to think about the data by planning and defining research parameters and codes based on the research methods, either before or during the data analysis process (Grounded theory) (11–13). The next section provides an overview of the organization of this book and a glimpse into what each chapter entails.

Organization of Book The book has nine chapters. categories • •



These chapters represent three different

Using Learning Management Systems (LMS) as analysis tools (14, 15) Open source tools for data organization and analysis (16–21) (R and JMetrik, Comprehensive Meta Analysis package - CMA and Apache Drill) Commercially available software package for qualitative data analysis (ATLAS.ti) (22, 23)

4 Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.

Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

Learning Management Systems as Analysis Tools The chapter titled Learning Management System: Education Research in the era of technology by Mehta and Kalyvaki (Chapter 2) provides an introduction to the use of LMS for data organization and analysis especially for novice researchers struggling with data collection, storing and making sense of preliminary data. Mehta and Kalyvaki present a researcher oriented perspective of the value of LMSs and invite people with access to LMS to explore these beyond the regular surface applications for course management, content delivery and grading. For example, the discussion boards in the LMS such as D2L (Desire to Learn) can be used to gain insights into student thinking and help researchers generate pre- and post assessment data for specific research questions. Likewise, the survey tool in LMS can be used for both Likert Scale and open-ended surveys. The chapter by Hedtrich and Graulich extends the information presented by Mehta and Kalyvaki and focuses on extending the capabilities of a Learning Management System through a software application LMSA-Kit. The LMSA-Kit developed by Hedrick and Graulich is a software solution that captures data on electronic learning from data logs in the LMS. Though their chapter on Crossing Boundaries in Electronic Learning (Chapter 3), Hedtrich and Graulich have nicely addressed a gap between the data generated in a blended learning and face-to-face instruction. Much of this potential data that provides insight into student cognition and course capabilities is lost due to the limitations of the LMS. The LMSA-kit addresses this gap by establishing a stronger connection between instruction components offered through LMS and face-to-face instruction. The LMSA kit extends abilities of traditional LMS by incorporating timely feedback for students, and for instructors by identifying at-risk students early on. This chapter presents benefits and the limitations of using LMSA-kit and its future potential as the authors work on improving current capabilities of LMSA-kit. Open Source Tools for Data Organization and Analysis (R and jMetrik, Comprehensive Meta-Analysis Package - CMA and Apache Drill) The chapters in this category provide examples and applications of open-source programs and packages that are great tools for data management and analysis. As defined by Elluri in her chapter on Leveraging Open Source Tools for Analytics in Education Research (Chapter 4), open source refers to a program or a software in which the source code is available to the general public for use or modification from its original design and is free of charge. Thus an open source software can be modified from its original design to incorporate additional functionality based on the needs of the users. It is an open platform for users to share and contribute ideas for the development of reusable software packages that can be harnessed or improvised by others. Elluri discusses the importance of data analytics in the domain of educational research for gaining insights into various factors that influence students’ academic performance and conceptual understanding. The chapter begins with an introduction to the data types that intrigue education researchers and the procedures usually followed for sorting and analyzing such data through various 5 Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.

Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

open-source tools that are available for qualitative and quantitative research. Elluri provides an overview of the analytic processes for such data, choosing the right software for analysis, and introduces various open source tools available for handling research data (R, Python, Wrangler, Apache Drill, Weka, AQUAD & Data Applied). This chapter includes a detailed introduction to using Apache Drill – an open source software that supports data intensive distributed applications for interactive analysis of large scale data sets. In addition, the chapter provides examples of the application of Apache Drill for exploration, data cleaning, query visualization and data transformation. Next chapter by A. Leontyev, S. Paulos, and R. Hyslop is titled Making the most of Assessment Data: Analysis of test data (Chapter 5) is on the use of jMetrik, an open source computer program for psychometric analysis. Leontyev and co-authors provide an overview of the jMetrik program for analysis of test data. The program has a user-friendly interface, integrated database and a variety of statistical procedures and charts. The chapter highlights applications of jMetrik data analysis by using data set from students’ responses to the Stereochemistry Concept Inventory. Various aspects of using jMetrik program such as data uploading, scoring the test, scaling the test at the scale, item and distractor level, and the main steps of data analysis are discussed along with several examples of data interpretation. The chapter by Harshman, Yezierski & Nielson titled Putting the R in CER: How a statistical program transforms research capabilities (Chapter 6) provides information on another open-source software R. R allows users to build, change or adapt features and functions on their local copies. The chapter describes the capabilities of R to transform data analysis in education research. Authors discuss several advantages (and some limitations) of using R for education research to effectively analyze and visualize data by defining custom functions, writing programmatic loops, and also for enhancing reproducibility and effective documentation of research process via interactive notebooks. According to Harshman et. al., R has several advantages over other existing alternatives such as SPSS and Excel. The chapter includes several examples of the applications of R from data organization to data analysis. Chapter titled Survey Data Analysis by R and R-Studio (Chapter 7) by R. Komperda extends the application and benefits of using R for quantitative and survey data. Kompreda provides an in-depth description of the visualization and analysis of survey data using R and R studio. R is a programming language and R studio is an Integrated Development Environment (IDE) that provides comprehensive facilities to computer programmers for software development and modification. Komperda presents R as a viable alternative to the traditional software such as Excel, SPSS, Stata, Mplus and LISREL for conducting survey data analysis including the psychometric evidence of instrument quality. The chapter demonstrates several uses of R and R studio using a sample data set available within R. Examples of applications of R and R studio such as visualizations of response distributions, descriptive statistics, principle components analysis and exploratory factor analysis using pscyh package and confirmatory factor analysis using lavaan package are included. 6 Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.

Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

The focus of the next chapter in this category is on the Asessment of instructional interventions using a Comprehensive Meta Analysis (CMA) package (Chapter 8). In this chapter the authors Leontyev, Chase, Pulos & Verma-Nelson introduce the CMA software solution to perform meta-analysis on a set of papers previously published on the effectiveness of Peer-Led Team Learning (PLTL) approach. Meta-analysis is a statistical procedure that combines data from multiple studies. It is used on several research studies of the same topic that have the treatment effect (or effect size) consistent from one study to the next. The chapter includes examples of calculating the effect sizes, forest plots, and a variety of other analysis to provide an overview of the data organization and performing the meta-analysis procedures in the CMA software.

Commercially Available Software Package for Qualitative Data Analysis (ATLAS.ti) The computer-aided data analysis tools presented in prior chapters include commercially licensed Learning Management System applications and extension kits and the open-source software. These programs can be utilized for both qualitative and quantitative research. The last chapter in this book (Chapter 9) is focused on a commercially licensed qualitative research software ATLAS.ti. Through her study on student problem-solving behavior, Gupta has demonstrated the process of data organization and qualitative data analysis using ATLAS.ti. The chapter includes process of transcription, data coding, memos and the visual displays of relationships between various concepts unraveled during the analytic process in the problem-solving study.

Conclusion This book is for researchers grappling with the challenges for deciding what specific tools they should use for engaging in data organization and analysis. With several open source and subscription based computer software and programs available, such a decision is often difficult. This book is by no means a presentation of the applications of every single computer software or tools that are available. It provides perspectives of researchers engaged in various educational research projects who have the first hand experience of using specific tools presented in various chapters in this book, and the pros and cons of using these tools for research. Researchers new to education research will perhaps gather some ideas from here to conduct their own analysis specific to their project gaols. Experienced researchers may find that this book presents to them more insights or a different perspective on using some analysis tools and approaches mentioned in the chapters in this book. It is hoped that this book will foster ideas (big and small) and discussions among various educational researchers on the tools that can be used most effectively to answer specific research questions using qualitative, quantitive, or mixed metods research approaches. 7 Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.

References 1. 2. 3. 4.

5.

Downloaded by 80.82.77.83 on December 11, 2017 | http://pubs.acs.org Publication Date (Web): November 20, 2017 | doi: 10.1021/bk-2017-1260.ch001

6. 7. 8. 9. 10. 11. 12. 13.

14. 15. 16. 17. 18. 19. 20.

21. 22. 23.

Seels, B. Educ. Technol. 1989, 11–15. Niemiec, R. P.; Walberg, H. T. J. Res. Comput. Educ. 1989, 21, 263–276. Bainbridge, W. Science 2007, 317 (27), 471–476. Nuts and Bolts of Chemical Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Symposium Series 976; ACS Publications: Washington, DC, 2008. Tools of Chemistry Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Publications: Washington, DC, 2014; Vol. 1166. Dayal V. In An Introduction to R for Quantitative Economics; SpringerBriefs in Economics; Springer: New Delhi, India, 2015. Gandrud, C. Reproducible Research with R and R Studio, The R Series; CRC Press, Taylor & Francis Group: New York, 2014. Prvan, T.; Reid, A.; Petocz, P. Teach. Stat. 2002, 24, 68–75. Creswell, J. W.; Plano Clark, V. L. Designing and Conducting Mixed Methods Research; Sage Publications: Thousand Oaks, CA, 2011. Liamputtong, P. Qualitative Research Methods, 4th ed.; Oxford University Press: New York, 2013. Fielding, N. G.; Lee, R. M. Computer analysis and qualitative research; Sage Publications: London, U.K., 1998. Patton, M. Q. Qualitative research and evaluation methods; Sage Publications: Thousand Oaks, CA, 2002. Kelle, U.; Prien, G.; Bird, K. Computer-Aided Qualitative Data Analysis: Theory, Methods and Practice; Sage Publications: Thousand Oaks, CA, 1995. Watson, W. R. TechTrends. 2007, 51, 28–34. Ellis, R. A.; Calvo, R. A. Educ. Tech. Soc. 2007, 10, 60–70. http://www.itemanalysis.com/; information on jMetrick retrieved on March 16, 2017. Meyer, J. P. Applied Measurement with jMetrik; Routledge: New York, 2014. Meyer, J. P.; Hailey, E. J. Appl. Meas. 2012, 13, 248–258. Horton, N. J.; Klienman, K. In Statistical Anlaysis and Graphics, 2nd ed.; CRC Press, Taylor & Francis Group: New York, 2015. Melnik, S.; Gubarev, A.; Long, J. J.; Romer, R.; Sivakumar, S.; Tolton, M.; Vassilakis, T. Proc. of the 36th Int’l Conf on Very Large Data Bases 2010, 330–339. Borenstein, M.; Hedges, L. V.; Higgins, J. P. T.; Rothstein, H. R. Introduction to Meta-Analysis; John Wiley & Sons Inc.: New York, 2009. Konopásek, Z. Hist. Soc. Res. Suppl. 2007, 19, 276–298. Friese, S.; Qualitative Data Analysis with ATLAS.ti; Sage Publications: Thousand Oaks, CA, 2011.

8 Gupta; Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues ACS Symposium Series; American Chemical Society: Washington, DC, 2017.