Investigating General Chemistry Students ... - ACS Publications

Mar 18, 2016 - (3) How do general chemistry students' postdiction accuracies and metacognitive monitoring of their exam performance change over time?...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/jchemeduc

Investigating General Chemistry Students’ Metacognitive Monitoring of Their Exam Performance by Measuring Postdiction Accuracies over Time Morgan J. Hawker, Lisa Dysleski,† and Dawn Rickey*,‡ Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States S Supporting Information *

ABSTRACT: Metacognitive monitoring of one’s own understanding plays a key role in learning. An aspect of metacognitive monitoring can be measured by comparing a student’s prediction or postdiction of performance (a judgment made before or after completing the relevant task) with the student’s actual performance. In this study, we investigated students’ postdiction accuracies for a series of exams within a twosemester general chemistry course. The research questions addressed include (1) How accurate are general chemistry students at postdicting their exam scores? Are there gender differences in postdiction accuracy? (2) How do general chemistry students’ postdiction accuracies relate to their exam performance? (3) How do general chemistry students’ postdiction accuracies and metacognitive monitoring of their exam performance change over time? Results indicate that most general chemistry students are not accurate in their exam score postdictions and that, consistent with research conducted in other domains, higher-performing students make more accurate postdictions than lower-performing students. In addition, although students who were new to a general chemistry course appeared to improve in their metacognitive monitoring on the second course exam compared with the first, monitoring did not significantly improve after that initial adjustment. Given the importance of metacognitive monitoring for student learning of chemistry, these findings suggest that further research and development of interventions to improve the metacognitive monitoring of introductory chemistry students is warranted. KEYWORDS: First-Year Undergraduate/General, Chemical Education Research, Testing/Assessment, Student-Centered Learning, Learning Theories FEATURE: Chemical Education Research



INTRODUCTION

In this study, we examine the accuracy of general chemistry students’ exam performance postdictions. Measuring metacognitive monitoring with pre- and postdictions of task performance has been approached in a variety of ways. Judgments can be made on performance in absolute terms as described above or relative to others’ performance.12 Generally speaking, people’s abilities to pre- or postdict their performance on a task is found to be poor, although pre- and postdiction ability is related to task difficulty.13 Undergraduate students’ metacognitive skillfulness related to academic tasks has also been investigated. Studies that employ pre- or postdictions of absolute exam performance have been carried out in undergraduate educational psychology,7,10,14,15 education,5 cognitive psychology,8 developmental psychology,16 introductory physics,17 and organic chemistry courses;18 highschool biology19,20 and psychology classrooms;21 and upperelementary classrooms.22 Most of these studies conclude that

Metacognitive skills, which involve thinking about one’s own thinking, are increasingly being recognized for their important role in learning.1,2 For example, in one study, metacognitive skillfulness was estimated to account for about 40% of the variance in learning outcomes.3 Thus, to improve student learning in a specific content area such as chemistry, it is critical to understand both students’ metacognitive skillfulness and instructional methods that could be used to improve it in that context. Monitoring is a component of metacognition that involves assessing one’s current knowledge, understanding, and abilities, as well as the task at hand and its difficulty.1,4 One measure of metacognitive monitoring, often employed in cognitive psychology research, is the accuracy of an individual’s predictions or postdictions of performance,5−10 where a postdiction is a judgment made after completing a task. An individual’s accuracy can be expressed as the difference between his or her prediction or postdiction and his or her actual performance on a task. This is also referred to as calibration.11 © XXXX American Chemical Society and Division of Chemical Education, Inc.

Received: August 27, 2015 Revised: February 12, 2016

A

DOI: 10.1021/acs.jchemed.5b00705 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

many students are well-calibrated when it comes to making postdictive judgments of exam performance.10,14,19,20 A notable exception is the study carried out in organic chemistry, in which the majority of students overestimated their exam performance.18 All of the studies that explored the relationship between performance and calibration found that higher-performing students were better calibrated on exam postdiction judgments than lower-performing students.5,10,14,15 Studies that included analyses to determine changes in students’ calibration accuracy over time indicate that students only became more accurate in some cases where there was an explicit intervention designed to improve students’ metacognitive monitoring throughout the course.5,7,8,10,15 Although we did not find previous studies that explored gender differences in accuracy of students’ course exam postdictions, some previous work has explored differences in male and female postdiction accuracies on informal knowledge tests. In particular, Beyer23 and Beyer & Bowden24 found that, while there were no significant differences between male and female undergraduate students in postdiction accuracy on “neutral” tasks (e.g., tests of common knowledge) or “femininegender-typed” tasks (e.g., trivia from movies and TV shows with primarily female audiences), females significantly underestimated their performance and exhibited poorer calibration relative to males on “masculine-gender-typed” tasks (e.g., football, basketball, and baseball trivia). In studies that asked undergraduate students to rate their performance relative to that of other students, Kruger and Dunning did not find any gender differences in postdiction accuracy on tests of humor, logical reasoning, or English grammar.12 In the context of general chemistry courses, monitoring research has primarily focused on students’ abilities to judge their performance on problems outside of exam situations. For example, in two studies, investigators asked students to report their confidence in their abilities to solve particular chemistry problems, but the students did not actually work the problems presented to them.25,26 Another study examined the relationship between students’ confidence judgments regarding individual stoichiometry questions and their performance on these questions.27 In one study that did investigate students’ judgments of performance on general chemistry exams, students were asked to judge how they performed on exams relative to the average student in the class (without any knowledge of the class average or other students’ exam scores).28 Such relative judgments are fundamentally different tasks compared with the absolute judgments of individual performance examined here. In this study, we examine general chemistry students’ abilities to monitor their examination performance. We investigate students’ judgments of their own performance on exams within a two-semester general chemistry course. Our research questions are as follows: (1) How accurate are general chemistry students at postdicting their exam scores? Are there gender differences in postdiction accuracy? (2) How do general chemistry students’ postdiction accuracies relate to their exam performance? (3) How do general chemistry students’ postdiction accuracies and metacognitive monitoring of their exam performance change over time?

Table 1. Summary of Course Characteristics Semester

Sections

Instructors

Students Enrolleda

Students in Study

Fall Spring

5 3

4 2

1075 696

925 491

a

Total enrollment at the end of each course.

public university. Although each course included multiple sections and instructors (Table 1), common exams were administered to all students. The instructors for the spring General Chemistry II course were a subset of the instructors for the fall General Chemistry I course. For each semester-long course, five multiple-choice exams were administered at threeweek intervals, including one comprehensive final exam. The first four exams were not comprehensive. Example exam questions are shown in Box 1. Students in both fall and spring semesters had the option to use their final exam percentage as a replacement for one previous exam percentage in the calculation of their final course grade.

All exams included the postdiction question, “What percentage score do you expect to earn on this exam? (A) 100%−90%, (B) 89%−80%, (C) 79%−70%, (D) 69%−60%, (E) < 60%.” This question appeared as the last or second-to-last question on each exam,29 such that each student was asked to make a judgment of his or her own score on the exam immediately after taking the exam. We collected students’ answers to all exam questions via optical mark recognition (i.e., Scantron) forms. Each student who answered a postdiction question received credit amounting to 1−2% of the total points for the exam regardless of whether his or her postdiction was accurate. No interventions intended to improve students’ metacognitive monitoring of exam performance were implemented in these courses. Students who were missing a



METHODS We collected data from students enrolled in a two-semester general chemistry sequence (General Chemistry I and General Chemistry II) for science and engineering majors at a large, B

DOI: 10.1021/acs.jchemed.5b00705 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Figure 1. Bubble plots of student exam postdiction score category versus exam score category for each exam in the fall semester (N = 925). Bubble size is proportional to the number of students with a given exam score category and postdiction. In each case, the dashed diagonal line indicates perfect calibration. Overall percentages of overpostdicting, underpostdicting, and perfectly accurate students are included on the right side of each panel. The percentages of perfectly calibrated students in each exam score category are included at the top of the corresponding columns (italicized).

To conduct the calibration analyses, we coded each student’s actual exam scores and exam postdictions using the categories that were available as answer choices to the postdiction questions. We converted each exam score category and corresponding postdiction to a numerical value on a 4-point scale as follows: 100%−90% = 4, 89%−80% = 3, 79%−70% = 2, 69%−60% = 1,