Knowledge Surveys in General Chemistry: Confidence

Sep 2, 2011 - Knowledge surveys have been used in a number of fields to assess changes in students' understanding of their own learning and to assist ...
3 downloads 16 Views 1MB Size
ARTICLE pubs.acs.org/jchemeduc

Knowledge Surveys in General Chemistry: Confidence, Overconfidence, and Performance Priscilla Bell*,† and David Volckmann‡ Departments of †Chemistry and ‡Psychology, Whittier College, Whittier, California 90601, United States

bS Supporting Information ABSTRACT: Knowledge surveys have been used in a number of fields to assess changes in students’ understanding of their own learning and to assist students in review. This study compares metacognitive confidence ratings of students faced with problems on the surveys with their actual knowledge as shown on the final exams in two courses of general chemistry (Chem 110A and Chem 110B). The surveys were administered at the start and end of the course and correlated with the final exam scores. The surveys and final exams were found to be reliable, and the relatively high correlations between them suggested that students’ confidence ratings on knowledge surveys were valid reflections of their actual knowledge. Students scoring high on the exams estimated their knowledge with greater accuracy than the lower-scoring students, who overestimated their knowledge (see figure). This phenomenon reflected the Dunning Kruger effect, and the methodology of knowledge surveys isolated students’ efficacy expectations, not outcome expectations, as the likely origin of the effect. Finding remedial interventions to improve metacognitive skills for lowerscoring, overconfident students poses a continuing problem. KEYWORDS: First-Year Undergraduate/General, Upper-Division Undergraduate, Chemical Education Research, Curriculum FEATURE: Chemical Education Research

chemistry students,6 which is one of the ideas that stimulated the current study of knowledge surveys in the general chemistry course. One of the more comprehensive definitions of metacognition comes from Gourgey (quoted in refs 7 and 8), and relates directly to the purposes of the knowledge survey outlined above:

E

ducators have many tools available for formative and summative evaluation of student learning. In addition to standard testing formats, knowledge surveys (KS) have emerged as tools for students to analyze their understanding of specific course content and for faculty to organize their course curricula.1 On these surveys, students face questions of varying difficulty and cognitive complexity according to Bloom’s taxonomy,2 and they are prompted to assign one of three levels of confidence to each question: a. I have confidence in answering this question. b. I could answer 50% of the question or know where to get information quickly. c. I have no confidence in answering the question. By honestly assigning one of the three levels of response, students should be able to determine quickly the areas in which they excel and the areas that will need to be stressed in their review of the material. Another benefit to students comes from the transparency of the faculty expectations of students’ knowledge and skill sets required for the entire course.3 An overview on knowledge survey construction can be found at the Merlot Elixr Web site.4 Students’ self-assessment of their understanding prompted by knowledge surveys involves metacognition, people’s knowledge about their own knowledge.5 Rickey and Stacy noted the importance of metacognition in the problem-solving abilities of Copyright r 2011 American Chemical Society and Division of Chemical Education, Inc.

[A]wareness of how one learns; awareness of when one does or does not understand; knowledge of how to use available information to achieve a goal; ability to judge the cognitive demands of a particular task; knowledge of what strategies to use for what purposes; and assessment of one’s progress both during and after performance. Knowledge surveys created for this study focus on all but the first component of this definition, whereas most of the currently published chemistry studies on metacognition focus a greater degree of emphasis on this first component.7,9 12 For example, Sandi-Urena, Cooper, and Stevens developed an inventory to assess the planning, evaluating, and monitoring of problemsolving skills of students, calling these areas “metacognitive skillfulness”, which are generally useful to any person solving problems in any context.10 By contrast, knowledge surveys

Published: September 02, 2011 1469

dx.doi.org/10.1021/ed100328c | J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education monitor student confidence in their own specific problem solving skills and knowledge directly related to course content. Knowledge surveys have emerged as tools for faculty curriculum development and for enhancement of student understanding of specific course content.1 The usefulness of knowledge surveys to faculty and students rests on their validity. That is, do student responses to knowledge survey questions accurately measure students’ actual knowledge of course content? The major way to answer this question is to correlate knowledge survey results with measures of student performance in the course. Few studies have reported these data. In a geology course (N = 15), Wirth and Perkins3 measured correlation coefficients in the range of r = 0.54 0.79 between survey scores and test or course grades. In contrast, Bowers,13 using a survey for five sections of an introductory biology course (total N = 336), found much lower correlations between survey scores and course grades (r = 0.21 0.46). The current study in chemistry classes explores the validity of knowledge surveys in some detail. No published knowledge surveys are available in the field of chemistry, although they have been reported for a number of other fields, including statistics,14 geology,3,15,16 and biology.13 This study began with the development of two knowledge surveys for the two sequential semesters of general chemistry taught at a small liberal arts institution during the fall semesters of 2005 2007 (Chem 110A), and the spring semesters of 2007 2008 (Chem 110B). Initial interest in the knowledge surveys was stimulated by their potential effectiveness as instructional aids. Subsequent analysis allowed the authors to investigate the relationships between students’ responses on the knowledge surveys and their performance on course exams. Of particular interest was the extent to which students’ scores on the knowledge survey would reflect their performance on exams. Kruger, Dunning, and their colleagues have published several papers showing how metacognitive ratings of performance are likely to vary greatly among individuals with varying skill levels.17,18 These authors, as well as others, have noted that people tend to overestimate their performance, and this error in estimation is greatest among the weakest performers.7,8,17 19 So the data collected for this study were analyzed to explore this possibility as well. The examination of data from three classes of students in Chem 110A enabled a compelling analysis for addressing the issues above. An independent study of a smaller set of data from Chem 110B served to replicate the findings from Chem 110A.

’ METHODOLOGY Chem 110A

The first author, who taught all three classes of Chem 110A reported in this study, created a 126-question knowledge survey covering the same topics as the eventual test questions, though none were identical to them. The survey questions were selected from midterm examinations written by the first author with Bloom’s levels2 in mind. The resulting distribution of cognitive complexity of questions on the survey reflected that of the actual final examinations (see the Supporting Information for samples of matched questions from the knowledge survey and final exams). For the Chem 110A survey, the percentage of questions were distributed roughly as follows according to Bloom’s taxonomy:2 knowledge (12%), comprehension (21%), application (27%), analysis (19%), synthesis (6%), and evaluation (5%). Both verbal and written instructions prompted students to circle

ARTICLE

one of three responses directly on the survey, labeled a, b, or c, indicating their level of confidence in answering the question. Response “a” should be chosen if students were confident that they could answer the question sufficiently well for graded test purposes. The “b” response would be selected if they could answer at least 50% of the question or knew precisely where they could quickly get the information (within 20 min), and then could complete the answer for graded test purposes. The “c” answer would be chosen if they were not confident they could adequately answer the question for graded purposes. The knowledge surveys were administered in the first class meeting of the course before any instruction had taken place (preKS). Students were not timed, and most completed the exam in approximately 30 min. The instructor distributed copies of the survey to students following its first administration. Subsequently, the instructor highlighted the questions from the knowledge survey associated with each test in the course and posted answers online the day before the exam to allow students to use the knowledge survey for review. This was done for all tests, and the complete set of answers was available before the final exam. In the last class period of each course, 4 days prior to the final exam, the instructor administered the same knowledge survey (postKS), announcing that the results would be tabulated after class grades were assigned. Students were required to identify themselves on each administration of the survey, but were informed that their identity would not be conveyed to the instructor until after grades were assigned. This mode of administration was implemented to decrease the likelihood of the selfenhancement motive described by Mabe and West.20 While the form of the survey in this study was paper and pencil, it is well-suited for online testing and analysis as seen in the geology knowledge surveys developed by Wirth and Perkins.3 In the three classes of Chem 110A, 166 students completed both the pre- and postinstruction knowledge surveys (N2005 = 46, N2006 = 62, N2007 = 58). Students were assigned an identification number, and each of their responses was entered into a spreadsheet using these values: a = 100; b = 50; and c = 0. This created an average KS scale of 0 100, whose range is equivalent to the range of the final exam scores, to enable a direct comparison between the scales. Data from two Chem 110A students were eliminated from the analysis because their responses did not change throughout the survey and their level of choice was not consistent with their ability (e.g., a confident student with A grades giving nearly all KS responses as “c”), which left a sample of 164 students. Data that had been collected on the class intake forms were also analyzed, including class standing and previous classes taken. The latter data were grouped in the following categories: no high school chemistry; one year of high school chemistry; two years of high school chemistry; advanced placement/honors chemistry; and college-level chemistry. In addition, test scores, final exam scores, and class total points were added to the spreadsheet. Statistical tests were performed using PASW Statistics 18.0. Protocols for IRB were approved by the human subjects committee of the institution. Chem 110B

A parallel study was conducted with 47 students taking the second semester of general chemistry. One of the students eliminated from the first study was likewise eliminated from this study, resulting in a sample of 46 students. As before, the first author assembled a 77-question survey for Chem 110B based on 1470

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education topics in the second semester of general chemistry. This survey was field tested in her colleague’s Chem 110B class prior to the year she taught the course. The questions, according to Bloom’s levels,2 were distributed as follows: knowledge (16%); comprehension (21%); application (40%); analysis (14%); synthesis (6%); and evaluation (3%). The surveys and statistical analysis were administered in an analogous way.

’ RESULTS AND DISCUSSION Chem 110A Reliability of Instruments

Reliability measures of the preKS and postKS for each of the two knowledge surveys were obtained using Cronbach’s α with an Excel program supplied by Nuhfer;21 the results reflect the internal consistency of the test items. The reliability for each administration of the survey was quite high. Cronbach’s α values for preKS and postKS for Chem 110A combined data were α = 0.976 (α2005 = 0.990; α2006 = 0.982; α2007 = 0.977), and α = 0.964 (α2005 = 0.927; α2006 = 0.975; and α2007 = 0.941), respectively. The final exams were also examined for reliability, resulting in Cronbach’s α values of 0.886, 0.871, and 0.807 for the three sections of Chem 110A. Chem 110A Data Analysis

The validity of the Chem 110A knowledge survey to measure students’ knowledge in the course was assessed by comparing the postknowledge survey scores (postKS) to the final exam for the course. First, a more targeted subset of questions on the survey was selected to match the actual questions on the final exam. (This focused subset is designated postKS*.) Correlations for the three years between postKS* and final exam were r = 0.537, 0.666, and 0.496 (p < 0.001), respectively. So the overall correlation between the postKS* and the final exam was fairly strong (r = +0.556, p < 0.001), indicating that the postKS* was generally a valid measure of student knowledge at the end of the course. That is, the distributions of students’ confidence ratings assessing their knowledge of the course content generally matched the distributions of their scores on the final exam taken four days later. Furthermore, the correlation between the full knowledge survey, postKS, and the final exam was nearly identical (r = +0.555, p < 0.001), no doubt owing to the fact that the entire knowledge survey is highly internally consistent. Correlations of posttests with the final exam of around r = 0.56 compares favorably with that found by Wirth,3 who reported r values in the range of 0.56 0.68. Such values should be considered relatively high. On the basis of reviewing more than 1300 articles and books on student ratings of teaching, Cashin proposed that in the social sciences:22 [C]orrelations between 0.20 to 0.49 are practically useful. Correlations between 0.50 and 0.70 are very useful but they are rare when studying complex phenomenon. It is possible that the high correlations between knowledge survey and final exam scores were a result of ordinary student characteristics, such as previous chemistry knowledge or testing experience. For instance, students may respond consistently in both testing formats. If these characteristics are especially influential, both pre- and postknowledge surveys should generally correlate with the final exam equally, thereby discounting the contribution of student learning through the course. The overall correlation between the preKS and final exam, though significant, is only r = 0.190 (p < 0.05). The individual classes had r values of

ARTICLE

0.121, 0.250, and 0.143 (p > 0.05), respectively, in contrast to the postKS overall correlation r = 0.556 and individual class correlations of r = 0.537, 0.666, and 0.496 (p < 0.001). The dramatic increase of the correlation coefficients highlights the effect of the course on student responses on the survey. Additionally, the influence of students’ previous testing knowledge or testing experience, as well as response biases on the knowledge survey, may be minimized by controlling for preKS values in a partial correlation of postKS with final exam. In our analysis, the change in the postKS correlation with final exam was negligible (r = 0.556 versus the partial correlation of r = 0.536 controlling for preKS). The preKS did not correlate strongly with any course midterms (r < 0.232, p > 0.05). Conversely, correlations between postKS and midterms generally increased as the course progressed and were especially high for Test 4 (r = 0.600) and Test 5 (r = 0.668). These tests were given 3 weeks and 1 week before the end of the course, respectively, and covered 40% of the topics on the knowledge survey. These higher correlations may reflect the enhancement of the students’ metacognition on these topics as a result of the feedback received on the exams. The level of student preparation was considered to be a possible contributor to the variation in knowledge survey results. As expected, the means of preKS were higher for those with more extensive chemistry experiences (preKS means were 15, 23, 34, 35, and 39, respectively, for students with no chemistry, 1 year, 2 years, advanced placement/honors chemistry, and college chemistry experience). A nonparametric test was chosen to compare the distribution of these groups because of the large variation in group size. A Kruskal Wallis test showed that different levels of student preparation (excluding the group of only two students who had no prior chemistry) produced significantly different preKS distributions (p < 0.001). Interestingly, the postKS survey distributions for the different levels of preparation were also significantly different (p < 0.05). So for the first course in general chemistry, the previous familiarity with the material led to differences in confidence ratings not only at the beginning, but also at the end of the course. On the basis of Mabe and West’s20 analysis of metacognition, class standing was expected to be related to KS, because the more mature students might have had other experiences with self-evaluation and might therefore be better at assessing their own knowledge. Their likely completion of additional courses in problem-solving skills should also have enhanced their scores.6,23 However, no significant differences between students of different classes were obtained. Student Self-Estimates and the Dunning Kruger Effect

One of the goals of this paper was to explore the correlation between students’ subjective confidence ratings on the knowledge survey and the more objective measure of student knowledge on the final exam. In the discussions that follow, it is assumed that there is a rough equivalency between these scales, which have the same 0 100 ranges. The choice of the values for the response alternatives on the knowledge survey (0, 50, or 100) was made based upon how particular degrees of knowledge might be evaluated on a summative test. Chem 110A Discussion

Figure 1, representing data from 164 students, shows the correspondence between each student’s Chem 110A final exam score (ranked from low to high, creating a monotonically increasing line) with his or her associated postKS* scores. The 1471

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

Figure 1. Chem 110A 2005 2007 postKS* scores and final exam scores ranked by students’ final exam score.

postKS* scores are joined with straight lines to make it easier to see the student-to-student variations in confidence. Inspecting the figure, one can see that the between-student variability in postKS* scores is relatively uniform across the range of final exam scores. The postKS* standard deviations for students in the bottom-, middle-, and top-third of final exam scores were 14.3, 11.9, 10.7, respectively. These decreasing levels of variability could easily be accounted for by the depressive ceiling effect on scores as they approach the maximum of 100. It seems, then, that students with the highest exam scores are just about as variable at gauging their course knowledge as students who have the lowest exam scores. More interesting in Figure 1 is the progressive change in students’ abilities to gauge their own knowledge over the entire range of final exam scores. For example, among roughly one-third of the students who scored the lowest on the exam, only eight of them underestimated their performance on the exam. The rest of that one-third overestimated their knowledge as they approached the final exam, and many by a wide margin. On the other hand, the top-third of the students produced much more accurate estimations of their ability, but it appears that the majority of them, 28, underestimated their ability relative to the exam. To test this progressive change of the postKS* as a measure of student learning over the range of final exam scores, we established a variable that measured student subjective knowledge estimates relative to objective knowledge measurement (EstΔ*) by calculating the difference between the postKS* and the final exam score for each student. The correlation of this variable with the final exam score was very strong (r = 0.586), reinforcing the notion that there is a systematic improvement of self-assessment from weaker to stronger students. This negative correlation reflects the inverse relationship between EstΔ* and the final exam score, indicating the large overestimation of weak students and the relatively more accurate (or even underestimation) of the stronger students. A simplified view of the relationships among postKS* scores and final exam scores (with preKS for comparison) is presented in Figure 2, in which the students are arbitrarily divided into three groups, depending on their level of performance on the final exam. The overestimation on the postKS* by weaker students and the more accurate estimation by the strongest students is now quite obvious. The relationships shown are nearly identical to those repeatedly displayed in the research of Dunning and

ARTICLE

Figure 2. Chem 110A preKS, postKS*, and final exam scores sorted by student groupings.

Figure 3. Chem 110A 2005 2007 data showing the relationship between mean EstΔ* values and final exam scores.

Kruger and their colleagues and others.8,17 19 The relationship, the so-called Dunning Kruger effect, has been replicated many times under various experimental and real-world conditions. Generally, people with less competence will have a positive bias for rating themselves or their performance as above average. In fact, more than half of them will tend to rate their competence or their performance well “above average”, a notion that does not make obvious sense. The more incompetent the individual, the greater this positive bias becomes. That is, the more incompetent people are, the greater the difference between their self-assessment and actual ability. Extremely competent people, on the other hand, will have more accurate self-assessment, and may even show negative bias.17 The mean EstΔ* is essentially the average over- or underestimation bias of the students. Figure 3 focuses on the significant change of EstΔ* as a function of the arbitrarily divided three groups of final exam scores (F [2, 161] = 26.5, p < 0.001). Students in the lower third of the test scores overestimated their knowledge by the largest margin (by an average of 18%), indicating that they are the least effective in judging their own ability accurately. The middle third also overestimated their knowledge, but to a much lesser degree (by an average of 7.5%). The students scoring in the top third were the most realistic, with a slightly pessimistic underestimation (by an average of 0.11%). These data indicate that, for the upper two-thirds of the students, their average postKS* score in Chem 110A estimated 1472

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

Figure 4. Chem 110B 2008 postKS* and final exam scores ranked by final exam score.

their understanding of course content on the final exam within one grade variation (10%).

ARTICLE

Figure 5. Chem 110B 2008 preKS, postKS*, and final exam scores sorted by student groupings.

Chem 110B Reliability of Instruments

Cronbach’s α values for preKS and postKS for combined data of students in Chem 110B were again quite high at α = 0.972 (α2007 = 0.963, and α2008 = 0.978) and α = 0.957 (α2007 = 0.957 and α2008 = 0.962). Cronbach’s α for the final exam in 2008 was α = 0.857.

Chem 110B Data Analysis

For Chem 110B, taught by the first author, the correlation between postKS* and final exam score was slightly higher than that of Chem 110A reported above (r = 0.571 versus r = 0.556), replicating the close association between student confidence ratings on the knowledge surveys and summative assessments of student knowledge on the exam. The correlation between the overall postKS (which includes all 77 questions) and the final exam score was even higher (r = 0.583), confirming again the internal consistency of the knowledge survey. Unlike the Chem 110A survey, neither the preKS nor the postKS scores were significantly related to students’ various levels of preparation for Chem 110B according to a Kruskal Wallis analysis of distributions. These results can be explained by the fact that the spring topics (kinetics, equilibrium, and electrochemistry) are less extensively covered in high school and preparation courses in college and would therefore have much less influence on students’ knowledge about the course topics. Again, class standing did not relate significantly to survey variables. Figure 4 demonstrates again considerable student-to-student variation in their postKS* scores over the range of student abilities in Chem 110B. Additionally, as before, the students showing the least ability on the final exam showed the greatest tendency to overestimate their competence on the postKS*, whereas students showing moderate or excellent command of the subject matter of the course had postKS* scores closer to their eventual final exam scores. This time the negative correlation of EstΔ* with the final exam score (r = 0.605) was even higher than that of Chem 110A result, presumably illustrating again the strength of the Dunning Kruger effect in these data. Figures 5 and 6 show the simplified views of the effect, comparing the postKS* and EstΔ* scores of students exhibiting low, medium, and high levels of ability on the final exam (preKS are included for comparison). These data are remarkably similar

Figure 6. Chem 110B data showing the relationship between mean EstΔ* values and final exam scores.

to the data not only from the three sections of Chem 110A, but also from nearly all of the replicated research studies conducted by Dunning, Kruger, and their colleagues.17,18,24 The lowestthird of the students overestimated their performance by appreciably more than a letter grade on the average. Mabe and West predicted that students would become better estimators of their knowledge with increasing familiarity with knowledge survey testing.20 Our results did not support this prediction. In fact, students’ ability to assess their knowledge grew worse from Chem 110A to Chem 110B. As the distribution of EstΔ* for Chem 110A had large skew and kurtosis, the comparison of EstΔ* between the two courses required nonparametric statistics. The median EstΔ* increased from 4.9 to 10.8 from Chem 110A to 110B for the 44 students who had taken both courses, showing a marginally significant increase in overconfidence as they moved from the lower-level course to the higher-level course, according to a test of Wilcoxon signed ranks (p = 0.082). The mean postKS* scores decreased from 76.4 to 67.2, but the mean final exam scores decreased even more, from 70.7 to 57.3, so the increase in EstΔ* is due to the lower final exam score, amplified, no doubt, by the Dunning Kruger effect. It is instructive to compare the above results for Chem 110B taught by the first author (2008) with data obtained from the same course offered in the previous year. It was noted above that the knowledge survey, written by the first author for Chem 110B, had been originally field-tested in a course taught by another 1473

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

ARTICLE

discerning students were reluctant to express confidence in their knowledge of the material on the postKS; and (iii) the students who eventually scored highest on the final exam, having taken the postKS, discovered what their greatest weaknesses were and studied the areas that required the most review during the 4 days they had to study for the final exam.

Figure 7. Chem 110B 2007 preKS, postKS, and final exam scores sorted by student groupings.

faculty member in the chemistry department (N = 41). Although the original course covered identical topics in the same order with the same textbook, there were obvious differences in the administration of the course, especially with regard to the use of the knowledge surveys. First of all, the surveys were not integrated into the course as fully as was the case in the 2008 course discussed above. The preKS was delivered during the first class period, but throughout the course students never had access to the survey questions, answers, or discussion of the answers. The survey was again delivered on the last day of the course, 4 days before the final exam. Furthermore, owing to differences in teaching and examination styles of the two instructors, the forms of the final exam in the two courses bore no resemblance to one another. The format and type of questions asked on the final exam in the first author’s course were similar to what would be found on the knowledge survey. The other instructor’s final exam included many fewer and much more global questions. (No measure of reliability of this instrument was available.) Given these major differences in courses, one might expect results involving the knowledge surveys to be different as well. In fact, the correlations between postKS and the final exam were quite different (r = 0.374 for the original field-tested course vs r = 0.583 for the course reported above conducted by the first author). Thus, the scores on the knowledge surveys were much less reflective of student performance on the final exam in the original course. However, more similarities than differences were found in the data between the two courses. In the original course, the means for preKS and postKS were 18.0 and 63.3; in the subsequent course, they were 23.9 and 64.5. The mean final exam scores were even closer for the two courses, 57.7 and 57.5. Finally, as Figure 7 shows, plotting the mean knowledge survey scores against the mean final exam scores for the lowest-, middle-, and highest-scoring student groups produces the familiar pattern for the Dunning Kruger effect, in which the less competent individuals exaggerate their ability relative to their final performance evaluation. It is interesting that the students scoring highest on the final exam in the original Chem 110B course show in their knowledge survey scores such a dramatic underestimation of their competence—over one letter grade lower than their ultimate final exam average. At least three possible reasons might explain this extreme discrepancy: (i) the format of the more global final exam had little resemblance to the format of the more detailed knowledge surveys; (ii) the students in this course had minimal experience with the knowledge surveys, and so the most

’ CAUSES OF THE DUNNING KRUGER EFFECT Why would the poorest performers display such glaring overconfidence? In their original analysis, Kruger and Dunning17 suggested simply that people with little skill or knowledge, first, are not in a position to know how to answer questions, and, second, they do not have the metacognition to recognize and gauge how deficient they are. These authors argue that the skills necessary to make an accurate metacognitive judgment about whether their answers or analyses are correct are the same skills as the ones that are required to come up with the correct analyses in the first place. People incapable of arriving at correct answers would also be incapable of exercising any metacognitive discrimination that could help distinguish which answers are correct and which are incorrect. Two potential alternatives to the above argument are that people with little ability produce inaccurate self-evaluations either because they desire to save face (for themselves or for others) or because they simply do not care much about their own self-evaluations. However, research by Ehrlinger et al.18 demonstrates that neither of these two explanations has merit. In one experiment, subjects were motivated by substantial extrinsic rewards ($100) whenever their self-evaluations were completely accurate, and in another experiment students were threatened with a personally embarrassing situation if they submitted inaccurate self-evaluations (interview by an expert professor). In neither case did the sizable overconfidence in their knowledge disappear, and in fact, their naturally inflated self-evaluations tended to become even more magnified in the face of strong opposing motivational pressures! It appears, then, that students who do not know much about a subject are much too confident about the material for their own good. The ordinary feedback that they should be obtaining by completing a knowledge survey is in great part wasted on them. Therefore, even though knowledge surveys might be useful in identifying students according to their competence in the class, it is hypothesized that knowledge surveys may not be helpful instructional resources for the majority of students in the lowestthird of the class. In previous research, very competent individuals have tended to underestimate their performance. (See ref 24 for a review.) In most of this research, subjects had been asked to estimate their performance in comparison to others completing the same task. It is likely that the cause of the underestimate comes from the erroneous impression of competent individuals thinking that most everyone else is as competent as they are at the task. So when asked, for example, in what percentile their performance would fall, they are reluctant to indicate the highest percentiles. However, whenever subjects were asked to estimate their raw performance, the underestimation diminished or disappeared altogether. In the present research, it should be remembered that subjects were asked on their knowledge surveys neither to compare their performance with others nor to guess at their raw scores. Rather, on knowledge surveys students are simply asked to gauge the extent to which they might be able to answer 1474

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education questions (regardless of how well other students might perform). The minimal underestimation found among the best students who had the knowledge survey for review in Chem 110A and Chem 110B is likely because they were not making judgments about their ability in comparison to other students. The slight underestimation found could easily be accounted for by individual error variance around relatively accurate self-evaluations. One further intriguing idea highlights the differences between the methodology of knowledge surveys and the typical methodology used to illustrate the Dunning Kruger effect. In Bandura’s25,26 conception of personal control, two types of expectations relate to personal action: efficacy expectations and outcome expectations. The central metacognitive question for efficacy expectations is, “Can I do it?” For example, a student responding to a knowledge survey question might ask, “Can I solve this problem?” or “Do I have the skill to set up and perform this titration?” The central question for outcome expectations is, “Will what I do work?” For example, “Will I get credit for the right answer for this problem?” or “Will the indicator change when it should?” Typically in research that illustrates the Dunning Kruger effect, subjects are first expected to perform a task (such as take a test or engage in a debate). Only afterward are they asked how they believe their performance will measure up to some standard and how it will rank compared to the performance of other subjects on the same task. It appears in this customary methodology, then, that subjects have always been asked about both their efficacy and their outcome expectations after the performance has been completed. Therefore, according to Bandura’s model, either type of expectation (or both) might be the basis for the over- or underevaluation of subjects in the typical research design; the two types of expectation are theoretically confounded in the design of the research. However, in the methodology for research on knowledge surveys, it appears that only one kind of expectation is being prompted: efficacy expectation, or, “Can I do this?”—not outcome expectancy, or, “How will my answer be scored?” or “How will my answer be compared to others in the course?” The fact that the Dunning Kruger effect can be elicited using the methodology of knowledge surveys supports the interesting idea that the effect is largely derived from subjects’ metacognition about efficacy expectations and not outcome expectations.

’ CONCLUSIONS The replicated data in this study show several important general characteristics of knowledge surveys used in a yearlong sequence of courses in general chemistry. First of all, the knowledge surveys created for both semesters were shown to be reliable measures of student responses according to an analysis using Cronbach’s α. Furthermore, the significant correlations between students’ knowledge survey scores and their final exam scores show that the surveys are valid indicators of student knowledge of the course content and their skills at handling problems of general chemistry by the end of the course. Variations in student scores on the knowledge surveys were relatively consistent across all levels of student performance on the final exam. Furthermore, the average knowledge survey score for the upper two-thirds of the students in both semesters was within 10% (one grade level) of the final exam score. On average, the best students estimated their knowledge on the final exam almost perfectly in classes where knowledge surveys were always available for review, but there was a

ARTICLE

progressive tendency for weaker and weaker students to overestimate their knowledge. This result fits the description of the Dunning Kruger effect, which has been shown in other recent research studies to be a robust effect across several domains of self-evaluation. The finding of overestimation by the weaker students is expected to be a general phenomenon that affects the results of knowledge surveys in any discipline. Although better students, who tend to estimate more accurately, appear to receive instructional aid from the use of knowledge surveys for review, it is hypothesized that other students who persist in overestimating their own competence are incapable of benefiting from the ordinary use of knowledge surveys and require other models of intervention (e.g., see ref 10). The unique methodology of knowledge surveys makes it clear that the Dunning Kruger effect found in this study stems directly from students’ judgments about their own mastery of the problems and assignments of the course and not their expectations about grades that they might be assigned on their work. That is, this study suggests that the Dunning Kruger effect is based upon efficacy expectations independent from outcome expectations. Users of knowledge surveys would want to develop a way to counsel individual students. One way might be to determine a score below which students should be concerned about their potential difficulty in the class. While this might alarm some fraction of the better students, it would enable the group who tend to overestimate their knowledge to be aware that they are in jeopardy. For example, if a postKS score of 74 were used for Chem 110A, approximately 77% of the upper- and lower-thirds of the class would be correctly advised about their performance on the final exam. Administering the knowledge survey midway in the course could permit a score to be determined that would prompt students to make some changes in their approach to reviewing for the course. Other authors have proposed methods for early identification of overestimating students to trigger intervention.7,8 Once these students have been identified, it is not at all clear what types of interventions will be successful with these specific students. Sandi-Urena, Cooper, and Stevens, for example, have recently proposed a multistep intervention that increases not only metacognitive sensitivity in students, but their problem solving skills as well.15 Additional research needs to be performed to determine whether this kind of intervention would help this group of students. Evaluating the effect of online administration of the survey as well as determining the value of more frequent delivery (before midterms) would be worthy topics of additional study, as these changes might enhance the effectiveness of the students’ metacognition. The role of instructional modes and testing styles, as well as the influence of the survey authorship, would also be areas for additional study.

’ ASSOCIATED CONTENT

bS

Supporting Information Table of matched questions from Chem 110A final exams and knowledge surveys; Knowledge Surveys for Chem 110A and Chem 110B. This material is available via the Internet at http:// pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected]. 1475

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

ARTICLE

’ ACKNOWLEDGMENT The authors wish to thank Lori Smith for processing the raw knowledge surveys, as well as the Faculty of Whittier College for a Research Grant. We thank Edward Nuhfer for his invaluable help in calculating and verifying Cronbach’s α and for his constructive assistance with the manuscript. ’ REFERENCES (1) Nuhfer, E. B.; Knipp, D. The Knowledge Survey: A Tool for All Reasons. To Improve the Academy 2003, 21, 50–78. (2) Taxonomy of Educational Objectives; The Classification of Educational Goals: Handbook 1—Cognitive Domain; Bloom, B. S., Ed.; Longmans, Green: New York, 1956. (3) Wirth, K. R.; Perkins, D. Knowledge Surveys: An Indispensable Course Design and Assessment Tool. Presented at the Innovations in the Scholarship of Teaching and Learning at Liberal Arts Colleges, St. Olaf, MN, 2005. (4) MERLOT ELIXR Home Page. http://elixr.merlot.org/ (accessed Aug 2011). (5) Weingart, F. E. In Metacognition, Motivation, and Understanding; Weingart, F. E., Kluwe, R. F., Eds.; Lawrence Erlbaum Associates: Hillsdale, NJ, 1987; p 8. (6) Rickey, D.; Stacy, A. M. J. Chem. Educ. 2000, 77, 915–920. (7) Cooper, M. M.; Sandi-Urena, S.; Stevens, R. Chem. Educ. Res. Pract. 2008, 9, 18–24. (8) Potgieter, M.; Ackermann, M.; Fletcher, L. Chem. Educ. Res. Pract. 2010, 11, 17–24. (9) Cooper, M. M.; Sandi-Urena, S. J. Chem. Educ. 2009, 86, 240–245. (10) Sandi-Urena, S.; Cooper, M. M.; Stevens, R. Int. J. Sci. Educ. 2010, 1, 1–18. (11) Schraw, G.; Brooks, D. W.; Crippen, K. J. J. Chem. Educ. 2005, 82, 637–640. (12) Tsai, C. J. Chem. Educ. 2001, 78, 970–974. (13) Bowers, N.; Brandon, M.; Hill, C. Cell Biol. Educ. 2005, 4, 311– 322. (14) Jordan, J. J. Stat. Educ. 2007, 15 (2); http://www.amstat.org/ publications/jse/v15n2/jordan.html (accessed Aug 2011). (15) Knipp, D. Knowledge Surveys: What Do Students Bring To and Take From a Class? http://web.archive.org/web/20100529084647/ http://www.isu.edu/ctl/facultydev/KnowS_files/KnippUSAFA/KSKNIPPUSAFA.html (accessed Aug 2011). (16) Nuhfer, E. B. J. Geosci. Educ. 1996, 44 (4), 385–394. (17) Kruger, J.; Dunning, D. J. Personality Soc. Psych. 1999, 7, 1121– 1134. (18) Ehrlinger, J.; Johnson, K.; Banner, M.; Dunning, D.; Kruger, J. Org. Behav. Hum. Decis. Process. 2008, 105, 98–121. (19) Issacson, R. M.; Fujita, F. J. Scholarship Teach. Learn. 2006, 6, 39–55. (20) Mabe, P. A.; West, S. G. J. Appl. Psych. 1982, 67, 280–296. (21) Nuhfer, E. Private communication. June 4, 2010. (22) Cashin, W. E. Student Ratings of Teaching: A Summary of the Research; IDEA Technical Report No. 20; Center for Faculty Evaluation and Development; Kansas State University: Manhattan, KS, 1988; pp 1 6. (23) Antonietti, A.; Ignazi, S.; Perego, P. Br. J. Educ. Psychol. 2000, 70, 1–16. (24) Dunning, D. Self-Insight: Roadblocks and Detours on the Path to Knowing Thyself (Essays in Social Psychology); Psychology Press: New York, 2005. (25) Bandura, A. Self-Efficacy: The Exercise of Control; W. H. Freeman: New York, 1997. (26) Bandura, A. Psychol. Rev. 1977, 84, 191–215.

1476

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476