The Significance of Accurate Student Self-Assessment in

The Significance of Accurate Student Self-Assessment in Understanding of Chemical Concepts. Susan D. Wiediger, and John S. Hutchinson. Department of ...
0 downloads 0 Views 205KB Size
Research: Science and Education edited by

Chemical Education Research

Diane M. Bunce The Catholic University of America Washington, D.C. 20064

The Significance of Accurate Student Self-Assessment in Understanding of Chemical Concepts Susan D. Wiediger and John S. Hutchinson* Department of Chemistry, Rice University, Houston, TX 77251-1892; *[email protected]

An important difference between an expert and a novice in chemistry is the ability to use the language and symbolic representations of chemistry accurately (1). A novice or intermediate student of chemistry may have difficulties in the proper use of terms and in the proper descriptions of concepts, in part reflective of an incomplete understanding of those concepts. These difficulties often manifest themselves in significant differences in student responses when questions are asked in different ways. Recent studies demonstrate that different methods of examination reveal different levels of understanding by students (2). This seems intuitively clear. What is less obvious is how these difficulties with language affect students’ ability to self-assess their understanding of new material as a precursor to deepening their understanding. In this paper, we present our observation of students’ self-assessment capability. As part of a series of pretests administered in the first semester of General Chemistry at Rice University (Wiediger, S. D.; Cramer, R. D.; Hutchinson, J. S. J. Chem. Educ., in preparation), a new interactive question type provided an opportunity to look directly at this ability and its relationship to success on content questions. We argue that to move beyond rote memorization of any topic, development of the ability to evaluate one’s own work (and that of others) is essential.

familiarity with the item and to provide a free-response answer only if they felt able to (i.e., if they did not feel able to define the term, we did not ask them to). This approach was used for term identification, where we were interested in whether students had been exposed to the term at all, even if they could not define it, as well as for a molecular orbital energy diagram that we anticipated would be unfamiliar to most, and we wanted more than just blank answers. The other 9 questions included an option in the second stage for students to indicate that they had left the first-stage question blank. The second stage presented students with five to eight options and asked which option most closely resembled their free-response answer. In other words, we asked the students

The Three-Stage Questions We developed a series of pre-instruction diagnostic quizzes for our General Chemistry course at Rice, with the primary goal of determining our students’ knowledge of chemical concepts on entry to the course. To avoid overwhelming or intimidating students on the first day of class, quizzes were 10–15 minutes in length and given before each section of the course (nine in all). Most of the questions on these quizzes are either multiple choice or free response, and we requested that students do the quiz before starting the reading for the next section. The quizzes, given in a Web-based format, were written at Rice using a commercial software package1 for Web-based data collection. Because our quizzes are created at Rice, we were able to develop a novel “three-stage question” format that reveals a great deal about the depth of student comprehension of chemical concepts. An example of a three-stage question is shown in Figure 1, and the box lists all 14 questions of this style that were asked. We will use data from three of the questions to illustrate our analysis. The first stage in each question was a free-response question such as “Define electronegativity” or “Complete the Lewis structure for methanol.” For the 5 questions marked with asterisks (see box), students were asked to indicate their 120

Figure 1. Example of a three-stage question. This question about entropy is the type that asks about “familiarity”. Students who answered at one of the other three levels of familiarity were not required to give a free-response answer; they were still asked to choose a definition from a list on the followup screen, but with a different prompt and without the blue column (here represented in gray). The answers depicted are those of a student who had an incorrect free-response answer, a correct multiple-choice answer, and positive agreement (as discussed in the “Considering SelfAssessment” section of this paper). These questions would be part of a “diagnostic quiz” of approximately 5 to 10 questions.

Journal of Chemical Education • Vol. 79 No. 1 January 2002 • JChemEd.chem.wisc.edu

Research: Science and Education

to find their previous free-response answer in a multiplechoice list. Students did not see the multiple-choice options until after the free-response answer had been submitted on a previous Web page. These options were usually developed from free-response answers to similar questions the year before; “none of the above” was generally one of the options. One illustration of this self-assessment is shown in the second stage in Figure 1.

Topics of Three-Stage Questions Asked

Definitions Valence* Electronegativity* VSEPR* Entropy*†

Skills Lewis structures Methanol Boron trifluoride† Hydrogen cyanide Carbamic (or aminoformic) acid Molecular geometries Tetrahedral† Trigonal pyramidal Octahedral Trigonal bipyramid Molecular orbital diagrams Elemental oxygen (O2)*

Concepts Relationship between electronic structure and lamp color (Ar vs Ne) NOTE: All of the diagnostics are available online at http://chemed.rice.edu, for those interested in the actual appearance of these questions. *Questions asking students’ level of familiarity are marked with asterisks. † Daggers mark questions whose data are used for illustrations in the text.

In the third stage, the students were asked to select what they thought was the correct answer from the same five to eight options. They had access to their free-response answers during the second and third stages; rare cases where students modified their first-stage answer after seeing the choices offered were detected and discarded. Students’ answers were confidential. Although participation earned extra credit, the correctness of students’ answers had no effect on their grades. Analysis

First Pass A basic comparison of multiple-choice and free-response questions would be to check the correlation of the questions. The results, illustrated in Figure 2, suggest the following unsurprising results: 1. The majority of students answer consistently: if their free-response answer is correct, then their multiplechoice answer is correct; if one is wrong, then so is the other. 2. For all but two questions, more students get the multiple choice correct, in many cases by large margins, suggesting that the options trigger recognition of the correct answer. For the terms valence and entropy, more students get the free response correct, suggesting that the options in the multiple-choice are distracting. 3. Overall performance is poor; many students can answer few of the questions correctly. 4. Statistical analysis (chi-square) shows that the free response and multiple choice are dependent to better than the 99% confidence level (with the exception of VSEPR and the MO diagram of O2, which are difficult questions and have small sample sizes of fewer than 30 students).

An initial conclusion to be drawn from these comparisons is that, even though some students correctly answered only one of the question versions, overall the multiple-choice results are strongly correlated with free-response results. This suggests that, as one would hope, the two types of questions test the same knowledge. This would be reassuring for those instructors who, owing to class size or other constraints, use multiplechoice tests. However, delving further into differences between the questions suggests a subtle but significant weakness in this reasoning.

Figure 2. Comparisons of free-response and multiple-choice correctness for three questions: defining entropy, drawing a tetrahedral molecular geometry, and completing the Lewis structure for boron trifluoride. Each matrix sums to the number of students taking a particular diagnostic; note that the total numbers are smaller for entropy because only students who provided a definition are included. P-values less than .01 for chi-square calculations indicate better than 99% confidence that the questions are related—i.e., that such a distribution of numbers could not have occurred by chance. The tiny p-values shown here indicate strong correlation between the two forms of question, as might be anticipated.

Considering Self-Assessment Recall that in the second stage of these questions, students were asked to select from a multiple-choice list the answer that most closely resembled their free-response answer—in other words, to assess their answer. When the expert grades the free response using the same categorization process, any difference between the choice made by the student and the choice made by the expert reveals a difference in assessment potentially highlighting the difference in viewpoint between the expert and the novice. Where the expert and the student categorize the student’s free-response answer the same way, we call this “positive agreement”; “negative agreement” indicates that expert and student chose differently in stage two. This is not the same as the correctness of the free response. For example, the sample

JChemEd.chem.wisc.edu • Vol. 79 No. 1 January 2002 • Journal of Chemical Education

121

Research: Science and Education

Figure 3. Comparison of free-response and multiple-choice answers when split by agreement (see definition in the text). For the other 11 three-stage questions, all positive agreement comparisons have p values less than .005 (except for VSEPR and the MO of O2), whereas all negative agreement comparisons have p values greater than .25 (except valence, which could not be calculated).

answer shown in Figure 1 has a wrong free-response answer, correct multiple-choice answer, and positive agreement. Had the student made any choice other than “C” as the best match to his or her answer, then Figure 1 would illustrate negative agreement. Figure 3 shows the results of using agreement to split the data in Figure 2. The bottom right box in each triad shows the correlation of multiple-choice and free-response questions for positive agreement cases. The correlation as measured by chi-square is extremely good, again suggesting that for these students, as it appeared for the whole class, the paired questions are testing the same information. On the other hand, the results for negative agreement cases are shown in the bottom left box of each triad. This is generally a smaller but still significant number of students. P values for most of the questions are in the .8 to .9 range, indicating that the free-response and multiple-choice versions are uncorrelated. This is extremely counter-intuitive. How could this be true? One possible interpretation of the data in Figure 3 might suggest that most of those in the negative-agreement category also get the answer wrong. This would imply that there is a connection between students’ ability to assess themselves “as experts” and their knowledge of the content; this may also be confounded with their ability to express themselves in the language of chemistry. Thus, these students may be essentially blank slates that do not know the content, do not know

122

how to express what knowledge they do have, and/or do not know how to assess what they have written down. However, if this were the case, then we would predict a correlation between the agreement and the correctness of student answers. Table 1 collects the results of such analysis; for most of the three-stage questions, there is no such correlation. If agreement is a measure of “chemistry language ability”, then it seems consistent that more free-response questions are correlated to agreement, since generating a correct answer presumably requires a higher language skill level than recognizing one. Arguably, one can also identify the problems where correctness and agreement are correlated as more basic problems that are generally covered in any curriculum. Perhaps it is these questions where student understanding is solid, the knowledge assimilated into students’ conceptual framework. While it might be expected that the ability to communicate in the language of chemistry should be related to content knowledge, our data do not support a close connection. In order for language and content to show a close correlation, most students who know the language would also need to correctly answer the question. This suggests that the ability to communicate in the language of chemistry, while not independent of content knowledge, is not necessarily closely related to that knowledge—at least not correct content knowledge. Understanding chemistry language is a necessary but not sufficient skill. Neither content nor the ability to communicate is sufficient alone.

Journal of Chemical Education • Vol. 79 No. 1 January 2002 • JChemEd.chem.wisc.edu

Research: Science and Education

Discussion Comparing the students’ choices in the second stage of the questions to an expert’s choices provides insight into students’ self-assessment and chemistry language skills. This is logical, and corresponds with the differences seen between experts and novices in other types of tasks, such as grouping physics problems (3). However, what we have focused on here is the correlation between having an expert-like view (positive agreement) and correct content knowledge. This comparison leads to less simplicity in the interpretation. For all but two of the questions asked, the number of students incorrectly answering the free-response version of the question but correctly answering the multiple-choice version was larger than the opposite combination. What might be the cause of this difference? Some possible reasons why student might fare better on multiple-choice questions are: 1. blind luck and test-taking strategies are more likely to result in a correct answer when choosing among options rather than filling in a blank 2. the options given in a multiple-choice question help a student to index a forgotten bit of information in longterm memory 3. a student has a general idea of the correct answer and cannot accurately express it in the language or symbolic representations used by an expert chemist, but can recognize it when given a list 4. a student with poor understanding may nonetheless recognize something that “looks right”—i.e., looks like the kind of answer she or he has seen before

All four of these causes apply to any test. Additionally, causes 3 and 4 depend on a student’s understanding of the “language” of chemistry. In cause 3, students lack the appropriate language to express what they know, but can recognize it, akin to being able to read but not speak a foreign language. In cause 4, the student can make an educated guess based on a cursory familiarity with the language itself, even though the conceptual understanding is lacking. Consequently, it is of great significance to understand the role of language and representation in students’ understanding and response to exam questions. The results presented above show clearly that although there is no consistent correlation between agreement and correctness, separating the students by whether they have positive or negative agreement also separates them into a group for whom free-response and multiple-choice questions appear correlated and a group for whom they do not. While it may seem farfetched that questions differing only in format could be unrelated, we believe that this happens for these students because they do not speak the language required to understand or answer the questions. In other words, relating answers to exam questions to a student’s knowledge of a subject depends on the student’s already knowing enough of the language to adequately convey that knowledge. For students who know enough of the language of chemistry to communicate what they know, multiplechoice and free-response questions do indeed probe the same area of knowledge. However, students who are not fluent with the language cannot consistently express themselves. As a

Table 1. Agreement vs Correctness for FreeResponse and Multiple-Choice Questions Question p Multiple Choice Free Response VSEPR .984 Ar/Ne .980 Electronegativity .940 Octahedral .885 Trigonal bipyramid .861 Trigonal bipyramid .806 Valence .798 Octahedral .685 HCN .639 .579 O2 MO VSEPR .473 .471 O2 MO Entropy .437 Valence .325 Carbamic acid .288 Carbamic acid .209 Trigonal pyramid .176 Electronegativity .087 Entropy .067 Ar/Ne .067 Methanol .048 BF3 .038 Tetrahedral .022 Trigonal pyramid .008 BF3 .001 Methanol .000 HCN .000 Tetrahedral .000 NOTE: Questions are ranked by chi-square p-value. The shaded area shows questions for which p ≤ .05. Questions for which p ≤.05 show correlation between the correctness of the students’ answers and agreement (whether the students’ choices in stage two match an expert’s choices); for p > .05, agreement and correctness are statistically independent.

consequence, there exists no correlation between the answers they write and the answers they select from a list. Their answers might as well be random guesses. For students with positive agreement, free-response and multiple-choice questions appear correlated. This suggests that the two versions of the question are testing the same information. The student might answer incorrectly, but they know what they are saying, and presumably could be persuaded of their error. For students with negative agreement, there is no correlation between their free-response answers and their multiplechoice answers on the same question. This lack of consistency in answering most probably reflects a lack of understanding of the language or symbolic representations of chemistry. A student with such a lack of understanding will not even recognize his or her own answer in a list. This weakness in selfassessment capabilities therefore characterizes students who may be unable to know whether their answer is correct. This has profound implications for a student’s ability to monitor his or her learning, both in and out of the classroom.

JChemEd.chem.wisc.edu • Vol. 79 No. 1 January 2002 • Journal of Chemical Education

123

Research: Science and Education

Further Implications These results strongly suggest the need for teaching approaches that develop the students’ ability to critically selfassess their own answers and provide opportunities for practicing the languages of chemistry. Students who cannot recognize their own answers have not thought critically about their understanding of that answer. In conventional instruction, there are many opportunities for students to have graders examine and find their errors, but few requirements for students to examine and find their own errors. Much has been written about the utility of active-learning and cooperative-learning approaches. Our data reinforce these arguments. Students who have opportunities to speak in class or to speak to one another should more readily develop the appropriate self-assessment, by challenging their own understanding at a more critical level. Learning a second language provides a useful analogy. Students with only a beginning knowledge of a new language may easily confuse similar questions; when asked, “What is your name?” they may answer with “It’s two-thirty in the afternoon.” Such an answer arises from memorizing a script of the language without understanding it. By contrast, a more advanced student may have no problem recognizing the question and answering “My name is Joe,” but might still make errors in grammar or spelling that may or may not be corrected when given options in a multiple-choice question. The lesson of the foreign language analogy is that constant demands to function actively in a language quickly integrate and extend one’s capabilities. Effective teaching in foreign languages, as may also apply in chemistry, involves more practice and experience via more conversation. Chemistry classes are rather like immersion language programs, in which students are given the basics such as Lewis structures or definitions and then expected to become fluent through use in homework and more advanced topics. This level of demand would best be integrated in a chemistry course through the many techniques developed and grouped under the umbrella of active learning. Opportunities

124

for the students to speak and interact provide needed realtime feedback that is not possible with written homework and exams. In a passive mode of education, students have little opportunity to improve communication skills (4). Opportunities for students to speak (and listen to other students) are, in our experience, particularly effective in revealing to students topics where their comprehension is weak. Anyone who has taught is familiar with and learns to recognize the anxiety that comes with trying to discuss a subject when on intellectual thin ice. This anxiety is a powerful motivating force for critical self-assessment. When forced to discuss chemistry orally, students can learn to use those sensations of anxiety as indications of a need for self-assessment. This can be accomplished by interactive approaches in the classroom, via encouragement of group study and via many other active-learning approaches. Experts and novices in chemistry differ in their command of the language and representations in chemistry. Developing the skills of self-assessment, to know when one doesn’t know, is essential to making the transition from novice to expert. Note 1. ColdFusion 4.5, a product of the Allaire Corporation, is a Web application server that allows dynamic generation of Web pages and handles database connections.

Literature Cited 1. Bransford, J. D.; Brown, A. L.; Cocking, R. R. How People Learn: Brain, Mind, Experience, and School; National Academy Press: Washington, DC, 1999. 2. Jones, M. G.; Carter, G.; Rua, M. J. J. Res. Sci. Teach. 2000, 37, 139–159. 3. Chi, M. T. H.; Feltovich, P. J.; Glaser, R. Cognitive Science 1981, 5, 121–152. 4. Wright, J. C.; Millar, S. B.; Kosciuk, S. A.; Penberthy, D. L.; Williams, P. H.; Wampold, B. E. J. Chem. Educ. 1998, 75, 986–992.

Journal of Chemical Education • Vol. 79 No. 1 January 2002 • JChemEd.chem.wisc.edu