Comparing Chemistry Faculty Beliefs about Grading with Grading

Publication Date (Web): January 26, 2012 ... The three categorical values identified in a similar physics education study were evident as themes in ou...
1 downloads 6 Views 1MB Size
Article pubs.acs.org/jchemeduc

Comparing Chemistry Faculty Beliefs about Grading with Grading Practices Jacinta Mutambuki† and Herb Fynewever*,‡ †

Department of Chemistry, Western Michigan University, Kalamazoo, Michigan 49008, United States Department of Chemistry and Biochemistry, Calvin College, Grand Rapids, Michigan 49546, United States



S Supporting Information *

ABSTRACT: In this study, we seek to understand the beliefs that chemistry faculty hold when grading student solutions in problem solving situations. We are particularly interested in examining whether a conflict exists between the chemistry faculty beliefs and the score they assign to students’ solutions. The three categorical values identified in a similar physics education study were evident as themes in our study: (i) a desire to see students’ explanation of their reasoning; (ii) a reluctance to deduct points from a student’s answer that might be correct; and (iii) projection of correct thought processes onto a student solution, even when the student does not explicitly show thought processes. The scoring of each student solution depended significantly on the weight given to each theme by an instructor. In situations where a participant expressed all the three themes, the conflict was resolved by laying a burden of proof on either the student or the instructor. In this study, a sizable minority of the participating faculty acted inconsistently, in that they stated they valued students showing their reasoning when solving problems, but they graded student work in a way that would discourage students from showing their reasoning. KEYWORDS: First-Year Undergraduate/General, Chemical Education Research, Testing/Assessment FEATURE: Chemical Education Research



INTRODUCTION Although grading may be perceived as a tedious activity, it serves as an important tool in both teaching and learning. The publication of a large number of articles in this Journal is indicative of the seriousness with which chemical educators take grading. In particular, there have been many articles on how to use technology to provide valid grades even in the face of a large grading load;1 how alternative grading schemes (e.g., pass−fail versus traditional letter grades) can affect student achievement;2 and how to use rubrics and common grading schemes to achieve grading that is consistent with learning goals and consistent between instructors.3 Our work here builds on this foundation by interviewing faculty regarding what they value in student work and then having them think aloud as they assign grades to purposefully constructed archetypical “student” solutions. In this way, we probe what instructors explicitly tell their students they value in a problem solution and then see whether the messages sent by their grading practices reinforce these values. Scriven4 identifies six reasons why grading is important, as follows: 1. To describe unambiguously the worth, merit, or value of the work accomplished. © 2012 American Chemical Society and Division of Chemical Education, Inc.

2. To improve the capacity of students to identify good work, that is, to improve students’ self-evaluation or discrimination skills with respect to work submitted. 3. To stimulate and encourage good work by students. 4. To communicate the teacher’s judgment of the student’s progress. 5. To inform the teacher about what students have and have not learned. 6. To select the students for rewards or continued education. Haagen5 argues that the instructor’s grading practice is of interest and importance not merely to the student himself, but also to parents, college administrators, bodies that award scholarships, professional and graduate schools, potential employers, and researchers associated with any of these groups. In the grading process, an individual instructor evaluates a variety of aspects of student work, including: the quality of thinking, grasp of factual detail, integrative ability, evidence of student preparation, and understanding.6 The practice of grading requires judgments; the instructor’s judgments serve the student as an aid to learning and also as a source of selfknowledge.5 The instructor’s judgments may be influenced by Published: January 26, 2012 326

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education

Article

Table 1. Description and Comparison of the Students’ Solutionsa Students’ Solutions Characteristics

A

Qualitative Reasoning Organization

No evidence Disorganized

Final Answer Reasoning Evaluation of the Final Answer Correct Mathematics Evidence of Hierarchical Knowledge Organization Evidence of Metacognition

Incorrect Unclear No Yes None

Avg. Score Given Standard Deviation

2.5 1.6

a

None

B Evidence Somewhat logical Incorrect Unclear No No Unclear Categorizes problem 5.6 1.7

C

D

Evidence Logical

Evidence Logical

Incorrect Clearly incorrect Yes Yes Solution could be based on fundamental principles Planning via subproblems, reflection on progress 6.6 2.8

Correct Clearly incorrect No Unclear Solution based on fundamental principles Planning via subproblems 7.8 1.8

E No evidence Unclear organization Correct Unclear No Yes None None 5.1 2.7

Solution characteristics adopted from Table I, ref 20.

analysis revealed that most physics instructors placed the burden of proof on themselves, a practice likely to discourage students from showing their reasoning.22 The present study replicates the study of Henderson et al. with faculty in chemistry. The purpose of this study is to examine whether a conflict exists between the stated beliefs about what should be shown by students in their work versus the observed grading practices among the chemistry faculty. The specific research questions that guided this study are these: • How do these faculty resolve the conflict, if any, arising from the expressed beliefs when assigning a score to a student’s solution? • Are the chemistry faculty likely to place the burden of proof on themselves or on the student when assigning a score? To elicit the faculty beliefs, five students’ solutions were presented to the instructors for grading. The five students’ solutions were constructed from actual students’ work from a final semester exam in general chemistry. These students’ solutions were designed to replicate the students’ solutions in Henderson et al.’s study.22 The description of the solutions is given in Table 1. The results presented in this paper discuss two of the five students’ solutions presented to the chemistry faculty. As was done in the physics literature, we focus on two students’ solutions; that is, student solution D (SSD) and E (SSE). The basic principle behind this choice is one of isolation of variables. SSD and SSE are the only two solutions in which the students may have done exactly the same work and where both have come up with the correct final answer. The only differences between the two are the degree to which the students make their process clear (e.g., explicitly including the chemical equation, labeling steps, including units, etc.). And so, the extent to which an instructor values SSD above SSE belies the extent to which the instructor values students making their reasoning clear. The converse is also true, namely, the extent to which an instructor values SSE over and above SSD belies the extent to which the grader does not value explicit work if it has some errors in it. On the whole, contrasting SSD with SSE asks the question: given that the final answers are the same and all of the calculations are the same, does the instructor put the burden of proof on the student (and penalize SSE for not showing his or her reasoning) or take the burden of proof on himself or herself (and give SSE the benefit of the doubt). As

the beliefs they hold, especially about the purpose of student work and what should be valued in student work.7 We are mindful that the term “belief” is a loaded one with an extensive literature in teacher education research.8−21 For the purposes of our study, we will directly compare what instructors say they believe about the purposes of grading and what should be valued in student work with how they go about grading in practice. While many studies investigate teacher beliefs in teaching and learning, very few studies about teacher beliefs and their grading practices have been documented. In Henderson et al.,22 the authors indicated the existence of a gap between physics faculty beliefs and their scoring of student solutions. The authors identified three themes that guide grading decisions:22 1. Desire to see student reasoning in order to know whether a student really understands. 2. A: Reluctance to deduct points from solutions that might be correct; and, B: Tendency to deduct points from solutions that are explicitly incorrect. 3. Tendency to project correct thought process onto a student solution even when the thought processes are not explicitly expressed. Their data analysis revealed that there is inconsistency between the stated beliefs and the score assigned to the students’ solutions among the physics faculty.22 This inconsistency is mainly between the stated belief manifest in Theme 1 and how it is often not followed in grading practice when Themes 2 and 3 are favored instead of Theme 1. For example, instructors state that they want students to show their reasoning (Theme 1) but then take off points for explicit mistakes (Theme 2B) and give credit for the work that might be correct even when thought processes are not made explicit (Theme 3). Henderson et al. use the “burden of proof” construct to explain how the instructors, consciously or unconsciously, resolved the conflict when an instructor expressed at least two values.22 The burden of proof construct involves requiring either the student or the instructor to bear the burden of proof. A burden of proof on the instructor as defined by these authors means that, in order to deduct points while grading a student’s solution, the instructor needs explicit evidence that the student used the incorrect knowledge or followed incorrect procedures. A burden of proof on the student means that there must be explicit evidence that the student used correct knowledge and followed correct procedures in order to get points.22 Their data 327

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education

Article

Figure 1. Interview problem and the two student solutions (SSD and SSE) discussed in this paper. Boxes in the right margin of SSD indicate errors in this solution.

sciences) and engineering majors. It is taught in a mostly lecture format by faculty with several sections of approximately 200−250 students per lecture section with a co-requisite laboratory of 20−30 students per section, each taught by teaching assistants. Most faculty instructors in the course assign weekly Web-based homework assignments, which include many numerical problems similar to those that would be found as end-of-chapter problems in a general chemistry textbook. Quizzes and written examinations resemble these assignments, but are often in a multiple-choice format to ease the grading load as the grading for the large lecture sections is the responsibility of the faculty instructors. Volunteer participants for this study were found among faculty who have taught an introductory chemistry course for at least five years. Ten participants were recruited in this study. All participants were tenured or tenure-track faculty. Five subdisciplines of chemistry were represented, including organic, inorganic, physical, analytical, and biochemistry. Teaching experience ranged from assistant professors with five years of teaching experience to full professors with over 30 years of teaching experience. Subjects were invited to participate through person-to-person visits (i.e., knocking on their office door), at which time the method of the study was described and, if the subjects were willing, they were given the prompt (the problem shown in Figure 1). Each participant was asked to develop a key before a mutually arranged interview date approximately one week later. The problem in the prompt is a stoichiometry problem that is typical of the sort of problem that would be assigned in the first semester of the introductory sequence. We note that faculty would not typically grade this sort of question by hand because the homework is Web-based and quizzes and exams are usually multiple choice. We found, however, that when presented with the student solutions during

we shall see, however, assigning the burden of proof was difficult to do in some cases. Context: Formative Assessment Paradigm

Our two research questions are based upon the context provided by research in formative assessment (sometimes synonymous with classroom assessment or assessment for learning). The premise of formative assessment is that teaching and learning is best accomplished within the context of two-way communication between instructor and students.23−25 This communication typically includes defining the learning target, conducting measurement of learning while it is happening, giving feedback to both students and instructor during the learning process, and making timely adjustments of teaching and learning activities based on the feedback received. Grading practices are common elements of this formative assessment cycle in that the instructor can learn about student understanding while grading. Also, students can get feedback on their learning through the comments given and grades assigned by the grading instructor. Furthermore, students may receive mixed messages if instructors’ grading practices do or do not line up with the expectations instructors communicate to students.



METHODOLOGY

Research Site and Participants

This study was conducted in the chemistry department of a regional research university in the midwestern United States. The chemistry department in this university offers a typical introductory two-semester general chemistry sequence. The first semester of this sequence is taken by approximately 10% of the first-year class, along with a small number of upper-division students, and is geared toward science (including health 328

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education

Article

the interview, none of the faculty expressed any hesitance in grading the solutions for the purpose of this study. This is probably not surprising, as the faculty typically have experience evaluating student work in other contexts, including previous work in smaller class settings, in interactions with students during office hours, and in smaller upper-division classes. The research was conducted during the summer, when the instructors’ teaching obligations were minimal.

percentage yield. On the other hand, student solution E (SSE) has the correct answer, but very little explication of the reasoning is displayed in this solution. It is important to note that the student writing SSE may or may not have made all of the same mistakes shown in SSD, but just did not show this work. In this way, SSE is quite ambiguous and the student may have made several mistakes or none at all: it is impossible to tell with certainty from the work shown.

Data Collection

Data Analysis

Data were collected via one-on-one, 30−60 min, semistructured think-aloud interviews.26−28 The think-aloud process was useful in this study because it helped capture the participants’ thoughts and views as they carried out the grading practice during the interview. An audio recorder and a video camera were used for verbal and visual responses of the participants. The interview consisted of: (i) discussion of beliefs and purposes in grading students’ solutions, and (ii) ranking the students’ solutions from best to worst and grading the students’ solutions on a scale of 0−10. During the interview, the participants were first asked to explain their purposes in grading student problem solutions. This part of the interview was quite open-ended (see the online Supporting Information for script) and did not ask the subjects to discuss the extent to which they require students to show their work. This is intentional, as we did not want to bias the grading itself one direction or another regarding this issue. The instructors were then presented with five student solutions (A, B, C, D, and E) and asked to rank them in order of the grade they would receive (from best to worst) assuming the interview problem was meant for a quiz. This was followed by assigning an actual grade for each student solution on a scale of 0−10. In the process of ranking and grading, the participants were asked to assume that the students were familiar with the instructor’s normal grading practices. The five students’ solutions presented to the participants in this study were designed based on actual student solutions to a test given by one of the authors to over 100 students. The solutions were modified, however, in order to replicate the physics study of interest. The characteristics of each solution were reported in the physics study20 and are reproduced in Table 1. These solutions reflect different approaches that students apply when problem solving and their perceived reasoning, as well as differences between novice and expert problem-solving strategies.20,29,30 Taking our lead from the physics literature, we indicated the student’s errors on each solution and boxed them as shown in Figure 1, SSD. This practice is used simply as a timesaving device, with the intention of aiding the participants to more easily evaluate each student solution. Although this choice sacrifices some authenticity, we felt it was more important to replicate the physics study methodology and implement this timesaving device for the participants than to leave the solutions completely unmarked. We note that we also boxed the final answer on student solutions A−D. This had the potential to confuse participants. Close examination of the transcripts, however, reveals no evidence that any of the participants were misled by this inconsistency. Student solution D (SSD) has a somewhat detailed explication of the thought processes and reasoning, which is vital in problem solving. SSD arrives at the correct answer fortuitously, despite explicit mistakes, such as using the incorrect formula for Ba(NO3)2, not balancing the chemical equation, and writing an explicitly incorrect equation for

Each participant’s interview was transcribed and the transcript broken into statements that addressed each interview question.31 The transcripts were then read through to obtain a general sense of the data and reflect on its meaning.32 This was followed by the coding process. HyperRESEARCH, a qualitative data analysis software package, was used to facilitate organization of the codes.33 The coding process was begun by using 4 of the 10 transcripts.34 The 4 transcripts were independently read and coded by each of the two authors. While coding, we had in mind the themes found in the physics literature as well as allowing for new themes to emerge, but limiting ourselves to those portions of the transcript that were within the focus of our research (i.e., only those dealing with SSD and SSE). At the conclusion of independent analysis, the authors compared their codes, noting any discrepancies. These discrepancies were resolved through discussion. The established codes from the four transcripts were then applied by both authors independently35 to the relevant sections of the remaining six interview transcripts, while still allowing for new, emerging codes to be included. All the codes were then categorized into common themes. Again, discrepancies were settled by discussion. Through this discussion, the authors merged together similar themes and came to the conclusion that all of the emergent codes were not prevalent enough nor different enough from those found in the physics literature to warrant inclusion in our discussion of the results. From our independent analysis data, we were able to calculate an inter-rater reliability, Cohen’s κ.36 For these calculations, we counted the number of agreements for the presence (or lack thereof) of each theme in each interview and where the burden of proof was assigned (on student or instructor) for each interview. This analysis was performed on the data before resolving any discrepancies through discussion. The resulting κ values for the themes and for the burden of proof assignments were 0.81 and 1.00 (perfect agreement), respectively. It should be noted, however, that assigning the burden of proof was not as facile as this agreement might suggest. There were several instructors who made some statements consistent with the burden of proof on students at one point in the interview yet later went on to make a statement placing the burden of proof on the instructor at another point. In the end, the burden of proof was assigned based on which direction was emphasized more by the interviewee.



RESULTS

Scoring Student Solutions

The scores assigned to the two student solutions (SSD and SSE) by the 10 instructors are shown in Figure 2. Scores assigned to these two solutions had a great range. A majority of the instructors, 8 out of 10, gave SSD a higher grade than SSE. Two instructors scored both solutions equally, and 1 of the 10 instructors gave both solutions full credit. The transcripts of the 329

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education

Article

Theme 1

All the instructors expressed this theme. They pointed out that explication is important in problem solving because it could help them diagnose whether the student understood the concepts learned or not, as well as provide guidance in correcting a student’s mistakes. This theme often was expressed when comparing each student’s approach in solving the problem in question. For instance, Instructor 7 said: I appreciate student solution D [SSD] because it does give me a chance to better understand what the student was thinking as they did the problem. It doesn’t necessarily change how I would grade it, but at least my ability to interpret whether they are in need of some guidance, I think is much easier. For student E, I wouldn’t and it’s not so much from these students’ solutions but I think if I was looking at an overall student performance, I would be able to much more readily provide guidance to students A, B, C, and D as to where they might have gone wrong, and not at all to E in being able to say this I believe is where you made a mistake, let’s go back and look at your other, the other, homework or quizzes that you’ve done. Where you’ve done this, are you making the same mistake?

Figure 2. Graph showing the relationship between the scores assigned to the two student solutions (SSD and SSE) by the 10 instructors. The gold bars represent student solution D (SSD), and the green bars designate student solution E (SSE).

10 participants were then analyzed to determine the reasons behind the aforementioned diversity. Interview Analysis

Theme 2A

It was evident from analysis of the transcripts that the scoring decisions of the 10 instructors confirm Henderson et al.’s three common themes:22 1. Instructors want to see a student’s reasoning reflected in the solutions so that they can evaluate whether a student really understands the concepts learned. 2. Instructors indicate a reluctance to deduct points from a student solution that might be correct, but deduct points from the student solution that is explicitly incorrect. 3. Instructors tend to readily project correct thought processes onto a student solution when the student does not necessarily express her (or his) thought processes explicitly. Every instructor expressed at least one of the three themes as they graded the students’ solutions. For purposes of clarity in our discussion of themes, we have numbered the first part of Theme 2 as 2A, that is, “Instructors indicate a reluctance to deduct points from a student solution that might be correct”; the second part, “Instructors deduct points from the student solution that is explicitly incorrect”, is referred to as 2B.

All the participants seemed to want specific evidence of the student’s lack of understanding in order to take off points. Although the 10 instructors acknowledged that SSE had few explications, five instructors (1, 2, 4, 5, and 8) that expressed Theme 2A had the smallest differences between the grades they assigned SSD and SSE. Indeed, Theme 2A is the only theme that appears without exception and exclusively in those instructors who had a difference in score that was less than 2 (delta = SSD − SSE: see Table 2). This makes Theme 2A the most predictive theme for the delta values. Instructor 4 gives an example of reflection of this theme: Student solution E has got the correct answer and he used a very simple way to write the solution, but all the stages are right; all the conversions are correct, so I give him 10. I try to give them more credit as long as they write something that seems right...or I will give them some credit. Theme 2B

Most instructors appeared to deduct points from the student solution with incorrect reasoning. In particular, although SSD

Table 2. Direction of the Burden of Proof, Themes Expressed, and the Assigned Scores by Each Facultya

a

Sorted by direction of burden of proof and then by ascending SSE score. bGrades were assigned on a scale of 0−10. cDelta values reflect the difference of the grade for SSD minus the grade for SSE for each faculty. 330

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education

Article

themselves. Even though the instructors do not use the exact words “burden of proof”, we can determine from their statements which way they go in assigning this burden. For example, while grading SSE, an instructor may emphasize that the student certainly knew what he or she was doing but did not show the work. Lacking explicit evidence of incorrect thinking, the instructor is making the assumption that the student knew what he or she was doing and is then taking the burden of proof upon himself or herself rather than placing it on the students. On the other hand, an instructor may emphasize that he or she will not assume student E knows what he or she was doing because of not clearly showing what the thinking was. Lacking explicit evidence of correct thinking, the instructor would then be placing the burden of proof on the students rather than taking it on themselves. We will illustrate these two conflict resolution strategies below. The direction of the burden of proof from analysis of the transcripts is shown in Table 2. We note, however, that while the burden of proof assignment was somewhat predictive of the difference in scores given to SSD and SSE (delta = SSD − SSE), the presence or absence of Theme 2A was even more so. We discuss this further below. Four of the 10 participants (2, 4, 5, and 7) appeared to direct the burden of proof on the instructor. For instance, Instructor 4 expressed all the themes, but ended up resolving the conflict and assigning SSE a full credit in favor of Themes 2 and 3. In scoring SSE, the instructor said: E [SSE] got the answer correct but sometimes too simple, you really wonder whether he really understands because I wish here...he is maybe very smart. But you have to let others understand where you come from, where you get the numbers from. I don’t like E although he or she may be smart to get the correct answer and everything right but from a simple writing you cannot check his thinking, you know. I don’t want to take any credit off but I will just tell him directly that he should give people a little more writing to enhance understanding just in case the final result is wrong. Here we see the instructor resolving the conflict between all three themes. Theme 1 is clearly expressed “you have to let others understand where you come from, where you get the numbers from” and “sometimes too simple [lacking in detail], you really wonder whether he really understands”. Given these statements, we might expect that Instructor 4 would not give SSE much credit. On the other hand, Instructor 4 seems quite confident that the student is competent “he is maybe very smart” and “he or she is maybe very smart to get the correct answer and everything right”. In this way, Instructor 4 is expressing Theme 3 and projecting correct thought processes onto the student but only that she did not expose her thinking in more detail. Finally, the instructor resolves the conflict by putting the burden of proof on himself in that he does not take any points off. This is consistent with Theme 2A, demonstrating a reluctance to deduct points from a student that may be correct. So, in this case and similarly for two of the other instructors who took the burden of proof on themselves, SSE was not penalized much if at all relative to SSD. The student is assumed to have enough ability to solve the problem even though he or she had not shown as much work as the instructors would like students to show. On the other hand, 6 of the 10 instructors seemed to direct the burden of proof on the students with statements that emphasize Theme 1 over the other themes. They pointed out that exposing reasoning through writing is a vital element in

had explicitly expressed her reasoning and had the correct answer, the mistakes in her solution served as evidence for 9 of the 10 instructors to deduct points (Table 2). Instructor 8 demonstrates an example of Theme 2: For this one they made a mistake converting milliliters to liters so that’s minus one,...then they messed up the calculation of the theoretical yield...so they lost two points for that and so they got a 7 out of 10. Theme 3

This theme was often projected when grading SSE. Although half of the instructors scored SSE less than five, 6 of the 10 instructors felt that student SSE had the correct thought processes, only she did not display her reasoning. For example, Instructor 7: This student [SSE], I think this student knew what they were doing; they actually had the ability to do all of the detail work but I think they chose not to and used their calculator [instead]...they have the knowledge because they clearly indicate what they know about stoichiometry and solutions at the top, but I just think that they felt like they didn’t have to write down any details.



DISCUSSION From the analysis of the transcripts, it was evident that all the instructors were faced with the challenge of what grade to assign given that they expressed at least two conflicting themes (Table 2). They would resolve that conflict, however, by giving a preferential weight to one of the three themes. For instance, Instructor 10 acknowledged that SSE had the correct answer (Theme 2A), but also believed that explication of reasoning is what is important in problem solving (Theme 1). Consequently, by resolving the conflict in favor of Theme 1, the instructor gave SSE a score of 3.5 out of 10. The instructor said: E [SSE] has some really big gaps; showing me hardly anything. In fact, I gave them almost 4 points but it’s possible I could even give them less because they don’t show me the equation, they don’t say anything. They don’t say it’s a limiting reactant problem, they don’t say it’s a precipitation reaction, they even make some wrong statements; I know they’re saying the moles equals the moles, but that’s not really what they are. Really weren’t descriptive there, and so it’s very cryptic and it is not a satisfactory method of analysis, and it shows me it’s possible they may be missing some key concepts. On the other hand, I think the student probably does understand how to do it but they’re not telling me they understand how to do it. Use of the phrase “on the other hand” shows that this instructor is weighing two conflicting sides. On the one hand, the student does not show her work, as the instructor would want them to do. On the other hand, the instructor indicates that they assume “the student probably does understand how to do it”. It is clear, however, that this instructor resolves the conflict by not giving much credit to the student because she “show[s] me hardly anything” and that “it is not a satisfactory method of analysis”. The relative strength of each of the aforementioned themes with respect to each individual participant seemed to be the key determinant of the score assigned to the solution. Henderson et al. argue that the weighing of each of the three themes can be modeled using a “burden of proof” construct.22 In resolving the conflict arising from at least two themes, instructors direct the burden of proof on either the student, or 331

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education

Article

none of the chemistry faculty graded SSE higher than SSD. It is important for the reader to recall that the two students’ solutions, that is SSD and SSE presented in this paper (Figure 1), had similar characteristics to the students’ solutions used by Henderson et al.22 Although the two studies have different results, further investigations of the direction of the burden of proof in grading would be necessary to confirm whether there are cultural differences between the two communities of physics and chemistry.

problem solving. An example of reflection of this theme is demonstrated by Instructor 6, who gave SSE the lowest score: This one has no reaction, no limiting reactant. Okay, the amount of...he calculates the moles of barium chloride correctly, okay, so he calculates the moles of silver chloride correctly and then divides... Okay, so actually this is correct, this is correct, this is correct [instructor referring to the steps written by SSE in solving the problem], gets everything correct. I just don’t see how these were thought out. Yeah, so this formula here is correct and the result is correct. Therefore, E gets 10% [1 out of 10] and in this case, these things are correct but since there’s no reaction written, there’s no explanation how it was done, I cannot see if this was actually... if the student knew this or if it was just copied from somewhere. So this student might actually be better than this one [SSD] but since the method of solving the problem is not exposed correctly, I cannot grade that work. Here again we can clearly see that the instructor sees a conflict: the student “gets everything correct” but because “there’s no explanation how it was done” the instructor does not accept it. In this way, Instructor 6 and five other instructors clearly place the burden of proof on the student. Unless they clearly show their thinking (Theme 1), a student has not proven she or he understands how to solve the problem. In two cases (Instructors 7 and 8), the burden of proof assignment was not a good predictor of the difference between the SSD and SSE scores given. We note that both of these instructors seemingly contradict themselves by placing the burden of proof on the students at one point in the interview but then placing it on themselves at another point. For example, with Instructor 7, at first it seems clear that he will only give Student E credit for parts of the problem that were explicitly correct: “They [SSE] probably would receive a point for each part where they did actually come up with the correct answer and probably they would receive a total of four points for this.” Later, however, Instructor 7 is ready to credit the student with knowledge even without explicit proof: “It’s clear with student solution E that there was a basic knowledge even though the thought process wasn’t written down at all.” This statement, together with other similar statements led us to code Instructor 7 as placing the burden of proof on himself, even though he gave SSE significantly fewer points than SSD. This inconsistency was also significant for Instructor 8, but in the other direction. Statements made by Instructor 8 led us to code him as placing the burden of proof on the students, even though he granted SSE only one point less than SSD. Looking at Table 2, we note that a better predictor of the difference in score between SSD and SSE (delta = SSD − SSE scores) is the presence or absence of Theme 2A, a reluctance to take points off from a student solution that may be correct. All of the instructors who expressed Theme 2A have the smallest delta values, while all of the instructors who did not express Theme 2A have the largest delta values. This indicates that perhaps the most decisive value behind grading decisions is the one expressed in Theme 2A. For the purposes of comparison, we will rely upon the burden of proof classification as these data are published in Henderson et al.’s study.22 A majority of the physics faculty (5 of the 6 participants) directed the burden of proof on themselves, whereas a slim majority of chemistry faculty (6 out of 10) in our study placed the burden of proof on the student (Table 2). Furthermore, while the physics study indicates that 50% of the faculty graded SSE higher than SSD,



CONCLUSION AND IMPLICATIONS The goal of this study was to examine whether a gap exists between the beliefs and the scoring of students’ solutions among the chemistry faculty. Our data analysis leads us to conclude that a slim majority of chemistry faculty’s beliefs in this study are consistent with their grading practices. All the participants expressed a desire to see students’ explication of the reasoning in solving the problem in question. Eight of the ten participants scored SSE lower than SSD. In other words, students’ failure to communicate their reasoning was the deciding factor in assigning the final grade. Implications for Research

Our results provide further evidence for, but have some differences in comparison to the conclusions reached in the study by Henderson et al.22 with physics faculty. Although participants in both studies express the same themes, the study with physics faculty indicated more inconsistency between their beliefs (expressed in Theme 1) and the grading practice (giving more weight to Themes 2 and 3 when assigning burden of proof). Given the small sample size, however, it would be inappropriate to generalize: future work would be needed to investigate whether this difference persists with a larger sample of instructors from a variety of institution types. Future research could extend our protocol to explicitly ask the instructors to reflect on the cognitive dissonance that occurs if they ask students to explicitly show their work and then penalize them for doing so by subtracting points for explicit mistakes. Although this extension runs the risk of “putting words in the mouths” of the participants, it may serve as a useful prompt for self-reflection and therefore for faculty development purposes. This explicit prompt may be helpful in probing whether a persistent difference exists between how the physics and the chemistry community settle on grading decisions. Additionally, future research could investigate whether the order in which faculty grade papers would have an effect on the grade assigned. When the interview started, the participants in this study were simply asked to rank them from best to worst. Without exception, the participants considered the solutions in alphabetical order. This is probably because they had no motive to scramble them. It could be that if the labels were scrambled there may be some ordering effect. The authors have no data to support or refute this possibility. We can only say that the instructors were all treated equally and that the order matched that in the physics study to which we were comparing. We also note that, after ranking the solutions, the participants were asked to return to the solutions three more times: (i) to give them each a numerical score between 1 and 10; (ii) to speculate on what the students were thinking, and (iii) give overall impressions of each student’s approach. This repetitive cycling through all of the solutions ensures that each was considered more than once, with each solution being considered again after 332

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education



the grader was familiar with all of the solutions. Still it remains an open question as to whether an ordering effect could have been at play in this study and this could be investigated in future research. Future research could investigate the impact that using a rubric has on grading practices. For example, researchers could compare scores assigned by faculty using a rubric to those who grade without using a rubric. It would also be interesting to see how faculty respond if they were asked to use a rubric that conflicts with their beliefs about grading. This could be very useful in advancing the research agenda put forth in this paper if the rubric were designed to force the instructors to place the burden of proof on themselves or on the students.

Article

ASSOCIATED CONTENT

S Supporting Information *

Interview questions used in the data collection. This material is available via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected].



ACKNOWLEDGMENTS We would like to express our sincere gratitude to the chemistry faculty who were involved in this study for their time and willingness to participate. We would also like to thank colleagues HP and CH, the Associate Editor, and anonymous reviewers for comments that helped to make this a better paper.

Implications for Practice

If instructors think it is important for students to show their work, then they should put the burden of proof on the students and they should not grade in such a way that encourages students to hide what they are thinking. A significant fraction of physics and all chemistry instructors stated that they desire for students to show their work, yet they do not necessarily keep the burden of proof on the students. In our study, a slim majority of chemistry faculty (60%) directed the burden of proof on the students: a way that would encourage the students to explicitly show their reasoning in the future. Instructors may send mixed messages to students when they ask them to show their work, but then deduct points for showing work that is incorrect, or give points for a correct answer when the work is not shown. If indeed it is important for students to be able to express their reasoning in writing (and all of the participants in our study thought it was), then it is better to lay the burden of proof on the students than to lay it on the instructor. Faculty need to make their expectations clear when assigning student written work, especially when a significant fraction of their colleagues (40% in this study) do not place the burden of proof on students when grading student work. Given that this is a relatively common phenomenon, many students might not know at the outset that they will be graded on showing their work, and they may consider it unfair if the instructor places the burden of proof on them. Often students may not necessarily show their work in writing. Instead, they may decide to work out some of the steps in their mind and write the simplified process with the assumption that the instructor would recognize the implicit steps as an inclusion of the correct answer. Faculty need to inform students about their grading practices and the key features that they would like to see in the students’ solutions and that these practices might differ from what the students have encountered in other classes. As has been suggested by others, the inconsistent messages sometimes present in instructor grading can be diminished through the use of grading rubrics.37,38 Rubrics have been viewed as a vital tool to help educators establish a fair mechanism for assigning a numerical grade, as well as an immediate feedback to the instructor on the level of student understanding.37 Grading rubrics help instructors by providing “a quantitative or numerical way of evaluating assignments, and help ease some of the difficulty in scoring authentic or performance-based tasks”.38 Simply sharing these rubrics with the students is a particularly powerful way of letting the students know what is expected and that they are required to show their work to get credit for their solutions.



REFERENCES

(1) Michener, J. M. J. Chem. Educ. 1925, 2 (6), 488. Mann, J. A. Jr.; Zeitlin, H.; Delfino, A. B. J. Chem. Educ. 1967, 44 (11), 67. Jones, D. E.; Lytle, F. E. J. Chem. Educ. 1973, 50 (4), 28. Connolly, J. W. J. Chem. Educ. 1972, 49 (4), 26. Frigerio, N. A. J. Chem. Educ. 1967, 44 (7), 413. Altenburg, J. F.; King, L. A.; Campbell, C. J. Chem. Educ. 1968, 45 (9), 615. Myers, R. L. J. Chem. Educ. 1986, 63 (6), 507. Johnson, R. C. J. Chem. Educ. 1973, 50 (3), 223. Munn, R. J.; Stewart, J. M.; Pagoaga, M. K.; Munn, T. C. I. J. Chem. Educ. 1981, 58 (9), 69. Bath, D. A.; Hughes, B. G. J. Chem. Educ. 1983, 60 (9), 734. Smith, S. R.; Schor, R.; Donohue, P. C. J. Chem. Educ. 1965, 42 (4), 224. Wellman, K. M. J. Chem. Educ. 1970, 47 (2), 142. Gutz, I. G. R.; Isolani, P. C. J. Chem. Educ. 1983, 60 (11), 9. Yaney, N. D. J. Chem. Educ. 1971, 48 (4), 276. Wartell, M. A.; Hurlbut, J. A. J. Chem. Educ. 1972, 49 (7), 508. Hamilton, J. D.; Hiller, F. W.; Thomas, E. D.; Thomas, S. S. J. Chem. Educ. 1976, 53 (9), 564. Olivier, G. W. J.; Herson, K.; Sosabowski, M. H. J. Chem. Educ. 2001, 78 (12), 1699. (2) Smith, D. D. J. Chem. Educ. 1980, 57 (2), 15. DeWitt, C. B. J. Chem. Educ. 1942, 19 (3), 12. Lippincott, W. T. J. Chem. Educ. 1973, 50 (7), 449. Suslick, K. S. J. Chem. Educ. 1985, 62 (5), 408. Goodwin, J. A.; Gilbert, B. D. J. Chem. Educ. 2001, 78 (4), 490. Bishop, R. D. J. Chem. Educ. 1991, 68 (6), 492. Dinsmore, H. J. Chem. Educ. 1987, 64 (6), 51. Monts, D. L.; Pickering, M. J. Chem. Educ. 1981, 58 (1), 43. Rondini, J. A.; Feighan, J. A. J. Chem. Educ. 1978, 55 (3), 18. Pinkus, A. G.; West, J. V. J. Chem. Educ. 1980, 57 (1), 89. Wimpfheimer, T. J. Chem. Educ. 2004, 81 (12), 1775. (3) Kandel, M. J. Chem. Educ. 1988, 65 (9), 782. Getzin, D. R. J. Chem. Educ. 1978, 55 (12), 794. Kovacic, P. J. Chem. Educ. 1978, 55 (12), 791. Kandel, M. J. Chem. Educ. 1986, 63 (8), 706. Pierce, W. C.; Haenisch, E. L. J. Chem. Educ. 1934, 11 (10), 565. (4) Scriven, M. Evaluation of Students (unpublished), 1974. Cited in Davis, B. G. Tools for Teaching; Jossey-Bass Publishers: San Francisco, CA, 1993; http://teaching.berkeley.edu/bgd/grading.html (accessed Jan 2012). (5) Haagen, C. H. J. Higher Educ. 1964, 35, 89−91. (6) Warren, J. College Grading Practices: An Overview; Educational Testing Service: Princeton, NJ, 1971. (7) Dumon, A.; Pickering, M. J. Chem. Educ. 1990, 67 (11), 959. (8) Ashton, P. J. Teacher Educ. 1990, 41, 2. (9) Ashton, P. Making a Difference: Teachers’ Sense of Efficacy and Student Achievement; Longman: New York, 1986. (10) Buchmann, M. Am. J. Educ. 1984, 92, 421−439. (11) Brookhart, S. M.; Freeman, D. J. Rev. Educ. Res. 1992, 62, 37− 60. (12) Clark, C. M. Educ. Researcher 1988, 17, 5−12. (13) Munby, H. Instructional Sci. 1982, 11, 201. (14) Nespor, J. J. Curriculum Stud. 1987, 19, 317. (15) Pajares, M. F. Rev. Educ. Res. 1992, 62, 307. (16) Hancock, E.; Gallard, A. J. Sci. Teach. Educ. 2004, 15, 281. 333

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334

Journal of Chemical Education

Article

(17) Johnson, M. J.; Hall, J. L. Impact of One Science Teacher’s Beliefs on His Instructional Practice J. Educ. Hum. Dev. 2007, 1 (1). (18) Levitt, K. E. Sci. Educ. 2002, 86, 1−22. (19) Wallace, C. S.; Kang, N.-H. J. Res. Sci. Teach. 2004, 41, 936−960. (20) Henderson, C.; Yerushalmi, E.; Heller, K.; Heller, P.; Kuo, V. H. Phys. Rev. ST Phys. Educ. Res. 2007, 3, 020110. (21) Haney, J.; McArthur, J. Sci. Educ. 2002, 86, 783. (22) Henderson, C.; Yerushalmi, E.; Kuo, V. H.; Heller, P.; Heller, K. Am. J. Phys. 2004, 72, 164−169. (23) William, D.; Lee, C.; Harrison, C.; Black, P. Teachers Developing Assessment for Learning: Impact on Student Achievement. Assessment in Education: Principles, Policy and Practice 2004, 11, 49−65. (24) Shepard, L. A. The Role of Assessment in a Learning Culture. Educational Researcher 2000, 29, 4−14. (25) Turner, M.; VanderHeide, K.; Fynewever, H. Motivations for and Barriers to the Implementation of Diagnostic Assessment PracticesA Case Study. Chem. Educ. Res. Pract. 2011, 12, 142−157. (26) Tingle, J. B.; Good, R. J. Res. Sci. Teach. 1990, 27, 671−683. (27) Cohen, J.; Kennedy-Justice, M.; Pai, S.; Torres, C.; Toomey, R.; DePierro, E.; Garafalo, F. J. Chem. Educ. 2000, 77, 1166−73. (28) Lochhead, J.; Whimbey, A. New Direct. Teach. Learn. 1987, 30, 73−92. (29) Smith, M. U.; Good, R. J. Res. Sci. Teach. 1984, 21, 895−912. (30) Camacho, M.; Good, R. J. Res. Sci. Teach. 1989, 26, 251−272. (31) Hycner, R. H. Human Studies 1985, 8, 279−303. (32) Creswell, J. W. Research Design: Qualitative, Quantitative, and Mixed Method Approaches, 2nd ed.; Sage Publications: Thousand Oaks, CA, 2003. (33) HyperRESEARCH Web Site. http://www.researchware.com/ (accessed Jan 2012). (34) Stains, M.; Talanquer, V. J. Res. Sci. Teach. 2008, 45, 771−793. (35) Stemler, S. An overview of content analysis. Pract. Assess. Res. Eval. 2001; 7(17). Retrieved January 2012 from http://PAREonline. net/getvn.asp?v=7&n=17. (36) Cohen. J. Psych. Bul. 1968, 70, 213−220. (37) William, C. D.; Linda, L. R.; Jeffrey, W.; Danny, E. J. Chem. Educ. 2000, 77, 1511−1516. (38) Quinlan, M. A. A Complete Guide to Rubrics: Assessment Made Easy for Teachers, K−College; Rowman and Littlefield Education: Lanham, MD, 2006.

334

dx.doi.org/10.1021/ed1000284 | J. Chem. Educ. 2012, 89, 326−334