The Development of the Redox Concept Inventory as a Measure of

Jul 29, 2014 - By assessing the quality of ROXCI as a measure of students' redox .... after the ROXCI was administered in both the pilot and full stud...
5 downloads 0 Views 933KB Size
Article pubs.acs.org/jchemeduc

The Development of the Redox Concept Inventory as a Measure of Students’ Symbolic and Particulate Redox Understandings and Confidence Alexandra R. Brandriet and Stacey Lowery Bretz* Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio 45056, United States S Supporting Information *

ABSTRACT: This article describes the development of the Redox Concept Inventory (ROXCI) as a measure of students’ understandings and confidence of both the symbolic and particulate domains of oxidation−reduction (redox) reactions. The ROXCI was created using a mixed-methods design in which the items were developed based upon themes that emerged from the analysis of semi-structured student interviews. The final version of the ROXCI was administered to first- and second-semester general chemistry students post-instruction and post-assessment on redox chemistry. The results include qualitative and quantitative evidence for the content, response process, and association types validity of the data, as well as individual item function and descriptive statistics for each student sample. In conjunction with the validity evidence, reliability was evaluated via a test−retest study. This included the results of the stability of (i) the total scores and average confidence; (ii) the difficulty and discrimination indices per item; and (iii) the students’ multiple-choice responses. In addition to the evidence for validity and reliability, results from the ROXCI regarding students’ understandings and confidence about redox concepts are discussed, including students’ misconceptions about oxidation numbers, the surface features of chemical equations, electron transfer, the role of the spectator ion, the dynamic reaction process, electrostatic interactions, and bonding. Implications for teaching and learning are discussed. KEYWORDS: High School/Introductory Chemistry, First-Year Undergraduate/General, Chemical Education Research, Misconceptions/Discrepant Events, Testing/Assessment, Oxidation/Reduction FEATURE: Chemical Education Research



INTRODUCTION In the classroom, chemistry concepts are often presented to students in different representational forms as suggested by Johnstone’s three domains: symbolic, particulate, and macroscopic.1−3 While experts move effortlessly among these representational domains, students may struggle.1−8 Various studies have identified students’ incorrect understandings within a variety of representational domains and content areas.4−14 Of particular interest is students’ understandings of oxidation−reduction (redox) reactions.14−19 A majority of the previous redox literature focuses on eliciting students’ understandings at the symbolic level through the use of chemical equations and symbols.15−18 As a result, a majority of this literature has described students’ misconceptions about redox definitions, surface features of chemical equations, and the application of oxidation numbers. For example, Garnett and Treagust described misconceptions in which students applied oxidation numbers to entire molecules or polyatomic ions.16 In + the reaction, CO2− 3 + 2H → H2O + CO2, students described that the polyatomic anion, carbonate, is oxidized because the charge on carbonate changes from −2 to 0 on carbon dioxide. Contrary to this idea, other students identified carbonate as reduced because it loses oxygen and the hydrogen ion as © 2014 American Chemical Society and Division of Chemical Education, Inc.

oxidized because it gains oxygen. Despite the abundance of literature on students’ symbolic understandings of redox reactions, only minimal literature exists on students’ understandings of the particulate nature of redox processes (the dynamic interaction of particles, electron transfer, etc.).19 In one study, students’ understandings were elicited in an interview setting using a particulate animation of AgNO3(aq) with Cu(s). Various misconceptions were identified including that the reactants react in a 1:1 stoichiometric ratio, the nitrate is the driving force of the reaction because it is more attracted to the copper than to the silver, and that cations and anions are bonded in aqueous solutions.19 Despite the lack of literature on students’ particulate understandings of redox concepts, the literature that exists on students’ understandings of electrochemical cells focuses predominately on the particulate level.10,20−25 Within this literature, misconceptions have been identified including the ideas that electrons can move through a solution by attaching themselves to ions, electrons can flow through a solution without help from ions, and that protons flow in metallic conductors.10,20−25 Because students struggle to make conPublished: July 29, 2014 1132

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

lecture course would be nearly impossible during a single semester. In order to measure the knowledge of a large number of students in a short period of time, concept inventories have been created in many areas of chemistry.39−46 However, currently, no concept inventories exist to measure students’ understandings of redox reactions. This article assists in bridging the gap in literature between students’ understandings of symbolic redox concepts and particulate electrochemical concepts through the development and use of a robust multiple-choice instrument, the Redox Concept Inventory (ROXCI), as a measure of students’ symbolic and particulate understandings of redox reactions. This quantitative research tool can be used to measure students’ understandings, and their confidence about those understandings, in a quick and efficient manner.

nections across the symbolic, macroscopic, and particulate domains of chemistry, more research is needed to investigate the particulate redox ideas that students bring as prior knowledge to electrochemistry instruction.



THEORETICAL FRAMEWORKS Constructivist views of learning consider students’ abilities to build viable understandings founded in their own experiences.26−29 This is in contrast to views of learning based upon the idea that knowledge can be transferred directly to the learner from the mind of the instructor. Constructed knowledge is thought to be adaptive to fit our own personal understanding so that the new information can be used to explain the experiential world.27 In order for knowledge to be meaningfully constructed rather than rote memorized, there must be an integration of students’ thoughts, feelings, and actions.30 However, this does not necessarily mean that the constructed ideas are always scientifically viable. Chi and ̈ knowledge as having two properties: it is Roscoe describe naive scientifically incorrect, and it can impede upon the deep understanding of scientifically correct ideas.31 In terms of instruction, some naiv̈ e ideas can be easily altered or reconstructed through formal instruction, while others are highly resistant to change and are robust in students’ cognitive frameworks.31,32 The latter ideas are described as “misconceptions.” Robust misconceptions are deeply rooted in elaborate and interrelated networks of other concepts which lend them plausibility and intelligibility to the student.4,32 Therefore, students hold firmly to their beliefs and often lack awareness that their understandings are incorrect.31,33 Because of the deeply rooted and stable nature of student misconceptions, they are often thought of as highly resistant to change. Competing views of students’ understandings focus on student knowledge as f ragmented or in pieces.34,35 diSessa established the view of phenomenological primitives (p-prims) in which knowledge is viewed as elements that can evolve, modify, and/or enlarge over time.34,35 This framework argues that some misconceptions may be robust and stable constructs but that not all misconceptions are stable and resistant to change. Appropriately designed pedagogy can possibly result in rapid and deep conceptual change, and it is possible that this can occur in a relatively short period of time.36,37 Since it is likely that both coherent and fragmented ideas exist within students’ mental models, this study was motivated by what is common between both theories of learning: students’ misconceptions can impede upon the learning process. Since redox concepts are pivotal to learning about electrochemical cells, these theories highlight the importance of assessing students’ understandings of redox concepts prior to learning electrochemistry in the classroom.



PSYCHOMETRIC ANALYSIS When designing assessments, two important constructs must be considered during the development process: validity and reliability.45,47−49 The validity and reliability of the data produced by an assessment are analogous to the concepts of accuracy and precision, and both must be considered when analyzing student responses as elicited by an assessment.49 Given that several types of validity and reliability exist, careful decisions were made to determine which types would be best suited for the development of the ROXCI. Validity

Validity is referred to as “the degree to which evidence and theory support the interpretation of test scores entailed by proposed uses of tests (p. 9)”.47 The Standards for Educational and Psychological Testing describe several types of validity including content, response process, internal structure, association with other variables, and consequence of use.45,47,48 However, the Standards caution that “evaluating the acceptability of a test or test application does not rest on the literal satisfaction of every standard in this document and cannot be determined by using a checklist (p.4),” and that “the appropriateness of a test or testing application should depend heavily on professional judgment...(p.2).”47 The authors have no reason to expect that an internal structure could be identified from students’ responses because students’ ideas are often considered fragmented34,35 and may change based upon the representation used to prompt understanding.32 Additionally, factor analytic methods require that students’ robust and distinct ideas (e.g., responses A, B, C, and D within one question) be reduced into dichotomous (correct/incorrect) variables. Such data manipulation would ignore the complex categorical nature of students’ distractor choices. Validity involving the consequences of use would be best explored in future studies that investigate how instructors use concept inventories in the classroom. For these reasons, this article focuses on the well documented forms49,50 of content, response process, and association types of validity because these make the most sense for the validation of the data produced by the ROXCI. Semi-structured interviews are not only important for creating assessment items, but the use of post-administration validation interviews are equally important to ascertain that the items are measuring the response process as described by the students. Adams and Wieman caution that statistical analysis is secondary in importance to student validation interviews because interviews provide more in-depth evidence for the



PURPOSE AND SIGNIFICANCE A conceptual understanding of redox (beyond that which can be achieved by rote memorization and application of oxidation numbers) is imperative because it provides a foundation for learning many advanced chemical and biological processes. Concepts that require an understanding of redox are recommended for all of the foundational chemistry courses in an American Chemical Society approved Bachelor’s degree curriculum.38 Instructors need a way to assess what their students know about redox chemistry in a valid and reliable manner, and conducting individual student interviews in a large 1133

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

Table 1. Major Misconception Themes Assessed by the ROXCI

a

ROXCI Themes

Description of Themes

Itemsa

Oxidation numbers Surface features Electron transfer Spectator ions Dynamics reaction process Electrostatics and bonding

Application and/or understanding of charges and/or oxidation numbers Using surface features of a chemical equation to identify whether or not a reaction is redox Role of electron transfer in a redox reaction Role of the spectator ions in single-displacement redox reactions Dynamic nature of particles Bonding, charge attractions, or replacing charges between charged species in redox reactions

1/2, 3/4, 5/6, 7, 8, 9, 13, 14, 15 1/2, 3/4, 5/6, 9, 12, 14 3/4, 9, 10, 16, 17, 18 9, 10, 11, 17, 18 10, 11, 16, 17, 18 10, 11, 16, 17, 18

Items 1−6 are two-tiered answer/reason paired items. The answer/reason pairs are 1/2, 3/4, and 5/6.

validity of an item.41 The ROXCI was developed, and the data was validated through the use of methods described in Adams and Wieman,41 and many of these methods have also been used by other concept inventories found in the literature.39−46

knowing the precision of students’ total scores may not be as useful as identifying consistent response choices. For this reason, this study investigates the stability of the individual item response choices to ensure that, if stability exists in students’ total scores, it is a result of consistency in the response choices. Additionally, the stability of the items’ function in terms of item difficulty and discrimination were explored in order to examine the consistency of item function over time.

Reliability

The Standards describe reliability as the consistency of a measurement when the testing procedure is repeated on a population of individuals.47 Reliability is often assessed using three methods: alternate-forms, internal consistency, and test− retest.47 Alternate-forms reliability requires two parallel forms of the same instrument. While creating a second version for the purposes of assessing the reliability would be time-consuming, it is also important to note that methods of internal consistency are not necessarily appropriate for all assessments.41,51 Although methods of internal consistency such as Cronbachα are widely used in assessment literature for evaluating reliability because they have the benefit of only one test administration, they do have limitations. An assessment with high internal consistency suggests that a relationship exists among the items based upon students’ responses, and therefore, the items correlate. However, because students’ ideas are often fragmented, it cannot be expected that the items on the ROXCI would correlate like methods of internal consistency require.41,51 Some researchers argue that if a concept inventory has a high Cronbach-α, the items may actually be redundant.41 Similar to factor analysis, the students’ responses to the item must be transformed into dichotomous variables (i.e., correct/ incorrect) in order to calculate Cronbach-α. Given that students’ responses to the carefully constructed distractors are what will be used to inform instruction, this study sought additional means of evaluating the reliability of the ROXCI data. Test−retest or test stability methods of reliability require administering the test twice to the same group of students to examine the consistency of examinee responses across implementations.41,52 When students repeatedly take an assessment, there will always be some variation in students’ scores because no examinee is completely consistent.47 Because of this variation, as well as any possible subjectivity in the scoring process, the score an individual earns will always reflect some amount of measurement error different from the examinee’s true score.47,52 A measure of relationship, such as a correlation, is usually used to determine the stability of students’ total scores across the implementations. However, this method has a few limitations. Students who earn the same scores on both test and retest implementations may not necessarily be answering the same questions correctly on both implementations. Additionally, even if a student consistently responds in an incorrect manner to an item, this does not mean that the student is choosing the same distractor. Because the specific distractors are what will be used to inform instruction,



RESEARCH QUESTION This article describes the development of the ROXCI and the validity and reliability of the data produced by it. Therefore, the research question that guides this study is what evidence exists for the validity and reliability of the data produced by the ROXCI? By assessing the quality of ROXCI as a measure of students’ redox understandings, the ROXCI can then be used in future studies to detect the prevalence of students’ misconceptions about redox concepts. Therefore, in addition to the validity and reliability evidence, examples of the students’ responses are presented to show the utility of the ROXCI in the classroom.



METHODS

Inventory Development and Structure

Institutional Review Board approval was obtained before collecting any student data for this study. The ROXCI was created based upon themes that emerged from a rigorous qualitative study that investigated students’ understandings of symbolic, macroscopic, and particulate redox concepts.53,54 The semi-structured interviews in the qualitative study were designed to guide students through the symbolic, macroscopic, and particulate domains of Johnstone’s triangle.1−3 The prompts included symbolic equations, macroscopic demonstrations, student-generated particulate drawings, and a static particulate textbook representation. First-semester general chemistry (n = 14) and organic chemistry (n = 16) students were individually interviewed. The interviews were transcribed verbatim, and the transcripts were analyzed using the constant comparative method.55 Six major misconception themes (Table 1) emerged from the analysis including oxidation numbers, the use of surface features of chemical equations, electron transfer, the role of the spectator ion, the dynamic reaction process, and electrostatics and bonding. (The detailed results of this qualitative study are beyond the scope of this article and will be disseminated in future publications.) These themes were used to inspire the development of the items. Several of the misconceptions that emerged during the semi-structured interviews are also reported in previous studies, such as misconceptions about the spectator ion as the driving force of the reaction, cations and anions being bonded in an aqueous solution, and the use charges rather than oxidation numbers to identify redox 1134

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

reactions.16,18,19 As shown in Table 1, several of the items cut across multiple themes given that the items assess the interrelationships between multiple redox concepts. For example, question 16 (shown in Figure 1) focuses on students’

authors. After these revision cycles, the inventory was piloted with samples of general chemistry and organic chemistry students. (This version of the ROXCI will be described as ROXCI-α.) Pilot Study Sample

ROXCI-α was first administered at a mid-sized liberal arts university in the fall of 2012 to first-semester general chemistry students (N = 232), first-semester organic chemistry for science (nonchemistry) majors (N = 172), and first-semester organic chemistry for chemistry majors (N = 34). Data was collected after the general chemistry students had been formally taught and assessed on redox concepts in both the lecture and the laboratory. After the ROXCI-α was administered to the students, 15 post-administration validation interviews were conducted with students. Validation Interviews and Revisions

Upon the basis of the statistical evidence from the pilot administration, the results of 15 post-administration validation interviews with students, and feedback from 2 additional content experts, revisions were made to the ROXCI-α to create the final version. Validation interviews were conducted with students after the ROXCI was administered in both the pilot and full studies. The purpose of the validation interviews was 2fold: to verify that students interpreted the questions as intended and to verify that students chose the right answers for the right reasons.41 Items that did not meet these criteria were either removed or revised based upon student feedback. As an example, response A in question 16 (Figure 1) originally used the word “bump” rather than “push” because the word “bump” appeared frequently in the semi-structured qualitative interviews. However, during post-administration validation interviews, students explained that this option seemed feasible but that the word “bump” seemed too unscientific for the response to be correct. This concern regarding “unscientific language” did not surface again in the second-round of postadministration validation interviews that were conducted with students who responded to the final version of ROXCI. Items were also removed when they were considered to be tangential to the focus of redox. As an example, an item on the ROXCI-α used multiple student-generated particulate diagrams of an aqueous solution. While understanding the particulate nature of an aqueous solutions is essential to understanding the particulate nature of many reactions, this item was removed in order to focus on the assessment of the features unique to redox reactions such as electron transfer. After the final revisions to the ROXCI-α were made, 8 additional post-administration validation interviews were conducted with the first-semester general chemistry students in order to identify how students interpreted the questions and responses in the final version. The authors concluded that students were interpreting the inventory as expected (response process validity). For example, with questions 3 and 4 in Figure 2, Julie described why the reaction cannot be a redox reaction: “I don’t think this is an oxidation-reduction reaction, because there’s only one react-er- product. I’m pretty confident about that [marks 90% conf ident for question 3]. I remember this from class. None of our reactions had just one product on the right [marks 95% conf ident for question 4].” As a second example, Susan chose answer C for question 16 (Figure 1) and marked that she was 75% confident, explaining:

Figure 1. A one-tier item with the confidence scale on the Redox Concept Inventory.

understandings of electrostatic attractions, electron transfer, and the dynamic process of a single-displacement redox reaction of Cu(s) and AgNO3(aq). All ROXCI items include symbolic and/or particulate representations to prompt students’ understandings. Representations of the macroscopic domain were not incorporated in order to focus on students’ understandings of the symbolic domain, the particulate domain, and the connections between the two. Individual item response options were drawn from student quotes during the semi-structured interviews. In question 16, response option C was based upon the responses of several students, much like the following quote from Meredith: “When this copper is let off [Cu2+(aq)], there’s a negative, 2 electrons [lef t on the solid], and I guess that’s what’s attracting these silver [Ag+(aq)] to bond to the copper [Cu(s)] when these are released [Cu2+(aq)].” Because ROXCI items are grounded in students’ responses from the qualitative study (using direct quotes whenever possible), it is inherently designed to measure students’ understandings. Items on the ROXCI went through several rounds of revisions, including content validation with 8 chemistry faculty (5 who teach general chemistry and 3 who teach organic chemistry). Revisions were also made based upon several rounds of feedback from 8 chemistry education research graduate students and multiple discussions between the 1135

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

interviews shown previously. The fact that the same ideas emerged in multiple samples both before and after the development of the ROXCI instrument points to the robustness of such misconceptions and that the ROXCI indeed detects the ideas that it was designed to detect. The final version of the ROXCI consists of 18 items, is scored from 0 to 18, and requires approximately 10−15 min to administer. Of the 18 items, 12 are single-tiered items such as question 16 (Figure 1). The remaining 6 items are two-tiered where students choose an answer for the first question and then elaborate with their reason in the second question, e.g., see Figure 2. Each of the 18 items also asks students to indicate their confidence about their responses from 0% (just guessing) to 100% (absolutely certain),44 as shown in Figures 1 and 2. The confidence scale was added in order to help instructors understand the robustness of the students’ misconceptions and to indicate whether or not students are thoughtfully choosing distractors that represent their ideas. The 0−100% scale was used (rather than a Likert scale) in order to have interval values for the analyses of students’ confidence. Because each of the 18 ROXCI items has an associated confidence tier (and student confidence for two items within a pair may or may not be equivalent), total scores were calculated from 0 to 18. Full Study Samples

Data was collected with ROXCI in spring 2013 at the same midsized liberal arts university with a different cohort of firstsemester general chemistry students (GC1) enrolled in a lecture course consisting of three 50 min lectures per week. Approximately 1.5−2 lectures were used to present firstsemester redox concepts (i.e., electron transfer and identifying changes in oxidation numbers) in the context of discussing major classes of aqueous reactions. The course text was the third edition of Chemistry: The Science in Context by Gilbert, Kirss, Foster, and Davies, which includes symbolic, macroscopic, and particulate representations of chemical reactions but primarily macroscopic and symbolic representations in the section specific to redox.56 In the laboratory, the students recreated the activity series based upon the results of mixing different solid metals with aqueous metallic solutions. Students completed the ROXCI within the first 15 min of the laboratory course, and the students were not given any course credit or extra credit for taking the ROXCI. All of the students were asked to complete the ROXCI, but students had the option to choose to consent or choose not to consent for their data to be used in the study. Students were asked to answer the ROXCI to the best of their ability, and the students seemed to be carefully responding to the assessment.

Figure 2. A two-tier item with the confidence scale on the Redox Concept Inventory.

“Once the copper leaves, they [Ag+(aq)] are not replacing him [Cu(s)]. They’re [Ag+(aq)] not going in that spot where he [Cu2+(aq)] left. They’re [Ag+(aq)] attracted to the electrons that are left um because of that copper leaving [Cu2+(aq)], and then it [Ag+(aq)] needs to pick up two more so they’re [Ag+(aq)] attracted to those electrons.” Interestingly, Susan’s response was very similar to the idea described by Meredith from the semi-structured qualitative Table 2. Descriptive Statistics for GC1 and GC2 GC1 (Test 1) (N = 83)

GC1 (Test 2) (N = 83)

GC2 (N = 510)

Scores

Itemsa

Confidenceb

Itemsa

Confidenceb

Itemsa

Confidenceb

Mean Std. Dev. Minimum Median Maximum Cronbach α Ferguson δ

5.4 3.1 2.0 4.0 15.0 0.69 0.91

54.3 20.2 6.8 55.7 94.1 0.94 0.97

5.1 2.7 1.0 4.0 12.0 0.57 0.92

51.8 23.2 0.1 55.4 96.2 0.97 0.98

6.2 2.9 1.0 6.0 14.0 0.62 0.95

57.5 21.6 0.0 60.6 100.0 0.96 0.99

a

Each correct item response awarded students 1 point for a total possible score of 0−18. bSummary statistics based on the distribution of students’ average confidence (%) responses for the 18 items. 1136

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

The students were administered the ROXCI twice for a test− retest study, with a two week period elapsing in between the two administrations (subsequently referred to as test 1 and test 2). In between test 1 and test 2, the students received formal lectures on Lewis structures and VSEPR models. The sample consisted of 83 students after removing both those students who chose not to consent and those students with missing data. After test 2, 8 additional students participated in a second round of validation interviews for a total of 23 postadministration validation interviews. ROXCI was also administered in an online format to a sample of second-semester general chemistry students (GC2) at a large research university in spring 2013, with 554 students consenting to participate. Students answered the ROXCI in a department computer laboratory near the end of a secondsemester general chemistry course, immediately before instruction about electrochemistry concepts. Students had carried out both an activity series and a redox titration experiment in the laboratory but had yet to do planned experiments on electrochemistry and electrolysis. A sample of 510 students remained for analysis after missing data was removed.



Figure 4. Distribution of GC2 students’ average confidence responses.

about redox concepts. The students’ high confidence but low performance is indicative of the widely documented construct in psychology known as Dunning-Kruger effect.57 The central tenet of this argument is that individuals who are unsuccessful at a task lack the metacognitive skills that enable them to recognize their poor performance.57,58 Students’ high confidence on their responses, even when those responses are incorrect, provides evidence that students believe strongly in the responses they are choosing and provides response process validity for the ROXCI response options. Two other statistics can be found in Table 2: Cronbach-α59 and Ferguson-δ.60 The generally accepted standard for high reliability of a measure is α ≥ 0.70; however, α is not necessarily the most appropriate measure of reliability for all assessments, including concept inventories.41,44,51 Therefore, additional measures of reliability were examined. Ferguson-δ is a measure of the discrimination of the overall test scores, (i.e., it reflects the extent to which students earn scores compared to the range of total possible scores) and is expected to be δ ≥ 0.90.61 Given that δ was large for both samples, it can be said that students earned a variety of scores across the possible ranges for both the total score (0−18) and average confidence (0%−100%). This also can be seen in Figures 3 and 4. (An instrument with a low δ would have many students clustered within just a small portion of the total possible range of scores; such low discrimination would be less useful to educators.)

RESULTS AND DISCUSSION

Descriptive Statistics

Table 2 shows the descriptive statistics for both the GC1 and GC2 student samples. Inspection of both the minimum and maximum scores reveals neither a ceiling effect nor a floor effect in either sample, i.e., few students earned scores of 18 (meaning the questions were too easy) or 0 (meaning the questions were too hard). However, the median scores for both GC1 and GC2 students are both below the theoretical midpoint, suggesting that the ROXCI items are difficult for students. This is also reflected in the right skew in the GC2 students’ histogram (Figure 3). On the basis of a Kolmogorov−Smirnov test of

ROXCI Item Function

Each item on the ROXCI was evaluated using a mixture of both qualitative and quantitative data to examine its function. Figure 5 depicts both the item difficulty (p) and discrimination (D) indices for the GC2 students. Item difficulty (p) is calculated as the fraction of students who answered an item correctly.41,62 Items with high difficulty indices are considered easier items, and items with low difficulty indices are considered more difficult items. As can be seen in Figure 5, the ROXCI items span a wide variety of difficulty indices for the GC2 students. (Similar results for the GC1 students can be found in the Supporting Information.) For both the GC1 and GC2 students, many items had difficulties below 0.30, while no items had difficulties above 0.80, again suggesting that the ROXCI was difficult for both groups of students. Association types of validity are often assessed by correlating the measurement with a theoretically related criterion.49 Carmines and Zeller argue that validity related to the association of variables is limited regarding its applicability in the social sciences because many assessments represent abstract theoretical concepts where there are no known criteria variables

Figure 3. Distribution of GC2 students’ total correct responses.

normality, both GC2 students’ total scores (D = 0.123, p < 0.01) and average confidence (D = 0.067, p < 0.01) distributions lack normality. (The histograms and normality statistics for the GC1 student sample can be found in the Supporting Information for this article.) Despite the difficulty of the instrument, many students felt confident about their responses, as indicated by the left skew shown in GC2 students’ average confidence scores (Figure 4). Students’ high confidence, but low scores, demonstrates that the ROXCI measures incorrect and strongly held student ideas 1137

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

Figure 6. Item response curve for responses A, B, C, and D (correct answer) to question 16.

Figure 5. Difficulty and discrimination indices for each of the 18 items for the GC2 student sample with numerical values indicating item numbers.

incorrect answer, again suggesting the response is functioning as expected. As shown in Figure 5, question 16 is a difficult item for students and has a low discrimination index. This suggests that both the high and low achieving students often chose incorrect responses for this item. This can also be seen by the competition in the response choices for the high achieving students. This indicates that question 16 can detect misconceptions held by both the strong and weak students. However, caution should be taken when interpreting the results of an IRC at the extreme ends of the total point range because a single student response has a greater influence on the curve when there are only a few students who received a specific total score. (e.g., only 3 students received a score of 14, so a single student represents 33.3% of the students who scored 14). The IRCs of all the items were examined, and no unusual response behaviors (e.g., negative slope on a correct response) were detected. (IRCs for questions 3 and 4 can be found in the Supporting Information.) Although IRC analysis was conducted for the GC1 sample, the small sample made it difficult to detect the relative slope of the curves. Therefore, postadministration validation interviews were conducted with GC1 students in order to assess the validity of the items based on the students’ responses.

in which to compare.49 At this time, there is little known about what existing theoretically viable criteria should be correlated to students’ misconceptions of redox reactions. However, when comparing Figure 5 with that of Figure S5 and S6 (Supporting Information), a slight shift is observed in the difficulties, with the items being more difficult for the GC1 students. Given that the GC2 students have had more chemistry instruction, it can be expected that the items would be easier for these students, and therefore, this signifies evidence for the association validity of the ROXCI data. As shown in Figure 5, most of the ROXCI items have discrimination indices above 0.30. Item discrimination is the extent to which an item distinguishes between strong and weak students.41,62 It can be calculated from the difference in percentages of correct responses between the students who scored at the top and bottom 27% of the student distribution.41,62 Indices can range from −1 to +1 with positive scores indicating that higher achieving students answered a question correctly more often than the lower achieving students. Values above 0.30 are considered to highly discriminate between high and low achieving students.62 For this study, the concurrent evaluation of difficulty and discrimination indices was used to further understand students’ response process for the ROXCI items. Some items cluster in the bottom left-hand corner of the plot, suggesting low discrimination but also high difficulty. This means that not only the low achieving students but also the high achieving students find these items to be difficult (thus, low discrimination). Items that can detect misconceptions in both strong and weak students are vital for educators, especially if the educators believe that the students hold a strong grasp of the concept.41 The ROXCI items were also evaluated using item response curves (IRCs). IRCs relate the percentage of students at each possible total score with their response choices for an individual item.63 Figure 6 depicts the IRC for question 16 for the GC2 students. For example, of the students who earned a score of 4 out of 18 on the ROXCI, 13% of the students chose “A”, 46% chose “B”, 40% chose “C”, and 1% chose “D” (the correct answer). In Figure 6, the generally positive slope of the line representing response D can be attributed to the fact that students with higher scores were more likely to choose response D than the poorer performing students. Because D is the correct answer, the response is functioning as expected. The generally negative slope associated with response B indicates that the lower performing students chose this

Test−Retest Reliability

The stability of the GC1 students’ results was evaluated across two ROXCI implementations. Much like an organic chemist may use multiple spectra to characterize the structure of a synthesized molecule, the stability of students’ responses was examined through multiple lenses: overall test scores and average confidence, individual item functioning, and consistency of responses. By triangulating multiple analyses, a more compelling argument for the reliability of the data can be made. Stability of Students’ Total Scores and Average Confidence. Traditionally, test−retest reliability is assessed through the use of a correlation. Figure 7 shows the relationship between the GC1 students’ total scores on both test 1 and test 2. Because ROXCI is difficult for students, a skew exists in the total scores that violates the assumptions of the parametric Pearson Product−moment correlation (r). Therefore, a Spearman rank correlation coefficient (also known as Spearman’s rho, ρ) was used for the test−retest analyses.64,65 Because there are no generally accepted benchmarks for the strength of a test−retest correlation in order for data to be considered reliable,66 this study used Cohen’s statistical cutoff values to evaluate the strength of the associations: 0.10 indicates a small association, 0.30 indicates a medium association, and 0.50 indicates a large association.67 1138

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

Figure 7. Students’ total scores for test 1 and test 2.

Figure 9. Item discrimination for test 1 and test 2 with numerical values indicating item numbers.

In Figure 7, the correlation between students’ test 1 and test 2 total scores on the ROXCI was ρ = 0.563, p < 0.0001, indicating a large association and suggesting stability in terms of students’ total scores. Similarly, a large association was found between students’ average confidence responses for test 1 and test 2 (ρ = 0.873, p < 0.0001), as shown in Figure 8, suggesting the students’ confidence was stable across test 1 and test 2.

Figure 10. Item difficulty for test 1 and test 2 with numerical values indicating item numbers.

in the top and bottom 27% (based upon total scores) who correctly responded to the item. Since some variation exists in terms of students’ total scores (Figure 7), the top and bottom 27% may not necessarily be the same students for each implementation. Additionally, the difficulty and discrimination values only focus upon students who respond correctly and incorrectly. Given that the distractors are what inform instruction, further analyses to investigate the consistency of students’ individual item responses were conducted. Consistency of Students’ Responses. In order to probe the consistency of students’ individual responses, a chi-square goodness-of-fit test was used to investigate the distribution of consistent and inconsistent student responses. A chi-square goodness-of-fit test evaluates whether an observed distribution differs significantly from a hypothesized distribution.64 In this analysis, the observed distributions were based on the number of students who chose the same response choice for both test 1 and test 2 (consistent) and the number of students who chose different responses (inconsistent). These frequencies were compared to the hypothetical frequency that would exist if students chose responses at random. The null hypothesis (Ho) for this test states that the observed frequencies (consistent/ inconsistent) are equal to the hypothesized frequencies (random chance). Because students’ response patterns were not expected to mimic chance, a significant difference was expected for a majority of the items. To determine the size of the difference, a Cohen’s w effect size was calculated.67

Figure 8. Students’ average confidence for test 1 and test 2.

In both Figures 7 and 8, some variability can be seen between students’ responses to test 1 and test 2. Such variation is not surprising since students continue to construct new understanding. Between test 1 and test 2, students received formal lectures on Lewis structures and VSEPR models. Because some items on the ROXCI focus on particulate concepts such as bonding and charges, it is likely that instruction may have contributed to the variation in students’ understandings. Stability of the Difficulty and Discrimination Indices. In addition to examining the test−retest reliability of the total score and average confidence for the overall instrument, the stability of the students’ responses per item was examined using the difficulty (p) and discrimination (D) indices. Figures 9 and 10 present the difficulty and discrimination indices for test 1 and test 2. A strong relationship exists between item difficulties (ρ = 0.898, p < 0.0001) and between item discrimination indices (ρ = 0.837, p < 0.0001), suggesting stability in item function from test 1 to test 2. Not only is each set of results strongly correlated, but those of test 1 are also highly similar to those of test 2. However, some caution should be taken when interpreting these results. The item discrimination values were calculated by taking the fraction of students 1139

dx.doi.org/10.1021/ed500051n | J. Chem. Educ. 2014, 91, 1132−1144

Journal of Chemical Education

Article

the difference was large (large effect size) for 8 items, moderate (medium effect size) for 8 items, and small (small effect size) for 1 item. This suggests that students’ responses were generally stable across implementations, and this was especially evident in the 8 items where the difference was substantial (large effect size). One item, item 11, lacked statistical significance. This item assesses students’ understandings of the role of the spectator ion and requires a conceptual understanding of solutions and bonding. Because students received formal instruction on bonding, Lewis structures, and VSEPR models in between test 1 and test 2, it is possible that this instruction may have contributed to the variation in students’ responses. The final column in Table 3 shows the number of students who chose the same incorrect response for both test 1 and test 2. The purpose of presenting these frequencies is to show that the students who were consistent were not only the students who responded correctly, as was shown in previous analyses. In fact, many of the consistent students chose the same distractor choice across testing implementations. For 15 of the items, at least 50% of the consistent students chose a distractor, and for 9 of the items, at least 75% of the consistent students chose a distractor. These findings suggest not only that the ROXCI can detect misconceptions but also that the misconceptions detected are often stable.

As an example of this analysis, Figure 11 shows students’ responses to question 3 for test 1 and test 2. The y-axis specifies

Figure 11. Students’ responses to question 3 on test 1 and test 2. Blue indicates inconsistency, and green indicates consistency in students’ responses.

students’ responses on test 1 and test 2 (e.g., AB indicates that a student chose A on test 1 and B on test 2). Green bars represent students who were consistent across implementations, while blue bars represent students who changed their responses and were therefore inconsistent. For question 3, the observed frequencies were 62 consistent students and 22 inconsistent students. This distribution was compared to a random distribution where 50% of the students would be consistent, while the other 50% would be inconsistent. A significant result indicates that students responded in a manner different from that of chance. The results of the chi-square goodness-of-fit tests are shown in Table 3. For all 18 items, the number of students who were consistent was greater than that of the random distribution, and for 17 items, the difference was significant. Of these 17 items,

Measuring Students’ Understandings and Confidence

Given the evidence discussed above, ROXCI generates both valid and reliable data regarding students’ understandings of redox reactions. Therefore, analyses were conducted regarding students’ misconceptions in both the GC1 and GC2 samples. For example, the results for two-tiered items 3 and 4 (shown in Figure 2) can be found in Figure 12. In Figure 12, students’

Table 3. Chi-Square Goodness-of-Fit Consistency Analyses Question

Consistent Studentsa

χ2

p-Value

Cohen’s w

Consistent and Incorrect

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

63 47 62 38 33 38 31 45 37 46 24 38 40 49 46 36 39 37

22.28 44.28 20.25 19.12 9.64 19.12 6.75 37.79 16.97 40.97 0.68 19.12 23.81 51.28 40.97 14.94 21.40 16.97