Development of the Flame Test Concept Inventory: Measuring Student

Nov 22, 2017 - This study reports the development of a 19-item Flame Test Concept Inventory, an assessment tool to measure students' understanding of ...
7 downloads 14 Views 3MB Size
Article Cite This: J. Chem. Educ. XXXX, XXX, XXX-XXX

pubs.acs.org/jchemeduc

Development of the Flame Test Concept Inventory: Measuring Student Thinking about Atomic Emission Stacey Lowery Bretz* and Ana Vasquez Murata Mayo Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio 45056, United States ABSTRACT: This study reports the development of a 19-item Flame Test Concept Inventory, an assessment tool to measure students’ understanding of atomic emission. Fifty-two students enrolled in secondary and postsecondary chemistry courses were interviewed about atomic emission and explicitly asked to explain flame test demonstrations and energy level diagrams. Analysis of students’ explanations offered insight into students’ alternative conceptions and was used to design items and distractors for a 19-item Flame Test Concept Inventory about atomic emission. Results from a pilot study with first-year university chemistry and with upper-division chemistry students were analyzed to create a final version of the inventory that was administered to both secondary students (N = 459) and first-year university students (N = 100) who had completed formal instruction and course assessment about atomic emission. Analysis of student responses indicated the inventory generated valid and reliable data. Common alternative conceptions about atomic emission that remain postinstruction and their prevalence are discussed. KEYWORDS: High School/Introductory Chemistry, First-Year Undergraduate/General, Upper-Division Undergraduate, Analytical Chemistry, Chemical Education Research, Demonstrations, Misconceptions/Discrepant Events, Testing/Assessment, Atomic Spectroscopy, Atomic Properties/Structure FEATURE: Chemical Education Research



INTRODUCTION

to measure the prevalence of alternative conceptions with larger numbers of students. Concept inventories have been reported in the literature for a variety of chemistry concepts including particulate nature of matter,9−11 bonding,12,13 light and heat,14−17 equilibrium,18,19 inorganic qualitative analysis,20 ionization energy,21 organic acids,22 phase changes and solutions,23,24 quantum chemistry,25 redox reactions,26 and biochemistry concepts.27,28 A majority of these concept inventories consist of questions that ask students to interact with either symbolic or particulate representations of matter. None of these concept inventories, however, asks students to interact with macroscopic representations of chemistry such as visual observations in a laboratory experiment or chemical demonstration. This paper reports the development of a concept inventory that requires students to interact with macroscopic observations, namely, an inventory to measure students’ understandings about atomic emission in the context of flame test observations. The flame test is a longstanding demonstration in chemistry classrooms. From 1928 to 2015, the Journal has published 32 different procedures for conducting flame test demonstrations.29−60 Flame tests are a colorful, visually interesting demonstration of the anchoring concept61−63 that matter

Students can construct coherent understandings of phenomena that do not match accepted scientific views. These alternative conceptions, if not challenged through instruction and assessment, can become integrated into students’ cognitive structures and may interfere with subsequent learning of a new concept.1 One way that instructors can elicit their students’ alternative conceptions is by administering a concept inventory to characterize students’ understandings of a particular concept compared to expert thinking.2,3 When developing a concept inventory, one goal is to design an instrument that is easy to use, can be administered in a short period of time, and can accurately identify students’ alternative conceptions by generating reliable and valid data.4 A variety of methods for developing concept inventories exist, ranging from the creation of questions, responses, and distractors being drafted by expert mapping of the content domain5 to generation of items through clinical interviews with students.6 The National Research Council7 recommends the use of student interviews in the development of assessment instruments because interviews generate qualitative data that can be used to formulate attractive distractors written in the natural language of students, thus ensuring that the test elicits common student ideas and reasoning difficulties.8 Such distractor-driven multiple-choice tests combine the richness of qualitative research with the power of quantitative assessment, thereby creating a tool © XXXX American Chemical Society and Division of Chemical Education, Inc.

Received: August 25, 2017 Revised: October 27, 2017

A

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Interview Sample

consists of atoms that have an internal structure that dictates their properties often as part of the evidence that led to the development of the quantum model. Despite the showy nature of this demonstration, there is scarce evidence that it promotes student learning: “students move on, graduate, and [when] I cross paths with them, they remember that flame test that I did for them; they may not remember the chemistry, but they remember the demo.”64 While limited research has been reported regarding students’ understandings of atomic line spectra,65 there are no reports published to date of investigations into student thinking about flame tests and atomic emission.

THEORETICAL FRAMEWORKS In order to conceptualize the interview guide to investigate students’ understandings about flame tests and atomic emission, two different theoretical frameworks were employed.

The sample included students in advanced placement chemistry (AP, N = 14) and secondary students in a first-year chemistry course (SC, N = 12), with both groups enrolled at a large, suburban, public secondary school in the midwestern United States. The university sample consisted of undergraduate students in first-year chemistry (FYC, N = 14) and undergraduate students in upper-division chemistry (UD, N = 12), with both groups enrolled in a large, public and predominantly undergraduate institution in the midwestern United States. The UD students were enrolled in either an instrumental analysis or an analytical chemistry course. All data collection was approved by the Institutional Review Board with regard to protection of the rights of human subjects with respect to informed consent and minimization of risk. Each student was assigned a pseudonym.

Meaningful Learning

Interviews

Ausubel and Novak have written extensively about the construct of meaningful learning which is the process where students make substantive connections between what they need to learn and what they already know, i.e., their prior knowledge.66,67 Meaningful learning stands in contrast to the process of rote memorization where information is stored without forming connections to prior knowledge. Ye and Lewis explained rote learning as involving tasks such as “a first-year chemistry student [being asked] to recall the color of a particular metal when it is put in a flame test. As the information that is being recalled has no meaningful association with existing content knowledge, the information must be learned through rote learning.”68 Ausubel and Novak’s construct of meaningful learning has informed several chemistry education research studies including research leading to the development of concept inventories regarding bonding representations,12 redox reactions,26 and challenges with learning organic chemistry.69,70

To develop a “thick description”76 of students’ understandings about atomic emission, students were interviewed individually in an instructional laboratory setting so that flame tests could be conducted. Secondary student interviews lasted an average of 60 min and were conducted at their school. University student interviews lasted an average of 45 min. Interviews were both audio and video recorded in order to capture student interactions with the prompts described below. The interviews consisted of four phases. Phase I asked openended questions about atomic emission, and students were asked to create representations77 to explain how atoms release energy. Students were asked to think aloud78 about the topic and were provided with a periodic table, a digital Livescribe Pulse Smartpen,79 and paper to draw if desired. Phase II focused on flame test demonstrations, asking students to respond to predict−observe−explain80 questions regarding what would happen to three chloride salts in a flame test, conducted one salt at a time. Of the 52 students interviewed, 48 reported being familiar with the flame test before it was conducted in the interview. The flame tests were used to prompt student thinking about atomic emission in the macroscopic domain of Johnstone’s triangle, and to elicit what connections, if any, that the students identified between the macroscopic domain and atomic structure. In Phase III, students were asked to explain the conventions and symbols found in an energy level diagram for a hydrogen atom such as the meaning of n (principal quantum number), the horizontal lines (energy levels), and the numbers (with a maximum of zero) indicating energy. The energy axis was labeled “Energy × 1020 (J/atom)”. Phase III explored the students’ understanding of the symbolic domain and its connections to atomic structure with regard to electronic transitions. Lastly, in Phase IV, two additional energy level diagrams were shown to students: one with an arrow pointing from n = 2 to n = 4 and the other with an arrow pointing from n = 4 to n = 2. The students were asked to choose whether one, both, or neither of these diagram(s) might represent the flame tests they had carried out in Phase II of the interview. The intent of Phase IV was to elicit conceptions related to connections between the symbolic and the macroscopic domains.



Johnstone’s Domains

To guide our choices for investigating students’ knowledge of atomic emission, the interview guide was constructed to intentionally explore Johnstone’s domains. According to Johnstone, the chemistry to be learned by students falls across three domains: the macroscopic domain (the tangible or visible), the particulate domain (the invisible), and the symbolic domain (formulas or representations).71 Johnstone has argued that one reason that science is so hard to learn is that, in formal science classrooms, students are expected to simultaneously understand the connections among all three domains whereas they should learn about dyads between two of the domains rather than have all three inflicted upon them at once.72 Therefore, the interview guide in this research study that was constructed to purposefully explore the connections (or lack thereof) in students’ understandings about atomic emission was designed using prompts from two domains, namely, energy level diagrams (symbolic) and flame tests (macroscopic).



METHODS This research employed a mixed-method, sequential design.73 Semistructured interviews74,75 were conducted to elicit student thinking about atomic emission. These findings were then used to create items and distractors for a concept inventory that was then used to quantify the prevalence of these understandings with a larger sample of students. Details on each of these procedures are provided below.

Data Treatment

All interviews were transcribed verbatim using both the audio and video recordings. Data management was facilitated using QSR NVivo8.81 The interview transcripts were analyzed for alternative conceptions using the Constant Comparison B

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Expert Content Validation

Method (CCM) which is an inductive data coding process used to categorize and compare qualitative data for analysis purposes.82 In CCM, a unit of data (i.e., a transcript interview) is analyzed and broken into codes based on emerging themes, which are then organized into categories that reflect a logical understanding of the codes. Then, this analyzed unit of data is systematically compared with another unit of data. Ideally, this iterative process goes on until theoretical saturation is reached, which occurs when no new categories or new information emerge from subsequent interviews. In this study, a purposeful sample was used to include a diverse range of student expertise, secondary and postsecondary chemistry students with the goal to reach theoretical saturation. A detailed analysis of students’ explanations has been reported.83 The alternative conceptions that emerged through the analysis of student interviews formed the basis for creating questions and distractors for the flame test concept inventory (FTCI) in the form of multiple-choice items. For example, Question 8 on the FTCI is shown in Box 1, and the correct answer is indicated with an asterisk.

A preliminary version of the FTCI was a paper-and-pencil inventory of 18 multiple-choice items. Each copy of the FTCI included a handout with color pictures of three chloride salts and their respective flame tests. Seven faculty members at the researchers’ institution who were experienced first-year chemistry and/or analytical chemistry instructors were asked to review the chemistry content of the FTCI for accuracy. Specifically, the faculty were asked to answer these three questions: 1. Which item(s), if any, do you find confusing? Why? 2. Which item(s) best represent atomic emission? Why? 3. Which topic(s) were omitted that you feel need to be included to best represent atomic emission? Why? The faculty provided several comments and suggestions to improve this preliminary version of the FTCI. For example, one expert suggested that an item that included a student-generated Bohr atomic model to represent a multielectron substance be followed by an item that explicitly asked students about the limitations of that student-generated Bohr model. The item containing a Bohr atomic model was not deleted because that model was the one most commonly drawn by secondary students in their explanations of how atoms release energy. The expert was agreeable to retaining the item with the model as long as an additional item was added to explore students’ comprehension of the limitations of that model for multielectron atoms. Other experts suggested changes in wording for some items to more accurately reflect the chemistry that takes place in flame tests. For example, rather than talk about the ions in the flame, the word “ions” was changed to “atoms” because atomization more accurately describes the current scientific understanding of the multiple processes taking place in a flame. A careful analysis of expert suggestions, balanced against the need to keep the essence of the alternative conceptions found in the interviews, led to revisions in wording and the addition of two new items, resulting in the pilot-test version of the FTCI with 20 items.

Box 1. Question 8 on the Flame Test Concept Inventory Q8. Copper(II) chloride releases energy in a flame test when... A. losing valence electrons B. gaining valence electrons C. breaking bonds in the compound *D. valence electrons return to the ground state All the distractors on the FTCI were inspired by multiple explanations offered by students during the clinical interviews. For example, responses A, B, and C in Question 8 were crafted from students’ explanations about losing/gaining electrons and breaking bonds during the flame test: “In the flame test the valence electrons on the substances were lost to make them more stable that’s why the color of the flame stopped and when back to normal (referring to initial color) after a while, so this chart (energy level diagram with arrow pointing from n = 4 to n = 2) shows that after a while, after the flame test loses electrons, it became more stable, the energy decreases.” (Alex, SC) “When atoms lose energy, (pause), when is releasing energy, one of them is gaining an electron and one of them is losing an electron (referring to arrows in energy level diagrams), just trying to figure out [pause], to go up in energy level (indicates energy level diagram with arrow going from n = 2 to n = 4), you have to gain an electron, but this is going to coincide with gaining an electron and this one (points at energy level diagram with arrow pointing from n = 4 to n = 2) is going to be losing an electron.” (Rachel, UD) “When the copper chloride burns (during flame test) there are bonds breaking, one way the energy’s being released is in that certain wavelength of light that it is being emitted.” (Arthur, AP) “Energy is stored in the bonds between atoms, not in the actual atoms itself, and when you break the bonds that’s when energy is released... The energy release by the bonds breaking through the heat (during flame test), and depending in what wavelength is the visible light spectrum, wherever that lands, that’s the color you see.” (Waldo, AP) “The flame (in flame test) is required to have the reaction take place and to break the bonds apart and to make the electrons jump.” (Michael, FYC)

Concept Inventory Administration

The 20-item version of the FTCI was pilot tested both with FYC students (N = 222) and with UD students (N = 40) enrolled in analytical chemistry courses. Seven of these students (5 FYC, 2 UD) subsequently participated in individual student validation interviews (described below), resulting in the deletion of two items, the addition of one item, and revisions in wording to others. The revised, final version of the FTCI that consists of 19 items was then administered to secondary high school chemistry students (SC, N = 308 and AP, N = 151) from 15 schools in 9 states across the United States, and to an additional FYC class (N = 100) giving consent for their results to be used in our research. Three weeks later, three students from the FYC class of N = 100 participated in validation interviews; no additional modifications were deemed necessary. The 19-item FTCI was then administered for a second time in this same classroom in order to conduct a test−retest reliability study. After omitting the responses from the three students who participated in the validation interviews, the responses of N = 80 students were used for a test−retest reliability study. In all cases, directions were given to students by their instructors regarding voluntary participation and student consent for the use of results in research. Students required 10−15 min to respond to the FTCI, and the FTCI was administered after students had been formally assessed by their instructors on the C

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

topic of atomic emission in order to identify the most firmly held alternative conceptions. Excel and SPSS84 were used to analyze the data.

Table 1. Student Scores for the 20-Item Pilot-Test Version and the Final 19-Item Version of the FTCI Version

Student Validation Interviews

20 items

The student validation interviews offered insights as to how to further improve the quality of the data generated by the FTCI by identifying problems or ambiguities in the content and clarity of items that could lead to confusion for students.85 Students who reported being unfamiliar with the flame test agreed that the FTCI handout with color pictures of the flame tests was clear and self-explanatory. The first round of validation interviews, which each lasted approximately 45 min, were conducted 3 weeks after the administration of the 20-item FTCI. This waiting time was chosen because it has been shown that explicit memory does not affect responses in a test−retest situation if it is obtained at a 3 week interval.86 The validation interviews followed a think-aloud protocol where students reanswered all items, providing information about how well they understood the items, and which strategies (elimination, guessing) they used to choose an answer. If a student was not guessing and chose a distractor, this indicated the distractor was appealing and/or rooted in the alternative conceptions identified in the four-phase flame test interviews.86



19 items

19 items test−retest

Scoring (Students) 1-tier 1-tier 1-tier 1-tier 2-tier 1-tier 2-tier 1-tier 2-tier 1-tier

(FYC) (UD analytical) (UD instrumental) (SC) (SC) (AP) (AP) (FYC) (FYC) test (FYC)

1-tier retest (FYC) 2-tier test (FYC) 2-tier retest (FYC)

N

Mean

SD

222 25 19 308

6.61 12.56 12.31 6.01 4.29 8.87 6.79 8.48 6.09 8.53

2.770 3.150 3.130 2.910 2.344 5.008 3.888 4.011 3.220 4.081

6 13 13 6 4 7 5 8 6 8

1 7 6 1 0 0 0 1 0 1

17 18 18 15 11 18 14 18 14 18

8.68 6.08 6.18

3.871 3.352 2.647

9 6 6

1 0 0

17 14 12

151 100 80

Median Min Max

p-values exceeding 0.05 indicate normality. Only the FYC scores were normally distributed, as shown in Table 2. One unusual feature of the AP scores in Figure 2 is the prominent number of students who scored 17/19. The AP data set includes responses from students with 5 different teachers in 4 states, but it would be incorrect to presume that the peak at 17 suggests there were only two (or three) questions that were particularly difficult for the AP students. In fact, there were six questions (nearly one-third of the items) on the FTCI for which 30% or more of the AP students chose incorrect distractors. The discriminatory power of the entire inventory was measured by Ferguson’s delta (δ) where δ ≥ 0.9 indicates a broad score distribution and good discrimination among all students.90 Both one-tier and two-tier scorings of the final version of FTCI yielded satisfactory results for all students (Table 2).

RESULTS AND DISCUSSION

Concurrent Validity

The scores for the pilot test of the 20-item FTCI for both FYC (N = 222) and UD (N = 25 and N = 19) students can be found in Table 1. Concurrent validity was examined to determine if the 20-item version of the FTCI could successfully distinguish between populations.87 The FYC mean (M = 6.61; SD = 2.77) was significantly lower than the mean for either class of UD students. The more experienced UD chemistry students scored higher than the less expert FYC students. Two, one-tailed t-tests established that the two UD classes were equivalent,88 resulting in the subsequent merger of both courses’ data for analysis (n = 44; M = 11.5; SD = 2.83). A t-test indicated that the UD pilot test FTCI scores were significantly higher than the FYC scores [(t (264) = −11.50, p = 0.01 with η square (η2 = 0.33) showing a large effect accounting for 33% of the variability between groups], thereby providing a form of concurrent validity. It bears noting that, even though the UD students outperformed the FYC students by a statistically significant amount, the average UD student answered only 11.5 out of 20 (57.5%) items on the 20-item FTCI.

Internal Consistency

The internal consistencies of SC, AP, and FYC scores on the final 19-item version of the FTCI were measured by calculating a Cronbach α where α ≥ 0.7 values are considered acceptable and, thus, indicate that all items closely measure the same construct.91 Kuder−Richardson (KR-20) indices were also calculated as a measure of internal consistency to examine the dichotomous scoring, either right or wrong, again with an accepted criterion value of 0.7 or greater. Both one-tier and two-tier FTCI scores yielded acceptable α values and KR-20 indices for both the AP and the FYC students (Table 3), indicating that all items closely measured the same construct.90 The scores of the less experienced SC students, however, did not exceed either threshold, reflecting a more fragmented knowledge about the concepts assessed by the FTCI.92 Internal consistency was also examined by conducting a test−retest analysis. Scores were available for N = 80 of the FYC students (see Table 1) and used to calculate a stability coefficient, as a measure of consistency in responses using one-tier scoring. While the stability coefficient as measured by the Pearson correlation resulted in a strong correlation of 0.591, it did not meet the recommended threshold of 0.7. A lower correlation, however, is not cause for concern as the appropriateness of internal consistency thresholds for CIs and alternative conceptions have been recently questioned given the challenges associated with measuring incomplete and incorrect student understandings.3,22,92 Examining the correlation

Descriptive Statistics

The final version of the FTCI consists of 19 items, including four, two-tier answer−reason pairs.5 Student responses were scored both as a one-tier FTCI with 19 points possible (“1” for correct, “0” for correct) and again as a two-tier FTCI with 15 points possible (“1” for correct answers to both the answer and the reason tiers, “0” for all other responses). Table 1 and Figures 1−3 summarize the final version scores for secondary students (SC), secondary students enrolled in an AP chemistry course (AP), and FYC students, including test−retest data. All SC, AP, and FYC score distributions were skewed to the left, that is, score below the theoretical median. The Kolmogorov−Smirnov (K−S) test was used to assess the normality of the distributions in Figures 1−3 as the student sample sizes were above 50.89 In the K−S test, the scores of interest are compared to a normal distribution and, therefore, D

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Figure 1. FTCI one-tier and two-tier score distributions for SC (N = 308) students.

Figure 2. FTCI one-tier and two-tier score distributions for AP (N = 151) students.

Figure 3. FTCI one-tier and two-tier score distributions for FYC (N = 100) students.

consistency among students’ responses is likely to be lower. Most importantly, Streiner93 has cautioned against stringent application of 0.7 as a threshold given that concept inventories are not unidimensional. The FTCI measures students’ knowledge of flame tests, but the constructs required for understanding here include electronic structure of the atom and properties of light.

between scores in a test−retest design does not account for the fact that students with identical scores may not, in fact, hold the same misconceptions. While internal consistency can be artificially inflated by asking questions that iterate on the same concept, when instruments are designed to detect misconceptions of students that result from fragments of knowledge and incorrect relationships among concepts, the E

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

A high p-value (p > 0.8) may indicate that the item was too easy for the test’s intended population and may not be appropriate for inclusion in an inventory designed to elicit students’ alternative conceptions. Note that a low p-value does not necessarily indicate a malfunctioning item. A valid item could be answered incorrectly by a large number of students because of the very fact that it addresses a deep-rooted alternative conception. Item discrimination was calculated to determine how well each item differentiated between students who scored in the top 27% and those whose scores were in the bottom 27% of all scores.94 Discrimination index values of D > 0.3 are considered ideal.90 A low item discrimination index can be measured when an item is either too difficult or too easy because either extreme of difficulty corresponds to having all students getting an item either correct or incorrect. Discrimination index versus item difficulty plots for SC, AP, and FYC students can be found in Figure 4. The majority of the items functioned acceptably for all students. A few items were answered correctly by fewer than 25% of the SC students, indicating that the items were particularly difficult for those students and that students either resorted to guessing (25% marks the guessing threshold for a 4-distractor item) or found another distractor particularly attractive. Note that the most difficult items also poorly discriminated, given that students in both the top and bottom 27% found them difficult. Individual item reliability in the form of a point biserial (ρbis) was calculated as the correlation between each item’s score (correct = “1” or incorrect = “0”) and the overall test score. Satisfactory values for ρbis equal or exceed 0.2.90 The majority

Table 2. Results for Test of Normality and Discrimination for FTCI Final Version Scores Scoring (Student Categories) 1-tier 2-tier 1-tier 2-tier 1-tier 2-tier a

(SC) (SC) (AP) (AP) (FYC) (FYC)

K−S Statistic

df

p-Value

Ferguson’s δ

0.121 0.144 0.170 0.181 0.088 0.920

308 308 151 151 100 100

0.000 0.000 0.000 0.000 0.052a 0.035a

0.94 0.92 0.95 1.00 1.00 0.95

These p-values were found to be normally distributed.

Table 3. Internal Consistency for Final Version FTCI Scores Internal Consistency Measures Student Categories SC AP FYC

Scoring

Cronbach’s α

KR-20

1-tier 2-tier 1-tier 2-tier 1-tier 2-tier

0.55 0.50 0.87 0.83 0.76 0.73

0.55 0.50 0.87 0.83 0.77 0.74

Item Analysis

Classical test theory was used to evaluate item difficulty, discrimination, and reliability. Item difficulty, p, is defined as the proportion of students answering that item correctly. The acceptable range of item difficulty is 0.30 < p < 0.80.90

Figure 4. Discrimination index versus item difficulty with one-tier scoring. (The number next to each dot indicates the FTCI item.) F

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

and 12 (category a). Figure 5 shows that 94% of SC students held at least six common alternative conceptions about atomic emission. It bears repeating that these data were collected af ter students had completed instruction and been tested by their instructor on these ideas. Only 4% of AP students correctly answered items 9, 10, 11, and 12 (category a), while 45% of students correctly answered both items 4 and 15 (category b). Distractors in category a were the most commonly selected by AP students, while distractors in categories b and f were the least selected by these students. Figure 5 shows that 75% of AP students held at least four common alternative conceptions about atomic emission. Only 4% of FYC students correctly answered items 9, 10, 11, and 12 (category a), while 46% of students correctly answered both items 4 and 15 (category b). Distractors in category a were the most commonly selected by FYC students, while distractors in categories b and f were the least selected by these students. Figure 5 shows that 85% of FYC students held at least four common alternative conceptions about atomic emission. The distractors can be further categorized according to the students’ difficulties with connecting Johnstone’s domains (Table 5). For example, the “macroscopic/particulate” refers to the difficulties that students had with connections between the macroscopic and particulate domains. Students’ alternative conceptions regarding the connections between the macroscopic and particulate domains are evidenced by their selection of distractors across five different FTCI items (Table 5). Students’ alternative conceptions at the “macroscopic/symbolic” interface are evident by the popularity of distractors in Table 5 regarding the colors of the flame test and their connection to energy level diagrams. It was particularly challenging for students to connect the energy level diagrams to a color such as “red”. During the validation interviews, students were focusing on the

of the items functioned acceptably for all students. Notably, items 9 and 10 (Box 2; correct answers are marked with an asterisk), which asked students about representing absorbance and emission using a Bohr atomic model and about the limitations of doing so, both had low ρbis values for both SC and AP students, indicating the strong appeal of distractors in both of those items. The ρbis was low for FYC students for item 10, which is the item that was strongly recommended during expert content validation. Student validation interviews confirmed that students were not explicitly guessing the answer, that they understood the intention of items 6 and 10, but that they were still strongly attracted to distractors. Alternative Conceptions as Measured by the FTCI

The FTCI items elicited alternative conceptions about atomic emission. Each FTCI distractor is directly tied to an alternative conception revealed through the interviews and falls into one of these seven categories provided in Table 4. Not one SC student correctly answered both items 4 and 15 (category b), and only 8% correctly answered items 9, 10, 11, Table 4. FTCI Items as They Correspond to Alternative Conceptions about Atomic Emission Alternative Conception Categories a b c d e f g

Misrepresentations of atomic emission Atomic properties affecting atomic absorption and emission Breaking and/or forming bonds affect absorption and emission Losing and/or gaining electrons Process related alternative conceptions Heuristics Terminology used out of context

Corresponding FTCI Item(s) 9, 10, 11, and 12 4, 15 3, 5 2, 6, 8 7, 13, and 14 16 and 17; 18 and 19 1 G

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Figure 5. Students holding one or more alternative conceptions about atomic emission in the context of a flame test.

Table 5. Comparison of Students Choosing FTCI Distractors about Atomic Emission as They Correspond to Johnstone’s Domains

direction of arrows and the terms “absorption” and “emission”, rather than explaining how the size of the gap between energy levels would result in the generation of different colors being

observed. Their confusion about the relationship between the symbolic/particulate domains was also revealed by their combined responses to a two-tier question, where the first tier asked H

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Assessment in Science Teaching and Learning Symposium; The University of Sydney; UniServe Science: Sydney, Australia, 2006; pp 1−9. (2) Libarkin, J. C. Concept Inventories in Higher Education Science. Manuscript prepared for the National Research Council Promising Practices in Undergraduate STEM Education Workshop 2, Washington, DC, Oct 13−14, 2008. http://sites.nationalacademies.org/cs/ groups/dbassesite/documents/webpage/dbasse_072624.pdf (accessed Oct 2017). (3) Adams, W. K.; Wieman, C. E. Development and Validation of Instruments To Measure Learning of Expert-Like Thinking. Int. J. Sci. Educ. 2011, 33 (9), 1289−1312. (4) Krause, S.; Birk, J.; Bauer, R.; Jenkins, B.; Pavelich, M. J. Development, Testing, and Application of a Chemistry Concept Inventory. Paper presented at the 34th ASEE/IEEE Frontiers in Education Conference, Savannah, GA, Oct 20−23, 2004. Institute of Electrical and Electronics Engineers; Piscataway, NJ, 2004. DOI: 10.1109/FIE.2004.1408473. http://ieeexplore.ieee.org/stamp/stamp. jsp?arnumber=1408473 (accessed Oct 2017). (5) Treagust, D. F. Development and Use of Diagnostic Tests To Evaluate Students’ Misconceptions in Science. Int. J. Sci. Educ. 1988, 10, 159−169. (6) Bretz, S. L. Designing Assessment Tools To Measure Students’ Conceptual Knowledge of Chemistry. In Tools of Chemistry Education Research; Bunce, D. M., Cole, R. S., Eds.; American Chemical Society: Washington, DC, 2014; pp 155−168. (7) National Research Council. Knowing What Students Know: The Science and Design of Educational Assessment; The National Academies Press: Washington, DC, 2001; pp 1−14. DOI: 10.17226/10019. https://www.nap.edu/catalog/10019/knowing-what-students-knowthe-science-and-design-of-educational (accessed Oct 2017). (8) Sadler, P. M. Psychometric Models of Student Conceptions in Science: Reconciling Qualitative Studies and Distractor-Driven Assessment Instruments. J. Res. Sci. Teach. 1998, 35 (3), 265−296. (9) Mulford, D.; Robinson, W. An Inventory for Alternate Conceptions among First-Semester General Chemistry Students. J. Chem. Educ. 2002, 79 (6), 739−744. (10) Othman, J.; Treagust, D. F.; Chandrasegaran, A. L. An Investigation into the Relationship between Students’ Conceptions of the Particulate Nature of Matter and Their Understanding of Chemical Bonding. Int. J. Sci. Educ. 2008, 30 (11), 1531−1550. (11) Nyachwaya, J. M.; Mohamed, A.-R.; Roehrig, G. H.; Wood, N. B.; Kern, A. L.; Schneider, J. L. The Development of an Open−Ended Drawing Tool: An Alternative Diagnostic Tool for Assessing Students’ Understanding of the Particulate Nature of Matter. Chem. Educ. Res. Pract. 2011, 12, 121−132. (12) Luxford, C. J.; Bretz, S. L. Development of the Bonding Representations Inventory To Identify Student Misconceptions about Covalent and Ionic Bonding Representations. J. Chem. Educ. 2014, 91 (3), 312−320. (13) Peterson, R. F.; Treagust, D. F.; Garnett, P. Identification of Secondary Students’ Misconceptions of Covalent Bonding and Structure Concepts Using a Diagnostic Instrument. Res. Sci. Educ. 1986, 16, 40−48. (14) Artdej, R.; Ratanaroutai, T.; Coll, R. K.; Thongpanchang, T. Thai Grade 11 Students’ Alternative Conceptions for Acid−Base Chemistry. Res. Sci. Technol. Educ. 2010, 28 (2), 167−183. (15) Chandrasegaran, A. L.; Treagust, D. F.; Mocerino, M. The Development of a Two-Tier Multiple-Choice Diagnostic Instrument for Evaluating Secondary School Students’ Ability To Describe and Explain Chemical Reactions Using Multiple Levels of Representation. Chem. Educ. Res. Pract. 2007, 8, 293−307. (16) Linke, R. D.; Venz, M. I. Misconceptions in Physical Science among Non-Science Background Students: II. Res. Sci. Educ. 1979, 9, 103−109. (17) Wren, D.; Barbera, J. Gathering Evidence for Validity during the Design, Development, and Qualitative Evaluation of Thermochemistry Concept Inventory Items. J. Chem. Educ. 2013, 90 (12), 1590−1601. (18) Banerjee, A. C. Misconceptions of Students and Teachers in Chemical Equilibrium. Int. J. Sci. Educ. 1991, 13 (4), 487−494.

them to choose a symbolic representation of emission and the second tier asked them to choose a reason for their symbolic selection.



CONCLUSIONS AND IMPLICATIONS FOR TEACHING The flame test concept inventory generates reliable and valid data for instructors’ use in assessing their students’ alternative conceptions about atomic emission. The FTCI is an easy to administer CI and requires only 10−15 min, but if an instructor has limited class time, s/he may select individual items for in-class discussion or for use as a formative assessment in classroom response systems, for example, “clicker” questions. The flame test is a colorful demonstration that easily captures the attention of students. However, the results from student interviews and responses to the FTCI suggest that there is a large gap between the positive affective response garnered by the flame test and the cognitive understanding of what takes place and what these observations suggest to chemists about atomic structure. Instructors could engage students in their classroom using the flame test demonstration accompanying the “predict, observe, and explain” tasks used in the interview protocol and have a class discussion. Instructors who teach upper-division courses may wish to formatively assess what prior knowledge their students bring with them as residual from earlier coursework in chemistry. The development of FTCI shed light onto alternative conceptions related to challenges with understanding representations of the Bohr atomic model and energy level diagrams. Interviews with secondary students indicated their preference for the Bohr atomic model to explain how atoms release energy. The FTCI includes both of these representations because students bring these preferences to their first-year chemistry classrooms. Instructors can assess understanding of the limitations to these models and start classroom discussions. The FTCI may also be administered in a classroom after formal instruction, similar to how the data were collected in the current study. Colleagues interested in obtaining a copy of the FTCI (including the color handout of the flame tests) for classroom use or additional research should contact the corresponding author.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Stacey Lowery Bretz: 0000-0001-5503-8987 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under Grant 0733642. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We thank the students and instructors who made this study possible.



REFERENCES

(1) Treagust, D. F. Diagnostic Assessment in Science as a Means to Improving Teaching, Learning and Retention. In Proceedings of the I

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

(19) Voska, K. W.; Heikkinen, H. W. Identification and Analysis of Student Conceptions Used To Solve Chemical Equilibrium Problems. J. Res. Sci. Teach. 2000, 37 (2), 160−176. (20) Tan, K. C. D.; Goh, N. K.; Chia, L. S.; Treagust, D. F. Development and Application of a Two-Tier Multiple-Choice Diagnostic Instrument To Assess High School Students’ Understanding of Inorganic Chemistry Qualitative Analysis. J. Res. Sci. Teach. 2002, 39 (4), 283−301. (21) Tan, K.-C. D.; Taber, K. S.; Goh, N.-K.; Chia, L.-S. The Ionization Energy Diagnostic Instrument: A Two-Tier MultipleChoice Instrument To Determine High School Students’ Understanding of Ionization Energy. Chem. Educ. Res. Pract. 2005, 6, 180− 197. (22) McClary, L. M.; Bretz, S. L. Development and Assessment of a Diagnostic Tool To Identify Organic Chemistry Students’ Alternative Conceptions Related to Acid Strength. Int. J. Sci. Educ. 2012, 34 (15), 2317−2341. (23) Adadan, E.; Savasci, F. An Analysis of 16−17-Year-Old Students’ Understanding of Solution Chemistry Concepts Using a Two-Tier Diagnostic Instrument. Int. J. Sci. Educ. 2012, 34 (4), 513−544. (24) Linke, R. D.; Venz, M. I. Misconceptions in Physical Science among Non−Science Background Students. Res. Sci. Educ. 1978, 8, 183−193. (25) Dick-Perez, M.; Luxford, C. J.; Windus, T. L.; Holme, T. A. A Quantum Chemistry Concept Inventory for Physical Chemistry Classes. J. Chem. Educ. 2016, 93 (4), 605−612. (26) Brandriet, A. R.; Bretz, S. L. The Development of the Redox Concept Inventory as a Measure of Students’ Symbolic and Particulate Redox Understandings and Confidence. J. Chem. Educ. 2014, 91 (8), 1132−1144. (27) Villafañe, S.; Heyen, B. J.; Lewis, J. E.; Loertscher, J.; Minderhout, V.; Murray, T. A. Design and Testing of an Assessment Instrument to Measure Understanding of Protein Structure and Enzyme Inhibition in a New Context. Biochem. Mol. Biol. Educ. 2012, 44, 179−190. (28) Bretz, S. L.; Linenberger, K. J. Development of the Enzyme− Substrate Interactions Concept Inventory. Biochem. Mol. Biol. Educ. 2012, 40 (4), 229−233. (29) Oser, J. I. Flame Tests. J. Chem. Educ. 1928, 5 (2), 192. (30) Clark, A. R. The Test-Tube Method for Flame Testing. J. Chem. Educ. 1935, 12 (5), 242−243. (31) Clark, A. R. Test-Tube Flame Test Applied to the Rarer Elements. J. Chem. Educ. 1936, 13 (8), 383−384. (32) Kiplinger, C. C. Paper for Paltinum in Flame Tests. J. Chem. Educ. 1941, 18 (6), 297. (33) Anderson, H.; Corwin, J. F. A Simple Method of Demonstrating Flame Tests. J. Chem. Educ. 1947, 24 (9), 443. (34) Brown, J. A. Lacquer Color Filters for Qualitative Flame Tests. J. Chem. Educ. 1953, 30 (7), 363−364. (35) Strong, III F.C. Improving Potassium Flame Tests. J. Chem. Educ. 1969, 46 (3), 178. (36) Smith, D. D. Producing Flame Spectra. J. Chem. Educ. 1979, 56 (1), 48. (37) Pearson, R. S. An Improved Calcium Flame Test. J. Chem. Educ. 1985, 62 (7), 622. (38) Bouher, J. H. Capillary Tube Flame Test. J. Chem. Educ. 1986, 63 (2), 158. (39) Ager, D. J.; East, M. B.; Miller, R. A. Vivid Flame Tests. J. Chem. Educ. 1988, 65 (6), 545−546. (40) Gouge, E. M. A. Flame Test Demonstration Device. J. Chem. Educ. 1988, 65 (6), 544−545. (41) Peyser, J. R.; Luoma, J. R. Flame Colors Demonstration. J. Chem. Educ. 1988, 65 (5), 452−453. (42) Mattson, B. M.; Snipp, R. L.; Michels, G. D. Spectacular Classroom Demonstration of the Flame Test for Metal Ions. J. Chem. Educ. 1990, 67 (9), 791. (43) Barnes, Z. K. Alternative Flame Test Procedures. J. Chem. Educ. 1991, 68 (3), 246.

(44) Ragsdale, R. O.; Driscoll, J. A. Rediscovering the Wheel: The Flame Test Revisited. J. Chem. Educ. 1992, 69 (10), 828−829. (45) Thomas, N. C.; Brown, R. A Spectacular Demonstration of Flame Tests. J. Chem. Educ. 1992, 69 (4), 326−327. (46) McRae, R. A.; Jones, R. F. An Inexpensive Flame Test Technique. J. Chem. Educ. 1994, 71 (1), 68. (47) Li, J.; Peng, A.-Z. Multiple Burning Heaps Of Color − An Elegant Variation of a Flame Test. J. Chem. Educ. 1995, 72 (9), 828. (48) Dalby, D. K. Bigger and Brighter Flame Tests. J. Chem. Educ. 1996, 73 (1), 80−81. (49) Bare, W. D.; Bradley, T.; Pulliam, E. An Improved Method for Students’ Flame Tests in Qualitative Analysis. J. Chem. Educ. 1998, 75 (4), 459. (50) McKelvy, G. M. Flame Tests That Are Portable, Storable, and Easy To Use. J. Chem. Educ. 1998, 75 (1), 55. (51) Dragojlovic, V.; Jones, R. F. Flame Tests Using Improvised Alcohol Burners. J. Chem. Educ. 1999, 76 (7), 929−930. (52) Johnson, K. A.; Schreiner, R. A Dramatic Flame Test Demonstration,. J. Chem. Educ. 2001, 78 (5), 640−641. (53) Sanger, M. J. Flame tests: Which Ion Causes the Color? J. Chem. Educ. 2004, 81 (12), 1776A−1776B. (54) Sanger, M. J.; Phelps, A. J.; Banks, C. Simple Flame Test Techniques Using Cotton Swabs. J. Chem. Educ. 2004, 81 (7), 969− 970. (55) Mortier, T.; Wellens, A.; Janssens, M.−J. Inexpensive Alcohol Burners for Flame Tests Using Aluminum Tea Light Candle Holders. J. Chem. Educ. 2008, 85 (4), 522. (56) Vitz, E. Demonstration Extensions: Flame Tests and Electrolysis. J. Chem. Educ. 2008, 85 (4), 522. (57) Landis, A. M.; Davies, M. I.; Landis, L.; Thomas, N. C. Magic Eraser” Flame Tests. J. Chem. Educ. 2009, 86 (5), 577−578. (58) Maines, L. L.; Bruch, M. D. Identification of Unkown Chlroide Salts Using a Comination of Qualitative Analysis and Titration with Silver Nitrate: A General Chemistry Laboratory,. J. Chem. Educ. 2012, 89 (7), 933−935. (59) Neel, B.; Crespo, G. A.; Perret, D.; Cherubini, T.; Bakker, E. Camping Burner-Based Flame Emission Spectrometer for Classroom Demonstrations. J. Chem. Educ. 2014, 91 (1), 1655−1660. (60) Yu, H. L. L.; Domingo, P. N., Jr.; Yanza, E. R. S.; Guidote, A. M., Jr. Making a Low-Cost Soda Can Ethanol Burner for Out-ofLaboratory Flame Test Demonstrations and Experiments. J. Chem. Educ. 2015, 92 (1), 127−128. (61) Murphy, K.; Holme, T.; Zenisky, A.; Caruthers, H.; Knaus, K. Building the ACS Exams Anchoring Concept Content Map for Undergraduate Chemistry. J. Chem. Educ. 2012, 89 (6), 715−720. (62) Holme, T.; Murphy, K. The ACS Exams Institute Undergraduate Chemistry Anchoring Concepts Content Map I: General Chemistry. J. Chem. Educ. 2012, 89 (6), 721−723. (63) Holme, T. A.; Luxford, C. J.; Murphy, K. L. Updating the General Chemistry Anchoring Concepts Content Map. J. Chem. Educ. 2015, 92, 1115−1116. (64) Price, D. S.; Brooks, D. W. Extensiveness and Perceptions of Lecture Demonstrations in the High School Chemistry Classroom. Chem. Educ. Res. Pract. 2012, 13, 420−427. (65) Körhasan, J. D.; Wang, L. Students’ Mental Models of Atomic Spectra. Chem. Educ. Res. Pract. 2016, 17, 743−755. (66) Ausubel, D. Educational Psychology: A Cognitive View; Holt, Rinehart and Winston: New York, 1968. (67) Novak, J. D. Human Constructivism: A Unification of Psychological and Epistemological Phenomena in Meaning Making. Int. J. Pers. Constr. Psych. 1993, 6 (2), 167−193. (68) Ye, L.; Lewis, S. E. Looking for Links: Examining Student Responses in Creative Exercises for Evidence of Linking Chemistry Concepts. Chem. Educ. Res. Pract. 2014, 15, 576−586. (69) Bretz, S. L. Human Constructivism and Meaningful Learning. J. Chem. Educ. 2001, 78 (8), 1107. (70) Grove, N. P.; Bretz, S. L. A Continuum of Learning: From Rote Memorization to Meaningful Learning in Organic Chemistry. Chem. Educ. Res. Pract. 2012, 13, 201−208. J

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

(71) Johnstone, A. H. You Can’t Get There from Here. J. Chem. Educ. 2010, 87 (1), 22−29. (72) Johnstone, A. H. Why Is Science Difficult To Learn? Things Are Seldom What They Seem. J. Cmptr. Assist. Lrng. 1991, 7 (2), 75−83. (73) Towns, M. H. Mixed Methods Designs in Chemical Education Research. In Nuts and Bolts of Chemical Education Research; Bunce, D. M., Cole, R. S., Eds.; American Chemical Society: Washington, DC, 2008; pp 135−148. (74) Bretz, S. L. Qualitative Research Designs in Chemistry Education Research. In Nuts and Bolts of Chemical Education Research; Bunce, D. M., Cole, R. S., Eds.; American Chemical Society: Washington, DC, 2008; pp 79−96. (75) Phelps, A. J. Qualitative Methodologies in Chemical Education Research: Challenging Comfortable Paradigms. J. Chem. Educ. 1994, 71 (3), 191−194. (76) Geertz, C. Thick Description: Toward an Interpretive Theory of Culture. In The Interpretation of Cultures: Selected Essays; Geertz, C., Ed.; Basic Books: New York, 1973; pp 3−30. (77) Linenberger, K. J.; Bretz, S. L. A Novel Technology To Investigate Students’ Understanding of Enzyme Representations. J. Coll. Sci. Teach. 2012, 42 (1), 45−49. (78) Bowen, C. W. Think-Aloud Methods in Chemistry Education: Understanding Student Thinking. J. Chem. Educ. 1994, 71 (3), 184− 190. (79) Livescribe. http://www.livescribe.com/en-us/ (accessed Oct 2017). (80) White, R.; Gunstone, R. F. Probing Understanding; Falmer: London, 1992. (81) QSR International. http://www.qsrinternational.com/product (accessed Oct 2017). (82) Creswell, J. W. Qualitative Inquiry & Research Design: Choosing among Five Approaches; Sage Publications: Thousand Oaks, CA, 2007. (83) Mayo, A. V. Atomic Emission Misconceptions As Investigated through Student Interviews and Measured by the Flame Test Concept Inventory. Doctoral dissertation. Miami University, Oxford, OH, 2012. https://etd.ohiolink.edu/pg_10?0::NO:10:P10_ACCESSION_ NUM:miami1362754897 (accessed Oct 2017). (84) SPSS. https://www.ibm.com/analytics/us/en/technology/spss/ (accessed Oct 2017). (85) Leighton, J. P.; Heffernan, C.; Cor, M. K.; Gokiert, R. J.; Cui, Y. An Experimental Test of Student Verbal Reports and Teacher Evaluations as a Source of Validity Evidence for Test Development. Appl. Measurement in Educ. 2011, 24 (4), 324−348. (86) McKelvie, S. J. Does Memory Contaminate Test−Retest Reliability? J. Gen. Psych. 1992, 119 (1), 59−72. (87) Trochim, W. M. K. Measurement Validity Types. Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/ measval.php (accessed Oct 2017). (88) Lewis, S. E.; Lewis, J. E. The Same or Not the Same: Equivalence as an Issue in Educational Research. J. Chem. Educ. 2005, 82 (9), 1408−1412. (89) Razali, N. M.; Wah, Y. B. Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. J. Stat. Model. and Analytics 2011, 2 (1), 21−33. (90) Ding, L.; Beichner, R. Approaches to Data Analysis of MultipleChoice Questions. Phys. Rev. ST Phys. Educ. Res. 2009, 5 (2), 020101− 02010317. (91) Cronbach, L. J. Coefficient Alpha and the Internal Structure of Tests. Psychometrika 1951, 16 (3), 297−334. (92) Bretz, S. L.; McClary, L. M. Students’ Understandings of Acid Strength: How Meaningful Is Reliabilty When Measuring Alternative Conceptions? J. Chem. Educ. 2015, 92 (2), 212−219. (93) Streiner, D. L. Starting at the Beginning: An Introduction to Coefficient Alpha and Internal Consistency. J. Pers. Assess. 2003, 80 (1), 99−103. (94) Crocker, L. M.; Algina, J. Introduction to Classical and Modern Test Theory; Holt, Rinehart, and Winston: New York, 1986.

K

DOI: 10.1021/acs.jchemed.7b00594 J. Chem. Educ. XXXX, XXX, XXX−XXX