Response Process Validity Studies of the Scale Literacy Skills Test

Jump to Supporting Information - The Supporting Information is available on the ACS ... Table of item statistics from each set of response process ...
0 downloads 0 Views 2MB Size
Article Cite This: J. Chem. Educ. 2019, 96, 1351−1358

pubs.acs.org/jchemeduc

Response Process Validity Studies of the Scale Literacy Skills Test Jaclyn M. Trate,† Victoria Fisher,† Anja Blecking,† Peter Geissinger,‡ and Kristen L. Murphy*,† †

Department of Chemistry and Biochemistry, University of WisconsinMilwaukee, Milwaukee, Wisconsin 53211, United States College of Science, Technology, Mathematics, and Health Sciences, Eastern Oregon University, La Grande, Oregon 97850, United States



Downloaded via VOLUNTEER STATE COMMUNITY COLG on July 26, 2019 at 09:31:13 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Assessment and evaluation tools and instruments are developed to measure many things from content knowledge to misconceptions to student affect. The standard validation processes for these are regularly conducted and provide strong evidence for the validity of the measurements that are made. As part of the suite of validation tools available to researchers and practitioners, response process validity studies are an important component for the development and study of multiple-choice or forced-response items. An assessment was developed to measure the scale literacy skills of students in General Chemistry. The initial validation of this instrument was reported earlier, with the response process validity work now presented here to complement the previous study. In addition to the validity work with students in General Chemistry, an additional study was conducted with students in Anatomy and Physiology, as part of the validation studies as the assessment was moved into this new discipline. The process used, threats identified, and implications of these identified threats are reported. KEYWORDS: First-Year Undergraduate/General, Testing/Assessment, Curriculum, Chemical Education Research FEATURE: Chemical Education Research



INTRODUCTION The development and dissemination of instruments for measuring student cognitive processes related to chemistry tasks has long been an integral part of this field with many of the pioneering assessments, such as the Chemical Concepts Inventory (CCI),1 CHEMX,2 and others,3−5 supported by robust reliability and validity evidence. As stated in the Standards for Educational and Psychological Testing, validity measures “the degree to which evidence and theory support the interpretation of test scores entailed by proposed uses of tests”6 and is established through the use of multiple evidentiary examples such as test content, internal structure, relations to other variables, and response processes.7 Compared to many of the commonly used methods for establishing validity, evidence based on response processes is relatively under-reported.8−12 Described in the Standards as “questioning test takers about their performance strategies or responses to particular items”,6 evaluating the response processes of student respondents can provide necessary information related to the cognitive processes used by students as they solve assessment items. The incorporation of themes into science instruction, while not new,13−16 has gained increasing interest since the 2013 release of the National Research Council’s Next Generation Science Standards.17 One theme identified in this framework, scale, which has been identified as a cross-cutting concept along with proportion and quantity, has been investigated in a variety of settings18−23 but has been comparatively under© 2019 American Chemical Society and Division of Chemical Education, Inc.

studied in the postsecondary General Chemistry population.24−26 Defined broadly as “any quantification of a property that is measured”20 and more specifically aligned to a series of concepts and skills such as quantity, distance, measurement, estimation, proportion, and perspective,20 it is widely agreed upon that students are readily confronted with issues of scale at the General Chemistry level. We previously reported the development of a 45-item multiple-choice assessment developed to measure student ability in selected scale concepts, the scale literacy skills test (SLST).24 The initial sample used to investigate this instrument included 812 Preparatory Chemistry students and 1393 General Chemistry I students at a large research intensive midwestern university. While this instrument was subjected to rigorous testing to ensure reliability and validity for measuring conceptions of scale held by students through trial testing, expert content validation, and classical test theory, the original publication did not include evidence based on the response processes of the student respondents, and we herein are reporting that data. As many claims regarding student performance related to ability in scale have been built around student performance on this assessment, of critical importance was the identification of any items that pose a threat to the validity of those claims. Received: November 30, 2018 Revised: May 2, 2019 Published: May 23, 2019 1351

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358

Journal of Chemical Education

Article

Figure 1. Schematic of experimental method: samples and resulting action (for Scale Literacy Skills Test, see ref 24).

Table 1. Response Codes with Codes and Descriptions Code (Abbreviation) Totally Correct (TC) Totally Incorrect (TI) Supported Misconception Response (SMR) Correct for the Wrong Reason (CW) Incorrect for the Wrong Reason (WR) Used a Test-Taking Strategy (TTSE), (TTS-TE), or (TTS-NM) Did Not Support Misconception Response (DSMR)

Description Intended Processes Student selected the correct multiple-choice option and provided a correct reason for selection of that response Student selected an incorrect multiple-choice option and provided an incorrect reason that supported selection of that response Student selected a response related to a misconception and demonstrated in their process that they held that misconception Potential Threats Student selected the correct response but does not provide reasoning that supports selection of that choicea Student selected an incorrect response but provides reasoning that is correct and should have resulted in the selection of the correct response Student was able to apply a test-taking strategy to increase their odds of answering correctly, further categorized by type of test-taking strategy applied: elimination (E), trial and error (TE), or number matching (NM)b Student selected a misconception response but did not demonstrate in their process holding that misconception.

a

For an example of student language coded this way, see the description of items 2/3 and 2*/3*. bFor an example of student language coded this way, see the description of items 7, 8, 30, and 31.

Research Questions

arrive at an answer and select their chosen answer. Students were provided a calculator and scratch paper but were given the caveat that, if used, the student must verbalize everything they are writing down or entering into the calculator. The interviewer asked follow-up questions as needed to ensure the process used by the student was thoroughly captured. Example items were presented to the student in which the interviewer first demonstrated the expected process to the interviewee and the interviewee then completed an example of their own with feedback from the interviewer. Each interview was video recorded and the audio transcribed for further analysis. All student data were obtained via signed consent to the University of WisconsinMilwaukee Institutional Review Board (IRB approvals 09-047 and 14-404). Because the first set of response process interviews (with students in General Chemistry I, Figure 1) revealed potential threats, items were evaluated and changed prior to the second set of response process interviews (with students in Anatomy and Physiology I).

Following the response process study published by Jack Barbera and co-workers,12 response processes of General Chemistry I and Anatomy and Physiology I students on the scale literacy skills test were collected and analyzed in order to the answer the following research questions:



1. To what extent does response process evidence exist for the scale literacy skills test in both the discipline for which it was developed and a related discipline? 2. What threats to the validity of the scale literacy skills test can be identified through the response process and what implications do the identification of these threats hold?

METHODS

General Chemistry I

General Chemistry I is a 16 week traditional 5 credit laboratory/lecture/discussion course taken by science majors. The university prerequisite for this course includes a passing grade in intermediate algebra (or demonstrated algebra proficiency on a math placement test), or a passing grade in a Preparatory Chemistry course. Additionally, students are required to earn a passing score of 50% or above on a chemistry placement test (ACS Toledo Test) to maintain enrollment in the course. Students who do not score above that threshold are directed to a Preparatory Chemistry course. General Chemistry I interview participants (n = 38) were solicited during three semesters, with the final semester solicitation opened only to male students to ensure equal sampling of each sex. Students received a copy of the ACS General Chemistry Study Guide or a gift card for their participation in the study. Interview participants were digitally presented with each item of the assessment and instructed to verbalize their problem-solving process as they worked to

Anatomy and Physiology I

Anatomy and Physiology I is a 16-week traditional 4 credit laboratory/lecture course taken primarily by nursing, biomedical science, and kinesiology majors. There are no university prerequisites in place for students enrolling in the course. On the basis of the results of the response process validity study in Chemistry, it was determined that the scale literacy skills test would be adjusted to address the threats found through the response process which included changing the stem (two items), changing the distractors (two items), or adding clarifying language (two items) to six of the items featured on the assessment. These changes were made to the assessments prior to the start of class-wide data collection in Anatomy and Physiology I. Students enrolled in this same 1352

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358

Journal of Chemical Education

Article

Anatomy and Physiology, 43 items demonstrated an intended process use rate of greater than or equal to 50%. This number drops to 33 (73% of items) and 25 (56% of items), respectively, when considering those items with intended processes that used greater than or equal to 70% and 80%.

semester of class-wide testing in Anatomy and Physiology were solicited to participate in the response process study of the scale literacy skills test (n = 20). Students were seated at a table and given a copy of the 45-item test and an answer sheet with work space. Each item was given the same amount of work space regardless of the type of item so as not to cue students to any expected length of the problem-solving process. Students in this interview set were also instructed to verbalize their problem-solving process as they worked (either on paper or on a calculator) to arrive at an answer and were instructed to verbalize anything written down in the work space along with their selected answer. The interviewer again asked follow-up questions as needed to clarify the student’s process.

Identification of Response Process Threats

As described in Table 3, the response assignments in General Chemistry I were first used to determine those items in which potential threats existed. Using a 2.5% threshold occurrence rate (at least one student using one or more discrepant reasoning processes), 19 items were identified as potential threats. Upping the threshold to 5% (at least two students using one or more discrepant reasoning processes) led to the identification of six items as potential threats. Further inspection of each item and the type of processes used by the students narrowed the number of threats (two or more students using the same discrepant reasoning process) existing on the assessment to five. The remaining item was determined not to pose a threat to the validity as no two students reported choosing the same incorrect answer while using the same process. Using the same method as with the chemistry students, a 5% threshold occurrence rate (at least one student using one or more discrepant process) was chosen for the initial assessments of items posing a potential threat to the assessment in Anatomy and Physiology (displayed in Table 4). Of the 16 items initially identified, upping the threshold to 10% (at least two students using one or more discrepant reasoning processes) led to the reduction of potential threats to six items. Further inspection of each item and the type of processes used by the students narrowed the number of threats existing on the assessment to four. The remaining two items were determined not to pose a threat to the validity as while more than two students did use a discrepant process to select a correct response, no two students chose that response while using a similar process. No common threats between courses were found on any items, and several of the items were found to function very differently in each of the disciplines sampled. See the Supporting Information for a table showing the raw coding of each item for both courses.

Protocol

Following previous literature precedence for a response process validity study of this type,12 each interview response was coded on the basis of the process used by the student. As described in Table 1, response codes could fall under the category of either “intended process” or “potential threat”. A final category of “other” was applied for those students whose process could either not be elucidated from interview transcripts or for students who explicitly stated they were guessing. Items were coded by independent raters (3 for the Chemistry study and 5 for the Anatomy and Physiology study), and any discrepancies in coding were discussed until an agreement was met. Upon final assignment of the codes, any item that 2 or more students reported using the same discrepant reasoning process was identified as posing a threat.



RESULTS

Response Process Support for Content Validity

In total, 58 students participated in the interviews from General Chemistry I (n = 38) and Anatomy and Physiology I (n = 20) and completed all 45 items of the scale literacy skills test during the interview period. As seen in Table 2, for Table 2. Number of Items in Which Intended Processes Were Used by Students Threshold, %

General Chemistry (38 Students)

Anatomy and Physiology (20 Students)

n ≥50 ≥70 ≥80 ≥90

38 45 42 33 11

20 43 33 25 14



IDENTIFICATION OF RESPONSE PROCESS THREATS Items 2 and 3 (Figure 2) were developed with several common student errors in mind and demonstrated very interesting results in both response process studies. The regions of the number line (specifically the boundary between region A and region B) were selected to identify those students who only use the given metric unit and not the value and metric unit together. Those students who correctly take into account both parts of the value in item 2 should correctly arrive at answer B, while those who do not will choose answer A. Item 3 allowed

General Chemistry I students, intended processes were used at least 50% of the time on all items. With an increase of the threshold to 70% and 80%, the number of items only drops to 42 (93% of items) and 33 (73% of items), respectively. In

Table 3. Suspect Items Identified through Response Process in General Chemistry I Item Code

2.5% Threshold

Item(s)

5% Threshold

CW TTS-E TTS-TE TTS-NM WR DSMR

12 2 2 1 2 0

2, 3, 9, 11, 17, 19, 21, 23, 27, 29, 31, 39 10, 16 12,14 30 5, 24

2 0 2 1 1

1353

Item(s)

Threats

Item(s)

3,9

2

3, 9

12, 14 30 24

2 1

12, 14 30

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358

Journal of Chemical Education

Article

Table 4. Suspect Items Identified through Response Process in Anatomy and Physiology I A&P I

5% Threshold

Item(s)

10% Threshold

Item(s)

Threats

Item(s)

CW TTS-E TTS-TE TTS-NM WR DSMR

10 4 2 2 1 1

1, 2, 3, 11, 19, 20, 21, 29, 31, 39 7, 8, 20, 31 8, 13 16, 31 12 5

4 1 1 1 0 0

2, 20, 21, 29 8 8 31

2 1 1 1

2, 20 8 8 31

Figure 2. Items 2 and 3 of the Scale Literacy Skills Test administered in Chemistry.

many students to catch the missed value in item 2 and to go back and correctly identify choice A. These results were verified from the percent chosen of each distractor in both the class-wide and interview data sets (shown in Table 5).

On the basis of the item statistics of the interview set, shown in Table 5, item 2 appears to warrant further investigation as both item difficulty and discrimination fall below the recommended threshold of 0.35 for item difficulty and 0.25 for item discrimination.27 During the interviews it became apparent that if a student confused micrometer as equal to 1 × 10−9 m, the student would correctly choose an incorrect response (1 × 10−7, option A) to item 2, and the student would incorrectly (1 × 10−9 m, option A) choose the correct response to item 3, thus supporting a possible threat to item 3 instead of item 2. This is supported through partial transcripts of students: GC 1 (responding to item 2): “I was looking for the range that micrometers was in and that is 10−9 meters so I would have to pick [region] A.” Given that the threat to item 2 only existed due to the chosen unit, it was determined that changing the item to include a more familiar unit like millimeter could eliminate the threat posed by this item as students would be less likely to

Table 5. Item Statistics and Response Frequency by Percentage for Items 2 and 3 as Administered in Chemistry Item 2 Parameter Difficulty Discrimination Percent chosen Percent chosen Percent chosen Percent chosen

A B C D

Item 3

Interview (n = 38)

Class-Wide (n = 2034)

Interview (n = 38)

Class-Wide (n = 2034)

0.342 0.100 47.4 34.2 13.2 5.3

0.544 0.464 32.3 54.4 8.2 5.1

0.789 0.500 78.9 2.6 10.5 7.9

0.688 0.454 68.8 21.8 5.9 3.5

Figure 3. Items 2/3 of the Scale Literacy Skills Test administered in Anatomy and Physiology. For clarity, these items are marked with an asterisk (*). 1354

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358

Journal of Chemical Education

Article

confuse its conversion for another unit. The number line and value in question were altered (Figure 3) to account for this change prior to presentation of the item to Anatomy and Physiology students. Upon consideration of the item statistics for items 2* and 3* in Table 6, it appears as though the modifications to items 2

threat in Anatomy and Physiology, students were able to correctly select “one order of magnitude” while demonstrating the use of strategy that should have lent to the selection of an incorrect answer choice. Upon consideration of the item statistics for each of these items in Table 7, it appears that performance on both of these

Table 6. Item Statistics and Response Frequency by Percentage for Items 2*/3* As Administered in Anatomy and Physiology

Table 7. Item Statistics and Response Frequency by Percentage for Items 7/8 As Administered in Anatomy and Physiology

Item 2* Parameter Difficulty Discrimination Percent chosen Percent chosen Percent chosen Percent chosen

A B C D

Item 3*

Interview (n = 20)

Class-Wide (n = 538)

Interview (n = 20)

Class-Wide (n = 538)

0.850 0.000 5.0 85.0 5.0 5.0

0.591 0.341 13.8 59.1 15.8 11.3

0.500 0.500 50.0 35.0 0.0 15.0

0.385 0.348 38.5 42.6 5.2 13.8

Item 7 Parameter Difficulty Discrimination Percent chosen Percent chosen Percent chosen Percent chosen

and 3 have increased the students’ likelihood of getting 2* correct, while decreasing the likelihood of getting item 3* correct, the expected change given the removal of the threat identified in chemistry. However, through the response process it was revealed that, as opposed to the Chemistry students who only paid attention to the unit, the Anatomy and Physiology students only paid attention to the value itself. As the region between 10−2 m and 10−3 m encompasses both the correct value of 1 × 10−1 m and the original value of 100 (or 102), 5 (25%) students in the interview set reported a process that resulted in the selection of the correct answer for item 2* while using a process other than that intended by the item. A&P 1 (responding to item 2*): “I’d say it’s in [region] B because it’s between 10... I think it’s a −2 and 103 and 100 is 102 so it would be in that [region].”

A B C D

Item 8

Interview (n = 20)

Class-Wide (n = 538)

Interview (n = 20)

Class-Wide (n = 538)

0.450 0.667 30.0 20.0 5.0 45.0

0.414 0.548 38.8 16.7 3.0 41.4

0.500 0.667 30.0 15.0 5.0 50.0

0.377 0.622 35.7 22.9 3.7 37.7

items is similar and might lead to the assumption that students were consistent in their answer selections (i.e., students selecting 101 in item 7 choosing 10−1 in item 8). However, upon further analysis, 40% of the students (class-wide) who were incorrect on the first question changed to a correct response on the second question. Response process interviews provide a possible explanation for this observation based on the common strategies used by students who answered item 7 incorrectly. A&P 2 (responding to item 7): “100 is 1, 101 is 10, 102 is 100 and to get from 101 to 102 you have to times your answer by 10 so I’m going to say it’s 10” A&P 3 (responding to item 7): “100 is just I think that’s just 10. I don’t remember. Okay 101 should just be um, should just be 10 and then this is 100 and this is 1000. So um, to like check the order of magnitude I think you just take the biggest one and divide whatever is before that one. So the answer is 10.” If the student uses the same strategy in item 8 and attempts to find the distractor which relates to the common multiplier between the values on the number line, the student does not find the “correct” value. By failing to include “10” as a distractor for the item, the student using this strategy is inadvertently given an advantage over other students not using

Items Displaying Test-Taking Strategy Threats

In each course, items were identified in which the distractors chosen allowed students to increase the odds of selecting a correct answer through use of a test-taking strategy. One testtaking strategy used by both groups was trial and error. This test-taking strategy appeared in items where students were able to use the distractors to attempt to back calculate an original value in order to determine the correct answer choice. In one particularly interesting example (Figure 4) that emerged as a

Figure 4. Items 7 and 8 of the Scale Literacy Skills Test. 1355

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358

Journal of Chemical Education

Article

Figure 5. Item 30 of the Scale Literacy Skills Test.

exams used in both General Chemistry I24 and II, upholding the validity of the claims made based on student performance on this assessment is of critical importance. Therefore, performance subscores with the five identified threats removed were calculated for the scale literacy skills test for two semesters of General Chemistry I students. Using a procedure developed by Meng, Rosenthal, and Rubin,28 comparison of student performance on the assessment to final exam performance showed a nonsignificant change between the computed correlations (Table 8). These results provide

this strategy. This is further evidenced by the fact that of the 60 students who changed to selecting “1 order of magnitude” in item 8, all but 4 of them chose “10” or “101” in item 7. A&P 4 (responding to item 8): “That would be 10. No, that would be 0.1. Cause if you were to take 0.001 and you were to multiply that by 0.1 [doing math on calculator] No, it would have to be 1. If you were to take 0.01 and multiply that by 1 [doing math on calculator] No, what am I talking about? 10−1 [converting 10−1 to decimal notation] so the decimal would be here. Well it would be... so it would be an order of magnitude then cause A [10−1] and B [0.1] would be equivalent to each other and that doesn’t give you the right answer. C [1] doesn’t give you a right answer so it would have to be D [1 order of magnitude].” Two additional items in each course were found to exhibit a threat as due to number matching. In each case, a student was able to discern the correct multiple-choice response through selection of the only distractor containing a certain number sequence. For example, in chemistry, item 30 (Figure 5) featured only one distractor containing two “4”s in succession to one another. As demonstrated by the student excerpt below, if a student did not convert or incorrectly converted either value given in the problem but still correctly divided the thickness of the foil by the diameter of the atom, the student was able to identify the correct choice simply by matching the “44” of their calculated answer with the answer choices. GC 2 (responding to item 30): “So the thickness in atoms. You’d have to do conversion of 250 picometers to millimeters and that’s well 100 picometers is 10−9 millimeters so 2.5 × 10−10 I think and then I would have to calculate 0.11 millimeters divided by 2.5 × 10−10 and I get 44 hundred million, but that’s not up there so I did something wrong but since it is 44 and then a string of zeros I will just do 440,000.” Comparable results emerged in Anatomy and Physiology for a similar item (item 31), which asked students to determine the number of pages of a book given the thickness of both the book (0.0235 m) and an individual page (57 μm). As demonstrated by the student interview excerpt for this example, the student was able to correctly identify the correct answer through matching the “41” from their calculated value with the response choices. A&P 5 (responding to item 31): “I’m gonna divide um 0.0235... yeah gonna divide 0.0235 by 57 [doing work on calculator] which gives the answer of 0.0041 approximately and the only one that’s close to that is [choice] B, 410.”

Table 8. Comparative Results of Meng’s Test of Correlated Correlation Coefficients Correlated Item SLST (all items), paired final SLST (threats removed), paired final SLST (all items), conceptual final SLST (threats removed), conceptual final

r (Pearson Product Moment Coefficient)

Z Value

p Value

0.582

0.644

0.520

0.277

0.782

0.569 0.651 0.645

evidence that exclusion of these 5 items from analysis of the assessment is not necessary and that the predictive model from which these scores were built24 is still valid. As this research is only in its preliminary stage in Anatomy and Physiology, the four items identified through the response process will be considered further as necessary.



LIMITATIONS The response process data collected for the scale literary skills test demonstrated that response validity exists for 80% of the items in General Chemistry and for 64% of the items in Anatomy and Physiology. While this difference could be accounted for by the relatively small sample of students interviewed from the same semester of Anatomy and Physiology, another explanation for this observation could be due to differing levels of discipline specific content knowledge held by students within the chemistry and biological science disciplines. As these items were written for an assessment with chemistry students in mind, these items might hold an inherent bias toward knowledge thought to be held by chemistry students



CONCLUSIONS AND IMPLICATIONS In response to research question 1, validity evidence exists for the scale literacy skills test in both disciplines that it was investigated in with greater than 50% of items demonstrating

Treatment of Identified Threats

As student performance on the scale literacy skills test has been used to partially predict student performance on the final 1356

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358

Journal of Chemical Education

Article

Notes

greater than 80% response process support in both samples. Noticeably, the response process support is greater for the General Chemistry I sample than it is for the Anatomy and Physiology I sample, an observation that can possibly be attributed to the smaller sample size of students interviewed in Anatomy and Physiology I, or more likely, to the difference between the disciplines. While again the scale literacy skills test was written for students in Chemistry, the overarching theme of the assessment should be measured consistently across disciplines. However, as described by the course demographics, inherent differences in the type of student enrolled in each course can likely account for the differences seen between disciplines. In response to research question 2, response process data collected during these studies identified five items in Chemistry and four items in Anatomy and Physiology that are threats to the validity of the scale literacy skills test. Two out of the five items identified in Chemistry that were changed prior to the response process study in Anatomy and Physiology were revealed to no longer pose a threat in the new discipline. Two of the remaining three items were written in such a way that changing the stem or distractors was not likely to eliminate the threat and were anticipated to remain a threat in the new discipline. Interestingly, the four items identified as threats in Anatomy and Physiology are unique to those identified in Chemistry. This observation is key in understanding the implications for assessments developed in one discipline and adapted for use in other disciplines. More broadly, the implications of using response process validity relate to a more informed practice during assessment development and construction. Notably, when developing an assessment of this type, one that uses a multiple-choice format, consideration of distractor selection should not be ignored. Caution should be exercised when faced with an item in which anecdotal evidence suggests use of an obscure process, for the likelihood of widespread use in a class-wide setting is low. Adoption of a distractor in which a low percentage of students will be attracted would likely change the percentage of students choosing the remaining distractors and diminish the validity of the item. Likewise, as student performance on assessment items is often used to make judgments about skills or knowledge that students possess, distractor selection must support these judgements. As time or resources do not always allow for the ability to conduct timely response process validity interviews, this work serves to better inform the method of test design and construction.



Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation. The authors declare no competing financial interest.



ACKNOWLEDGMENTS We would like to thank the many students who participated in this study and their willingness and openness in sharing their understandings and conceptions. This material is based upon work supported in part by the National Science Foundation under Grants DUE-1140610 and 1432090.



(1) Mulford, D. R.; Robinson, W. R. An Inventory for Alternate Conceptions among First-Semester General Chemistry Students. J. Chem. Educ. 2002, 79 (6), 739−744. (2) Grove, N.; Bretz, S. L. CHEMX: An Instrument to Assess Students’ Cognitive Expectations for Learning Chemistry. J. Chem. Educ. 2007, 84 (9), 1524−1529. (3) Bauer, C. F. Beyond “Student Attitudes”: Chemistry SelfConcept Inventory for Assessment of the Affective Component of Student Learning. J. Chem. Educ. 2005, 82 (12), 1864−1870. (4) McClary, L.; Bretz, S. L. Development and Assessment of a Diagnostic Tool to Identify Organic Chemistry Students’ Alternative Conceptions Related to Acid Strength. Int. J. Sci. Educ. 2012, 34 (15), 2317−2341. (5) Examinations Institute of the American Chemical Society Division of Chemical Education; University of WisconsinMilwaukee: Milwaukee, WI, 2018. (6) Standards for Educational and Psychological Testing; American Educational Research Association: Washington, DC, 1999; http:// www.apa.org/science/programs/testing/standards.aspx (accessed Apr 2019). (7) Arjoon, J. A.; Xu, X. Y.; Lewis, J. E. Understanding the State of the Art for Measurement in Chemistry Education Research: Examining the Psychometric Evidence. J. Chem. Educ. 2013, 90 (5), 536−545. (8) Adams, W. K.; Wieman, C. E.; Perkins, K. K.; Barbera, J. Modifying and Validating the Colorado Learning Attitudes about Science Survey for Use in Chemistry. J. Chem. Educ. 2008, 85 (10), 1435−1439. (9) Stains, M.; Escriu-Sune, M.; Molina Alvarez De Santizo, M. L.; Sevian, H. Assessing Secondary and College Students’ Implicit Assumptions about the Particulate Nature of Matter: Development and Validation of the Structure and Motion of Matter Survey. J. Chem. Educ. 2011, 88 (10), 1359−1365. (10) Wren, D.; Barbera, J. Gathering Evidence for Validity During the Design, Development, and Qualitative Evaluation of Thermochemistry Concept Inventory Items. J. Chem. Educ. 2013, 90 (12), 1590−1601. (11) Schwartz, P.; Barbera, J. Evaluating the Content and Response Process Validity of Data from the Chemical Concepts Inventory. J. Chem. Educ. 2014, 91 (5), 630−640. (12) Brandriet, A. R.; Bretz, S. L. The Development of the Redox Concept Inventory as a Measure of Students’ Symbolic and Particulate Redox Understandings and Confidence. J. Chem. Educ. 2014, 91 (8), 1132−1144. (13) American Association for the Advancement of Science, Project 2061. Science for All Americans: A Project 2061 Report on Literacy Goals in Science, Mathematics, and Technology; Washington, DC, 1989. (14) American Association for the Advancement of Science, Project 2061. Benchmarks for Science Literacy; Oxford University Press: New York, NY, 1993.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available on the ACS Publications website at DOI: 10.1021/acs.jchemed.8b00990.



REFERENCES

Table of the distribution of codes by item (PDF) Table of item statistics from each set of response process interviews (PDF)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Kristen L. Murphy: 0000-0002-7211-300X 1357

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358

Journal of Chemical Education

Article

(15) National Research Council. A Framework for K−12 Science Education: Practices, Crosscutting Concepts, and Core Ideas; The National Academies Press: Washington, DC, 2012. (16) National Science Teachers Association. Scope, Sequence, and Coordination of Secondary School Science: A Project of the National Science Teachers Association; National Science Teachers Association: Washington, DC, 1992. (17) National Research Council. Next Generation Science Standards: For States, by States; National Academies Press: Washington, DC, 2013. (18) Tretter, T. R.; Jones, M. G.; Andre, T.; Negishi, A.; Minogue, J. Conceptual Boundaries and Distances: Students’ and Experts’ Concepts of the Scale of Scientific Phenomena. J. Res. Sci. Teach. 2006, 43 (3), 282−319. (19) Tretter, T. R.; Jones, M. G.; Minogue, J. Accuracy of Scale Conceptions in Science: Mental Maneuverings Across Many Orders of Spatial Magnitude. J. Res. Sci. Teach. 2006, 43 (10), 1061−1085. (20) Jones, M. G.; Taylor, A. R. Developing a Sense of Scale: Looking Backward. J. Res. Sci. Teach. 2008, 46 (4), 460−475. (21) Trend, R. D. Deep Time Framework: A Preliminary Study of U.K. Primary Teachers’ Conceptions of Geologic Time and Perceptions of Geoscience. J. Res. Sci. Teach. 2001, 38 (2), 191−221. (22) Jones, M. G.; Tretter, T.; Taylor, A.; Oppewal, T. Experienced and Novice Teachers’ Concepts of Spatial Scale. Int. J. Sci. Educ. 2008, 30 (3), 409−429. (23) Jones, M. G.; Paechter, M.; Yen, C.; Gardner, G.; Taylor, A.; Tretter, T. Teachers’ Concepts of Spatial Scale: An Internatinoal Comparison. Int. J. Sci. Educ. 2013, 35 (14), 2462−2482. (24) Gerlach, K.; Trate, J.; Blecking, A.; Geissinger, P.; Murphy, K. Valid and Reliable Assessments to Measure Scale Literacy of Students in Introductory College Chemistry Courses. J. Chem. Educ. 2014, 91 (10), 1538−1545. The scale literacy skills test is available upon request from the authors. (25) Gerlach, K.; Trate, J.; Blecking, A.; Geissinger, P.; Murphy, K. Investigation of Absolute and Relative Scaling Conceptions of Students in Introductory College Chemistry Courses. J. Chem. Educ. 2014, 91 (10), 1526−1537. (26) Swarat, S.; Light, G.; Park, E. J.; Drane, D. A Typology of Undergraduate Students’ Conceptions of Size and Scale: Identifying and Characterizing Conceptual Variation. J. Res. Sci. Teach. 2011, 48 (5), 512−533. (27) Eubanks, I. D.; Eubanks, L. T. Writing Tests and Interpreting Test Statistics: A Practical Guide; American Chemical Society (ACS), Division of Chemical Education, Examinations Institute: Washington, DC, 1995. (28) Meng, X. L.; Rosenthal, R.; Rubin, D. B. Comparing Correlated Correlation Coefficients. Psychol. Bull. 1992, 111 (1), 172−175.

1358

DOI: 10.1021/acs.jchemed.8b00990 J. Chem. Educ. 2019, 96, 1351−1358