Developing an Array Binary Code Assessment Rubric for Multiple

Oct 24, 2014 - Developing an Array Binary Code Assessment Rubric for ..... Berk , R. A. A Humorous Account of 10 Multiple-Choice Test-Item Flaws That ...
0 downloads 0 Views 838KB Size
Article pubs.acs.org/jchemeduc

Developing an Array Binary Code Assessment Rubric for MultipleChoice Questions Using Item Arrays and Binary-Coded Responses Elizabeth K. Haro† and Luis S. Haro*,‡ †

Department of Biology, Stanford University, Stanford, California 94305, United States Department of Biology, The University of Texas at San Antonio, San Antonio, Texas 78249, United States



S Supporting Information *

ABSTRACT: The multiple-choice question (MCQ) is the foundation of knowledge assessment in K−12, higher education, and standardized entrance exams (including the GRE, MCAT, and DAT). However, standard MCQ exams are limited with respect to the types of questions that can be asked when there are only five choices. MCQs offering additional choices more extensively test an examinee’s knowledge of complex concepts and thus an examiner can get a more accurate assessment of the examinee’s comprehension. We introduce an assessment paradigm in which choices of the MCQs are items in an array encoded with a number that can be expressed via binary notation in the standard five-choice MCQ answer sheet. Numbers can be represented by combinations of zeroes (unfilled bubbles) and ones (filled bubbles) at each of the five numerical place-holder positions. A scanner that recognizes five filled/unfilled “bubbles” per horizontal line of the answer sheet makes it possible to represent 31 options as possible answers. The rubric provides a more comprehensive assessment of a student’s knowledge than a standard five-choice MCQ. It has the potential to revolutionize MCQ testing. We present examples of MCQ assessment of chemical concepts at all levels of Bloom’s taxonomy, employing item arrays and binary-coded choices. KEYWORDS: General Public, Biochemistry, Inorganic Chemistry, Organic Chemistry, Physical Chemistry, Testing/Assessment, Standards National/State



INTRODUCTION The multiple-choice question (MCQ) is the foundation of standardized testing programs in higher education and in classroom settings. Standardized tests employ MCQs to assess knowledge of biochemistry, organic chemistry, and other branches of chemistry. Examples include the chemistry subject exams created by the American Chemical Society,1 the Medical College Admission Test developed by the American Association of Medical Colleges,2 and both the general and subject Graduate Record Examinations produced by the Educational Testing Service,3 as well as a multitude of exams for entrance into health professions such as dentistry, nursing, optometry, and pharmacy. The format is used pervasively because (i) it permits inexpensive and objective scoring, (ii) such questions can be answered rapidly, allowing broad topic coverage within a testing session, and (iii) a sophisticated statistical technology has been developed to support the automated analysis and interpretation of the test results. Hence, for both large and small class sizes, it is a manageable method of assigning test scores and grades. Although the use of MCQ testing is widespread in science and other disciplines, there is constant debate concerning its role as an evaluation tool.4−13 Assessment is a critical component of an educational system, and to fulfill this requirement MCQ exams have become ubiquitously employed appraisal instruments in large lecture classes. Many science classes are in the range of 100−300 © 2014 American Chemical Society and Division of Chemical Education, Inc.

students per class. With cuts to the education budget we can expect increased class sizes together with increased use of MCQ exams that utilize automated grading. Decreased funding forces the educational system to downsize the personnel needed to grade exams. Hence, assessment of understanding of complex scientific concepts, of ability to perform topic-relevant calculations, and of capacity for logical reasoning through constructed response types of questions or questions requiring quantitative answers are not feasible when there is a shortfall of person-hours needed to score these types of exams. The cost of operating an assessment program comprised of constructed response questions is substantially more than an assessment program that employs MCQ exams.14 To score the College Board’s Advanced Placement Test in Chemistry costs approximately $30.00 per constructed response item versus about $0.01 per MCQ exam item. In other words, assessment of constructed response exams costs 3000-fold more than assessment of MCQ exams. The MCQ exams are less costly and more efficient because they are scored with automated systems such as the ParScore system (Scantron Corporation, Eagan, MN, USA). This technology makes use of MCQ answer sheets, a scanner, and software to carry out the grading and item analysis of MCQs. Published: October 24, 2014 2064

dx.doi.org/10.1021/ed400509d | J. Chem. Educ. 2014, 91, 2064−2070

Journal of Chemical Education

Article

Table 1. Symbols of Selected Chemical Elements

can be changed from four distractors and one correct answer to 30 distractors and 1 correct answer. The ability to have more than 5 options associated with a question along with automated scoring of exams opens the door to creativity in assessment. Examples are presented to illustrate use of the rubric to assess knowledge of chemistry.

Furthermore, statistics are provided for each and every MCQ response for both the entire class and individual students. Book publishers have contributed to the widespread use of MCQ exams in science and other disciplines. They have developed extensive MCQ test banks for use with their textbooks. The questions consist of the standard 4- or 5-choice MCQ formats that are supplied to instructors who use their textbooks, e-books, schoolbooks, and other published works. However, MCQ exams are limited with respect to the types of questions that can be asked when there are only 5 choices. MCQs that offer more choices can more extensively test a student’s knowledge of complex concepts, and thus an instructor can get a more accurate assessment of the student’s perceptions of the concepts. The expansion of MCQ answer pools has been developed as matching and extended matching test formats.15−17 Expanded MCQ answer pools are a widely used testing format in medical testing in many areas of medicine as evidenced by a plethora of books dedicated to the testing format.18−28 Expanded MCQ exams with the correct answer chosen from a list of 7−26 possible answers were found to be appropriate for a final test of clinical knowledge in medical education.29 They were also reported to measure an aspect of student achievement that is linked to student achievement in clinical diagnostic ability.30−32 The test format was useful in monitoring progress in gross anatomy courses where grades on final examinations were higher in the extended-matching group versus the 5-choice MCQ group.33 A missing component of the expanded MCQ format is an ability to automate the grading, and this communication addresses that shortcoming. Interestingly, a meta-analysis of 80 years of multiple-choice testing endorsed the use of the 3-option MCQ to improve content coverage in most settings. However, it was noted that the findings do not necessarily conflict with earlier recommendations, including those of the author, to write as many plausible distractors as possible.34−36 Chemistry lends itself to the generation of MCQ testing with numerous plausible distractors. For example, in the assessment of knowledge referring to the properties, acronyms, or structures of the 20 common amino acids, each of the 20 amino acids is a plausible distractor. In the testing of knowledge regarding the elements of the periodic table there are 118 plausible distractors. Metabolic pathways possess numerous plausible distractors, the enzymes and metabolites, which pertain to testing of that subject matter. To overcome the limitations of the 5-choice MCQ format and the nonautomated scoring of the expanded MCQ format we introduce a novel assessment rubric. In this method options for the MCQs are items listed in an array. The items are associated with a unique numerical identifier. The MCQ option array is comprised of numerous distractors and a correct answer. The number associated with the answer choice is then recorded in binary notation37 on the standard 5-choice MCQ answer sheet. Using this blueprint the architecture of the MCQ



IMPLEMENTING THE ABC ASSESSMENT RUBRIC IN THE KNOWLEDGE DOMAIN OF CHEMISTRY Bloom’s Taxonomy38 and the revised taxonomy39 were devised to encourage higher forms of thinking in education, such as analyzing and evaluating, rather than just recalling facts. In addition to assessing knowledge and comprehension, MCQ exams can also be designed to assess the higher levels of cognition (application, analysis, and synthesis) described in Bloom’s Taxonomy.40−43 Considering this, the ABC Assessment Rubric can expand evaluation of higher learning by integrating it with established MCQ test item development strategies.17,44−48 In the Taxonomy of Educational Objectives,38 a classification of educational objectives that deal with recall or recognition of knowledge as well as the development of intellectual abilities and skills was assembled into a schemata referred to as the cognitive domain. The hierarchical classification scheme, comprised of six levels from the least complex to the most complex level of thinking, are Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation. The ABC Assessment Rubric can be used to assess learning at all levels of the cognitive domain. Knowledge Assessment

Knowledge, the lowest level of the cognitive scheme, requires students to recall information. In the following example, knowledge of a word referent that corresponds to a given symbol is assessed. The student is asked to recall the symbol for the chemical element sodium, from Table 1, with the correct option corresponding to ⑧. The number 8 is then recorded in the answer sheet in binary code as ○●○○○ (B), since the pattern corresponds to the binary code for the number 8 shown in Figure 1 of the Supporting Information. Comprehension Assessment

Comprehension, the next higher level of the cognitive domain, can be assessed by judging the faithfulness and accuracy of translation of one form of communication into another. In this next example we test the student’s ability to understand nonliteral statements by translating chemical nomenclature into chemical formula. The student is asked to translate the chemical name butane into a chemical formula by selecting an option from Table 2. The correct translation corresponds to option ⑤, reported in the answer sheet in binary code as ○○●○● (CE). Application Assessment

Application is the next more advanced level in the cognitive domain. In this type of assessment the student is given a new 2065

dx.doi.org/10.1021/ed400509d | J. Chem. Educ. 2014, 91, 2064−2070

Journal of Chemical Education

Article

chemical synthesis problem, shown in Figure 2, where a starting compound, phenylmethanol, is used to synthesize a desired

Figure 2. Organic synthesis reaction scheme showing missing reagents and reactants in the synthesis of 2-(benzyloxy)acetaldehyde.

product compound, 2-(benzyloxy)acetaldehyde, will be a vehicle to assess the ability of a student to put together components and sequences of reactions that will lead to the synthetic product. The approach to carry out this synthesis is for the student to observe that there is a difference of C2H4O between the formulas of starting reactant phenylmethanol (C7H8O) and the intermediate (C9H12O2). An inspection of the reactant compounds of Figure 3 reveals that only acetic anhydride

Figure 1. Titration curve of histidine. Points labeled ① thru ⑦ on the curve represent answer options to the stem question asking for identification of the point on the curve where [histidine0] = [histidine−].

problem and asked for an application of the appropriate abstraction without showing the student how to use it in the given situation. The student in this next example is asked to apply his or her knowledge of balancing chemical reactions, by carrying out a stoichiometric analysis of a chemical reaction. The student provides the numbers needed in the blanks to balance an unfamiliar chemical reaction such as the one given below: The balanced equation is

Figure 3. Reactant and reagent options for organic synthesis of 2(benzyloxy)acetaldehyde.

The answers would be 2, 6, and 1, entered in the 5-item answer sheet in binary code as ○○○●○ (D), ○○●●○ (CD), and ○○○○● (E), respectively.

(③) and ethylene oxide (⑥) can provide the two-carbon chemical unit needed to synthesize the intermediate. In the product we see that the starting material is joined to a twocarbon chain by an ether linkage. An ether linkage results from a reaction of an alcohol with an epoxide under either acid or base catalysis. Hence, we can hypothesize that the intermediate, C9H12O2, is formed by addition of oxirane (⑥) to the starting reactant phenylmethanol. To succeed in carrying out such a reaction, a catalytic acid or base must be employed. The student selects the catalytic reagent from Figure 3. A reaction of phenylmethanol with the strong base NaH (⑤) will generate an appropriate nucleophile, the conjugate base phenolmethanoate, C7H7O−, so that it can carry out an SN2 attack on one of the carbons of the epoxide ring of oxirane. In the reaction mechanism there is a simultaneous opening of the epoxide ring and formation of a bond between a carbon of the opened epoxide and the oxygen of the nucleophile. The intermediate alcohol, (C6H5CH2−O−CH2CH2−OH), formed after acidification in step 3, is then oxidized by employing the pyridinium

Analysis Assessment

Analysis, the subsequent higher rung in the cognitive hierarchy, requires an individual to break down a communication into its constituent parts such that the relations between ideas expressed are made explicit. In this example, we ask the student to break down the idea, the titration curve of histidine shown in Figure 1, into the component chemical species that exist as a function of pH, in order to identify specific species at a point along the titration curve. In this case, the student is asked to identify the point on the curve where [histidine0] = [histidine−], which if correctly answered corresponds to the point labeled with ⑥ and documented as ○○●●○ (CD). Synthesis Assessment

Synthesis, the next higher step of the rankings in the cognitive hierarchy, requires an individual to put together pieces and parts so as to form a whole. In this example, a multistep Table 2. Chemical Formulas of Various Compounds

2066

dx.doi.org/10.1021/ed400509d | J. Chem. Educ. 2014, 91, 2064−2070

Journal of Chemical Education

Article

Table 3. Buffer Compounds

chlorochromate reagent corresponding to option ② of Figure 3. Consequently, to carry out this organic synthesis, the student would select reagent ⑤ for step 1, reactant ⑥ for step 2, and reagent ② in step 4, entered in the 5-item answer sheet in binary code as ○○●○● (CE), ○○●●○ (CD), and ○○○●○ (D), respectively.

listed in Table 3 and make a judgment as to which compound to select for use in the purification of the therapeutic substance. The student must evaluate each buffer for the appropriateness of it’s physicochemical properties (charge, pKa, and solubility) for the procedure, consumer safety, and cost. In order to make a choice, the student must arrive at multiple conclusions about each buffer and make an overall judgment. Evaluations of the value of each compound for use in the chromatography are simply a series of conclusions (yes, no, or more information is needed) as shown in Table 4. After assessment of each compound with respect to the statements and comparisons of conclusions for each compound, the student makes a final judgment. Upon evaluating each of the compounds for solubility, safety, buffering capability at appropriate pH, charge properties, and cost, the best choice from Table 3 is the

Evaluation Assessment

Evaluation, the highest step of the cognitive domain, requires qualitative and quantitative judgments about the extent to which material and methods satisfy criteria. To assess the evaluation skills of the student, in this hypothetical example, the student’s biotechnology company is contracted to purify a metric ton of a therapeutic substance using aqueous cationexchange chromatography with a pH 6.9 buffer at 20 mM. The buffer must be either neutral or cationic so it does not adhere to the ion-exchanger. The student must evaluate the compounds 2067

dx.doi.org/10.1021/ed400509d | J. Chem. Educ. 2014, 91, 2064−2070

Journal of Chemical Education

Article

issue or limitation that arises is that students require more time to fill-in their answers using the ABC Assessment Rubric. Hence, teachers implementing the method should be cognizant of the extra time required to bubble in multiple bubbles compared to a 1-bubble response and therefore should factor in time for this in designing their exams. Another issue is the waste involved in printing a table of the binary code for each student for every exam. This is easily resolved by using laminated sheets of tables containing the binary code bubbling pattern of Figure 1 of the Supporting Information that can be reused indefinitely. Another concern is that students occasionally mis-bubble their answer. Students can circle or write the number of their selected response on the exam so that the student can address mis-bubbling concerns with the teacher after scoring of the exam. In cases of mis-bubbling, the circled or written number on the exam takes precedent over the bubbled-in answer. When students have not previously been exposed to the assessment rubric their initial reactions are naturally ones of uneasiness, as would be expected of new situations. However, students show a willingness to receive the novel testing rubric. During the first exam students are sensitized to the novel assessment format. After filling in the score sheet with their first few responses they are comfortable with bubbling in their answers. They adapt and show satisfaction in their ability to respond to questions using the new format. Students express acceptance of the novel testing format after the first exam. Some students have voiced a value and preference for the format because of its ability to thoroughly test one’s knowledge of the subject. Other students like the format because they feel that it can better differentiate rankings among them. Instances of negative valuation of the assessment rubric are rare.

Table 4. Graphic Organizer for Compound Evaluation Compound: ⑦ Statements Relevant to Compound Selection The compound is soluble in water at the required concentration? The compound is safe to use for this purpose compared to other choices? The compound will buffer adequately at the required pH? The compound has the appropriate charge at the required pH? The compound is cost-effective compared to other choices?

Conclusion Choices Yes

No

More information needed to decide

X X X X X

compound corresponding to option ⑦, noted as ○○●●● (CDE).



DISCUSSION The ABC assessment rubric plays a role in reducing guessing by augmenting the number of options for a given stem question. The effect of guessing on multiple-choice questions is a subject that has been investigated. A statistical model that analyzed uncertainties of test reliabilities resulting from guessing in multiple-choice and true/false tests was reported.49 The model showed that in a 60-question exam, assuming that the students knew the answers to 30 questions, as the number of stem choices increased from 2 thru 5, the scores decreased from 45 to 36. Increasing the number of answer choices decreased test scores, and they approached a score that coincided with a student’s knowledge of the subject matter. The effects of blind guessing on the success of passing multiple choice and true− false tests revealed that as the number of options per question decreased, the number of passing scores increased.50 Hence, the study demonstrated that the effect of blind guessing diminishes as the number of options increases. It was established that, with respect to guessing, test reliability could be improved by increasing number of options associated with the stem.7 Furthermore, when guessing is present, reliability of multiplechoice tests is an increasing function of the number of item choices.51 In practice, students and teachers may encounter certain time limitations and issues relating to cognitive burden in implementing this method; however, these can be readily mitigated. One issue to arise is that unfamiliarity with the use of the binary code could add cognitive burden to students taking the exam. In order to reduce the cognitive burden, in the courses ranging from freshmen-level to senior-level courses including Contemporary Biology I for nonmajors, Biochemistry, and Endocrinology, we have implemented strategic tactics. The students are introduced to the ABC Assessment Rubric by giving them an online practice test that is accompanied by a table of the binary code and the correct answers bubbled into the Parscore form. With this practice exam the students can compare their bubbled-in responses to the key of correctly bubbled-in responses. Practice quizzes are also given using the binary code so that students practice bubbling in their answers and thus feel very comfortable with the process of using the binary code to select a response and then bubble the answer into the Parscore form before taking an exam. In this way, the cognitive process is in selecting the correct item response, as it should be, and filling in the bubbling pattern for the number associated with the selected response is “mechanical”. Another



CONCLUSION

In summary, to redress limitations of the 5-choice MCQ format and the nonautomated expanded MCQ format we have devised the ABC Assessment Rubric. It incorporates up to a maximum of a 31-option array for a 5-choice answer sheet from which respondents are asked to select the best possible answer for a given stem. A standard 5-choice score sheet is used to record a respondent’s choices in binary notation. Objective grading of the respondent’s score sheet is automated by utilizing equipment and software currently used for scoring 5-choice answer sheets. By employing the ABC Assessment Rubric, guesswork is drastically diminished, compared to 20% −25% for the traditional MCQ. The method provides enhanced flexibility in the construction of objective examinations. Test questions can be fashioned in different formats to assess the learning at different levels of Bloom’s Taxonomy. Additionally, assessing and scoring constructed response tasks, like producing a numerical answer to an arithmetic question, is possible with the setup. The rubric is a novel measuring instrument intended to describe numerically the degree of learning under uniform, standardized conditions. It represents an evolution from the simple 5-choice MCQ. We hope that the adoption of the ABC Assessment Rubric will bring new creativity to the design of examinations and thus improve the quality of educational and standardized testing. 2068

dx.doi.org/10.1021/ed400509d | J. Chem. Educ. 2014, 91, 2064−2070

Journal of Chemical Education



Article

(16) Case, S. M.; Swanson, D. B.; Ripkey, D. R. Comparison of items in five-option and extended-matching formats for assessment of diagnostic skills. Acad. Med. 1994, 69 (10), S1−S3. (17) Case, S. M.; Swanson, D. B. Constructing Written Test Questions For the Basic and Clinical Sciences; National Board of Medical Examiners: Philadelphia, PA, 2002. (18) Akkad, A.; Habiba, M.; Konge, J. EMQs in Obstetrics and Gynaecology; Radcliffe Publishing Ltd.: Oxford, U.K., 2006; p 176. (19) Alcolado, J. C.; Mir, M. A. Extended-Matching Questions for Finals, 2nd ed.; Elsevier Health Sciences: London, U.K., 2007; p 428. (20) Ayoub, T.; Badiei, S.; Ionides, A. Extended Matching Questions in Opthalmology; TFM Publishing Ltd.: Harley Shrewsbury, U.K., 2014; p 300. (21) Barakat, N. G. Best of Five and Extended Matching Questions for MRCPCH: Pt. 1; 123 Doc Medical London, U.K., 2011; p 210. (22) Coales, U. F. PLAB 1000 Extended Matching Questions; Royal Society of Medicine Press Ltd.: London, U.K., 2000; p 292. (23) Feather, A.; Domizio, P.; Hayes, K.; Knowles, C. H.; Lumley, J. S. P. EMQs for Medical Students: Practice Papers; PasTest: Knutsford, U.K., 2008; Vol. 1, p 304. (24) Feather, A.; Domizio, P.; Hayes, K.; Knowles, C. H.; Lumley, J. S. P. EMQs for Medical Students: Practice Papers; PasTest: Knutsford, U.K., 2008; Vol. 3, p 384. (25) Feather, A.; Domizio, P.; Hayes, K.; Knowles, C. H.; Lumley, J. S. P. EMQs for Medical Students: Practice Papers; PasTest: Knutsford, U.K., 2008; Vol. 2, p 304. (26) Irfan Syed, I.; Keshtgar, M. EMQs in Surgery, 2nd ed.; Taylor & Francis Ltd.: London, U.K., 2012; p 320. (27) Reilly, M.; Raju, B. Extended Matching Items for the MRCPsych Part 1; Radcliffe Publishing Ltd.: Oxford, U.K., 2004; p 224. (28) Roddie, I. C.; Wallace, W. F. M. MCQs & EMQs in Human Physiology: With Answers and Explanatory Comments, 6th ed.; Taylor & Francis Ltd.: London, U.K., 2004; p 192. (29) Beullens, J.; Van Damme, B.; Jaspaert, H.; Janssen, P. J. Are extended-matching multiple-choice items appropriate for a final test in medical education? Medical Teacher 2002, 24 (4), 390−395. (30) Beullens, J.; Struyf, E.; Van Damme, B. Diagnostic ability in relation to clinical seminars and extended-matching questions examinations. Medical Educ. 2006, 40 (12), 1173−1179. (31) Beullens, J.; Struyf, E.; Van Damme, B. Do extended matching multiple-choice questions measure clinical reasoning? Medical Educ. 2005, 39 (4), 410−417. (32) Struyf, E.; Beullens, J.; Van Damme, B.; Janssen, P.; Jaspaert, H. A new methodology for teaching clinical reasoning skills: problem solving clinical seminars. Medical Teacher 2005, 27 (4), 364−368. (33) Lukic, I. K.; Gluncic, V.; Katavic, V.; Petanjek, Z.; Jalsovec, D.; Marusic, A. Weekly quizzes in extended-matching format as a means of monitoring students’ progress in gross anatomy. Annals of anatomy = Anatomischer Anzeiger: official organ of the Anatomische Gesellschaft 2001, 183 (6), 575−579. (34) Haladyna, T. M.; Downing, S. M.; Rodriguez, M. C. A review of multiple-choice item-writing guidelines for classroom assessment. Educ. Meas.: Iss. Prac. 2002, 15, 309−334. (35) Haladyna, T. M.; Downing, S. M. A taxonomy of multiplechoice item writing rules. Appl. Meas. Educ. 1989, 2, 37−50. (36) Haladyna, T. M.; Downing, S. M. Validity of a taxonomy of multiple-choic item writing rules. Appl. Meas. Educ. 1989, 2, 51−78. (37) Leibniz, G. W. Explication de l’arithmétique binaire, qui se sert des seuls caractères 0 et 1 avec des remarques sur son utilité et sur ce qu’elle donne le sens des anciennes figures chinoises de Fohy. Mem. Acad. r. Sci. Paris 1703, 3, 85−89. (38) Bloom, B.; Englehart, M.; Furst, E.; Hill, W. Taxonomy of educational objectives: The classification of educational goals; McKay Publishers: New York, 1956. (39) Anderson, L. W.; Krathwohl, D. R. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives; Addison Wesley Longman: New York, 2001. (40) Cheesman, K. L. Writing/using multiple-choice questions to assess higher-order thinking. In College science teachers guide to

ASSOCIATED CONTENT

S Supporting Information *

Binary code; a collection of examples of the ABC Assessment Rubric used in MCQ formats that include composite answers to multiple questions, multiple true/false format, numerical calculated response, multiple-multiple choice, matching/ extended matching, and the incomplete stem format, as well as other examples. This material is available via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS



REFERENCES

We thank the following providers of support for Elizabeth K. Haro: Alamo Council of the Blind, American Chemical Society Scholars Program, The Jewish Guild for the Blind.

(1) American Chemical Society Exams. http://chemexams.chem. iastate.edu/materials/exams.cfm (accessed May 2013). (2) American Association of Medical Colleges Medical College Admission Test. https://http://www.aamc.org/students/applying/ mcat/ (accessed May 2013). (3) Educational Testing Service Graduate Record Examination. http://www.ets.org/gre/ (accessed May 2013). (4) Ashford, T. A. A Brief History of Objective Tests. J. Chem. Educ. 1972, 49 (6), 420−423. (5) Berk, R. A. A Humorous Account of 10 Multiple-Choice TestItem Flaws That Clue Testwise Students. J. Excel. Coll. Teach. 1998, 9 (2), 93−117. (6) Bleske-Rechek, A.; Zeug, N.; Webb, R. M. Discrepant performance on multiple-choice and short answer assessments and the relation of performance to general scholastic aptitude. Assess. Eval. High. Educ. 2007, 32 (2), 89−105. (7) Burton, R. F. Quantifying the effects of chance in multiple choice and true/false tests: question selection and guessing of answers. Assess. Eval. High. Educ. 2001, 26 (1), 41−50. (8) Cohen, D. J.; Rosenzweig, R. No Computer Left Behind. Chron. High. Educ. 2006, 52 (25), B6−B8. (9) Considine, J.; Botti, M.; Thomas, S. Design, format, validity and reliability of multiple choice questions for use in nursing research and education. Collegian 2005, 12 (1), 19−24. (10) Hartman, J. R.; Lin, S. Analysis of Student Performance on Multiple-Choice Questions in General Chemistry. J. Chem. Educ. 2011, 88 (9), 1223−1230. (11) Mujeeb, A. M.; Pardeshi, M. L.; Ghongane, B. B. Comparative assessment of multiple choice questions versus short essay questions in pharmacology examinations. Indian J. Med. Sci. 2010, 64 (3), 118−124. (12) Newble, D. I.; Baxter, A.; Elmslie, R. G. A comparison of multiple-choice and free-response tests in examinations of clinical competence. Med. Educ. 1979, 13, 263−268. (13) Wormer, F. B. Reexamination of multiple-choice testing. Education 1970, 90 (4), 285−289. (14) Wainer, H.; Thissen, D. Combining multiple-choice and constructed-response Test Scores: Toward a Marxist theory of test construction. Appl. Meas. Educ. 1993, 6 (2), 103−118. (15) Case, S. M.; Swanson, D. B. Extended-matching items: A practical alternative to free-response questions. Teach. Learn. Med. 1993, 5 (2), 107−115. 2069

dx.doi.org/10.1021/ed400509d | J. Chem. Educ. 2014, 91, 2064−2070

Journal of Chemical Education

Article

assessment; Lord, T. R.; French, D. P.; Crow, L. W., Eds. National Science Teachers Association Press: Arlington, VA, 2009; pp 35−41. (41) Kim, M.-Y.; Patel, R. A.; Uchizono, J. A.; Beck, L. Incorporation of Bloom’s Taxonomy into Multiple-Choice Examination Questions for a Pharmacotherapeutics Course. Am. J. Pharm. Educ. 2012, 76 (6), 1−7. (42) Stupans, I. Multiple choice questions: Can they examine application of knowledge? Pharm. Educ. 2006, 6 (1), 59−63. (43) Morrison, S.; Free, K. W. Writing Multiple-Choice Test Items That Promote and Measure Critical Thinking. J. Nurs. Educ. 2001, 40 (1), 17−24. (44) Haladyna, T. M. Developing and Validating Multiple-Choice Test Items, 3rd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, 2004. (45) Haladyna, T. M.; Rodriguez, M. C. Developing and Validating Test Items; Routledge: New York, NY, 2013. (46) Nitko, A. J.; Brookhart, S. M. Educational assessment of students, 6th ed.; Prentice Hall: Lebanon, IN, 2010. (47) Osterlind, S. J. Constructing Test Items: Multiple-Choice, Constructed-Response, Performance and Other Formats, 2nd ed.; Kluwer Academic Publishers: Boston, MA, 1998; Vol. 47. (48) Campbell, D. E. How to write good multiple-choice questions. J. Paediatr. Child Health 2011, 47, 322−325. (49) Burton, R. F.; Miller, D. J. Statistical Modelling of Multiplechoice and True/False Tests: ways of considering, and of reducing, the uncertaintities attributable to guessing. Assess. Eval. High. Educ. 1999, 24 (4), 399−411. (50) Wang, J. Critical values of guessing on true-false and multiplechoice tests. Education 2001, 116 (1), 153−158. (51) Zimmerman, D. W.; Williams, R. H. A New Look at the Influence of Guessing on the Reliability of Multiple-Choice Tests. Appl. Psychol. Meas. 2003, 27 (5), 357−371.

2070

dx.doi.org/10.1021/ed400509d | J. Chem. Educ. 2014, 91, 2064−2070