Validation of an Assessment Rubric via Controlled Modification of a

Jun 22, 2012 - Department of Chemistry, University of New Hampshire, Durham, New ... For a more comprehensive list of citations to this article, users...
0 downloads 0 Views 344KB Size
Article pubs.acs.org/jchemeduc

Validation of an Assessment Rubric via Controlled Modification of a Classroom Activity Christopher F. Bauer*,† and Renée Cole‡,§ †

Department of Chemistry, University of New Hampshire, Durham, New Hampshire 03824, United States Department of Chemistry and Physics, University of Central Missouri, Warrensburg, Missouri 64093, United States



S Supporting Information *

ABSTRACT: A rubric that embodies the key features of the processoriented, guided-inquiry learning (POGIL) model was subjected to systematic study of validity and reliability. Nearly 60 college instructors used the rubric to evaluate four intentionally modified versions of an established POGIL activity. The modifications strengthened or weakened key characteristics of the activity. Results indicate that the rubric was sufficiently sensitive and reliable for distinguishing the structures of the four versions, even in the hands of those inexperienced with POGIL or the rubric, and that the evaluation was consistent with the design characteristics. The rubric should continue to be useful for reviewing existing classroom materials, for efficient authoring and revising of new materials, and for introducing others to POGIL pedagogy. This article also exemplifies a rigorous process by which a rubric may be evaluated. KEYWORDS: First-Year Undergraduate/General, Chemical Education Research, Curriculum, Inquiry-Based/Discovery Learning, Collaborative/Cooperative Learning FEATURE: Chemical Education Research

D

that exploration, and then application of that idea. The materials and classroom facilitation also support development of cognitive process skills (information processing, problem solving, critical thinking) and group process skills (management, communication, teamwork).8,9

isseminating the physical artifacts of a curriculum is much easier than disseminating the beliefs on which it was designed. Artifacts, such as copies of classroom activity handouts, PowerPoint slides, and laboratory procedures, are easily exchanged. Beliefsa tacit network of understandings and commitments regarding learningare not. Acquisition of instructional materials without embracing their underlying beliefs is a suggested reason for curriculum drift and dilution: drift in form and goals, and dilution of outcomes.1−6 Dancy and Henderson7 describe how curriculum innovations grounded in recent physics education research are often inappropriately assimilated by new users. These new users fully intend to adopt particular strategies or materials, but in practice they make adaptations they feel necessary to adjust to their context or expectations. These adaptations may compromise effectiveness, leading them to conclude that the strategy does not work. Furthermore, independent observations of faculty who claim to be implementing innovations suggest that faculty over-report the amount of innovation they actually implemented. The steering committee of the process-oriented, guidedinquiry learning (POGIL) project8 recognized that this issue had to be addressed explicitly as the dissemination of project materials expanded. POGIL instruction is implemented through structured classroom materials that students work through in groups, under the watchful eye of an instructor who monitors progress and facilitates discussion. The activities have an intentional learning-cycle structure that includes an exploration of a model or data, the development of an important idea from © 2012 American Chemical Society and Division of Chemical Education, Inc.



RUBRIC DEVELOPMENT To communicate the design philosophy and provide a guide to faculty developing new materials, a rubric was developed that embodied these characteristics. The first version was designed by a group of analytical chemists who are developing materials for that area of chemistry.10 Having a rubric helped with both the development and review of materials to ensure consistency and adherence to the POGIL philosophy. It also provided a mechanism for reviewers to provide consistent feedback to authors. In addition, this engagement of the faculty not directly involved in writing activities increases their involvement in the course development process, making it more likely that they will continue their own implementation.11 Early versions of the rubric were tested at workshops, and feedback was gathered on its performance, allowing continual refinement. A rubric is an explicit, hierarchical rating scheme for qualitative evaluation or assessment, or both.12−15 A single global criterion (holistic rubric) or set of criteria (analytic rubric) are divided into categorical levels in terms of quality or frequency. The object of Published: June 22, 2012 1104

dx.doi.org/10.1021/ed2003324 | J. Chem. Educ. 2012, 89, 1104−1108

Journal of Chemical Education

Article

the assessment is then rated on those criteria. Many applications of rubrics have been published. A few recent examples include assessment of lab reports,16 engineering design projects,17 student portfolios,18 student presentations,19 concept maps,20,21 and classroom performance.22,23 Our literature review found numerous studies of inter-rater reliability, that is, rubric users score the same objects to establish the extent of agreement. Much less frequent24 are studies of validity: Does the rubric measure what it purports to measure? Fay et al.25 and Buck et al.26 used a modified rubric for distinguishing levels of inquiry in published chemistry lab experiments, all of which were identified by their authors in print as an “inquiry experiment”. Rubric scores indicated that the level of inquiry was not uniform despite the label. Validity in this case was based on a quasiexperimental design, taking advantage of an existing population of printed experiments, which were selected to represent a diversity of inquiry levels. Docktor and Heller27 reviewed students’ written solutions to physics problems using a rubric for describing problemsolving processes. Validity was established by independently interviewing students to compare their thinking with what they wrote down. Lastly, two studies feature video clips. In these, 3 operating room nurses28 and 13 middle school mathematics teachers29 were used as objects for performance assessment. The nursing videos were directly scored by experts to evaluate the validity and reliability of different rubric forms. The teaching videos were short classroom case studies about which teachers viewing the videos recorded observations and inferences. Their written comments were then analyzed by a rubric. In both the nursing and teaching cases, the video clips were selected to represent a range of performances, and validity was inferred by ratings that agreed with author expectations. This article describes an experimental test of a rubric designed to provide feedback to authors regarding the quality and design of guided-inquiry activities. The test materials consisted of four intentionally modified versions of an established activity. The research questions are:

2. Does this rubric identify the intended features in the four versions? 3. Can this rubric be used reliably by users with different levels of experience? Box 1. Design Characteristics of Four Versions of The Nuclear Atom Activity Version 1: Focuses on symbols, rules, and practice. Still includes a learning cycle. Removes questions or instructions that provide conceptual connections and reinforcement of ideas. Leaves one item about working in a group and self-assessment. Version 2: Presents all concepts first as a verbal explanation of the model, then includes the entire set of critical thinking questions and exercises as in the original. Version 3: Same sequence of questions as the original, but none that represent “exploration” of the model. In other words, it moves directly to questions requiring inferences with no questions involving redescription of the model. No cues or suggestions regarding group collaboration. Version 4: Original with additional cues to direct group activity and reflection on learning. Very explicit in aspects of process skills and self-assessment.



PROCEDURE Sixty science faculty members (predominantly chemists) attending the POGIL National Meeting in June 2008 used a rubric to evaluate four modifications of the same printed classroom activity before arrival at the meeting. Materials were provided in paper form as a packet with instructions and informed consent information on the top sheet, the four versions of the activity, and four copies of the rubric. No training was involved. Respondents checked consent or nonconsent and provided a private code name, which was used to identify their data anonymously in the records. The activity was The Nuclear Atom, the first activity in

1. To what extent does this rubric provide a valid evaluation of the guided-inquiry and process characteristics of written POGIL activities?

Table 1. Text and Category of Characteristic Items in the POGIL Rubric Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Category Objectives Objectives Objectives Objectives Learning Cycle Learning Cycle Process Process Learning Cycle Learning Cycle Learning Cycle Learning Cycle Learning Cycle Process Process

Statement If content learning objectives are not explicitly stated, a reviewer can still get a clear sense of what the content objectives are from the activity. If process learning objectives are not explicitly stated, a reviewer can still get a clear sense of what the process objectives are from the activity. The content objectives (stated or inferred) are constrained to no more than three key concepts. The activity is likely to lead to the accomplishment of the process and content objectives. There is a clear learning cycle structure of exploration, critical thinking leading to concept development, and application or practice of that concept. Sufficient structure is provided in written materials so that students are likely to be able to work through activity with minimal intervention by facilitator. The instructions provided to students cue them to work collaboratively. Questions engage students in developing a shared description of the model provided for exploration. Model being explored constitutes a reliable set of data or exemplars that give students enough evidence from which to develop clear inferences. Questions are sequenced in a logical manner that facilitates building up of the concepts of interest and avoids conceptual leaps. Questions engage students in interpreting, synthesizing, predicting, or explaining the results of the exploration, and in producing a written articulation of those understandings. There is a point at which a synthesis or organization of previous thinking guides students to some central idea. Materials provide applications or practice that extend the central idea in a meaningful and logical manner. Students assess what they have learned in terms of content. Students assess what they have learned in terms of cognitive process skills and/or group process skills. 1105

dx.doi.org/10.1021/ed2003324 | J. Chem. Educ. 2012, 89, 1104−1108

Journal of Chemical Education



Chemistry: A Guided Inquiry.30 This activity has been used extensively with students in general chemistry and with faculty at workshops as an introduction to POGIL. This activity therefore is perceived by the originators as having a strong set of characteristics exemplifying the POGIL model. For this study, the activity was intentionally modified to strengthen, weaken, or change certain characteristics of that activity, resulting in four versions. The modifications did not change the initial illustrations of nuclear structure, and did not change the wording of the critical thinking questions or exercises. The changes involved reordering questions, removing questions, or adding certain questions to the text. Box 1 lists the design characteristics of the four versions of The Nuclear Atom. Copies of the four activities are included in the Supporting Information. The rubric consists of 15 item statements (Table 1) describing desired characteristics of activities. (The complete form is included in the Supporting Information.) Using the rubric means judging whether a printed classroom activity incorporates explicitly each characteristic. The form requests a numerical rating (0, 1, 2, or 3 with 3 being highest) and written comments to explain the rating. The numerical rating provided an evaluation of the quality, while the comment box was used to provide feedback to the authors on how to make improvements to reach a higher quality level. The term “indicator” has been used as a synonym for “characteristic”. The numerical scale is anchored in the written instructions as shown in Box 2.

Article

RESULTS

The ratings given by each participant were entered into a spreadsheet. Graphs and statistical tests were accomplished with Excel or SPSS. Table 2 shows descriptive statistics for the four versions, including average scores for all participants, and segregated scores to compare: (i) the number of times they had used the rubric (more or less than seven uses, including for this experiment); (ii) amount of experience with The Nuclear Atom activity (at least once vs never); and (iii) whether they performed the reviews in numerical order (yes or no). Figure 1

Figure 1. Item-by-item averaged rubric scores for The Nuclear Atom versions. Error bars are 0.4 units, representing approximately p = 0.1 significance level based on Kruskal−Wallis rank sum test. For clarity, error bars are shown only in places where significant overlap exists among version ratings.

Box 2. Numerical Scale for Scoring the Four Versions of The Nuclear Atom Activity 0 = No explicit evidence regarding this indicator. 1 = Some evidence of meeting expectation. Significant improvement needed. 2 = Satisfactory evidence of meeting expectation. Improvements should be considered. 3 = Substantial or exemplary evidence of meeting expectation. Improvements not essential.

extends the comparison, illustrating the average score for each rubric characteristic for each of the four versions of the activity; error bars represent a p = 0.1 confidence range (±0.2 score units) based on Kruskall−Wallis (group) and Mann−Whitney (pairwise) nonparametric statistical comparisons.



The numerical sum of items 2−15 was used as an overall indicator of quality. Item 1 was not always answered, so it was left out of the sum. Thus, the maximum possible score was 42. The anticipated score rank of versions 1, 2, 3, and 4, based on the intended design differences, was expected to be 4 > 1 > 2 ≈ 3. Participant responses also included the number of times they had used the rubric in the past, the number of classes in which they had used the POGIL model, the number of POGIL workshops attended or led, and the sequence in which they rated the four activities.

DISCUSSION

Independence

Correlations between the average scores of the four versions of The Nuclear Atom activity were significant in only two cases. Versions 1 and 4 were correlated at r ∼ 0.4, as were versions 2 and 3. Because 1 and 4 are more similar in quality, as are 2 and 3, parallel scores in these two cases could be expected. The lack of significant correlations suggests that participants were able to

Table 2. Mean Rating and Standard Deviation for Faculty Applying the Rubric To Evaluate Four Intentionally Modified Versions of The Nuclear Atom Versiona

Average Scoreb

Experienced Rubric User (SD)

Novice Rubric User (SD)

The Nuclear Atom Veteran

The Nuclear Atom Novice

In Sequence Review

Out of Sequence Review

4 1 2 3 nc

37.2d 29.4d 20.9d 18.5d 49−56

38.5 (3.3) 32.3 (5.2) 18.6e (3.0) 16.2 (6.4) 10−14

36.8 (5.2) 28.1 (8.2) 21.8e (6.7) 19.4 (6.9) 39−42

37.8 31.0 23.0f 18.2 20−25

37.0 29.6 19.3f 15.8 11−13

37.1g 29.8 21.8 18.8 31−34

39.7g 32.1 21.3 19.2 6−10

a Versions are listed from highest to lowest mean rating. bMaximum score is 42. cNumber is a range because some respondents did not rate every item. d,e,f,gCommon superscript letters (d,e,f,g) indicate which sets of items show significant difference at p ≈ 0.1 by nonparametric Mann−Whitney pairwise test. A slightly liberal p was chosen to show a few marginal differences. For superscript d, there are statistically significant differences for each score relative to the others except that versions 2 and 3 are not different. For superscripts e, f, and g, the superscripts indicate there is a statistically significant difference between the two values.

1106

dx.doi.org/10.1021/ed2003324 | J. Chem. Educ. 2012, 89, 1104−1108

Journal of Chemical Education

Article

experience with The Nuclear Atom activity. Table 2 shows that experienced and novice users of the rubric obtained very similar scores and an identical ranking, as did those who were familiar with The Nuclear Atom and those who were not. The differences noted for version 2 (experience with rubric and experience with the activity) are at the margin of significance. When the scores for individual rubric items were compared, experienced rubric users were found to be slightly more negative on rubric items 4 (activity likely to achieve objectives) and 5 (clear learning cycle structure). We speculate that experienced rubric users are very familiar with learning cycle structure and recognized its absence (absent by design in version 2), and that the absence of this structure would lead to a lower chance that learning objectives would be achieved. In terms of activity experience, veteran users provided slightly more positive ratings for version 2 on items 10 (logical sequence), 12 (central idea developed), and 13 (application practice provided). Version 2 is identical in its written text to the original published activity. Veteran users may have recognized this similarity and tended to keep ratings on these items the same as on the other versions. Furthermore, amount of experience as a POGIL instructor did not make a difference (results not shown). These results suggest that the rubric is robust with respect to level of experience. Novice rubric users did tend to show larger divergence in scoring (standard deviations, Table 2), which was expected. This suggests that even new users have a good chance of making distinctions that are as valid as those of experienced users, and that experience with using this assessment tends to increase consistency in rating.

provide independent assessments of the four versions, despite the large amount of overlap in their content. Validity

The average scores (Table 2) indicated that each version was significantly different and ranked in quality in the order 4 > 1 > 2 ≈ 3, which matches the expected order. Figure 1 illustrates how these differences break down according to individual rubric items. Participants rated version 4 (average score = 37.2) highest on all rubric items. This version was the original published activity, with addition of explicit cues to direct group activity (process skills) and self-assessment. It was intended to score strongly on all categories, and the participants were able to observe that. The next highest scoring version was version 1 (average score = 29.4), which incorporated the learning cycle structure, but was weakened in process skills and conceptual connections. The results show that participants in fact ascribed lower ratings on items 2, 7, 8, and 15 (process skills) and items 5, 10, 11, and 12 (perceived strength of structure). Version 2 (average score = 20.9) was a lecture in print: information was presented, and then practice problems followed. Participants saw it that way. This version was similar to version 4 on items 3 (limiting number of concepts involved), 6 (minimal instructor intervention), 9 (reliable set of data), and 13 (there is a central idea developed). On the other hand, version 2 was rated much lower on process skills (2, 7, 8, 14, 15), which were absent, and much lower on learning-cycle structure (items 5, 11, 12), which was also absent. Version 3 (average score = 18.5) removed the exploratory portion of the learning cycle, but kept the higher-order inferential questions and applications. Again, participants identified these differences. Besides being very weak on process skills, participants noted that this version was lower than other versions on item 6 (minimal instructor intervention), 10 (avoiding conceptual leaps), and 13 (meaningful applications). The process skill items (2, 7, 8, 14, 15) contributed about seven points to differentials for versions 2 and 3 versus version 1. The learning cycle components (5, 6, 10, 12, 13) contributed about three points to differentials of versions 2 and 3 versus version 1, and more for version 4. Overall, the intended design differences in process and inquiry were made explicit by the rubric such that both expert and novice users were able to distinguish structural differences in these printed materials. Furthermore, participants believe that learning goals are more likely to be achieved when process skill development is explicit, as indicated by scores for item 4.



IMPLICATIONS We expect this rubric to be useful in a number of ways, both for the POGIL project itself and as a model for other curriculum development projects. First, for authors creating POGIL materials, the rubric provides clear guidance about the desired structure of the activity to support the inquiry goal while emphasizing process goals at the same time. The rubric has evolved further among different subgroups of the POGIL project, but its key components are the same. The ANA-POGIL group still uses the form of the rubric described here as a template for authoring materials. It provides a common means for reviewing drafts of activities, and provides authors guidance on changes for subsequent drafts. The evolution of quality is apparent by looking at subsequent rubric reviews of the same activity after revision. Second, learning how to use the rubric is a valuable component of instructor facilitator development. We have found that conducting a rubric review of a common set of materials, and then using the individual data as a starting point for discussion is a valuable experience to incorporate into workshops. It helps people focus on the meaning of the rubric items as well as the structure of activities, getting them away from casual comparisons. It also provides a framework to help participants better understand the structure of the activities and the philosophy of the project. This in turn prepares participants to more appropriately assimilate the pedagogy and implement the materials in a manner consistent with the underlying philosophy of the project.

Reliability

Various factors were investigated that might contribute variability to the results. The four activities had been distributed as pre-meeting homework for participants, who were free to review them in whatever order they chose. We realized afterward that review order might bias results. We subsequently asked whether they went in numerical order, and if not, what order did they use. Most used increasing numerical order. Those few who did not, did not use a common alternative sequence, so we lumped them into one category. Table 2 (right two columns) shows that there was only a small order effect for version 4. (In this and other comparisons in Table 2, nonparametric statistical comparisons were made as the score distributions were often nonnormal and the variances not uniform.) Another concern was whether the experience of the reviewer might affect scoring. We looked at this in three waysexperience with using this rubric, experience with POGIL instruction, and



ASSOCIATED CONTENT

S Supporting Information *

Rubric; four altered versions of The Nuclear Atom activity from Chemistry: A Guided Inquiry, 3rd ed.30 This material is available via the Internet at http://pubs.acs.org. 1107

dx.doi.org/10.1021/ed2003324 | J. Chem. Educ. 2012, 89, 1104−1108

Journal of Chemical Education



Article

(18) Adamchik, C. F., Jr. The Design and Assessment of Chemistry Portfolios. J. Chem. Educ. 1996, 73, 528−531. (19) Hafner, O. C.; Hafner, P. Quantitative Analysis of the Rubric as an Assessment Tool: An Empirical Study of Student Peer-Group Rating. Int. J. Sci. Educ. 2003, 25, 1509−1528. (20) Moni, R. W.; Beswick, E.; Moni, K. B. Using Student Feedback To Construct an Assessment Rubric for a Concept Map in Physiology. Adv. Physiol. Educ. 2005, 29, 197−203. (21) Stoddart, T.; Abrams, R.; Gasper, E.; Canaday, D. Concept Maps as Assessment in Science Inquiry LearningA Report of Methodology. Int. J. Sci. Educ. 2000, 22, 1221−1246. (22) Moskal, B. M. Developing Classroom Performance Assessments and Scoring RubricsPart I.; ERIC Clearinghouse on Assessment and Evaluation, ED481714: College Park, MD, 2003; http://www.eric.ed. gov/PDFS/ED481714.pdf (accessed Jun 2012). (23) Moskal, B. M. Developing Classroom Performance Assessments and Scoring RubricsPart II.: ERIC Clearinghouse on Assessment and Evaluation, ED481715: College Park, MD, 2003; http://www.eric.ed. gov/PDFS/ED481715.pdf (accessed Jun 2012). (24) Reddy, Y. M.; Andrade, H. A Review of Rubric Use in Higher Education. Assess. Eval. High. Educ. 2010, 35, 435−448. (25) Fay, M. E.; Grove, N. P.; Towns, M. H.; Bretz, S. L. A Rubric To Characterize Inquiry in the Undergraduate Chemistry Laboratory. Chem. Educ. Res. Pract. 2007, 8, 212−219. (26) Buck, L. B.; Bretz, S. L.; Towns, M. H. Characterizing the Level of Inquiry in the Undergraduate Laboratory. J. Coll. Sci. Teach. 2008, 38, 52−58. (27) Docktor, J.; Heller, K. Assessment of Student Problem Solving Processes. Phys. Educ. Res. Conf. 2009, 133−136. (28) Nicholson, P.; Gillis, S.; Dunning, A. M. T. The Use of Scoring Rubrics To Determine Clinical Performance in the Operating Suite. Nurse Educ. Today 2009, 29 (1), 73−82. (29) Kersting, N. B.; Givvin, K. B.; Sotelo, F. L.; Stigler, J. W. Teachers’ Analyses of Classroom Video Predict Student Learning of Mathematics: Further Explorations of a Novel Measure of Teacher Knowledge. J. Teacher Educ. 2010, 61, 172−181. (30) Moog, R. S.; Farrell, J. J. Chemistry: A Guided Inquiry, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, 2006.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Present Address § Department of Chemistry, University of Iowa, Iowa City, Iowa, 52242 United States



ACKNOWLEDGMENTS This work was supported through the POGIL Project as part of NSF Grant 0618746. Andy Bressette helped in design of The Nuclear Atom versions. Karen Andersen helped in the evolving design of the rubric. Megan Grunert and Mary Emenike provided helpful manuscript reviews. Portions of this manuscript were presented at the ACS National Meeting, Salt Lake City, 2009, paper 1158.



REFERENCES

(1) Anderson, R. D. Study of Curriculum Reform; U.S. Dept. of Education, Office of Educational Research and Improvement: Washington, DC, 1996. (2) Anderson, T. R.; Rogan, J. M. Bridging the Educational ResearchTeaching Practice Gap: Curriculum Development, Part 1: Components of the Curriculum and Influence on the Process of Curriculum Design. Biochem. Mol. Biol. Educ. 2011, 39, 68−76. (3) Hargreaves, A., Ed. Extending Educational Change: International Handbook of Educational Change; Springer: New York, 2005. (4) Ferrini-Mundy, J.; Floden, R. E. Educational Policy Research and Mathematics Education. In Second Handbook of Research on Mathematics Teaching and Learning, Lester, F. K., Jr., Ed.; Information Age Publishing: Charlotte, NC, 2007. (5) Clements, D. H. Linking Research and Curriculum Development. In Handbook of International Research in Mathematics Education; English, L. D., Ed.; Lawrence Erlbaum: Mahwah, NJ, 2002. (6) Jones, M. G.; Carter, G. Science Teacher Attitudes and Beliefs. In Handbook of Research on Science Education; Abell, S. K., Lederman, N. G., Eds.; Lawrence Erlbaum: Mahwah, NJ, 2007. (7) Dancy, M.; Henderson, C. Pedagogical Practices and Instructional Change of Physics Faculty. Am. J. Phys. 2010, 78, 1056−1063. (8) Process Oriented Guided Inquiry Learning Home Page. http:// www.pogil.org/ (accessed Jun 2012). (9) Process-Oriented Guided-Inquiry Learning (POGIL); Moog, R., Spencer, J., Eds.; ACS Symposium Series 994; American Chemical Society: Washington, DC, 2008. (10) ANA-POGIL Project. http://pogil.org/post-secondary/anapogil (accessed Jun 2012). (11) Henderson, C.; Finkelstein, N.; Beach, A. Beyond Dissemination in College Science Teaching: An Introduction to Four Core Change Strategies. J. Coll. Sci. Teach. 2010, 39 (5), 18−25. (12) Stevens, D. D.; Levi, A. J. Introduction to Rubrics; Stylus Publishing: Sterling, VA, 2005. (13) Allen, M. J. Assessing Academic Programs in Higher Education. Anker/Jossey-Bass Publishing: San Francisco, CA, 2004. (14) Moskal, B. M. Scoring Rubrics Part I: What and When. ERIC Clearinghouse on Assessment and Evaluation, ED446110: College Park, MD, 2000; http://www.eric.ed.gov/PDFS/ED446110.pdf (accessed Jun 2012). (15) Moskal, B. M. Scoring Rubrics Part II: How? ERIC Clearinghouse on Assessment and Evaluation, ED446111: College Park, MD, 2000; http://www.eric.ed.gov/PDFS/ED446111.pdf (accessed Jun 2012). (16) Oliver-Hoyo, M. T. Designing a Written Assignment To Promote the Use of Critical Thinking Skills in an Introductory Chemistry Course. J. Chem. Educ. 2003, 80, 899−903. (17) Diefes-Dux, H. A.; Zawojewski, J. S.; Hjalmarson, M. A. Using Educational Research in the Design of Evaluation Tools for OpenEnded Problems. Int. J. Eng. Educ. 2010, 26, 807−819. 1108

dx.doi.org/10.1021/ed2003324 | J. Chem. Educ. 2012, 89, 1104−1108