Design, Implementation, and Evaluation of a ... - ACS Publications

Jul 31, 2015 - Chemistry Course. Gabriela C. Weaver*,† and Hannah G. Sturtevant. ‡. †. Department of Chemistry and Institute for Teaching Excell...
1 downloads 13 Views 1007KB Size
Article pubs.acs.org/jchemeduc

Design, Implementation, and Evaluation of a Flipped Format General Chemistry Course Gabriela C. Weaver*,† and Hannah G. Sturtevant‡ †

Department of Chemistry and Institute for Teaching Excellence and Faculty Development, University of Massachusetts, Amherst, Massachusetts 01003, United States ‡ Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, United States ABSTRACT: Research has consistently shown that active problem-solving in a collaborative environment supports more effective learning than the traditional lecture approach. In this study, a flipped classroom format was implemented and evaluated in the chemistry majors’ sequence at Purdue University over a period of three years. What was formerly lecture material was presented online outside of class time using videos of PowerPoint lectures with a voice-over. In-class time was devoted to collaborative problem solving based on the Student Centered Active Learning Environment with Upside-Down Pedagogies (SCALE-UP) approach, which has been shown to increase student exam scores and passing rates. The purpose of this proof-of-principle study was to look at quantitative measures of student performance in the flipped versus the traditional lecture approach of this chemistry course. Three years of results using ACS standardized exams showed that students’ ACS general chemistry exam scores in the flipped class were significantly higher by almost one standard deviation when compared with the students’ previous scores in the traditional class. Analysis of open-ended surveys given to students at the end of the course showed that the majority of students responded positively to the format, listing a variety of benefits, drawbacks, and ways to improve that can inform future implementations. KEYWORDS: First-Year Undergraduate/General, Chemical Education Research, Distance Learning/Self Instruction, Collaborative/Cooperative Learning, Internet/Web-based Learning, Testing/Assessment, Problem Solving/Decision Making FEATURE: Chemical Education Research



INTRODUCTION

solving. These types of practices enhance student success, result in more effective learning than traditional methods, and are not discipline dependent.14,15 Because student-centered teaching approaches are characteristic of the flipped environment, we hypothesize that students in this environment will perform better on objective measures of learning.

Maximizing the Impact of In-Class Time

STEM (science, technology, engineering, and mathematics) education research studies have repeatedly concluded that lecturebased approaches to teaching are not as effective as studentcentered methods (such as collaborative problem solving, inquiry, and project-based learning) at helping students develop deep and lasting understanding of the subject matter.1,2 Yet, in higher education, introductory chemistry courses continue to be most commonly taught in a lecture format. In such a setting, it is rare for students to have a great deal of interaction with the instructor or more than one or two of their peers. A foundational premise for our work is that student-centered learning environments enhance students’ motivation and engagement. In turn, increased levels of student motivation and engagement correlate with increased attainment of disciplinespecific learning outcomes.3−6 The premise is rooted in selfdetermination theory,7,8 a theory of motivation.9 Self-determination theory has been used extensively to describe and measure student motivation for undertaking learning activities and to explain the success or failure of learning environments in catalyzing student learning.10−13 Student-centered teaching approaches engage learners in constructing knowledge through activities such as problem-based learning and collaborative problem © XXXX American Chemical Society and Division of Chemical Education, Inc.

“Flipped” Instruction

We define flipping the classroom as moving lecture material outside of class (with a mechanism for pacing) so that collaborative problem solving and other active learning activities may be done in class. The course design reported here is one way to conduct a flipped classroom; as Bergmann and Sams16 point out, there is no one way to flip a class. According to Tucker,17 the flipped classroom uses “technology to invert the traditional teaching environment by delivering lectures online as homework and opening up the class period for interactive learning.” Traditionally cited as the first reference to the inverted classroom, Lage et al.18 employed the format of lecturing outside class via technology so that teaching directed toward multiple learning styles could be accomplished in class. They reported that students generally preferred the inverted format and suggested the course may attract females, which could increase diversity in the field.

A

DOI: 10.1021/acs.jchemed.5b00316 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Cited as the first use of the f lipped classroom, Baker19 also saw the use of inverting the traditional class by conducting lecture outside class with active learning activities in class. Baker found that students felt they experienced increased collaboration, personal attention, and engagement in critical thinking. The term flipped classroom did not become well known until its popularization by the Kahn Academy20 and, separately, by Tom Bergman and Aaron Sams of Woodland Park High School in Colorado.16 More recently, “blended courses” have been described that use some online elements of flipped teaching with additional inclass activities. Blended courses, broadly, are classes employing a combination of online (or computer mediated) and face-toface instruction.21,22 In contrast, flipped courses specifically move lectures online so that more student centered activities and collaborative work can be done in class. Blended is a much broader term for, arguably, any combination of online and face-to-face activity. One of the components blended courses may incorporate is asynchronous instruction, which has been suggested as an answer to the problem of effectively teaching large-enrollment courses.23 In one of the first incorporations of asynchronous computer work as part of a regular face-to-face English class, students were required to e-mail an entry to the class electronic conference once a week.24 They discussed the entry material in class. Qualitative analysis of student journal entries showed that students felt like they had more community and trusted and valued one another. In addition, the professor felt better prepared for class, knowing what the students already thought. In chemistry education, a few studies have been recently released on the flipped classroom at the university level. Smith25 looked at student attitudes toward various flipped elements and found that the majority of students felt the flipped format was more effective for them than a traditional course despite the burden of extra work. Another study26 examined effects of employing the flipped approach in undergraduate laboratory courses. Students in the flipped lab sections appeared to develop a deeper understanding of the theories behind laboratory experiments, grasp complicated procedures, and were less anxious about lab. More recently, Christiansen27 described an inverted organic chemistry course with a small sample size where students in the inverted group showed no statistically significant differences from the traditional group. Most students did prefer the inverted course by the end of the year after taking both types of courses, though there was an adjustment time for students. Flynn28 also released a study on four flipped courses in organic chemistry and spectroscopy (three large and one small course), finding that student grades increased and withdrawal and failure rates decreased statistically significantly. In surveys, students appeared satisfied overall with the format with the few suggestions for improvement related to the cramped desks in the large lecture classes and the video quality. The vast majority of students responded only positively to the format, finding the in-class problems and online lecture content and preclass tests helpful.

student-centered teaching is the Student-Centered Activities for Large Enrollment Undergraduate Programs (SCALE-UP) method. Designed for classes of up to about 100 students, SCALE-UP arranges students into three groups of three at large circular tables for collaborative, in-class work.29 The method has been evaluated extensively with SCALE-UP students scoring significantly better than students in traditional lecture classes in nearly every study.30 Over a period of five years with data from over 16,000 North Carolina State University (NCSU) students, traditional students were three times more likely to fail the course than SCALE-UP students.31 On the Force Concept Inventory (FCI) for physics, SCALE-UP students outperformed traditionally taught students, with gain scores between 0.35 and 0.55 compared to gain scores of 0.00 to 0.22.32 SCALE-UP has also been applied to the chemistry classroom, referred to as cAcL2, with positive results. Oliver-Hoyo and Allen33 describe the cAcL2 program at NCSU as using the SCALE-UP classroom environment with SCALE-UP management techniques, fully integrating lecture and lab. When a class section taught in the traditional lecture format was compared alongside a cAcL2 class section, the bottom 25% of the cAcL2 population performed statistically significantly better in the class than their traditionally taught counterparts on three out of four exams, and the cAcL2 class students scored statistically significantly higher on two of the four exams. In an attitudinal study of the classes,34 77% of the students in the cAcL2 class showed positive attitude gains in a prepost survey analysis over the traditional class students. Because of the extensive research on this approach to collaborative learning, we based the in-class portion of our flipped course on the SCALE-UP model. We specifically seek to answer the following questions in this proof-of-principle study: (1) In what ways are students’ scores affected, if at all, on the standardized ACS exam (paired conceptual-algorithmic form) when comparing a flipped format with the traditional lecture format? (2) What are students’ perceptions of the benefits, drawbacks, and ways to improve the flipped approach? (a) Do students believe the delivery helps their learning and, if yes, which components of it? (b) What course components do students consistently rank highest?



METHODOLOGY

Population Studied

The study was carried out over three academic years in a twosemester sequence, first-year chemistry course for chemistry majors. Students required to take general chemistry for other technical majors generally enroll in a different course. The course consists of a single section for the “lecture” component of the course and several sections, each of 20−24 students, in the laboratory. The lecture and laboratory components of the course are taken concurrently under the same course number. Students in this course sequence have the general characteristics summarized in Table 1 with a near equal mix of males and females, an average math SAT score of 643, and a 68% majority identifying as white. The flipped approach was carried out in the spring semesters of 2011, 2012, and 2013, with each of the immediately preceding fall semesters taught in the traditional lecture format by a professor experienced in undergraduate teaching, (Prof X),

In-Class Collaborative Work

Certainly common to all definitions of “flipped” classes is the idea that students engage in work outside of class using online resources allowing for the in-class interactions to change from an information “transmission” model to a more engaged model of learning. Therefore, the in-class approach must change in some way to a student-centered style. One approach for B

DOI: 10.1021/acs.jchemed.5b00316 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

sessions with teaching assistants, written weekly homework, and in-class exams) remained unchanged from the traditional format. The written weekly homework in both the traditional and flipped courses were problems selected from the end of each chapter of the textbook (same book, same length of assignment, same variety of problem types) and graded using the same policy both semesters. The traditional course in the fall semesters met three times per week in a typical auditorium-style classroom with class meetings each lasting 50 min. The room was equipped with a chemical demonstration bench at the front, as well as a blackboard, lectern computer, overhead projector and a screen projector. The instructor utilized all of these resources during his CHM 1-T lectures. Because it was of interest to examine the effectiveness the course design on students’ performance, we controlled for the amount of time students would spend in “contact hours”. Because the lecture material was being delivered online outside of class time, that time was factored as “contact hours” and we therefore met less frequently in a face-to-face setting for collaborative problem solving. The flipped CHM 2-F course, therefore, was initially set up to meet once every other week for in-class sessions. In-class sessions were 75 min long in spring 2011. The online time was based on the known length of each video and an assumed amount of time that students would spend on the associated quizzes. Each video aligned with the sequential lecture material used in the course when it was taught in the traditional format in previous years. In practice, additional optional in-class sessions were offered that semester in response to midsemester feedback from students. The content and structure of those optional sessions varied, as did attendance, though it was generally about 20% of the enrolled students. During the spring 2012 and 2013, the required inclass sessions were carried out weekly for 90 min, based in part on tracking data showing students’ online time was less than we had originally calculated. The in-class sessions were structured and taught in a format similar to that described in the SCALE-UP literature.24−29 Students were grouped in teams of three, with three teams sitting at each round table, for a group size of nine. The learning spaces used were equipped with tables that allowed such seating, as well as large electronic whiteboards (spring 2011 and 2013) or portable “huddle boards”35 (spring 2012) available to each team to share their work with each other and the class as a whole. Problems were developed by the instructor for students to solve collaboratively during each in-class session. The problems were designed to challenge students by providing extensions of the concepts they were introduced to in the online videos prior to the in-class sessions, demonstrating

Table 1. Demographics of Population Studied over Three Years (N = 164) Gender Age Race/Ethnicity

Entering Math SAT

% in Honors Program

Male 49% Female 51% Average: 19.6 Range: 19−25 White: 68% Asian: 17% Hispanic/Latino: 4% Black: 1% Other/Mixed: 3% Average: 643 Std Deviation: 75 Range: 480−800 2%

who regularly receives high student evaluation scores. The instructor of the spring 2011 and 2013 flipped course, (Prof A), was the designer of the flipped format and the developer of the online materials used (described below). In spring 2012, the course was taught by a different professor, (Prof B), using the same online materials developed for the 2011 implementation. Prof B was inexperienced at teaching using a flipped or collaborative approach or online materials but had expressed an interest in learning these methods and is a distinguished professor. (Prof A was on leave during spring 2012, and therefore, Prof B was teaching the course independently.) Historically, the majority of students who enroll in the first semester of the course continue on to the second semester of the course in the same academic year. A small number of students, however, do leave the sequence at the semester break each year, and some students enter the second semester of the sequence from other courses. Box 1 describes the design of the study, including the total population of each course and the “paired” population, representing only those students who were enrolled in both semesters of a given academic year. In this manuscript, “CHM 1-T” will be used to describe the first semester of the course, which was taught using the traditional (T) format, and “CHM 2-F” for the second semester of the course, which was the semester when the flipped (F) approach was used. Flipped Course Format

The flipped course consisted of the same individual components found in the traditional course (including lab, homework, recitation, and exams), but the face-to-face lecture component was replaced by a flipped approach that includes in-class sessions, online lecture materials, and associated online quizzes. All other components of the course (laboratory, recitation

Box 1. Design of the Study, including Number of Students, Course Format, and Professor

a

N-paired represents the number of students who were the same in both semesters of the course. C

DOI: 10.1021/acs.jchemed.5b00316 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

started by any user. Of course, in addition to the Blackboard and Mixable tools, e-mail between the students and instructor was available as always, as were traditional in-person, weekly office hours. It was important to ensure that students were not spending significantly more or less time on the course than they would have with the traditional format, in order to ensure that it could be taught with the same credit hour designation originally assigned to the course. Box 2 demonstrates the balance sheet of time spent on “contact hour” activities. These are the activities that are related to or replace the lecture portion of the course. The data for time spent in online activities was analyzed at the end of the first year of this experiment, from information that is automatically logged by the system when students log in to their Blackboard course.

applications of and connections among concepts. As much as possible, the problems were ones that would best be solved by three students contributing different perspectives, rather than simply being very challenging problems that could be solved by a single individual given enough time or a high skill level. The problems were modeled on materials available at the SCALE-UP site (http://scaleup.ncsu.edu/). The online component of the flipped course consisted of videos with an online quiz associated with each video lesson. The videos were created by the instructor of the spring 2011 course on a tablet laptop using Camtasia software. Camtasia allows for multiple types of digital resources to be edited into a movie. The videos consist of a combination of PowerPoint slides with onscreen writing, videos of chemical demonstrations, simulations, and animations, all with a voice-over done by Professor A. The on-screen writing enabled the instructor to show solutions to problems and to highlight or emphasize specific aspects of the slides in real time as part of the videos. The videos did not include any footage of the professor’s face. Thus, the videos were designed to be used for online, self-guided learning rather than simply being a video of a professor giving a lecture at a blackboard or on a document camera. Although the PowerPoint presentations upon which the videos were each based contained the same number and content of slides as they would have had for the equivalent 50 min lecture, they were only 17−35 min long, with an average time of 21 min per lesson, when they were presented in a recorded manner with no interruptions. The videos and quizzes were available through the Blackboard course management site, which served as the central repository for all materials associated with the flipped course. The quizzes were due at specified times each week, three times per week to match the three class meetings per week that students would have had in a traditional format. (In practice, this resulted in 38 online lessons when accounting for exam days and holidays on which online lessons were not due.) Thus, the quizzes were designed as a pacing mechanism with fairly straightforward contentgenerally at the lower levels of Bloom’s taxonomy36,37 and included both simple calculations and recall items from the video lessons. Students could earn up to 10 points per quiz, for a total of one-fourth of the points available for the course over the semester. Correct quiz responses were made available only after the due date so that students could use them as a study resource. The videos were posted before the quizzes were due but remained available through the end of the course so that students could continue to view them at any time. There were various online communication tools available for use by students and the instructor in the flipped version of the course. The course Blackboard site has both a discussion board, for threaded asynchronous discussions, and a chat room, for synchronous (“live” or “real time”) discussions. The chat room can be used by anyone registered in the Blackboard course (students, instructors, and teaching assistants) who is online at the same time. The discussion board saves conversations (“threads”), which consist of an initial posting about a topic and any comments posted in response to that. In addition, the flipped course utilized an online tool called Mixable developed by the instructional technology unit of the university.38,39 Mixable is a discussion platform that allows users from a course to connect with any other registered users from that course. It can function as an add-in to Facebook so that users do not have to go to a separate platform to communicate once they have registered on Mixable. Users can enter text or share a variety of document types, and discussions can be separated into threads

Instruments and Analyses Used

In order to compare student understanding of course content, we utilized the ACS Paired Exams for First Semester (exam #GC05PQF) and Second Semester (exam #GC07PQS) General Chemistry.40,41 Using these exams allowed us to measure student performance against national norms from data made available for each exam by the ACS Exams Institute. The exams are designed to be given in a 55 min time frame; therefore, they were administered at the end of each semester during the first 55 min of the 2 h final exam period. Upon collection of the ACS exam, students were given the instructor-developed exam, which included calculation and short-answer questions written by the instructor each semester. Only the ACS exam scores were used in the data analyses described here, though both exam components were used in the calculation of students’ final grades each semester. The regular course exams developed by each of the instructors were not used as a data source in this study. We also administered a face-validated42 survey that included both Likert scale and short answer questions to probe students’ opinions about the course format and its impact on their learning. The survey was administered via Qualtrics,43 an online survey tool, in the middle and at the end of each spring semester. Students received a small number of points for completion of the survey (about 1/4 of a percent of the total course grade) and an alternative method to get the same points if they chose not to participate in the survey. All data collection was approved by the Purdue University Human Subjects Review Board, IRB.



DATA AND RESULTS For the percentile analysis by year, 1-tailed paired t tests were used. For the overall comparison of all traditional course students to all flipped course students, 1-tailed independent samples t tests were used. Overall ACS Exam Score

The raw score on each ACS exam has a maximum possible value of 40. Table 2 shows the average raw ACS exam score for each semester. The analysis to determine if student performance is different in the two formats cannot simply utilize the distributions of raw scores because the two semesters have different content and a different ACS exam for each semester. However, each exam has a published distribution of scores from a nationally representative sample. Discussions with the ACS Exams Institute,44 which develops, administers, and analyzes the exams, indicated to us that the scores that are shared with them are shared voluntarily. However, they do not publish score distributions until they feel that they have a sufficiently large sample to be representative. They have looked at their data and D

DOI: 10.1021/acs.jchemed.5b00316 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Box 2. Balance Sheet Comparing Time Spent on Lecture-Related “Contact Hour” Activities

a

The 8 classes are in a 15 week semester with no in-class meetings during exam weeks or Martin Luther King, Jr., holiday.

Table 2. Mean Score on ACS Exams for Each Semester Expressed as Raw Score, Percentile on National Scale, and Standardized Score Relative to National Mean AY 10/11 CHM 1-Ta N (paired)b Raw Scorec Mean (S.D.) Percentile

AY 11/12 CHM 2-Fa

CHM 1-Ta

29.4 (5.7) 87th

31.9 (5.5) 84th

46 31.5 (5.0) 82nd

AY 12/13 CHM 2-Fa

CHM 1-Ta

30.1 (6.5) 88th

31.4 (6.0) 81st

60

CHM 2-Fa 58 29.6 (6.4) 87th

a

T = Traditional; F = Flipped. bN-paired represents the number of unique students who were enrolled in both semesters of the course. cRaw score out of 40 points. cN-paired represents the number of unique students who were enrolled in both semesters of the course.

3073 and 3557 respondents for each exam, respectively. We then compared our calculated percentile scores by year to check for statistical differences using one-tailed, paired t tests. Table 3 shows the statistical comparison across percentiles and shows that the increases were statistically significant for all three years, with small effect sizes (Cohen’s d). Comparison on the basis of percentile scores makes it difficult to conceptualize how big a change really is or whether the changes are being measured on equivalent scales. Therefore, we also converted the raw scores to standard scores relative to the mean of each exam (Table 4). This allows all scores to be represented on the same scale and compared to one another on the basis of the number of standard deviations. The basic equation for converting a raw score to a standard score is to take the difference between a given score (xi from respondent i from the test sample) and the mean being used for comparison (μο from the national sample) and divide that by the standard deviation of the comparison distribution (σο) xi − μo

believe that, in general, data shared with the ACS Exams Institute are skewed toward higher scores. They compared publicly available 75th percentile entering math ACT scores for the population of schools that returned data on the two exams used in this paper and found that to be 25.54 for the first semester test, and 25.66 for the second semester test, which are estimated to not be statistically different. By relating the raw score data back to those national distributions for each ACS exam, separately, the traditional versus flipped course performance can be compared. We compared the percentile ranks for each year separately, between the traditional (fall) and flipped (spring) semester. The percentile corresponds to the average score in relation to the published percentile curves of the specific exam given in that particular semester. Although the raw score appears to drop slightly between first and second semester, the percentile, in fact, rises because the two exams have different national distributions (Table 2). In order to carry out a statistical comparison based on percentile scores, we first obtained all (deidentified) raw data from the ACS Exams Institute for the two exams we were using. We calculated the percentiles of students in our study compared to these scores. The ACS exam samples we used consisted of

σo Standard scores will then range from −y to +y, with y representing the number of standard deviations above or below E

DOI: 10.1021/acs.jchemed.5b00316 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Table 3. Percentile Scores Relative to ACS Exams Institute Data for the Paired First and Second Semester Exams AY 10/11 CHM 1-Ta N (paired)b Percentile (S.D.) Sig. Effect Size a

AY 11/12 CHM 2-Fa

CHM 1-Ta

84.6 (16.5) p < 0.01 0.27

80.0 (21.8)

46

AY 12/13 CHM 2-Fa

CHM 1-Ta

84.8 (17.4) p < 0.01 0.24

78.3 (23.0)

60

78.9 (19.9)

CHM 2-Fa 58 83.8 (19.3) p < 0.01 0.26

T = Traditional; F = Flipped. bN-paired represents the number of unique students who were enrolled in both semesters of the course.

Table 4. Standard Score Relative to ACS Exams Institute Data for the Paired First and Second Semester Exams AY 10/11 CHM 1-T

a

b

N (paired) Standard Score (S.D.) Sig Effect Size a

AY 11/12 a

a

CHM 2-F

CHM 1-T

1.28 (0.84) p < 0.001 0.43

0.99 (0.78)

46 0.94 (0.72)

AY 12/13 CHM 2-F

a

CHM 1-T

a

60

CHM 2-Fa 58

1.34 (0.92) p < 0.001 0.41

0.93 (0.86)

1.31 (0.94) p < 0.001 0.42

T = Traditional; F = Flipped. bN-paired represents the number of unique students who were enrolled in both semesters of the course.

on the same scale. This also means that the comparisons no longer depend on the national norm tests, and the assumptions of normality that must be made about those distributions. There are a variety of models that can be used for modeling the item responses, and the data requirements of the models vary. In most cases, the required sample sizes are quite large (500 or more). Given the sample size available in this study, the one-parameter logistic model was chosen. Using the following equation, this model provides the probability of a correct response on item i given a student of ability level [θ], where bi represents the difficulty level of item i

the mean of the comparison distribution. The standardized scores for the flipped approach were compared to those of the traditional course using a paired t test within each year and including data for only the students who were in the course both semesters. Within each academic year, the standardized score increase between the traditional format and the flipped format is statistically significant (p < 0.001) with medium effect sizes (Table 4). Across the three years, there are no statistically significant differences among the three scores for the traditional course format or among the three scores for the flipped course format (Figure 1). These within-format scores were compared using ANOVA.

Pi(θ ) =

e(θ − bi) 1 + e(θ − bi)

Because this process required students to have responses on all items on both the fall and spring test, this analysis was only conducted for the students who took both semesters of the course. For these students, IRT was used to estimate student scores on the fall and spring exams. Because there were two sets of items that the students responded to (fall and spring), it was necessary to estimate an ability parameter for each set of items. To ensure these ability estimates were placed on the same scale, the items for both tests were estimated at the same time. This automatically placed the resulting ability estimates on the same scale. The scale used for the ability parameters is set to have a mean of zero and a standard deviation of one, so the scores can be interpreted like standardized scores. The mean score for the fall test was 1.10, with a standard deviation of 0.75, whereas the mean score for the spring test was 1.45, with a standard deviation of 0.75. The difference in the two mean scores was 0.35, which corresponds to approximately half of a standard deviation difference between the two scores, and are statistically different at the p < 0.01 level and a moderate effect size of 0.47. This result supports the results obtained in the analysis of the standard scores. For this reason and the alignment between results obtained with standard and percentile scores, additional analyses of our data were carried out with the standard score values.

Figure 1. Overall standard score on ACS exam for the traditional and flipped formats for each of the three years. Standard scores of f lipped versus traditional within each year are statistically different at the p < 0.001 level. Values within each format of the course, across three years, are not statistically different.

Item Response Theory Analysis

Because there were 164 students that were in both the fall and spring course over the three implementations, it was possible to perform item response theory (IRT) analyses on the data. In IRT,45 the invariance property asserts that the ability of the person is not dependent on the particular set of items administered, and the difficulty of the items is not dependent on the sample of people responding to the items. Because of this invariance property of IRT, ability estimates for one group of students can be estimated for different tests and are automatically

Conceptual vs Algorithmic Scores

The specific ACS exams used in this study consist of paired questions for each concept, with one being a conceptual question F

DOI: 10.1021/acs.jchemed.5b00316 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

Table 5. Overall Average Conceptual and Algorithmic Standard Scores, with All Years Combined N = 164, and by Individual Year for Flipped versus Traditional Course Formats Traditional Format

Flipped Format

Significance

Effect Size

Conceptual

0.79 (0.86)

1.20 (1.01)

p < 0.001

0.44

Algorithmic

0.97 (0.74)

1.18 (0.81)

p < 0.001

0.27

Significance by Year 2011 2012 2013 2011 2012

p p p p p

< < < <