Using the First Exam for Student Placement in Beginning Chemistry

Jun 1, 2009 - Pamela Mills and William Sweeney. Department of Chemistry, Hunter College, New ... Scott E. Lewis. 2014,115-133. Abstract | Full Text HT...
0 downloads 0 Views 338KB Size
Research: Science and Education

Using the First Exam for Student Placement in Beginning Chemistry Courses Pamela Mills and William Sweeney* Department of Chemistry, Hunter College, New York, NY 10021 Department of Urban Education, Graduate Center of the City University of New York, New York, NY 10065; *[email protected] Sarah M. Bonner Department of Educational Foundations and Counseling Programs, Hunter College, New York, NY 10065

Understanding student success and failure in introductory chemistry courses is an active area of research (1–3). Some research efforts attempt to identify causal relations, such as between a student’s cognitive ability or skill and course success (4–7). As has been pointed out by others, causal relationships are difficult to ascertain and they can be dependent upon particular theoretical frameworks (3). The identification of correlations between various predictor variables and student outcomes without reference to causality is a complementary approach with pragmatic outcomes. While such correlations may imply causal associations, causality need not be assumed. Correlations among myriad variables with outcome variables, usually course grades, have been studied, including ACT or SAT math scores (8, 9), demographic information (10), prior chemistry (11), or mathematics knowledge. In chemistry departments with significant numbers of under-prepared students, poor reading ability was the second (to math preparation) most commonly cited weakness (7). A recent study correlated several attributes from high school chemistry classes, prior course work, and demographic factors with college chemistry grades (1, 2). A large number of students at 12 different institutions of higher education were surveyed. Specific pedagogical strategies in high school chemistry as well as the level of mathematics completed by the student correlated positively with course outcomes. In all studies, the correlations could account for, at most, 40% of the variability. Effective use of diagnostic examinations necessitates correlations between diagnostic test scores and course outcomes. These exams are routinely used for placement and to provide a validated external measure of student performance. Currently, a wide range of placement tests is used for general chemistry but perhaps the most common are the two distributed by the American Chemical Society, namely, the California Chemistry Diagnostic Test (12, 13) and the Toledo Chemistry Placement Examination (14). In a California system-wide validation study (13), a comparison of the California Chemistry Diagnostic Test with the final chemistry grades showed a 0.42 correlation coefficient—less than 20% of the variability was captured in these tests. Similar results were found for the Toledo exam (14). A recent test designed to improve the identification of students at-risk of failing general chemistry is the Student Pre-Semester Assessment (SPSA) (10). The SPSA was more effective than any other existing diagnostic tests at predicting failure—the test would correctly identify potential failures 60% of the time.

738

In institutions with many at-risk students, the diagnostic test is often used to place students into a preparatory or remedial chemistry course (7). The effectiveness of the remedial course is partly dependent on the accuracy of the diagnostic test. Bentley and Gellene (15), in an extensive six-year study of the preparatory course used at Texas Tech University, found that their placement remediation program provided little or no significant academic benefit. Inaccurate student placement was raised as a possible explanation for the ineffectiveness of their placement remediation program. We describe here a simple diagnostic protocol that correlates highly with course outcome. In our implementation of the diagnostic protocol, at-risk students are identified and placed in our preparatory course. While we do not currently use this diagnostic for student advising (student placement is mandatory), we also report the results of logistic regression analysis to demonstrate its potential utility for student advisors. The correlation between first exam performance and final course outcome is robust, and holds true for a number of other New York commuter colleges and for a large Midwestern university. These results are not meant to state or infer causality for student failure. Rather the article is focused solely on the prediction of student failure and not on the underlying reasons for that failure. The Setting City University of New York (CUNY) is the nation’s largest urban public university, with 230,000 students and 23 institutions, including eleven four-year colleges, six community colleges, and a range of professional schools. Hunter College, in common with other CUNY four-year colleges, is a moderately large urban commuter institution. The student body of approximately 20,000 is highly diverse, with a high proportion of students from immigrant families. Over 40 languages are spoken by the entering first-year students (fall 2006) who represent 71 countries. Most first-year students were previously enrolled in the New York city public schools (67%) but a significant number of students come from parochial schools and from New York state (24%). Hunter College is moderately selective, with the 2006 first-year class having an average SAT of 1060 and a minimum high school average of 85%. As is typical in urban commuter colleges, a large proportion of our full-time students hold either part-time or full-time jobs. Slightly fewer than half of the students transferred into the college. General chemistry sections are taught by different faculty who employ similar teaching styles with some variability

Journal of Chemical Education  •  Vol. 86  No. 6  June 2009  •  www.JCE.DivCHED.org  •  © Division of Chemical Education 

Correlation between the First Exam and the Final Course Performance The first exam is typically given at the end of the section on stoichiometry, after about four weeks of class in the fifteenweek semester. We studied seven sections taught by five faculty members between fall 2000 and fall 2005. To standardize data from different sections and different instructors we defined student performance on the first exam in terms of the number of standard deviations above or below the mean in each section. Similarly, the outcome variable, the student’s final course performance, was also defined in terms of a standard deviation instead of a course grade in each section. The correlation between the first exam and course outcome was determined for each section. In addition, the data were pooled and an overall correlation was computed. For the pooled data students’ performance was standardized separately for each course. Students who did not take the final examination were excluded from the data set. Thus the students who officially withdrew from the class, and those who simply stopped attending, are not part of the measured correlation. The standardized course performance scores reflect achievement in general chemistry and have the advantage over course grades of being on an approximately interval level scale. Thus they can be used to assess the general strength of the relationship between performance on the first exam and overall performance in the first semester of general chemistry. A correlation coefficient for the correlation between the first exam and overall course performance from data pooled from all seven sections was observed to be 0.81 (n = 667). In individual classes the correlations were 0.82, 0.84, 0.80, 0.78, 0.83, 0.82, and 0.80. These correlation coefficients are almost twice as high as those for the commercially available exams (13, 14). The correlation coefficients are relatively consistent

Course Performance (Exam 1 Removed) Compared to Mean

depending upon the size of the section. Sections range in size from 40 to 300 students. The typical section is 88 students. The course meets four hours per week, with one hour devoted to recitation or workshop. Most instructors lecture for three hours and then follow that with a recitation period. Some instructors have substituted a workshop for recitation with graded, group workshops. Some instructors have incorporated personal response systems in their lectures, some use PowerPoint presentations, and some use the chalkboard. A common text and a common scope and sequence are used in all sections. Historically approximately 60% of the students enrolled in general chemistry have successfully completed the course with a C or better. The exams are not departmental and do vary from instructor to instructor. Most exams include multiple-choice and problem-solving questions with the preponderance depending upon instructor and class size. Over the past several years we attempted to identify students at-risk of failing general chemistry using a range of diagnostic tests. None of the tests we administered were found to be sufficiently predictive for our student population. We sought another methodology to identify students and tested the hypothesis that the first exam was a powerful predictor of overall course performance.

Standard Deviation

Research: Science and Education 3

R2 ∙ 0.533

2 1 0 ∙1 ∙2 ∙3 ∙4

∙4

∙3

∙2

∙1

Exam 1 Score Compared to Mean

0

1

2

3

Standard Deviation

Figure 1. Scatter plot showing correlation between the first exam and final course performance (n = 667). To examine the influence of the first exam point score on the correlation between first exam and final course performance, the score on the first exam was subtracted from the final course total. These point totals were renormalized and plotted vs the first exam normalized results.

among instructors even though the seven classes were taught by five different instructors.1 Part of the strong correlation between first exam and overall course performance is inherent—the score on the first exam is part of what determines the overall course performance. However, if the first exam is eliminated from the final course score, the correlation coefficient is still 0.73 (n = 667), and the median correlation among the individual classes is also 0.73. A scatter plot with the influence of the first exam removed from the final course performance is shown in Figure 1. This demonstrates that the strong correlation between first exam and final course performance is real and not simply a direct influence of the first exam score on the overall course performance. It should be emphasized that the results described here come from a range of general chemistry sections and instructors. No effort was made to standardize the first exam, and over time two different textbooks were used. Even the style of test was not consistent, as some instructors included multiplechoice questions in their examination, while others did not, and one instructor employed an oral exam in place of the midterm. Grading practices varied significantly among instructors, with the final exam counting between 30% and 70% of the final grade, depending on the instructor. Furthermore, section size varied from 42 to 236 students. The only common variable among sections was the content of the material covered. All first exams included the unit on basic stoichiometry, limiting reactants, and percent yield. High correlations among multiple measures of the same construct (such as achievement in chemistry) are a common phenomenon, even when instructors use varied assessment methods and assess different aspects of the content domain. Our study demonstrates a way to capitalize on the correlations among repeated measures by using early exam results for advisement or placement.

© Division of Chemical Education  •  www.JCE.DivCHED.org  •  Vol. 86  No. 6  June 2009  •  Journal of Chemical Education

739

Research: Science and Education Table 1. Comparison of Prediction of Failure to the Actual Binary Outcome

Implementation: Use of the First Exam as a Diagnostic Tool In fall 2002, we used the first exam in one section of first semester general chemistry2 as a diagnostic placement tool. At the beginning of the semester students were told that the first exam would serve as a diagnostic exam­—students who scored one or more standard deviations below the mean would be required to transfer to a newly created preparatory chemistry course. The preparatory chemistry course carries the same number of credits and both courses were scheduled to meet at the same time. (Our students have little flexibility in their schedule and are often receiving financial aid. Thus students must be able to move seamlessly to the preparatory course without loss of credit and without a change in schedule). Students who were enrolled in the general chemistry laboratory remained in the laboratory even if they were transferred to the preparatory chemistry course. Working with the registrar’s office, students were transferred one month into the semester and their transcript reflected their enrollment in the preparatory course, not the regular general chemistry course. Students who were transferred into the preparatory course then began anew—the first exam grade in general chemistry did not count toward their grade in the preparatory course. On the other hand, the first exam functioned as a typical first exam for those students remaining in general chemistry. Our rationale for requiring students to transfer into the preparatory course was based on two factors. First, in the past we have run an alternative, slower version of general chemistry for students who self-select based on their experience in the first two weeks of general chemistry. Most of the students who self-selected were post-baccalaureate students who had been away from school for several or more years. This is a particularly successful and self-sufficient cadre in our experience. Rarely did undergraduates choose to transfer, regardless of their performance on an early quiz. Similarly, in our current situation undergraduates in the linked courses who perform poorly on the first exam are often extremely resistant to transfer into the preparatory chemistry course. Thus it is our experience that many students at our institution are reluctant to “slow down” even when confronted with likely failure. While a forced transfer may seem harsh, the general benefit to transferred students differs little from benefits for students placed in preparatory courses based on traditional pre-semester diagnostic tests, if the forced transfer decision reflects a generally accurate prediction of failure without such a transfer. The accuracy of our decision to transfer students into the preparatory chemistry course who scored one standard deviation or below the class mean on the first exam can be assessed by referring to the complete pool of data from students in the seven sections taught between fall 2000 and fall 2005, where, owing to scheduling constraints, the preparatory course was not offered. This data set offers us the unique opportunity to test the accuracy of predicting failure from the first exam score, since students with low first exam scores who would have been counseled out had the preparatory chemistry course been offered remained in the course and have known course outcomes. The data set is larger

740

Predicted Outcome

Actual Outcome (% of Total) Unsuccessful

Successful

Total

98 (14.2)

12 (1.70)

110 (15.9)

Success

160 (23.1)

422 (61.0)

582 (84.1)

Total

258 (37.3)

434 (62.7)

692 (100)

Failure

Note: A successful outcome was defined as C or better final grade. A cutscore of –1.0 SD on the first exam was used. Official and unofficial withdrawals were included as unsuccessful outcomes.

(n = 692) than one discussed in the previous section because it includes students who officially or unofficially withdrew from the course. We assessed the accuracy of our prediction that students scoring one or more standard deviations below the mean on the first exam were likely to have an unsuccessful course outcome by tabulating the numbers of students in each of four categories: (i) those at or below one standard deviation from the mean (defined as a ‒1.0 SD cutscore) who had a successful outcome; (ii) those at or below the ‒1.0 SD cutscore with an unsuccessful outcome; (iii) those above the cutscore with a successful outcome; and (iv) those above the cutscore with an unsuccessful outcome. As always, first exam results were standardized within each class and pooled. In our sample, the overall error rate was 24.8% (23.1% + 1.7%, see Table 1). Most of our failures of prediction lay in the area of predicting success, that is, our false positive rate (students allowed to remain in the course based on the exam 1 score who eventually failed). We want to transfer those students who are highly likely to fail, while transferring the fewest number who would otherwise succeed. Thus our main goal was to minimize false negatives rather than false positives. The false negative rate in this data set was 1.7% of the entire sample. Among those students who would be forced to transfer because they scored at or below the cutscore, 11% (12 of 110) would have passed the course with a C or better. Logistic Regression for Advising Diagnostic tests are commonly used to place students in particular courses—to diagnose their current state of knowledge and enroll them in the appropriate class. It is also possible to use diagnostic tests to advise students without a concomitant course placement. Students are then expected to make their own decisions based on the best available data. Legg et al. suggested using logistic regression to construct a metric (16) that is accurate and readily understood and therefore would enable students to make more informed decisions about course choice. While we have chosen a forced transfer process rather than an advisement process based on our experience, we recognize that at other institutions advisement may be the preferred option.

Journal of Chemical Education  •  Vol. 86  No. 6  June 2009  •  www.JCE.DivCHED.org  •  © Division of Chemical Education 

Research: Science and Education



Pi ln 1 − Pi

= α + β Xi + εi

where α is the constant and β is the logistic regression coefficient, and εi represents random error. More details of logistic regression can be found in Legg et al. (16) and in references therein. The extent to which β differs from zero is a measure of the goodness of the model, that is, that exam 1 is a predictor of course outcome. The values of α and β are reported in Table 2. The β value was significantly different from zero at a p  23.69 = 0.95 χ142). This indicates that at least two of the correlations are different or that some contrast of them is significantly different from zero. We repeated this analysis, using data from only the CUNY institutions, including Hunter, on the grounds that variability due to placement practices were different at the large Midwestern university. Based on the χ2 test of the 13 r’s from CUNY institutions, again using the Fisher’s Z-transformation, we did not find statistically significant differences among the r’s at the CUNY institutions and failed to reject the null hypothesis of no differences (χ2 = 11.48 < 21.03 = 0.95χ122). This indicates that among the CUNY institutions, despite different instructors and instructional practices, the relationships between first exam and overall course performance are quite consistent. The large Midwestern University uses a diagnostic examination to place many less able students in an alternative preparatory course, thus restricting the range of both first exam and final course performance variables. A restricted range would be expected to reduce the magnitude of the correlation, providing a possible explanation for the difference in its correlation coefficient from the CUNY institutions. There was also some variability in the logistic regression results among different institutions. For example, in the combined Hunter College data a normalized score of ‒1.5 SD on the first exam predicts a 10% chance of passing the course (see Table 3). But the same ‒1.5 SD for the large Midwestern university predicts a 44% chance of passing the course (data not shown). This difference is a consequence of different passing rates at different institutions. Hunter College has an approximate 60% passing rate compared with the much higher passing rate of the large Midwestern university (~90%). Thus while the relative performance on the first exam is a robust predictor of overall course performance relative to other students, the estimated probabilities of passing are clearly dependent on the course’s overall passing rate. Therefore, predicted probabilities of passing based upon different levels of performance on the first exam should be estimated within institutions where the passing rate is stable over time, in order for those probabilities to be most reliable and useful for advising. Observations This study provides statistical evidence for a strong correlation between a student’s early performance in general chemistry and his or her final grade. In courses with high failure rates, the strong correlation also translates into a fairly accurate predictor of course success. However, the study does not attempt to address why such a correlation exists. The first exam in the college chemistry course includes a substantial amount of stoichiometry and nomenclature memorization, the same factors found by Tai et al. (1) to be the most important predictors of success in introductory college chemistry courses. The analytical thinking and effective use of algorithms required for success in these areas may be central skills necessary to succeed in introductory college courses. Thus process skills, rather than the stoichiometry itself, may be an important factor in predicting success. Confidence issues may also contribute to the predictive power of the first exam. It is well known that many students en-

Journal of Chemical Education  •  Vol. 86  No. 6  June 2009  •  www.JCE.DivCHED.org  •  © Division of Chemical Education 

Research: Science and Education

ter their first science class with self-doubt, and that this attitude can significantly affect their performance. Dweck and Leggett (17) have described two major behavioral patterns in response to achievement challenges: a maladaptive “helpless” response and a more functional “mastery oriented” response. The response elicited (helpless versus mastery oriented) is dependent on the situation. Diener and Dweck (18) have reported that when students are presented with a performance situation3 and given feedback that they are of low ability (as, for example, scoring well below the class average on the first exam), they exhibit a “helpless” response. Helpless students tend to view their difficulties as failures and view them as insurmountable. In other words, students who start with a low confidence level and then find their assessment of their abilities reinforced by poor test performance may find it difficult to recover the confidence needed to become effective learners. More recently there has been a striking demonstration that even a brief 15-minute intervention intended solely to improve students’ sense of personal adequacy can have a powerful effect on student performance (19). Conclusions Our results suggest that student performance on a first exam that covers nomenclature and stoichiometry is a robust predictor of subsequent student success in general chemistry. The first exam can be used as a placement tool or to create a probability metric for use in student advising. The effectiveness of the first exam as a predictor raises important questions about preparing students for general chemistry and possibly for other problem-solving courses in the sciences. Most importantly, what are the causes underlying the observed correlation? To what extent are these results applicable to other science disciplines? How do we compare with other liberal arts in the academy? To what extent do students “come back” after a failing experience? While these are questions chemistry educators have sought to address for years, this study focuses on the first month in the chemistry course as critical for subsequent success. Notes 1. The classes with correlations of 0.82, 0.84, and 0.80 were taught by three different instructors, while two other instructors each taught two classes (0.78 and 0.83; 0.82 and 0.80). 2. This section of general chemistry was not one of the seven sections used in the prior section. The set of data used to study correlations between first exam and course performances were from seven sections that did not have a preparatory chemistry section linked to the course. Because of space and financial limitations, we are not always able to associate a section of the preparatory course with a general chemistry section. Thus between fall 2002 and fall 2005 we had many sections of general chemistry (some of which were included in this study) that had no linked preparatory chemistry section. These unlinked sections act as a control group to evaluate the effectiveness of the preparatory course intervention.

3. A performance situation could be a test when it is used to evaluate performance in the course. The alternate could be viewed as a task that was intended to develop learning, rather than to evaluate learning that had previously occurred.

Acknowledgments This work was supported in part by a Math and Science Partnership grant from the National Science Foundation, EHR-0412413. The authors would also like to thank one of the reviewers for providing data essential to this article. Literature Cited 1. Tai, R. H.; Sadler, P. M.; Loehr, J. F. J. Res. Sci. Ed. 2005, 42, 987–1012. 2. Tai, R. H.; Sadler, P. M.; Mintzes, J. J. J. Coll. Sci. Teach. 2006, 36, 52–51. 3. Lewis, S. E.; Lewis, J. E. Chem. Educ. Res. Prac. 2007, 8, 32–51. 4. Brooks, D. W.; Albanese, M.; Day, V. W.; Koehler, R. A.; Lewis, J. D.; Marianelli, R. S.; Rack, E. P.; Tomlinson-Keasey, C. J. Chem. Educ. 1976, 53, 571–572. 5. Fanzio, F.; Zambotti, G. J. Coll. Sci. Teach. 1977, 6, 154–155. 6. Bunce, D. M.; Hutchinson, K. D. J. Chem. Educ. 1993, 70, 183–187. 7. Kotnik, L. J. J. Chem. Educ. 1974, 51, 165–167. 8. Ewing, G. ERIC Document Reproduction Services ED281776, 1986. 9. Deal, W. J. J. Coll. Sci. Teach. 1983, 13, 154–156. 10. Wagner, E. P.; Sasser, H.; DiBiase, W. J. J. Chem. Educ. 2002, 79, 749–755. 11. Pienta, N. J. J. Chem. Educ. 2003, 80, 1244–1246. 12. Russell, A. A. J. Chem. Educ. 1994, 71, 314–317. 13. Karpp, E. R. Validating the California Chemistry Diagnostic Test for Local Use; Paths to Success, Vol. III; Glendale Community College Planning and Research Office: Glendale, CA, 1995. 14. Gaither, L. ERIC Document Reproduction Services ED025256, 1968. http://www.eric.ed.gov:80/ (accessed Feb 2009). 15. Bentley, A. B.; Gellene, G. I. J. Chem. Educ. 2005, 82, 125–130. 16. Legg, M. J.; Legg, J. C.; Greenbowe, T. J. J. Chem. Educ. 2001, 78, 1117–1121. 17. Dweck, C. S.; Leggett, E. L. Psychological Rev. 1988, 95, 256–273. 18. Diener, C. I.; Dweck, C. S. J. Personal and Social Psych. 1978, 36, 451–462. 19. Cohen, G. L.; Garcia, J.; Apfel, N.; Master, A. Science 2006, 313, 1307–1309.

Supporting JCE Online Material

http://www.jce.divched.org/Journal/Issues/2009/Jun/abs738.html Abstract and keywords Full text (PDF) Links to cited URL and JCE articles

© Division of Chemical Education  •  www.JCE.DivCHED.org  •  Vol. 86  No. 6  June 2009  •  Journal of Chemical Education

743