Predicting Students at Risk in General Chemistry Using Pre-semester

Dickson Institute for Health Studies, Carolinas HealthCare System, Charlotte, NC 28201. Warren J. ... General chemistry courses, as well as other larg...
0 downloads 5 Views 114KB Size
Research: Science and Education

Chemical Education Research

Predicting Students at Risk in General Chemistry Using Pre-semester Assessments and Demographic Information Eugene P. Wagner*† Department of Chemistry, University of Pittsburgh, Chevron Science Center, Pittsburgh, PA 15260; [email protected] Howell Sasser Dickson Institute for Health Studies, Carolinas HealthCare System, Charlotte, NC 28201 Warren J. DiBiase Department of Middle, Secondary, and K–12 Education, University of North Carolina at Charlotte, Charlotte, NC 28223-0001

Prediction of student performance in the classroom is an important area of educational research. Its impact on educators’ ability to identify students who may have difficulty grasping the material presented in a course is potentially very large. General chemistry courses, as well as other large service courses, serve an eclectic mixture of students with different social and educational backgrounds. Identifying students at the beginning of the semester who are likely to have difficulties in such a course is an important yet difficult task. Background Interest in predicting student performance in general chemistry dates back to the very early years of this Journal (1–6 ), and the ability of the Toledo Exam (7 ), the California Chemistry Diagnostic Test (8), the SAT (9, 10), and the ACT (11, 12) to predict student performance in general chemistry courses is well documented. High school GPA and grades achieved in high school chemistry and mathematics classes have been used with some success (13). Logical reasoning ability (14 ), cognitive style (15–17 ), and level of intellectual development (18, 19) have also been investigated for their prediction of scholastic performance. Most of this work reports the correlation of predictive factors for all participants in the study with the grade earned for the course. The results have led to some generally accepted conclusions, such as the correlation of mathematical ability and SAT scores with general chemistry success. Despite such findings, there remains much to resolve in this area. Low correlations of a given prediction instrument with academic success or grade achieved are typically accepted as adequate. Further, what may have been a good predictor of success in the past may not apply today. Background knowledge, demographics, and environment of the students enrolling in general chemistry today have changed from just 10 years ago and certainly have changed over the generations. The specific topics taught in prior general chemistry and mathematics courses and the teaching methods employed influence students and potentially affect their success in subsequent chemistry courses. This is a very dynamic system, and prediction instruments need to be responsive to changing conditions if a high † The work described here was conducted while the author was affiliated with the University of North Carolina at Charlotte.

correlation to academic performance is to be achieved and maintained. The purpose of this work is to move the state of the art in this direction, proposing a means of predicting which students are at risk for failure in first-semester general chemistry for science majors. It is distinct from past research in several respects. First, the instrument focuses on prospective identification of at-risk students to facilitate the introduction of intervention programs designed around this population’s needs. Second, it uses in concert the aspects from many instruments that have been proved to be correlative to performance, such as mathematics ability, conceptual ability, chemistry ability, and demographics, to create a comprehensive prediction instrument. This work identifies and compares the significance of these variables in predicting students at risk for failure in general chemistry for science majors. Suggested future directions for this area of research are given, based on the interpretation of the results and conclusions reported here. The Setting The university at which this research was conducted is a comprehensive four-year institution with a student population of approximately 17,000 (expanding at an annual rate of 3–4%). The ethnicity in the first-semester general chemistry course for science and engineering majors (Chem1251) is similar to that of the university: 16.92% African-American, 0.43% American Indian/Alaskan native, 4.36% Asian/Pacific Islander, 1.67% Hispanic, 3.14% nonresident alien, and 73.40% Caucasian. Chem1251, like the campus generally, is split nearly equally by gender. Approximately 800 students enroll in Chem1251 each year. The average class size for daytime sections is 84, with a range of 32–132 over the past seven years. For the past six years, the chemistry department has experienced a steady—but alarmingly high—number of D and F grades assigned to Chem1251 students. Consequently, there has been considerable interest in identifying students likely to do poorly and developing intervention programs that address their needs at the beginning of the semester, before any graded assignments are given. To ensure that the department has not set unrealistic standards for Chem1251, the ACS exam for first-semester general chemistry has been employed as the final exam in the course, and the final exam grades have been curved to correspond to the national percentiles reported for the ACS exam.

JChemEd.chem.wisc.edu • Vol. 79 No. 6 June 2002 • Journal of Chemical Education

749

Research: Science and Education

In the fall 1998 semester, Chem1251 students averaged in the 47th percentile of the nation on the ACS exam, and the average grade point for the course was 1.78 on a 4.0 scale. While at first glance, this “C” average does not appear to show serious deviation in student performance from other introductory science and mathematics courses across the campus, there is typically a bimodal or skewed distribution of grades, with a relatively large number of D and F grades during the past five years. Although the chemistry department does not wish to lower its standards, it does want to increase the student success rate in Chem1251. Steps to achieve this aim include the initiation of course-associated interventions such as required problem-solving sessions, basic mathematics skills review, and programs designed to increase logical reasoning skills. For the past three years, approximately 80% of the students who failed the first exam in Chem1251 failed the course. Therefore, accurate identification of potentially low-performing students at the start of the semester is vitally important to trigger interventions early, and in any case before the first exam. On a global level, identification of these at-risk students is also important to the maintenance of quality and rigor in course teaching. Background The overall correlation between grades for a course and pre-assessment scores has been reported to be only in the range of .30–.60 (7, 8, 10, 14 ), and the correlation for a specific population of students can be even lower. We believe that the overall correlation between the grades earned in Chem1251 and a pre-semester skills assessment is inconsequential compared to the development of an accurate predictor that identifies students at risk while minimizing the false identification of students who will pass the class despite predicted poor performance. In the fall semesters of 1993–1995, the Toledo exam was used as a pre-semester assessment, and the results were used to divide the students into two categories, pass Chem1251 with a grade of A–D or fail Chem1251. The Toledo exam correctly predicted 75.1% of the students overall. However, after the optimal cutoff score on the Toledo exam between the pass and fail categories was determined, the exam correctly predicted 87.0% of the students in the pass category but only 27.7% of the students in the fail category. In an attempt to better predict which students would fail, the third portion of the Toledo exam dealing with specific chemical concepts was eliminated from the pre-semester assessment administered in the fall of 1995. This adjustment did not change the predictive capability of the Toledo exam in either the pass or fail category. Since the Toledo exam correctly identified only one of four students who subsequently failed Chem1251, the department ended its use after the spring 1995 semester. While the Toledo assessments of mathematical ability and chemical knowledge were useful in predicting performance in Chem1251 for students in the pass category, predicting which students would fail appeared to be much more complex. This led us to investigate the variables that affect performance in Chem1251 and help predict which students will most likely receive a failing grade without intervention. The goal was to develop a pre-semester assessment for students enrolled in Chem1251 that would identify at-risk students (sensitivity) without triggering unnecessary interventions for students who would ultimately have been successful in the course without them (specificity) (20). 750

Methods Development of the student pre-semester assessment (SPSA) for the Chem1251 course was directed by the following factors. 1. The connection between mathematical ability and success in general chemistry has been well documented and must be part of the assessment exam. 2. Although Chem1251 does not have any chemistry prerequisites, approximately 95% of the students enrolled have had at least one chemistry course. Therefore, evaluation of student knowledge of very basic chemistry principles should be included in the assessment. 3. The results of the Toledo exam verified that chemistry knowledge and mathematical ability are important in predicting success in Chem1251, but other factors such as verbal scores on the SAT, student environment data, and student academic background should be investigated as possible predictors of achievement in Chem1251. 4. Research has shown that critical thinking skills are correlated with success in general chemistry. Therefore, the pre-semester assessment should include questions that allow students to use these skills to solve the problem presented. 5. The validity of the assessment exam must be comparable to other well-established assessment exams in current use, such as the SAT and the Toledo exam.

The preliminary version of the SPSA was a multiplechoice exam consisting of 10 mathematics questions, 10 chemistry questions, and 8 demographic questions (Appendix). The mathematics questions dealt mainly with algebra and conversion between units and were designed to be answered without the aid of a calculator. The chemistry questions were designed so that students with minimal previous chemistry knowledge might still be able to analyze them and determine the correct answers. Ultimately, it was the goal of the SPSA that mathematics skills, analytical ability, and chemistry background knowledge be used in concert to arrive at correct answers. The demographic questions assess the following areas of a student’s environment and background: 1. Highest level of mathematics taken in the past or currently 2. Number of semesters of chemistry taken prior to Chem1251 3. Year in college (freshman, sophomore, etc.) 4. Age 5. Population of high school town 6. Involvement in university-sponsored activities such as a sports team or club 7. Declared major 8. Anticipated number of hours at work per week for the current semester

The demographic questions were presented in a multiplechoice format with five categorical classifications for each question. For example, the groupings for age were 15–18, 19–21, 22–24, 25–28, and older than 28.

Journal of Chemical Education • Vol. 79 No. 6 June 2002 • JChemEd.chem.wisc.edu

Research: Science and Education

The SPSA was administered during the first week of each semester from spring 1998 through fall 1999. Use of calculators was not allowed—all computational questions were designed to be answerable without complex computations. SAT scores and predicted grade point indices (PGI) were obtained from the registrar’s office. The PGI is a multivariate regression model developed by the mathematics department at the study institution; it uses SAT score, high school GPA, high school class rank, and GPA at the university to predict student success in course work at the university. The SPSA was compared to SAT, PGI, and the Toledo exam for ability to predict the students who would fail Chem1251 while minimizing the number of students who passed Chem1251 but were predicted to fail. The goal was not to predict the grade a student might obtain in the class, but rather to simply and accurately identify students at high risk for failing the course so that an intervention could be properly integrated within the first week of the semester. The SAT, PGI, SPSA, and Toledo exam were first evaluated and compared to each other for their predictive value in Chem1251. A cutoff score between pass and fail for each instrument was selected to yield approximately 90% correct prediction in the passing category (i.e., no more than 10% of those predicted to pass actually failed). The relationship of the demographic data to course performance was analyzed through a two-step process. First each question was compared to course grade through simple (one predictor variable) ordinal logistic regression (SOLR) analysis (21). This technique is an extension of conventional logistic regression, permitting use of a response variable with more than two categories. It is assumed that the categories are on a gradient (i.e., ordinal), and the measures of association are interpreted to represent the odds of a one-unit (or level) change in the response variable given a one-unit change in the predictor variable. The demographic variables that were significantly associated individually with course performance were then analyzed as a group using multiple (two or more predictor variables) ordinal logistic regression (MOLR) analysis. Those that remained significantly associated with course performance in the multiple model (i.e., after adjusting for the other variables in the model) were then combined with each of the objective instruments (PGI, SAT, SPSA) to create regression models for the final comparison of prediction accuracy. Since the data for the Toledo exam were taken from previous semesters when specific demographic data were not collected, the Toledo exam was not included in this portion of the evaluation. The regression model comparison was conducted in a crossvalidation format in which each semester’s data were used to construct a regression model, and then validated using data from the other three semesters (22). For example, the spring 1998 data were used to create a regression model, and then the performance of the model was tested on data from the fall 1998, spring 1999, and fall 1999 semesters. Again, a cutoff score between pass and fail was selected to achieve approximately 90% correct prediction in the passing category. The results of the regression models for the four semesters created for each instrument were combined to give overall average values by instrument of the proportions correctly or incorrectly predicted to pass or fail. These four proportions were used to assess the likelihood that:

1. The tests would correctly predict failure for a student who eventually failed (high sensitivity). 2. The tests would correctly predict passing for a student who eventually passed (high specificity). 3. A student who was predicted to fail actually did so (predictive value of a positive test).

Results and Discussion SPSA scores and Toledo exam scores were available for all students in Chem1251 because these pre-semester assessment instruments were administered by the chemistry department during the first week of the course. The data in Table 1 show that the SPSA most accurately predicted which students would fail Chem1251 (compared to the next best, PGI, χ 2 = 7.58, p = .006) and had the third best overall prediction percentage. The Toledo exam had slightly better overall reliability but did not perform as well as the SPSA for the target subjects. The PGI, which was based on the SAT and was designed to identify high-risk students applying for admission to the university, was more accurate than the SAT alone. Although its accuracy overall and in identifying students likely to fail still fell short of the SPSA’s, the added value of factoring in high school GPA, high school class rank, and college GPA is evident. Unfortunately, PGI data were available for only 67% of the students enrolled in Chem1251. SAT scores were available for 85%, and the MSAT proved very reliable in predicting correctly regardless of outcome and in predicting which students would pass. The total SAT (TSAT) and verbal component (VSAT) performed similarly in predicting passing, but were relatively poor predictors of at-risk students and of outcomes overall. Analysis of each question on the SPSA is shown in Figure 1. The goal was to obtain a discrimination index greater than 0.20 for each question. The discrimination index measures the performance on the item for students who did well on the test overall relative to those who did poorly. A relatively high discrimination index for a specific question indicates that most test-takers who answered the question correctly did well on the test as a whole and that most who answered incorrectly did poorly overall. The ACS standardized exams attempt to optimize the discrimination index for each question between 0.30 and 0.50, discarding any questions below a 0.20 value (23). Our results indicate that all SPSA questions

Table 1. Pre-semester Assessment Comparison Instrument Name

Scores Cutoff No. of Availaable Score (%) Students

SPSA

Specificityb

Sensitivityc

Predictive Accuracyd

1356

100

45%

87.7

40.8

74.1

904

67

2.16

84.0

29.4

69.9

Toledo Exam

1024

100

51%

87.0

27.7

75.1

TSAT

1149

85

860

89.6

16.8

69.2

MSAT

1149

85

440

91.3

23.1

75.0

VSAT

1149

85

400

90.9

10.4

67.4

PGI

aPercentage

of total Chem1251 population. bPercentage correctly predicted to pass Chem1251. cPercentage correctly predicted to fail Chem1251. dOverall prediction percentage.

JChemEd.chem.wisc.edu • Vol. 79 No. 6 June 2002 • Journal of Chemical Education

751

Research: Science and Education

100

Average (%)

Avg = 64.2, Range = 20.1-89.5

Discrimination Index

Avg = 0.42, Range = 0.10-0.71

0.8

90

0.7

80

Average (%)

0.5 60

0.4

50

40

0.3

30

Discrimination Index

0.6 70

0.2 20 0.1 10

0.0

0 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Question Number Figure 1. Analysis of the student pre-semester assessment (SPSA) questions. These data were obtained from 618 students from the fall 1998 and spring 1999 semesters. Refer to the text for the revisions made to question 19.

achieve the minimum 0.20 standard except question 19. It was phrased: “Which of the following is most likely to be an ionic compound? (a) NF3 (b) NaCl (c)CO2 (d) CN (e) CH4”. In the fall 2000 semester, this question was revised before administration to students in the general chemistry course for science majors at the principal author’s current institution. The revised question was phrased: “An ionic compound is created by joining a metal and nonmetal together. Which of the following is most likely to be an ionic compound? (a) NF3 (b) FeCl3 (c) CO2 (d) HCl (e) CH4”. This revision increased the discrimination index from 0.10 to 0.52. Results for all other questions were very similar to results obtained at the original test institution. This indicates that the added information helped students to analyze the question or to recall relevant ideas that they learned in previous chemistry courses. Regardless of the manner in which the revised question assisted the students, it still tested their knowledge of the periodic table and elemental symbols. These types of skills are essential for successful completion of a general chemistry course. SOLR analysis of the demographic data showed that mathematics background, chemistry background, year in school, and age were the only independent variables significantly predictive of performance in Chem1251 (Table 2). These four variables were combined in MOLR analysis. MOLR analysis indicated that mathematics background was most strongly predictive of outcome in the course; this was followed by chemistry background and age. As might be expected, increasing experience and age (presumably a marker for experience) were all predictive of a better outcome. Year in college was not a significant predictor when these variables were combined in the analysis (Table 3). The three statistically significant demographic variables were then entered into a set of MOLR models, each containing a variable for one of the objective assessments. The response variable in all cases was the grade received for the course. 752

The cutoff score between pass and fail was based on a 0–4 gradepoint scale; grades of 1.0 (D) or lower were considered failing. These regression models were used to predict grades and place students into either the pass or fail category (Table 4). It is possible that the modest improvement in the SPSA’s predictive power when demographic data were included reflects the tool’s comparative strength when analyzed alone. As an assessment’s ability to predict at-risk students with objective questions increases, it might be presumed that the qualities represented indirectly by the demographic variables (such as problem-solving capacity as represented by age) also are measured with increasing precision. For example, the number of previous chemistry courses and age were significant predictors of grade outcome until placed in a model with the SPSA. After accounting for the objective factors measured directly by the SPSA (mathematics knowledge, chemistry knowledge, and analytical ability), previous chemistry courses and age no longer contributed meaningfully to the model. Age, mathematics background, and chemistry background were significant variables in the SAT and PGI MOLR models. This suggests that some aptitude factor subsumed by the demographic variables is not adequately measured by the SAT or PGI. The advantage of not using background and environmental data to predict academic performance is that misclassification with respect to the qualities of real interest (knowledge, analytical skill) on the basis of imprecise or confounded categories is minimized. Also, reliance on information of this sort might be perceived as stereotyping, leading to discouragement among students and the prospect that a proportion of students would fail merely because they had been identified as being at risk of failure (Hawthorne effect). Although a similar phenomenon could arise from the use of more objective assessments such as the SPSA, SAT, or Toledo exam, it arguably would be less severe.

Journal of Chemical Education • Vol. 79 No. 6 June 2002 • JChemEd.chem.wisc.edu

Research: Science and Education

Table 2. SOLR Analysis of the Demographic Data from the Spring and Fall 1999 Semesters Variable

Odds Ratio

p

Math background

1.71