Assigning grades to absent students-revisited - Journal of Chemical

Apr 1, 1991 - Keywords (Domain):. Curriculum. Keywords (Pedagogy):. Testing / Assessment. Keywords (Subject):. Administrative Issues. View: PDF | PDF ...
1 downloads 0 Views 1MB Size
1

Assigning Grades to Absent Students-Revisited John F. Beck and J. Christopher Beck St. Francis Xavier University, Antigonish, NS 826 1CO. Canada In a recent paper in this Journal' an apparently sound statistical method (the Z-score method) was presented to assign a grade for a student who, for valid reasons, misses written tests or laboratory work. We were immediately interested in this method because the freshman chemistry courses a t this university have used, for a number of years, a mark estimation procedure developed by one of us (JFB). The purpose of both these efforts is to take into account the relative difficulty of the missed test. Rather than simply averaging the scores of other tests written by the student or rearranging the mark weightings, these procedures attempt C ~ equitable estimated score for the missed to D ~ O ~ Ua more test. Our procedure (the average of ratios method) averages the ratios of the student's score to the class average for testa written and then applies that mean ratio to the cl& average of the missed test. This procedure is not as statistically refined as the Z score method and may not be rigorous in all cases. In fact, its accuracy has not been tested other than that the results usually "seem" fair. However, its purpose is readily appreciated by students who have no experience with statistical mathematics. In order to determine the absolute and relative worth of these two methods and their comparison to simple averaging (the average method), we constructed a test based on the records of test scores from first-vear chemistrv courses a t this university. Nine such records were available to us for courses taught by three professors from 1986 to 1990. These data srLs are of varying completeness, the earlier ones record onlv the scores of the two midterm (.I - h. ) and two end-ofterm (2- and 3-h) exams, while the most recent ones contain the scores of all tests, six to eight depending on the course. Four further mark sets were produced by extracting from the 1988189 and 1989190 records the results of homogeneous format testa given in any one course. These are of two types, two sets of four 30-min multiple-choice clasa tests and two sets of four l-h, written, largely prohlem-solving tests given outside of class periods. The enrollments vary from 58 to 143 students and include students with various levels of chemistry and mathematical experience. The format of the test was to consider those students who had taken all of the tests in any course and t o use (n - 1) of such a student's scores to predict the nth score. This was done for eachmethod, through each mark set. Over 4000test results from 772 students were analyzed. The predicted scores were compared to the actual scores in two ways: l-by determining the mean absolute difference between the oredicted and actual scores and 2-by determining the mean difference with standard deviations between predicted and actual scores. The test was done on a single basis for each student, i.e., randomly choosing one of hisher scores as the missed test, and additionally, for the 1988189 and 1989190 records, on a multiple basis, i.e., estimating each of a student's test scores in turn based on the remainine scores. The entire set of calculations was carried out on MicroVAX 3400 computer using a program written in MODULA-2. The comparison of the variances, using the F-test, of the results from the three methods was carriedout usingaSAS2statistical package (SAS VMS 5.18) on the same computer. The results3 show that the Z-score method usuallv oroduces the lowest mean absolute differences (comp&on method 1 above) hetween the predicted and actual scores

a

(7.5 to 10.7) while the averaging method produces the largest (8.9 to 13.2). However, there is generally less than a %point spread among the means of the three methods for any mark set. When comparing mean differences (comparison method 2 above) for the mark sets with over 200 predictions per set, the Z-score and the average of ratios methods both produce zero mean differences. That is, the average of the predicted scores for the set equals the average of the actual scores. With smaller sets, these zero mean differences are not seen, and no method is consistentlv suoerior. In several cases. simple averaging produces thk smallest mean difference: The standard deviations range from 10.4 t o 15.0. reeardless of set size, indicating a far-6om-acceptable precis6n. The lack of precision is evident regardless of cornoarison method or characteristics of the set. Neither sets with larger student numbers nor those with larger numbers of tests nor those of. the homogeneous format reits demonstrate better precision. Thus, individual predictions are completely untru*tworthy. Staristical analvses of the intermethod relationshio of the predictions reveai that the outputs of the three methbds, for .each of the data sets, are not significantly different from each other a t the a = 0.05 level. Thus, once again, i t is evident that the more complex Z-score and average of ratios methods donot produce any more accurate score predictions than does simole averaeine. In anvcase. the chances are that a predicted s c k e will be suhstan~allydifferent from n,hat a student would have actually earned. In trving to understand the negative verdict of these results, we helieve that the fault may lie in the underlying premise of the whole operation. ~ h a i i si,t is assumed that on every test a student will generally score in the same range in relation to hisher classmates. 1n as heterogeneous a course as introductory chemistry it is probably not correct to assume that a student's approach to, and understanding of, such diverse topics as balancing equations, quantum mechanics, reaction kinetics, and organic functional groups will occur in the same way or to the same extent each time. Thus the average student's oerformance on tests of these tooics will vary, and not necessarily in line with therest of theclass. Indeed, if the above assumotion were valid. one could imaeine that after building up data base of perhaps five or six test scores, future tests could be administered to one designated test writer and all other student scores in the class calculated from that result. Clearly, i t is absurd to contemplate such a strategy, even in fairly homogeneous courses. In conclusion, it would seem that an accurate test score estimation method is not currently available and that such a procedure would require far more complex considerations than student and class averages. here fore, the present choice of a mark estimation method, if a mathematically based svstem is desired. should be based on ease of imolementation and the students' perception of fairness. 1nbur exnerience.. s i m* ~ l eaveraeine or alterine the weiebtine" of other tests is not popular, but our average of ratios method has been well received.

a

- -

-

' Adams, P. R. J. Chem. Educ. 1989, 66.717.

MCCIave, J. T.; Dietrich, F. H., 11. Statistics, 4th ed.; DellenMacmillan: San Francisco. 1988; Chapter 9. A summary of the results in tabular formis available from the first author. Volume 68 Number 4 April 1991

319