M. Pickerlngl a n d S. L. Goldstein2 Columbia University New York, New York 10027
The Educational Efficiency of Lab Reports
Traditionally the success of a curriculum change has been measured by the improvement of student achievement. The administrative efficiency and student reaction are considered in a minor way, but the major rriterion for optimization has been the"amount learned." In this paper we propose that the evaluation of a rurrirulum change should not be based solely on the change of achievement level, hut also on the "educational eft'irirncy." If one method can teach the sameamount of material in 10 hours of student labor, while another uses 15, it. -is rluar t one that is often for. ~~-~~whirh is hetter. This ~ o i n is gotten. Frequently c u r r i c ~ l u m ~ c h a n ~ that e s produce documentable improvement in achievement do so a t the cost of vastly increasing student time input. (This is true for most Keller nlan courses.) How can we make this concept of education efficiency less vaeue? When we discuss the efficienw of a curriculum change, weUareasking if the extra time put in by the student gave a oronortionate amount of extra achievement. We cannot, of . course, measure either the time input or the real achievement level of the ~ t u d e nand t must replare these imponderables by observables. The time input can be approximately measured hv a well desimed questionnaire. I t is important tn realize that this report nlay not be accurate, and to distinguish the real time input from that reported, we willcnll the latter "reported workload" ....- W. As for achievement. we are forced to fall back on test scores, and to assume that the achievement level is in some way proportional to the test scores S. T o evaluate the efficiency of a curriculum change we must compare two ratios. The first ratio R I is the ratio of test scores to reported workload in the control group. The second is the ratio of these quantities ~roducedas a side effect of curriculum
change. Denoting the control group with subscript c and the experimental group with subscript x, the first ratio R1 will be
and the second ratio Rz will be
Dividing Rz by R I gives a "success factor" p
~
.
1 Current address: Department of Chemistry, Princeton University, Princeton, New Jersey 08540. Current address: Department of Geology, Harvard University, Cambridge, Massachusetts 02138.
If the "success factor" so defined is greater than one, it indicates that Rz is greater than R1, and hence that the improvement of test scores is more than proportional to the change in reported workload. In this case, the gain in achievement is greater than would he anticipated from the change in reported workload. The use of the "success factor" reduces demands for accuracy in the data because only extreme cases are of interest. All changes which have success parameters of about one are "neutral"from an efficiency point of view, and no clear choice of alternatives can be made on an efficiency basis. If we are to optimize educational efficiency, lab courses offer a eood lace to start. The actual lab session is generally consi&red*to he a less concentrated learning experience than a lecture course; the report writing is tedious and time consuming. Traditionally this problem has either been ignored, or "fill in the blank" type reports have been used. We feel that the traditional report is a useful learning tool. It forces the
Volume 54, Number 5, May 1977 1 315
student to think about the experiment, and to practice an essential part of the scientific experience-the reduction of raw data to a conclusion. In this paper we will show that the inefficienry of lab report writing arises not from the intrinsic nature of the reports, hut rather from their use as a grading tool. We will alsoshow that test based gradingcan doas well a.lab repon madiny, without so much student agony and time . . wastage. We report an example of an efficiency evaluation of a curriculum change involving lab report grading. Although one mieht prefer not to think of grading changes as curriculum . . rhangrs, it is clear that a change in grading system requires that a different set uf goals be met, and rherefore massively affects how students spend their time. The change made was relatively minor; yet it had drastic effects in tprms of effiriency and workload. This study has only local validity-it would have to In repeated at other institutions, and the results might br quite different. Nevertheless it is an interesting example where traditional achievement based evaluation and the new efficiencv method rive strikingly different answers about whether; curriculu& change isworthwhile. The Curriculum Change
Columbia offers two freshman lab courses, a large day session of 500 students and a smaller night s&sion-of 5 6 1 0 0 students. The two courses are under different teaching assistants but do exactly the same experiments, and take exactly the same exams. For the first semester hoth groups followed the experimental grading system to he explained below, but for the second semester, the night session returned to a more traditionalsystem. I t was during this semester that the comparison was made. The experimental grading system used lab reports solely as a learning tool. The TA's continued to check them, but this consistrd only of marking errors, rather than putting scores on papers. Students were expected to correct errors in reports, attaching an appendix if necessary, and resuhmit them under penalty of loss of une or more grade steps in the course grade. Often the errnrs would be ouite minor. and the time taken for the recheck minimal. For the group that returned to the traditional lab report based grading system, the lab report grades determined 40% of the course grade. All reports were graded piecemeal and assembly line fashion by the TA's. For example, one TA might specialize in grading graphs according to a standard set of criteria. The total points assigned by each TA were added a t the end as for an examination. This procedure was used simply to remove section differences that might obscure the results. No regrades were allowed. In both groups, at the end of both semesters, an open book lab exam was used to measure a student's comprehension of what he had done. We chase the ooen book format to make the test more like real chemical research, and to emphasize the contrast with the lecture course, which has closed bwk exams. Students were expected to bring their corrected reports, lab handouts, and lab notebook as a "text." Questions asked students to do such thlngs as order of magnitude problems, error analvses. and data reduction, but most questions were of thr surt;"if you used dark beer instead of light heer on step three. what effert w)uld it have on the result?" A number of asked about properties of substances that they had seen and recorded in their lab notebooks. We do not claim this test measured all of the skills derivable from the lab-the manipulative skills were tested by two practical exams.3 The written exam counted as one half the grade for the experimental group and one third of the grade for the control
-FIOUD..
Reported workload for hoth groups was compared by asking the students in each course how much time they had put inon
Viekering, M.,and Kolks, G., J. CHEM. EDUC., 53,313 (1976). 316 / Journal of Chemical EducaNon
each report. This was done a t the end of the term, anonymouslv. so that there would be no obvious conflict of interest that would lead to exaggeration. From previous attempts to measure student time input, we had discovered that such a questionnaire would he useful only if very detailed and task related. It is necessary to ask the amount of time needed for each report separately, and in the case of the experimental group to ask about time needed for regrades. Such information shows immediately the relative difficulty of one report compared to another. Since the workload questionnaires are anonymous, we cannot match reported workload to achievement on a student by student basis. This means that small differences in reported workload may not be significant and should be ignored in studies of marginal efficiency. For the Durnoses of data treatment. each of the students in the group ihai had returned to the traditional curriculum was matched with a student from the lareer croup . remainine on the experimental system. The matciing was based on first semester scores on hoth written and practical exams. The results of the workload study for the whole class and the test scores for the matched sample a t the end of hoth semesters are shown in the table. Discussion
As seen in the table, the change in workload W , - W , was 30 - 71 = -41 hr, and the change in exam score S. S, was 41.1 - 47.1 or -6.0. This eives a success factor of -4. The experimental group had a lower achievement level, but there was a striking gain in efficiency because of a drop in workload. Thus, the efficiency criterion rates this curriculum experiment as "successful." the achievement taken alone as "unsuccessful." The data on which the evaluation is based are not completely unambiguous. The achirvement improvement in the traditional group . may reflect the additional study and preparation time reported by the group. I t could also r e f l e c a placeboeffect because of the change of gradingsystems a t the end of the first semester experienced only by the traditional group. I t may reflect differences in temperament or motivation. It may well be that the achievement gain is more related to these items than to the change in grading system, in which case the true increase in efficiencv is even hieher. The workload figures must also be interpreted with some caution. Thev are based on the whole class undereoine the experimentai program, not that reported by t h e m a c h e d samole arouo. The differences between those in the two courses gavecaused relatively minor differences in reported workload between the two classes in past years, although achievement levels have always been strikingly similar. But in this case the differences in reported workload are much greater, and focussed on the reports. We must prove that a test of any sort can be a valid measure of achievement in a lab course. One cannot measure achievement in the abstract, so this question has a metaphysical
-
-
Workload and Scores ~ r a d i t i o n a Grading l system (Control)
Experimental Grading Syrtem
Mean tert rcorer at end of first semester Mean tert rcorer ISc and
Srl
Mean time rpent on reports at the end of second semester ~ e a time n spent on preparation and study Mean total outride work for course ( W c and W,) Students in sample
57.5
57.4
47.1
41.1
5 4 hrrlrtudent
20 hrglrtudent
~emener
17 hrrlltudent semester
71 hrrlrtudent semester
54
~ l n c l u d e time r rpent on reports t o be regraded.
semester
lo hrrlrtuaent semester
30 hrrlrtuaent gemester
54
quality. But individual test scores can he compared with corresponding lab report grades. We found a dose relationship between the two, with a correlation coefficient of 0.441. This indicates a t the 99.5% confidence level that the grades are correlated. We also examined the differences hetween z scores (mean = 0,standard deviation = 1)on the lab reports and on the exam. The distribution was normal with the exception of a half dozen students who did very much hetter on the exam than on the reports. The median difference in z score was 0.5, not large enough to make a real difference in course grade, and the lab reports and the exam grades clustered closely. These arguments show that most students will get the same grade in the course no matter which system is used. I t does not show that the two measure the same thing. This is not the place to argue the philosophical and normative questions of what a student is to learn from the course. Rather we conclude that if reports are traditionally considered to he a satisfactory grading method, then the exam must also he satisfactory. It is, however, strange that the exceptions were all on the side of high exam grade. Most of these cases were bright hut easygoing people who did not put the effort into the reports. It is our feeling that report grading favors the "plodder" who impresses the TA with volume, and the exam is a hetter measure of scientific comprehension. A few people seem to he able to understand the experiments without making a massive effort on the reports. This is probably satisfactory from an educational point of view. I t is worrisome that these people, who are probably brighter than average, would he hurt under a grading system where reports are heavily weighted. Student reaction was interesting. Our students were not enthusiastic about either system. In the first semester, it seemed to them to he heresy to have an exam in a lab course, and it was because of the protests that the evening group was returned to a grading system hased partially on reports. However, as soon as they received their first reports hack and saw that the grades hased on the lab reports were not going to turn into sure A's, the reversal in student opinion was virtually unanimous in favor of the experimental system as the lesser of two evils. The workloads reported were very high for the traditional group. However, the reports escalated in length to such a degree that the reported workload is not unreasonable. The difference in the reports was visible even to a casual observer. The extra time did not go into learning science; it went into such time consuming things as second drafts, typing, and multicolor graphs. Students never seemed to learn that these things were not rewarded. If the purpose of the report is to learn how to reduce data to a conclusion, these things are clearly tangential. There are a number of decided administrative advantages in a test hased grading system. One is that cheating is easily controlled in a test setting. The instructor is not called on to make the subtle judgement of when collaboration between students writing reports turns into cheating. Also one does not have to worry that students who hand in reports late get an unfair advantage. The differences between TA's do not affect the course grades, and no section corrections are needed. The
TA who is unable to give a low grade even to a blatant incomnetent is not temoted. The test ~ r o d u c e as uniform and reakly observable standard of achievement throughout the course. Also the fact that students must fix up reports with errors, forces them to heed TA comments. TA'S are-less likely to overlook errors since uncaught errors might hurt a student during the exam. It takes timeto learn to grade well; one is not horn knowing how. The oualitv and maturitv of TA's is so variahle, tharthe course would'not have uniform objectives, ur a uniform standard of achievement, without the test.'rhe TA's seem to like the exper~mentalsystem better, in spite of the extra work made bv the rechecks. Thev felt that they knew of each student, and they disliked more of the the agonizina decisions over half points that were inevitable in the traditional system. Lab reports are good learning tools, hut if they are used for grading, the result is a poor use of student time. A test based system offers a way of assessing scientific prowess and achievement that is efficient., fair.. nondiscriminatorv. and emphasizes ohjective criteria of achievement. whiii the amount learned declined slighty under a test hased system in this study, the saving in time offsets the reduction in knowledee. More work could he scheduled in the time saved, to hrinaup the achievement level. Our conclusion. that test hased svstems are more efficient in lab courses, should not he extended in a blanket way to other institutions. But the results would prohahlv he the same wherever academic pressure is high andstudenis highly motivated. This study iscited as an example, and as a method that could he applied to other educational problems. I t is the method that is important, not the conclusion in this particular case. Many past studies have been done in which curriculum and grading changes have been evaluated solely hy looking at test scores. This can he seriously misleading. All of the effects must he examined. and one of the most i m ~ o r t a n effects t is the change in efficiency. In any particular case the efficiency may not he imnortant. hut nevertheless it should he checked. There is some evidence that i t may he safe to ignore efficiency in a t least some curriculum changes. In a recent article Bent and Powers4 conclude that "youcan't win, hut you can't lose either." This is perhaps a statement that the majority of changes are neutral from an efficiency point of view. But our study shows at least one case in which efficiency is a major factor and must he taken into account in planning. Helping students to learn the material with the minimum of labor is what teaching is all about, and this requires a study of the efficiency whenever an "improvement" is undertaken. ~~
~
w
Acknowledgment This manuscript was immmst4y improved by the si~ggestions of James Hagen and David Monts.
"ent,
H. A., and Power, J. D.,J. CHEM. EDUC.,
52, 448
(1975).
Volume 54. Number 5, May 1977 / 317