Ronald R. Martin
ond Kam Srikameswaran University of Western Ontario London, Ontario, Conado
II
Correlation between Frequent Testing and Student Performance
Since the first experiments of Keller ( I ) , a large number of colleges and universities have attempted self-paced modularized teaching techniques. More often than not, the technique seems to have provided satisfaction to both teacher and student alike. While small student groups seem ideally suited for this method, some efforts have also been made to use it on larger ones. In chemistry teaching the method was put to use on a group of 81 a t the University of Texas a t Austin (2). and perhaps there are several other larger groups which we are unaware of. In the fall of 1972, at the University of Western Ontario, we had little experience with the Keller method; hence we were reluctant to try i t here on groups of 150 to 200 students taking freshman chemistry. Instead we sought alternatives to motivate and improve the performance of large classes, not necessarily equivalent in-principle to the ~ & e meth: r od. Eventually we decided to experiment with a frequent testing method. modified to reduce neeative reinforcement effects. ' Our experiment was based on the premise that frequent testing should motivate students to repetitive and intensive study habits; as a consequence they should perform better. Such a conclusion was arrived a t by Dustin (3) in a college psychology course a t the State University of New York a t Plattsburg. He concluded that students examined weekly did better than those who were examined monthly; that students tested frequently tend to have less anxiety about examinations and that they performed better hecause of the -neater amount of time snent on studv material. In our experiment with frequent testing, we minimized possible negative reinforcement of failure in these tests by providingmore than one opportunity to pass each test type. The experiment turned out to be simple, inexpensive, and above all, worth trying out in 'large classes. The features of our method and results therefrom are described in the following sections. Method
Two groups of students, enrolled in the first-year introductory chemistry course (Chem 20) were subjects of this experiment. A group of 134 students formed the experimental group which was subject to the frequent testing method outlined below and a second group of 170 students formed the control group on which the method was not used. The material covered in the course (5% the tutorial and laboratory sessions were identical for both groups. A common lecturer1 held classes for the control g o u p in the mornings on Mondays and Wednesdays and held classes for the experimental group in the early evenings2 on the same days. A conscious effort was made by the lecturer to keep his presentations identical. The frequent testing procedure was conducted in the following way. Close to the end of each Wednesday evening lecture, a four item (occasionally three) m u l t i ~ l e Ronald Martin. 2 Classes were scheduled by the registrar on the basis of a student's total time table, the result being that the students were unable to select which class thev would enter which hel~edensure random groups. 8Kam Srikameswaran
choice test was administered to the class on the material covered in the previous week. Invariably these tests were evaluated by the computer MARKEX program (6) on the same day and results announced the following day. Those students who obtained 75% or better were not tested until the following Wednesday (new material) test. They were declared passed. Those scoring below 75% were not penalized hut were tested on the same material on the following Monday, a t the end of the Monday evening lecture session, with a different four item test. Again this test was evaluated similarly and ~ u l t announced s the next day. Students of this rewrite group who did 75% or better were passed on to the subsequent Wednesday evening test on new material. Those who scored less than 75% were tested again on Wednesday on the same material and evaluated. This re-rewrite group wrote their third and final test (as well as the first test on new material) on the same Wednesday. Those who failed to achieve a grade of 75% were considered failed in that test. Each student, therefore, effectively had three opportunities to improve his mark before he was penalized with a failure. It was found possible to conduct 14 sets of these Wednesday evening tests: the total number of tests conducted in the year (September 1972 to April 1973) was 42. The control group was not subjected to these tests. One of the authors3 was available to students of the group for tutorial assistance on these t a t s , three afternoons a week. Such help was not denied to students of the control Group B, if they needed it. Each of the experimental and control groups was tested in 12 laboratory sessions, 10 tutorial sessions, 3 term examinations (in November, January and March) and a final examination a t the end of April. The final mark was weighted thus Testing Mode Maximum Marks Lab work 15 Tutorials 15 Best 2 of the 3 examinations 30 Final examination 40
-
Total 100 To provide "mark" incentive to write the evening tests, the average mark for the fourteen test sets was used to make up 15% of the total mark, after reweighing the lab work, tutorial and examination marks proportionately. This procedure was quite obviously used only for the experimental group. To increase the impact of the frequent testing method, students were told that a failure in a test would count as a zero in that test; subsequently, however, at the end of the vear the policy was relented and the mark as ohtained in test wasbsedin the cumputation. if we Results ohrained would have been more s~gnii~cant had had access to matched groups to experiment with: however, the mechanics of obtaining matched groups were too formidable for us to atremot. We assume that the random processes of registration gave us uncorrelated groups.
a
Results
Figures 1-4 show frequency distribution of marks obtained in each of the November, January, March, and Volume 51, Number 7, July 1974
/
485
Frequency distribution of
Figure 1. marks for Test 1.
Figure 2. Frequency distribution of marks tor Test 2.
Figure
3.
Comparison of Performance between Frequently Tested Group A (-130
Mean Percentage Score
Test 1 (November '72) Teat 2 (January '73) Teat 3 (March '73) Final Exam (April '73)
66.50 44.53 59.46 68.18
(16 .SO)& (19.67) (19 .19) (17.06)
64.05 40.64 56.48 63.85
(16.93) (16.51) (17 .OO) (15.31)
The numbem in parenthesis are standard deviations (c). t = Mean of A - Mean of B g*
d ( d i E ) 8+ ( d n S J 9 (Obtained from Table XI, p. 75 of reference (7). April examinations for the experimental (A) group and the control (B) group. In each of these figures, computed values of skewness (gd and kurtosis (g2) are given along with values of the mean and standard deviations (a).It is seen that gl and gz are not too deviant from the expected values of gl = 0 and gz = 3 for a normal distribution, even though gl is indeed different from zero and gz is smaller than 3. Hence, in our calculations we assumed that these distributions are normal, even if apparently multimodal. The table summarizes the results and some analysis of our experiment on student performance by the frequent testing method. It is seen that the mean percentage scores of the experimental (A) group consistently exceed those of the control (B) group in each measurement. In order to check whether these differences in performance were indeed meaningful, t values were computed for each measurement from the respective standard deviation and difference between mean percentage scores of each group. These t values were used to obtain "probabilities of insignificance" that the two groups were different, from standard tables (7). The last column interprets these prohabilities and shows that it is meaningful to state that the experimental group performed better than the control group. It is seen that differentiation (column 4, the table) of the two groups occurs throughout the academic year. The closeness of the mean scores in the earlier tests probably indicates an initial "high school" effect; that is, comparable skills in course material covered a t the Grade 13 level. A eood ~ r o ~ o r t i oofn the first two tests did consist of such materiai to justify this premise. Hence, the divergence (4.33%) of the final exam seems to he the best indicator of the effectiveness of our method. The most significant (98%) separation between the two groups occurs in this examination. We would have been even more pleased if the t values had increased rermlarlv from Test 1 to the Final examination, in chronological order. However, the t values for Test 3 show a decrease to 1.37 from those of Test 2. An untested hypothesis that could account for this decrease is that the performance in Test 2 was poor for all groups . . in the freshman chemistry course, and this result motivated all groups to put in extra effort to achieve a better score in Test 3. Consequently, the additional motivation was as 486
/ Journal of Chemical Education
Figure 4.
Frequency distribution of
marks for Test 4.
students) and Control Group B (-170
Diff (A-B)
Group B
Group A
Measurement
Frequency distribution of
marks for Test 3.
students)
Interpretation: That the two groups, Probability of A & B are different t valueb Imienificance. baa .
+2.45 +3.89 +2 .17 4-4.33
1.27 1.79 1.37 2.27
0.26 0.07 0.16 0.02
74% 93% 84% 98%
probability probability probability probability
effective as that provided by frequent testing a t that time. Except in a small number of cases, there is a good correlation between individual mean percentage scores on the irequent tests and the final cumulative percentage scores, which means students who perform well on these tests perform well overall-a very good reason to stimulate students into performing well in the frequent tests. In eeneral these tests were well received hv a maioritv - . (79.3%) of the experimental group. o n e advantage claimed is that these multiple choice computer marked tests prepare students for themore important grade determining multiple cIioice examinations which are also computer marked. In addition to being convenient, quick, and objective in evaluation, these multiple choice items enabled us to test a wider area of the course material and demand exactness of answers as good practice. The latter was achieved by introducing plausible but incorrect distractions. Furthermore, repetitive testing enabled us to repeat test items with subtle changes such as a word or a number so that the correct answers were quite different. An understanding student should have no difficulties in spotting these right answers with these changes whereas one who relied on the memory of the right answer to the corres~ondinaearlier test item should do badly. We did find that a majority of students depended on such memory work; however, these students were obviously from the rewrite group who were generally not well prepared. The above are the most significant results of our experiment. Conclusions
We conclude that group A performs better than group B, most likely because of frequent testing which motivates the majority of students to do some extra work. The performance of group A would probably be even better if the weight of these test marks were higher in the final total mark; such a move could promote greater interaction hetween instructor and student, aiding the learning process. Literalure Cited (11 Keller. F. S..JAppl. B~hnu.Am].. 1.79 (1969). (21 White, J. M., Close, J. S.. andMcAllister. J. W . . J CHEM.EDUC.49 772 119721 (31 Dustin, D. S., ThsPsychologicdRerord. 21.409 (1971). (41 Martin, R. R., and Srikamerwluan, K., to be published. (51 Baird. N. C.. Bidinosti. D. R.. Bolton. J. R.. Brand. J. C. D.. Martin. R. R.. and s them m:. 1st ~ b univ&ity . of western O r ~ t a r i o , ' ~ m Pame, N. c.. ' ~ ~ o t pfor don. Ontario,1972. (61 Ellwmd. D. J., "Msrkex-A method of Using the Computer to Lore Multiple Choice Examinationr," 1st Ed., U ~ i v o n i t yof Western Ontario, landon, Ontario, 1972. (71 Smith. G. Milton, "A Simplified Guide to Stetistics in Psychology and Education.'' 3rd Ed., Halt, Rinchart and Winston. lnc.. Tamnto, 1962, p. 75.