Standard tests for the evaluation of student achievement - Journal of

Presents the results of a study designed to determine how closely the percentile norms of standardized tests coincide with an individual instructor's ...
2 downloads 0 Views 2MB Size
0

STANDARD TESTS FOR THE EVALUATION OF STUDENT ACHIEVEMENT GEORGE E. F. BREWER Marygrove College, Detroit, Michigan

IF

IT is assumed that the letter-grade distribution in large classes of undergraduate students should approach 10 per cent A, 20 per cent B, 40 per cent C, 20 per cent D, and 10 per cent F, then we have to recognize the fact that the deviations from any such curve must be large when the number of students in the class is small, sinre these percentages are arbitrarily proportioned t o the various letter grades. The problem of yearly fluctuation in the class grade average particularly faces teachers in smaller schools. Even in large schools class achievements may vary to such an extent that they can stay in the mind of a teacher for as long as 30 years. For example, Dr. William T. Hall wrote in 1944, when questioned about certain analytical chemical methods: ". . .In 1914, Noyes, Rlanchard and I introduced a new procedure in qualitative analysis a t The Massachusetts Institute of Terhnology. The results of the session were remarkable. In a class of about 50, about 5 failed. . . . We had a great many remarkably good reports. . .with respect to the interpretation of results on a practical basis, the class mas terrible. . . ." It is the right and duty of the instructor to allow for yearly fluctuations in class achievement according to his ex~erience,but manv teachers would like to compare their evalukion with"some standard from outside their own classroom. Definitely, some powerful tools for the evaluation of class achievement are the A. C. S.

cooperative tests.' Committees of well known teachers of chemistry develop these multiple-choice machinescoreable tests every one or two years in many fields of instrnction. These cooperative tests-when they are of the fivechoice single-answer t y p e a r e scored by the "right minus onefourth wrong" (RPW/4) formula, to prevent the influence of wild guessing, which, by the law of averages, would yield one right answer a t every fifth attempt. The Examinations Committee requests all instrnctors to report the grades of all the students: these are then pooled and evaluated statistically. Percentile norms are then published setting the average R- W/4 score a t 50 per cent, allowing the evaluation of an individual student in terms not only of his own score but in comparison to students from a group of schools. Immediately the important problem arises as to what extent such intercollegiate comparisons can he made and how closely the test results expressed in the percentile norms will coincide with an individual instrnctor's appraisal of a particular student. The present study has been undertaken in a four-year liberal arts women's colleee over a number of vears. ., tests, obtaininablefrom the Examin8tion~

-

mittee, American Chemical Society, St. Louis University, st. Louis, Missouri.

JULY, 1954

and includes all members of the classes in qualitative and quantitative analysis. Each of these classes wasin session for one semester (16 weeks of instruction: two 3-hour laboratory periods and two lecture-recitations per week). The classes were given five or six written 45-minute tests during a semester. About one-half of each test was devoted to newly presented material, while the other half reviewed the field. The tests were evaluated in letter grades, which were translated into percentiles later for the purpose of the study, settingarbitrarily but according to usual practic-A at 10&90; B, 90-70; C, 70-30; D, 30-10; F, 1 M . Figures 1 and 2 show the average grade received by each student plotted versus the percentile score made in the A. C. S. cooperative test, which was given as part of a final examination. In a case of ideal correlation all the points in Figures 1 and 2 would fall on the diagonal line. Large deviations from the diagonal mean that the instructor has assigned a certain letter grade to the student but that the score on the A. C. S. cooperative test indicates a different letter grade. For certain small deviations from ideality the letter-grade limits may not be overstepped. These limik are indicated in Figures 1 and 2 by the two bell-shaped curves on each side of the diagonal line. In other words, we assume that a student's grade is determined by the instructor with a precision of one-half of a letter grade, a precision of 1 5 per cent on abscissa and ordinate for A, *10 per cent for B, and so on. Of a total of 104 points in Figures 1and 2, only 13 points are outside these preci-

WANTITATWE ANALYSIS 15 CONSECUTIVE YEARS8

SYMBOL ME + W E 5 TEST NC4M 0

1948

V

1950 1951

0

U

"Y" Y ."

1948

TOTAL: 51

sion limits and an 87 oer cent correlation can be concluded. A more individual comparison between the A. C. S. cooperative test results and the instructor's appraisal can be attempted from the following considerations: suppose student R received during a semester the fol-

TIME-t

A QWL. ANALYSE CO-OF! TEST Y 0 INSTRUCTOF& EVALUATONS IMTRuCX)& AVERAGE 90% CONFIDENCE LIMIT

-

SWEX M T E # C A Y S TEST NORM 0 1947 11 -7 19p4 0

1948

V

1950

12

*T"

lowing marks: B-, B+, B, B+, B+ (average 77 per cent); while student M showed C, C, C+, B-, B-. A- (averaxe 65 Der cent). I n the former case we would be fai& justified in thinking that any other test would bring result close to 77 per cent and the same prediction should hold true for the A. C. S. cooperative test. In the case of student M the result in

a

JOURNAL OF CHEMICAL EDUCATION

estimate or 1/&/,1, where n is the number of ohservations and d is the deviation of an observation from the average. Figures 3 and 4 show test results for 1951 classes. The plot also contains the averages and 90 per cent confidence limits of the averages. In the case of the abovementioned student R (see quantitative analysis class, A . C. S. Figure 4) there are nine chances out of ten, according test within to the statistical computation, that in any number of Test cmfvaace tests the average result will be within the limits 77 Year no. Cases lirnila * 4 per cent or within the brackets of 73 to 81. Similarly, for student M we are now 90 per cent confident that the average would be between the limits 65 12 per cent (see qualitative analysis class, Figure 3). Since we have about three times more precise inTOTAL formation regarding student R than we have on M, we Correlation: 87% should be more rigorous in comparing the cooperative Quantitative Analysis test result with this student's grade average. The 90 per cent confidence limits of the averages were therefore computed for all of the 104 observed cases, and the findings are reported in the table. In a total of 104 cases the cooperative test result TOTAI. 41 ii was 87 times within the expected limits, indicating an Correlation: 81%. 84 per cent correlation. This is exceptionally good and about as much as could be expected, for two reasons: first, the cooperative tests attempt t o measure the mathe 65 per cent average. These considerations lead into terial "common" to all the courses throughout the the statistical field of the confidence limits of aver- country. The "uniqueness" of each course is, of neages. These confidence limits can be computed as cessity, ignored. Second, the tests are limited t o rec*ts/