Examination practice in general college chemistry. Quality of questions

Examination practice in general college chemistry. Quality of questions. B. Clifford Hendricks. J. Chem. Educ. , 1944, 21 (2), p 85. DOI: 10.1021/ed02...
0 downloads 0 Views 2MB Size
Examination Practice in General College Chemistry Quality of Questions B. CLIFFORD HENDRICKS University of Nebraska, Lincoln, Nebraska

S.

OME time ago an attempt was made to characterIze college chemistry examinations by an analysis of the questions which teachers were actually using.' That analysis included types of objectives probed by the questions, the degree of emphasis of each objective, a comparison of the stress of the same objectives in five widely used textbooks in general chemistry, and a survey of the tyfes of questions used. It is not possible to know the quality of examinations by such an inventory described above. Examination questions, like personal excellences, reveal their qualities by their outcomes-"by their fruits you shall know them." One should look for evidence of a question's successful performance in responses which are made to it by the students who try to answer it. In response to a request, publishedZiu1940, a number of institutions3 volunteered cooperation in an effort to get a t the quality of examination questions by the analysis of students' answers to the questions used in examinations. Dr. John C. Flanagan has described desirable test items by saying, "There are two very important considerations in any selection of test items. The first is item validity, or discriminating power, and the second is item difi~ulty."~Through an analysis of the answer sheets made available by the institutions just mentioned, the answers were studied for these considerations. The procedure used in determining the validity of a given question was as follows: All papers of the given examination were piled in an order determined by the total grade upon the paper. From this pile the upper 27 per cent of the papers were removed and the median grade6for answers for each question for that upper section was fonnd. In a similar manner the papers for the lower 27,percent of the grades were removed and the median for the grades on answers for each question of that group fonnd. In the tabulation of these findings HENDRICKS AND HANDORF, "Examination practice in general 15, 178 (1938). college chemistry," J. CHEM.EDUC.,

HENDEIC~S AND SMITH, "Better new examinations from old," J. CHEM. EDUC., 17,583 (1940). Coe College (Iowa); Marshall College (West Virginia); Middlebury College (Vermont); Montana State College; North Texas A. and M. College; Omaha University (Nebraska); Rutgers College (New Jersey);. and William and Mary (Virginia). 4 FLANADAN, "General considerations in the selection of test items," I.Educ. Psychology, 29,874 (1939). 6 HENDRICKS AND HANDORP, New examinations from old," J. CHEM.EDUC., 16,332 (1939).

all scores are on a ten-point scale. If the median score on answers for a question for the lower 27 per cent differs from that on the same question for the upper 27 per cent by as little as one point or less that question's validity is listed as "low." If the median for the answers for the lower was found to be higher than those of the upper the question is classified as an "inversion." A question is cataloged as "good" if its median for difficulty for the whole set of papers was fonnd to be five and the median score of the upper 27 per cent was two or more points higher than that for the lower 27 per cent. All questions whose median for difficultywas found to he very near ten are listed as "too easy." By the sort of analysis just described, answer papers for 1300 or more students from the institutions just named were studied and the validity and difficulty of each question of the examinations were determined. The number of papers for any one examination varied from 34 in the smallest class to 315 in the largest. In all, a total of 108 questions passed through the procedure and are characterzed in the table below. TABLE l Q T J A L ~ Y or G ~ N E R A CCou.ean

Kindr of Qucsrions

c n e m s r s ~~ a ~ a a r a ~ r r o s s Per Cent of T o l d Quartions

Too easy L o n validity No validity, i.r . , no differentiation Total faulty question. Usable but not "good" "Good"

Thus, in terms of the criteria used, 44 per cent of the questions used were either faulty or a t most did not contribute greatly in giving the information which the examiner was presumably seeking. Faulty questions may be made more efficient by revision. Evidence has shown that attention to the quality of questions does improve them. The following report from an attempt to "do something about it" by one division of general college chemistry is offered. Prior to 1939 those in charge of the course paid no particular attention to the quality of quektions judged as in Table 1. Analysis of answer sheets from the students for that period prior to 1939 gave the results listed in column I of Table 2. Since 1939 a more careful editing of ex-

amination items used has brought about a quality change indicated in column 11.

will know that the validity will be much more significant if the basis or norm for student grouping has all of its elements of high comparability with the elements in the TABLE 2 item under analysis. Many teachers are in disagreement with the usual Ems- on Revrsro~on TRB QUALITY OF E X ~ I N A ~QVBSTIONS ON practice, in test construction, of discarding items for par C I ~ of I Kinds of Qurriions Told Qucrliona which all students make correct responses. Some beI II come emotionally eruptive when told that "an item is Too easy 27 17 LOW validity 18 2 more likely to have high discriminative power if it has TOM faulty questions 45 19 an average or median score of five on a ten-point scale." Items maMe but not "good'. 14 9 -GOOP 41 72 Their answer to that statement is, "We do not care about the discriminative power of some of our items. w e want to know whether the students do or do not revision have been sug~ ~ t forh achieving ~ d ~ gested in a p d o n s l y cited, iqore know the answer." In other words, these teachers set for their students an ideal of mastery in certain phases attentionto improved essay examinations for ,.hemistry D ~ F , ~ of their ~ course ~ and~are not, satisfied ~ by less than perfect has been presented in another paper.^ descriptiou~ of the construction of the cooperative responses for items having to do with those topics. It is not p ~ s n m e d ,in this presentation, that data Chemistry Tests illustrates the thoroughness with which from eight or nine institutions represent an adequate items such an instrument has the nondis*minatory sampling from which valid conclusions concerning exof it. l-hat paper also gives suggestionsfor edited amination practices for colleges and universities between many individuals making cooperationeffective may be made. However, just i ~ ~ ~ our t ~nation l in test construction. In it the idea of an " ~ ~ p ~ r throughout as i t is possible to estimate the approximate direction of ~~~~~u for use in a local institution as a means of im. fhe wind from a few simple tests, so this, even though proving the validity or controlling the degree of diffi. madeqnate, sampling may point to useful inferences a proposed examination is implicit. cnlty of ~ ~ should t ~ be ~ calledt to some i ~limitations ~ to the concerning examinations. Meanings so inferred, howwhich have heen used to locate the ever, should be interpreted with the knowledge that two those who participated-in this project are probably in this analysis, T~ some thusiastic teacher a desirable index of validity may be, more sensitive to examination needs than the many mistakenly, assumed to be a.ticket to test-item per- ~therswhoweret~~b~~ywithothermatterstogivemuch fection. should be remembered that the dependa- attention to improved examinations. On the other index is no more than that of the hand, may i t not be assumed that the quality of these bility of a basis used in grouping the students for its det-ins. examinations which were analyzed is above that of the larger number which were not in the study? If so, tion. rn by the procedure previously then there is still much to be done if college chemistry describedZthe assumption was made that the grouping examinations are to have a large per cent of their ques"good." of students by means of scores on the whole test is a tionseligibleforthelabel There are, to be sure, Other characteristics of exvalid method for grouping students into best and pooramination questions such as .objectivity, practicality, est for that partof chemistry which any individual item is seeking to test. A little thought raises doubts con- and reliability. These, however, are more intimately examina. connected with the form or administration of the test cerning this assumption, ~~~t ,.hemistry than with the inherent qualities of the individual tions are of many items arithmetic Groblems, and descriptive parigraphs. Items. For definite help in revising or editing individual Any experienced teacher knows that a grouping of students into best and poorest in terms of their success in items of a test the most helpful single quality of the the arithmetic of chemistry would not be duplicated if item is its validity, which in turn is dependent on item the basis of grouping were the descriptive sections of the difficulty. Limited evidence available points to a examination. The careful teacher will have this ele- need for more attention to this quality of chemistry ment of the validity's uncertainty in mind as he uses i t examination items now in use. There is reason to in passing judgment upon the quality of test items. He think that individual teachers may find rather gratifying results coming from their conscious attention to 6 Fnrrrcx~v AND HENDRICKS, "The w a y examination in these two qualitites of their test items. They may a t . 16,491 (1939). chemistry." J. C ~ MEouc., 'FOSTER, "The 193940 college testing program," J. CAsaa. least find and discard those space- and time-consuming items which contribute nothing to the test's objectives. Eouc.. 18, 159 (1941).