Better new examinations from old - American Chemical Society

study and evaluation of their test questions than do workers in larger ... In fact, the plan attracted the authors because it would ..... Dr. Magnus. ...
2 downloads 0 Views 4MB Size
Better New Examinations from Old' B. CLIFFORD HWDRICKS University of Nebraska, Lincoln, Nebraska O W 0 M. SMITH Oklahoma Agricultural and Mechanical College, Stillwater, Oklahoma

R

ECENTLY fifty or more teachers of college chemistry . have favored the authors of this paper with comments and criticisms on a plan of: (1) collecting from many institutions their best examination questions; (2) assembling them in a central depository; (3) determining their difficulty and differentiating ability; (4) classifying and duplicating them for ready availability for use by teachers in service. These teachers are working in the smaller colleges. Their classes in general chemistry vary all the way from 15 members to 566, the median number being 85. The reason for seeking counsel from these teachers is that they are less likely to have facilities or time for the study and evaluation of their test questions than do workers in larger institutions. Further, groups of students fewer in number than 190 give less dependable indices of difficulty and differentiating ability than do groups of larger numbers when studie8are made of their answers to tests. Some uncertainties regarding the project as expressed in comments by these interested teachers warrant attention. In brief these "fears" may be labeled; their loyalty to essay questions, their belief that objectives of the local course may not agree with those of other institutions, that items in the collection may not cover all topics, that students might get access to the collection of questions, that such a procedure might lead to fixation of course content, and that it is too much like a "sermon barrel." Almost without exception, these comments indicate a failure of the proponents of the project to make themselves understood. There has been no statement that the questions to be studied have to be of any particular type.= Presented before the Division of Chemical Education at the 99th meeting of the American Chemical Society, Cincinnati, Ohio, April 8. 1940. a HENDRICKs AND HANDORP, "New examinations from old," J. C ~ MEDUC., . 16,330 (1939).

Further, previous statements have said that this project would make possible an adherence to the objectives of the local teacher that is not possible in standardized long-period examinations such as the annual releases from the Cooperative Test Service. The teacher, who is interested in the use of the service contemplated by this project, inspects its offerings of questions and chooses only those items for inclusion in his examination which are in harmony with his objectives. In the event that there are items which he deems desirable in the examination thathe is constructing which are not yet iqthe collection, that teacher would, for a part of his test, use his own items to meet the requirements. If the procedure suggested above is followed in making up examinations, the authors see less likelihood of course content fixation than is now occurring because of the use of a given text or a given manual. In fact, the plan attracted the authors because it would enable teachers better to control the direction of their courses by this method of examination than by use of forms of examination now published. True, there is a resemblance between a magazine of "guinea pig" questions and a "sermon barrel," but in the judgment of the authors €be danger is mostly in the attitude of the teacher who uses the "barrel." The term "guinea pig" has come to symbolize a biological tool for use in experimentation. If our teachers may be induced to look upon their examination questions or questions from any other source as tools for use in experimentation, the objectionable connotation of "sermon barrel" will be lost. There are two ways of considering the possibility of students securing copies of questions from the magazine in advance of the examination. The Grst is that the likelihood of that occurring for this project should be no greater than for other published test forms. Second, one of the authors has taken the position that if his students get and learn the answers to all of his 1000 or more test items those students probably have an ac-

ceptable mastery of his course. One wonders if this fear could not be allayed, a t least partially, if the total examination spread were increased, i. e., if a wider sampling of course content were always insisted upon. By so doing the teacher could minimize the d e c t of any single item upon the total score. The use of the adjective "better" in the title of this paper is considered from two points of view: k s t , better in the sense of being more useful to the teacher; and second, better in the sense of being more dependable as tools of evaluation. In considering the iirst of these, it is not intended that a survey of test uses3 be made, but rather to present some brief comments concerning the service of old examinations in improving the usefulness of new questions made from the old. In considering this senice of old examination questions the authors are assuming that the major value an examination is its contribution to student counsel in^. " There may be little that is new to the experienced teacher in the statement: "The most usable phrasing of an examination question often comes through a sort of evolution." To illustrate: This question was used by one of the authors in an examination last fall:

of

CASEI, 1. Phrasing for ils first w e . D e h e each of the fallowing by a formal definition: (1) An acid. (2) An indicator. (3) A salt. (4) A double decomposition reaction.

While scoring the answers from his students the reader was impressed by the fact that low scores on the answers were often due to a failure to know the elements of a formal definition. A check indicated this to be the case for over fifty per cent of the class of 137 students. In an examination some weeks later he tried the following question upon the same class in which specific attention was called to the meaning of the term formal definition: CASEI, 2. A rephrasing for second use. W"te formal definitions for each of these four terms: (1) Formula. (2) Mol. (3) Neutralization. (4) Allotropes. (A formal definition has three parts: the name of the term defined, its classification, and the distinction between it and others of its class.)

*

contrast, illustrate, interpret, justify, etc. Their influence upon test results deserves attention similar to that given the word "dehition" above. Most teachers of chemistry consider the habit of recognizing use in the relation to the properties of a substance an important objective and will often include questions like the following: CASE11, 1. First phrasing. Briefly describe three uses of oxygen. Name the property involved in each case and show how that property makes the use possible.

In reading answers to this question, the problem of what constituted accefitable uses emerged. Oxygen is used chemically in any reaction in which oxygen is a reactant. Should the student who said, "Oxygen may be used to rust iron," be given as much credit for his answer as the one who told of its use in the acetylene torch? The decision reached was that the next time this question is used it will be in the revised form: CASE11, 2. A rephrased statement. Briefly describe three uses of oxygen which are of economic importance. Name the property involved in each case, and show how that property makes the use possible. To be of economic importance, there should be a cash market for oxygen for such use.

It was found that reading answers to the questions stated as the last two was time-consuming, so an dart was made to modify the form in order to reduce the time for reading. The result for such a question for hydrogen is as follows:

:.

CASEII,3+ A more practGl statement. In five only of the following blanks give the property of hydrogen or its use as the case may require. The use must depend on the property or the property must determine the use in each case. Property

Use

(1) High heat of burning 0) (2) (2) In balloons (3) Combines with some oils (3) (4) Reactswith onygenin oxides (4) (5) (5) In preparing ammonia (6) In preparing wood alcohol (6)

A later attempt to recognize the same weakness of When a comparison of the answers obtained in the two the statement of the first question above was to state it: . latter forms was made, there was no significant difference in validity or difficulty, the former took three CASEI, 3. A third rephrasing. Give formal definitions of: (1) Deiinition. (2) A base. (3) hours to score, and the latter but two hours. The latter question-form merits favorable consideration from the Electrolysis. (4) An equation. standpoint of practicability. In this case calling for the definition of a definition Numerical problems5 are a frequent item in college prompts more pointed attention to what constitutes an chemistry examinations. A study of answer papers for adequate formal definition before the student starts such items sometimes reveals reading practices which definitions of any of the other terms. This discussion are hard to defend if the purpose of these questions is to of the term, definition, should bring to mind many other help the teacher find student need for remedial instructerms4 much used in essay type examination such as: tion. To illustrate, the following problem may be S H ~ AND ~ Smrn, ~ "Service ~ ~ tests~ fors chemistry," used: Sch. Sci. Math., 35, 488-91 (1935); HENDRICKS AND FRUTCHEY, "The uses of examinations,"J. CHEM. EDUC.,15, 2 3 7 4 0 (1938). ' HENDRICKS AND FRUTCHEY, "The essay examination in 6 HENDRrCKs AND HANDORF, "Examination practice in general EDUC.,16, 493 (1939). . 15,179 (1938). chemistry," J. CHEM. collegechemistry,"J. C ~ MEDUC.,

CASE111, 1. Usual type of c h i c e l problm.

To measure the degree of differentiation of an item, some basis for comparison must be used. Usually the scores of the papers on the examination as a whole are taken as a basis, on the assumption that those who make the high scores on the test are the successful An analysis of this problem reveals some six or eight students in chemistry, and vice versa. Therefore, elements of difficnlty for the beginning student among if one determines the per cent of these best students which may be listed :the reaction iwolved, knowledge who answer a given question correctly and the per of chemical composition, and working concepts of cent of the poorest who likewise answer these same items density, per cent, decimals, division, multiplication, correctly, two percentage values will be obtained which and subtraction. Such a question is often scored may and should vary widely, if the item has any validity purely upon the basis of the correct numerical result, or diierentiating value. which is 72.6 grams. A zero grade may mean lack of To obtain the validity index in the quantitative understanding of any one or more of the above listed manner, the procedure is briefy as follows: The exelements of difficulty. The score on such a question, amination papers are arranged in order of the total to be useful to teacher and pupil, should attempt, a t score and separated into quarters, or upper and lower least, to localize the successes and difficulties of the fractions each of 27 per cent and a middle group of student. After such a "post-mortem" over answers to 46 per cent. Then the percentage of correct replies is the above problem, the following form was devised and determined on the item for each of the fractions of the tried out with satisfactory results. group. By a statistical calculation this may be combined into a single figure referred to as the Biserial CASE111, 2. A diagnostic form. Hydrogen has a density of 0.09 gram per liter a t standard Correlation. The overall difficnltyof the items may be conditions. Phosphoric acid is 3.1,per cent hydrogen. Enough obtained by getting the percentage of correct or insodium is put into phosphoric acid to produce 25 liters (S.C.)of correct responses of the whole group on the item. hydrogen. A number of cases are given to illustrate the use of Read the following statements about this problem and by calthese percentage figures in determining the diierentiatculation, if necessary, decide their correctness. Mark the correct ones C, and mark the incorrect ones X. For those marked ing value of items. These have all been taken from X, place a word, number, or phrase a t the right which will, when examinations given by one of the authors. Hydrogen has a density of 0.09 gram per liter a t standard conditions. Phosphoric acid is 3.1 per cent hydrogen. Enough sodium (amalgam) was put into phosphoric acid to produce 25 liters (S.C.)of hydrogen. What weight of acid is needed?

substituted for the italicized phrase, make the statement correct. Note how it is done in the sample.

Statements

Place Place X or C correction hem here

Sample A. Hydrogen is 0.31% of the acid (1) The reaction is: sodium f phosphoric acid = oxygen sodium phosphate. (2) The weight of hydrogen needed is 3.1 X 0.09g. (3)3.1% of the acid = weight of hydrogqn

+

desired.

(4)Twenty-five liters of hydrogen (S.C.) weigh 2.15 g.

(5) T h e weight of acid needed is the weight of the hydrogen divided by 3.1.

(6)The weight of acid needed is 72.6 g.

X

E

A. 3.1%

(1)(2)(3) -

-

(4)-

-

(5)(6)-

It has been shown that better new examination questions may be edited from the old ones if the teachers will be alert to the suggestions for revision which are to be found in the answers of students. This sort of use of old examination questions can be made by teachers of groups of any size. The second use of "better" is in the sense of being better both as to validity and difficulty. As phrased by one writer? "Does (the test item) discriminate between persons having much of the quality being measured and persons having only a relative small amount of the quality? And is the difficulty level of the test suited for the group - for which it is intended?" FLANAGAN, "General considerations in the selection of test

-

items and a short method of estimnting the product-moment coefficient from data a t the tails of the distribution." I. Educ. Psycho/., 30, 6i5 (Dec., 1939).

CASE N. Good differentiation and suficiently dificult.

-.

Complete and balance

Se f Enheated

Quarter Per cent Q, = 83 Difimlty Q a .- 61 56 Q; = 56 Q, = 22 Bis. r 0.56

CASEV. Ezcellent differentiation except i n the middle half. Quarter Per cent = , 90 Dificdty Qv = 45 47 drogen. What is the equivalent Qz = '53 weight? Assume atomic weight QL = 1 Bis. r 0.88 H a s 1.

1.18 g. of a metal reacts with acid to give up 0.02 g. of hy-

a

CASEVI. No differentiation, too easy. Quarter Per cent The general term far a process which changes the properties and structure of a substance is (I) sublimation, (2) evaporation, (3)physical, (4) chemical, (5)sub-atomic.

Qr

= 88

QF = 82 Qx = 80 Ql = 83

Dificulty 83 Bis. r 0.10

CASEVII. Slight differentiation, too hard. Querter Per cent According to the electron Ql = 36 Dificulty QS = 24 23 theory of valence NaH should Q, = 15 exist. Q1 = 16 Bis. r 0.24

The following question represents an attempt to test a student's application of the four controls: activities of metal, acids, oxidizing properties of the acids, and the solubilities of other products, in hydrogen generation :

CASEVIII. Good differentiation and accefitably dificult. Pre@rotion of hydrogen: Write a condensed statement for that pair of substances which will produce hydrogen most successfully from each of the following: (See notes below.) Potassium with either hydrochloric, nitric or (conc.) sulfuric acids. (2) Magnesium with acetic, phosphoric, or nitric acids. (3) Barium with steam or sulfuric add. (4) Potassium with (conc.) sulfuric acid or carbonic acid. (1)

NOTES: Activity series of metals: potassium, barium, magnesium, zinc, lead, hydrogen, silver. Activity series of acids: hydrochloric, nitric, sulfuric, phosphoric, acetic, carbonic. Soluble: All compounds of sodium and potassium; hydroxides of sodium, potassium, barium; all nitrates, acetates, most chlorides. Insoluble: Mast other salts and hydroxides.

The results obtained from 125 students gave these values: Quarter Per cent Q4 = 65

g

a

Qt

= = =

45

Dificulty 45

40 25

Teachers of college chemistry often express the desire to learn whether their students have developed facility in the use of the scientific method. One aspect of that acquisition is a recognition of the'relationship of observations to the inferences to which they lead. The next question illustrates an attempt to get evidence of ability in that direction. CASEIX. Acceptable differentiation, but too easy. E@erimenl. Below is a list of statements concerning the action of steam on hat iron or hydrogen on hot iron oxide. You are t o classify them by placing: D before those which are directions, 0 before those which are useful observations for this czpniment, X before those ahsenrations which are not useful in this experiment, C hefore those which are conclusions related to useful observations, and A before statements which are assumptions. Statements, The iron filings were darker after the experiment than before. There was steam mixed with the hydrogen produced. Anhydrous capper sulfate turned blue when dry hydrogen passed over hot iron oxide and then over this sulfate. Pass the hydrogen through calcium chloride. Steam reacted with the hot iron. There was less noise after a time hut there was always some noise in the pneumatic trough. Hydrogen formed water hy reacting with oxygen in the iron oxide. There was no air in the steam. The cork in the gas pipe turned black. Some of the steam mixed with the collected hydrogen had heen formed by a chemical change.

An analysis of the answers 120 students gave shows: Qllarter P n cent Q4 = 90

QZ Q2 Q,

= = =

80 70 60

Difinrlty 80

Of these nine questions ouly those should be reused which show the proper distribution between the four quarters or the upper and lower quarters and the middle group of the students. Or the question should he rewritten in the light of the students' answers in an dart to improve its diierentiating value. Better examinations should include ouly those items which have, by use, shown high selectivity and an acceptable difficulty for the group to be measured. If all questions considered were of the true-false type, the per cent correct or some function of that per cent could be used as an index of its difficulty. However, for those teachers who prefer to use questions whose answers may represent intermediate values between absolutely wrong and absolutely right, or questions of the essay type, it is suggested that the answers he scored on a scale of ten or twenty intervals which may then be handled in the manner customarily used in the determination of difficulty or the per cent of correct response in the various quarters. A question which may have come to many readers is, "Do these indexes of difficulty and of differentiating capacity continue to be the same for a given question when it is used a second or third time?" Dr. Magnus Olson of the zoology department of the University of Minnesota, who is assembling a file of indexed zoology questions, states:' "The second use does not alter their validity (i. e., differentiating capacity) nor does it have any appreciable effect upon the index of difficulty. This latter seems rather discouraging to me." In the Minnesota studies, Dr. Anderson8 reports, "Items do retain their discriminative power to a marked degree.': .. Dr. Olson and Dr. Anderson both consider the results of analysis of answers to old questions of such importance that in the language of the latter, "It is well to type each examination question that receives analysis upon a card. The percentake values (for differentiation capacity) in the case of objective items and the mean scores in case of essay questions can be entered directly upon the card and the questions filed away for future use. It then becomes possible to select items or questions for examinations in terms of their discriminative value." The authors of this paper would say with Dr. Olson, in summary, "that making examinations is not a task that one can complete and put aside." Old examination questions need to have their student answers studied either with a view to their assets for future use or, if they are not satisfactory, with a view to modification of their phrasing or form with the intent of further trial before discarding them completely or filing for future reuse. In other words, an examination question has to become old, i. e., used a t least once, before its better new can be produced and its worth demonstrated. A

In a private communirarion to one of thc aurhurs. AND~KSON,"Evaluation of test items." Studies i n Cnllefc E ~ n n z i n ~ z i w n 1.nivcrciry s, of hlinnesora, llG2U (19J4) 7 8