854
JOURNAL OF
CHEMICAL EDUCATION
JULY,1928
NEW-TYPE TESTS VERSUS OLD-TYPE TESTS: A COMPARISON KENNETH D. DODDS, BBLLEVUE, PENNSYLVANIA Many readers of THISJOURNAL were undoubtedly pleased to learn, from papers and comments, in recent issues,l that the question of testing is not a closed book, completed and laid away for all time. Much has been written, and much more said, pro and con, by adherents of each camp, but there seem to be a few points, which, although almost self-evident, have not yet been touched. To begin: one of the arguments most frequently advanced against the oldfashioned test is that comparable scores are not always obtained by d i e r ent observers. The literature of standard testing, on this point, seems to indicate that the pioneer investigators of testing methods thought that they had made a discovery when they ran into this situation; whereas, on the contrary, chemists, physicists, and artillerists (to mention only a few lines of endeavor) have long known that different operators, more frequently than not, fail to obtain identical results even though they follow precisely the same methods. Those who have been trained in science, rather than in "education," do not therefore regard this state of affairs as an indication of faulty method. In fact, "it is one of the most embarrassing things we can meet, when experimental results agree too c10sely."~ Again, standardized tests are claimed to be superior to tests of the older type, in that they are more easily scored and, moreover, scored on a logical basis; but their proponents cannot claim a monopoly of these features, a t least so far as chemistry is concerned. Leaving essays out of consideration, since they are not tests, users of the so-called old-type test have yet to be convinced that anything could be more easily, accurately, or justly scored than, for instance, a group of equations, or a numerical chemical problem, of individual manufacture. Purther, a test can be devised to serve any purpose; for example, to diagnose difficulties, to check up on progress, accomplishment, or what not: but whether the test is constructed by an individual teacher, and written on the board or by some one farther removed from the classroom, printed by the bale, and called a standardized test, seems to be beside the point. Standard tests may indeed serve any of the above-mentioned purposes admirably (although not to the exclusion of any other kind of test), and yield reliable information, provided that their norms are established from the results of thousands of applications to unselected groups; but there is no -marantee--in fact, accordinn to established mat he ma tic^,^ considerable ' Tnrs JOURNAL 4, 1414--7 (Nov., 1927); Ibid., 4, 141&23 (Nov., 1927), Ibid., 4,1459 (Dec., 1927). Mellor, "Higher Mathematics," Longmans, Green & Co., New York, 1919, p. 510. Mellor, bc. cit., p. 504.
doubt-that safe conclusions can be drawn from the results obtained, when the same tests are given to small groups. No chemist would care to be credited with the idea that a shovel of coal has the same analysis as the carload from which i t was taken. Advocates of standard tests could frequently be charged with stacking the cards, by throwing out all measurements, which do not fit their previously conceived ideas, for many of them give these tests with the firm conviction that the distribution of scores will be normal, and indeed must be, without regard to the number of cases; and attempt to explain away, or even jettison, scores which fail to fall on the normal curve (low ones, in most cases). Conformity to the normal frequency curve cannot safely be taken as an infallible indication of accuracy, nor does non-conformity constitute irrefutable evidence that something is wrong-with presumably everything but the test.' Field artillery firing is a highly developed art, based on experience, the theory of probabilities, and the normal frequency curve. The commander of a battery may know that, in a great number of shots, the overs and shorts, and rights and lefts, will approximate the ideal curve, but there is, unfortunately, no rule by which he can determine the exact spots on which his next salvo will burst. He must take his variables as they come. Tests of the completion type, true or false, yes or no, and others of like nature, are particularly objectionable to those teachers whose primaty object is to guide their pupils to the acquisition of some chemical knowledge, because, as has often been stated, these tests seem to he yard sticks for the measurement of memory, rather than of genuine understanding. Probably every teacher, of any experience, has had pupils who can repeat paragraphs describing the properties of a substance, but cannot utilize the information, for instance, in the laboratory; who can reproduce an equation, but cannot write one when put on their own resources; who can state the law of r a s s action, but cannot predict a shift in equilibrium. These pupils would score high on a new-type test, covering these points, but there can be no question that their knowledge is superficial. In regard to guessing, the common expedient of a double penalty on incorrect answers is undoubtedly somewhat of a curb, but is, nevertheless, open to a few objections. In the first place, the assumption that, of any number of guesses, half will be correct and half incorrect is true only when a great number of guesses are taken5 On a limited number of questions, the chances of beating the game must be as good as those of losing. In this connection, penalizing a bad guess will not assist a pupil in understanding chemistry, but may serve as an incentive to a little honest effort-or to a great deal of memorizing. Second, no method Mellor, loc. cit., p. 515. Mellor, loc. cit., p. 504.
known at present, enables the scorer to recognize a winning guess in a correct response, or to know that an incorrect one is not a slip. On the other hand, it is difficult to imagine that guessing would avail much in writing a list of equations, or in finding the percentage composition of a number of compounds from formulas or analyses. Old-type tests are said by their opponents to be not only unamenable to comparable scoring, but also liable to be too easy or too difficult. Such, i t must be admitted, is often the case, and hence in a series of tests-such as a teacher would give it1 the course of a semester, or of a year, for instance-faulty appraisals, arising in this way would cancel each other, as positive and negative errors have a habit of doing. This feature cannot be claimed for those tests wherein double penalties are assessed, without admitting that some correct answers are guesses, and the reverse. The numerical chemical problem has been attacked on all sides, but is still a favorite with those who adhere to old styles in testing. Wbether the ability to solve a problem is rightly regarded as being of more importance than correct answers to a score of questions on minute details may be debatable, on academic grounds, but we fancy that a prospective employer would have no difficulty in choosing between the two qualifications. Objectors to problems seem to have overlooked the fact that they possess many of the advantages claimed for their own methods, not the least of which is the diagnostic value. To solve a problem, the pupil must have, ordinarily: knowledge of the properties of the materials involved, ability to write equations, command of laws stating quantitative relationships, and ability to perform the operations of arithmetic and elementary algebra. These items of necessary mental equipment mean, of course, that a prohlem is composed of several elements, of perhaps unequal difficulty, in the scoring of which the advocates of new-type tests would charge the entrance of personal opinion as to relative weights, with inevitable error and disagreement. These charges must be admitted, but they do not necessarily constitute an unanswerable indictment against this means of finding out what a pupil knows, for a t least two reasons. First, since the subject being tested is chemistry, and not arithmetic, algebra, or English (which equipment the pupil is supposed to have already), it seems perfectly proper to assign greatest weight to the chemical phases of the problem. The distinction between chemical and non-chemical aspects is not so difficult a matter, once one recognizes that much that is chemistry sometimes appears in other dress: for example, an equation for Boyle's Law is a mathematical statement of a chemical (or physical) fact, and hence, if written as an inverse proportion, is mathematically correct, hut chemically grotesque. When the question of personal judgment arises, adherents of old-type tests believe that the elements of a problem cannot be arranged, in order of their difficulty, with the exactness of the electromotive series,
even by aid of "educational" methods, norms, or otherwise; and they, therefore, feel safer in their own estimates, as chemists, than in those of someone whose specialty is test making, or anything other than chemistry. On the whole, those who find the old method of testing still serviceable, regard the standard test more as an instrument for obtaining agreement among scorers than as a superior testing device. Regardless of the ohjections advanced against standardized tests, the efforts of those who are attempting to improve existing methods of testing, by application of psychological principles, and otherwise, should not be hastily condemned; for i t must be remembered that the educational psychologists are working in a comparatively new field without the advantages of a rich heritage of experience and accumulated facts, such as that enjoyed by the scientists. However, the advocates of new methods often invite hitter criticism,= when they heap contumely on time-tried and proved procedures; when they adhere, apparently blindly, to the curve of normal distribution; and when they draw sometimes debatable conclusions from perhaps faultily gathered statistics. While it is questionable whether any form of test, new or old, can be made foolproof and free from chemical, psychological, or other faults, it is certain that neither newness nor oldness are in themselves criteria by which to establish the worth, or worthlessness, of a method. There are teachers who prefer old-type tests, yet to whom the new styles are not wholly bad; and there are likewise others who are convinced that new-type tests are superior, but to whom the old are not anathema. After all, there is not much reason for dispute, between supporters of the two methods, for each is a t liberty to go his own way, rejoicing that he will probably get there as soon as the other fellow. The bail of a bucket is neither longer nor shorter, when horizontal than when vertical, and the same ends are reached by either route. Cf. Bernard DeVoto, Harper's Magazine, Jan., 1928, p. 182.