A Brief History of Objective Tests - American Chemical Society

ment may be traced to the intelligence tests pioneered by Alfred Binet around the ... observation of responses to assigned activities and had to be ad...
0 downloads 3 Views 4MB Size
Theodore A. Ashford University of South Florida Tampa, Florida 33620

A Brief History of Objective Tests

The origin of the objective test movement may be traced to the intelligence tests pioneered by Alfred Binet around the turn of the century in France. I n 1910 the Binet tests crossed the Atlantic with the translation and adaptation done by Dr. Henry Goddard, director of a training school for retarded children in Vineland, N. J. (1). Soon afterward, Dr. I,. Terman a t Stanford University gave to one of his graduate students, Arthur Otis, the assignment of adapting the Binet test to the children in America. The Binet test consisted of free-response exercises or observation of responses to assigned activities and had to be administered individually. Otis replaced the individual test items with multiple-choice items in which the subject was to pick the correct answer from among several alternatives. This modification permitted the test to be given to large groups of children. With this adaptation the "Stanford-Binet" test quickly became the standard American "intelligence test" and retained its popularity through many editions, even to the present time (8). There are several significant points in this adaptation of the Binet test: (1) the test could be given to a large group; (2) it was objective; (3) it could be scored reliably; and (4) collection of the data and statistical treatment made possible the study of groups and inter-group comparisons. M a j o r Impetus-Necessify

t o Test Large Numbers

Army Alpho Test

The next impetus to objective tests came as a result of World War I. The U.S. Army, faced with the problem of testing quickly and reliably millions of servicemen asked the help of the American Psychological Association. The committee appointed by the APA worked along the lines of Otis in developing what came to be called the Army Alpha Test (3). This test turned out to be a far better instrument than had hitherto been devised for assigning recruits to jobs with different intellectual demands, for picking out promising officercandidates, and for rejecting those who appeared to lack sufficient mental ability to complete military training successfully. Expansion of Education01 System in 1920's

The factors which gave the initial impetus to the development of objective tests were present to an even larger extent in the educational system after World War I. America experienced a rapid expansion in its educational system, especially a t the secondary school level, and this expansion had several consequences. First, the size of the individual classes increased and the resulting loss of personal contact between teacher 420

/

Journal of Chemicol Education

and student made frequent and reliable testing necessary. Second, the influx of the population altered the population mix, and an ever-increasing number of students were not "scholars" in the hitherto understood sense. Not every person attending high school in the early 20's had the ability, preparation or interest to go to college for a learned profession. As a result, the philosophy and practices of the educational system were altering rapidly. Establishment of Notional Testing Orgonizotions

Within this educational environment, educators in many fields in the schools were experimenting with and using objective-type tests. It was a t this time also that nationwide testing organizations were founded including the Cooperative Testing Service; its parent body, the American Council on Education which gave for a long time the Scholastic Aptitude Tests with two parts, verbal and quantitative; the College Entrance Examination Board; the Carnegie Foundation for the Advancement of Teaching; the Graduate Record Examination; and the American College Testing Board. The Educational Testing Service was founded in 1947 by the merging of the American Council on Education, the Carnegie Foundation and the CEEB. The only new organization is the American College Testing Program which hegan operations in 1959. I n addition, many government agencies, including the U S . Civil Service Commission, established offices to construct tests for the selection of job applicants or for the certification of professionals, such as doctors or lawyers. Similarly, many other organizations, such as the American Medical Association, the American Dental kssociation, and others hegan using objective tests for the accreditation of their members. Effect of World W a r I1

Still further impetus was received in World War I1 when approximately eleven million servicemen and women were tested for their special knowledge and abilities for assignment to duties or for further training. At the same time the United States Armed Forces Institute (USAFI) was established with the idea of providing educational opportunities to all servicemen. It was perhaps the largest L'university" ever conceived in which over two million servicemen had taken courses by the end of the War. Following the War the influx of veterans into the educational institutions under the GI Bill produced additional pressure for the adoption and development of objective tests. In addition, the need for teachers brought into the teaching profession those who were new and receptive to trying the objective-type tests.

Thus, one of the primary factors for the development of the objective tests in the United States has been the ever-increasing pressure t o test large numbers of individuals. This may be looked upon as a necessary evil or, at least, as a fact to be reckoned with. However, the development of the objective tests has also been fostered by more positive elements. Essay Versus Objective Tesls Shorfcomings of Esscly Exams

From ancient times up to about the beginning of this century i t was universally assumed that the tests given by a mentor or professor individually were a valid assessment of a student's ability. Indeed, this concept lingers on in many parts of the world. The assessment was done almost entirely on the basis of either written essay questions or oral examination supplemented by anecdotal observations. Since the beginning of this century, this method has been seriously questioned. The anecdotal observations, for instance, are sporadic, and hence are hardly a systematic or reliable sampling of the student's competence; the oral examination is as much a social encounter as it is a rigorous evaluation; the most serious criticism leveled a t the written examinations is whether the essay question is a reliable means of assessing student ability. Early in the century began a series of conferences and studies on examinations. These were summarized by Sir P. Hartog and E. C. Rhodes in a paper entitled "Examination of Examinations" (4). As a result of these conferences some of the serious shortcomings of the essay examinations emerged. One of these is that the essay questions are typically too general, too broad, unclear, and indefinite. Typical of the questions considered too broad are: Write all you know about the atomic theory; discuss the chemistry of the halogens and give four methods of preparation; describe the periodic system and give three examples of group relationships. A second major shortcoming of the essay examination to emerge is the unreliahility of the scoring. There were indications that the score a student's paper received depended on whether i t was graded before or after dinner! I n systematic studies, evidence accumulated that examinations read by d i e r e n t readers received widely different scores. Attempts to improve this by asking the writer of the examination to provide a sample perfect paper have met with varying success. One case has been reported in which the "model" paper was mixed up with the students' papers and was subsequently scored to receive a score of 10% by one of the readers! A third limitation of the essay examination is that the sampling error is inherently large. No more than a few questions can be asked in a 1-hr examination; and if these questions are specific enough to be searohing in depth they could hardly provide a wide sample of the entire field a t the same time. The unreliability of the essay test was clearly recognized as a serious and unacceptable defect.

mensely aided in improving their quality. I n any discussion of the development of objective tests, two aspects were early distinguished: one is the form of the objective items and the other is the content of the items. With respect to form, the objective examination began typically xith true-false questions, widely used during the early 20's and 30's. However, the limitations of this form of item were often recognized and pointed out. A basic difficultyis that most of the statements about the real ~ o r l dcannot be classified as definitely true or false; in other vords, the world is mostly neither black nor ~vhite,but is usually grey. The second objection to the true-false form is that on the basis of guessing alone a student may be expected to get half of the items right. A xide variety of forms was developed-multiple-choice, rank order, analogy, diagrams, tables, et~.-notably by a group of examiners under L. L. Thurstone a t the University of Chicago. This group put out a very excellent manual entitled "NIanual of Examination Methods" (6). Of these many forms, the multiple-choice form is so versatile, and the easiest to construct, to score and to treat statistically that it has superceded all other forms so that i t is the only one that is really used extensively. Anything that can be asked xith other forms can he asked with the multiple-choice form. For example, one can use a diagram, a photograph or a table and still ask a question with four or five responses. There are several varieties of the multiple-choice items;= of these the best answer or one-anszt.er variety either as a direct question or a completion item has been found, a t least in our Committee, to be the most effective. I t forces the examiner as n-ell as the student to concentrat,e on one specific point, and hence, it is capable of searching a t any desi~eddepth, especially by the control of the responses. This may be shown by an example from general information. To the question, "Which is nearest Chicago?" the responses may be given as (a) San Francisco, (b) Kew York, (c) Miami, ( d ) Philadelphia, and (e) Detroit. A general knowledge of the geography of the United States is sufficientto get the correct answer. However, to the same question, one might use the following responses: ( a ) Evanston, (b) Peoria, (c) Springfield, (d) Wheaton, and (e) Xih~aukee. Obviously this question requires a much more detailed knowledge of the geography surrounding Chicago. Nor is the versatility of the objective test limited to depth or difficulty. It was gradually recognized that test items can be written that test not only memory and information but reasoning and the other "higher" mental processes as well. The long experience vith objective tests has demonstrated that there is hardly any of the so-called "higher" mental processes that cannot be tested with objective tests. Content of Items. More significant is the development which took place with respect to the content of the items. While the form refers to "how" something is measured, covtent refers to "what" is measured. At least two dimensions of content are recognized. The first, and most obvious, is the suhiect matlev content.

Objedive Tesfs'

Forms of Items. The new objective tests were also under severe criticism; however, these criticisms im-

1

Several studies in this area were reported hy lluch (5). Forms of Test items receive excellent coverage in Liuquist (7). Volume 49, Number 6, June 1972

/

421

The very nature of the objective test requires the exnminer firat to state explicitly the subject matter of a given course. He must make an outline of the course and list the important terms, principles, concepts, theories, and so on xhich are contained in it and which he presumably wants to sample. This led to a development of a careful examination of courses of study and a listing of what were later called subject matter objectives. A detailed and carefully constructed courseof-study makes it possible to select items and to assemble a test which is an adequate and fair sampling of the course. The subject matter content is the most obvious dimension and was aimed a t very early. However, it was also early recognized that there are other dimensions in addition to subject matter. It is clear that the entire subject matter of a field can be sampled by questions of terminology alone, or factual information alone, but it is also clear that these would not represent a fair sample of what is expected of the course. The subject matter of chemistry contains not only information and vocabulary but also problems which are to be understood as well as applied; theories to be understood, applied, and distinguished from observable facts and so on. I n other words, it became clear that dimensions other than subject matter are an integral part of the course and are also to be measured. One of the first distinctions commonly made was between memory and reasoning. This dichotomy was expanded naturally into a continuum of levels of attainment. One such version was developed by the author and used for the construction of the USAFI General Chemistry test (8). Another continuum was expressed in terms of "types" or "levels" of human behavior. This led eventually to Bloom's taxonomy which included knowledge or information, comprehension, application, analysis, synthesis and evaluation (9). Another way of expressing these dimensions is Thurstone's primary mental abilities (10). There are many other ways of developing other dimensions. Thus we see that objective test items can not only test more reliably but they can be used to test subject matter content a t any desired depth as well as a t various levels of attainment and other dimensions of learning. Limitations of Objective Tests. While the objective tests can have all these virtues, they also have limitations which have been pointed out repeatedly and have been widely recognized. By their very nature objective tests have been most suitable for testing in the cognitive domain; at any rate they have been most successful and may be limited to testing in this domain. So far we have not learned how to test successfully in the affective or psychomoter domains by use of multiplechoice questions. It is recognized that objective tests cannot be used effectively, if a t all, to assess a person's ability to organize material, to present an argument either in writing or orally nor his ability to carry out laboratory operations. In the ACS Testing Committee, for example, the best we have been able to measure is "knowledge of" acceptable laboratory procedures and techniques. But these do not measure the skill of the individual in actually carrying out the operations in the laboratory. Similarly, satisfactory objective 422

/

lourno1 o f Chemical Education

Development of ACS Testing Program

No. tests featured

Year

No. tests sold

Ans. sheets sold

Instrumental Analysis 1965 Advctnced High School

1960

Graduate Level Tests Inorganic

High School 1955

1945

Biochemistry Physical Quantitative Organic

1940 Qualitative

1935

General

tests have not yet been developed to measure the motivation of an individual nor his drive nor his personality nor the force of his personality. So far we have not learned how to test these very important elements by multiple-choice questions and, indeed, it may be questioned whether these attributes can be tested by any kind of paper and pencil test. While these limitations of multiple-choice questions are recognized, we must not "hold it against them," so to speak. A test is but an instrument and all instruments have limitations. I n addition, in any assessment the subjective element of judgment must come in. The fact that the physician must ultimately make a subjective judgment does not imply that it is preferable that the temperature of the patient he "subjectively" taken by touching his forehead with the hand instead of "objectively" using an accurate clinical thermometer. Establishment of the Committee

It was within this context, sketched above, that the American Chemical Society Testing Program was developed and flourishes today (see the table). Testing remains as an important part of the process of continuous assessment which is an integral function of teaching. It is inconceivable that the teaching profession will ever give up this right which they consider as fundamental. I n fact, most of the nationally standardized testing instruments are used at the discretion of the teacher, school principal, or supervisor within the realm of his own authority. While the weight accorded to standardized tests

varies from time to time and from place to place in relation to the purpose for which they are used, objective tests prevail, as they have in the American Chemical Society since 1934, as important auxiliary devices for the evaluation of students. Literature Cited

H ~ T U OSIR . P., AND RAODEB.E. C.. "An Examination of Eraminations." international Institute Examinations Enquiry. 1935. Room, G . M., "The Ohieotive or New-Type Examination," Scott, Foresman and Ca.. Chicago, 1929. "Manual of Examination Methods." T h e University of Chicago Books o r e . Chicago, 1937. L ~ ~ n s a r s rE. . F. (Editor). "Educational Measurement," Ameriaan Council on Eduoation. Washington, D. C., 1966, pp. 193-212. Asxrono, T.A., J. CHEM.EDYO.,21. 386 (1844). BLOOM, BENJAMIN S. (Editor),''T&xonomy of Educations1 Objectives. Handbook I : Cognitive Domain," Lonzmans, Green snd Co.. New York. 1956, pp. 89-119. THUBBTONE. L. L.." P ~ i m ~Mental w Abilities," University ai Chicago Press, 1957.

Volume 49,Number

6, June 1972

/

423