David W. Brooks Paul B. Kelter Thomas J. Tipton University of Nebraska-Lincoln Lincoln. NE 68588
I
I
Student Evaluation versus Faculty Evaluation of Chemistry Teaching Assistants
Several years ago we published a study describing a lack of discrimination by students when evaluating teaching assistants (TAs) whose assignment included laboratory instruction only ( I ). At that time one of us (DWB) developed a prejudice against the use of student evaluations of TAs even though student evaluations of recitation TAs seemed discriminating. This article reDortS the results of a test desiened to discover relationships, if any, between student evaluatkns and faculty evaluations of TA performance in the classroom. Our subjects were TAs in a large, multisection general chemistry program with considerable training o ~ ~ o r t u n i t i e(2) s as well as ODportunities to receive feedbalk on teaching via videotape (3). Three years ago we began a program of unannounced faculty visits to the classrooms for the purpose of TA evaluations. ~ a i u levaluators t~ use a standardized form on which to record ohservations and make comments as well as to record evaluative scores. This technique was judged valid in terms of internbserver reliability. Two of us (DWB,TJT) have by now observed over 500 sessions each. During the 197&79 academic year we trained over one fourth of our faculty, includingmost members of our Departmental Graduate Committee, to make ohservations and evaluations. The scheme has had good effects on our Dropram. Many faculty gained first hand inowledge of therecitation class roo^ and some of the problems faced in general chemistry. As the evaluation literature indicates, however, this method of direct ohservation is very unreliable (4). Some very good TAs are evaluated on had days, and vice versa. The supervising faculty (the lecturers to whom the TAs are assigned) provide another source of TA evaluation. TAs are expected to cooperate with supervising faculty, and their nerformance is r e ~ o r t e dhv these facultv each semester. Using two inputs-supervisor rating and classroom rating resultine from unannounced facultv visit-a facultv ratina of perf&ance was determined during one fall semes&r. After takina the classroom ohservation rating to the supervisor and complete lists of TAs arriving a t an initial rating for each and ratings were circulated for two additional iterations. This was intenhed to bring supervisors ratings into better agreement with one another. When consensus was reached, this rating was translated into $0, $10, $25, and $40 awards which were distributed just before semester hreak. At the same time this faculty rating process was being conducted, we administered a questionnaire to the 1,500 students of these TAs. This instrument used items from a well-studied collection (5).The questionnaires were all introduced, administered, and collected by the same person during lecture classes when no TAs were present. Students were asked to identifv their questionnaires by TA name and section number only. Large differences among the scores of TAs were found. Using ANOVA techniques, we sought to discover relationships between the faculty rating of performance and the student ratings. Few statistically significant.correlationswere found. We grouped the TA student scores into quartiles. The frequency with which a faculty rating and quartile coincide is didplayed in the table. Although each TA received only one faculty rating of performance, each usually received two student ratings. A total ~
TX
294 1 Journal of Chemical Education
Frequency Relationship between Ouarlile Based upon Student Ratlng and Award - -Amount Based upon Faculty Rating Faculty Rating of T A Performance
Quartile Base on Student Rating 4
1
low
2
3
high
Total
of 68 were studied. We deleted data for 3 classes taught by undergraduates who were not eligihle for nwurds ovir $16. Also, we deleted data for Gother (:lawe. hecause iaculty rntinyi reflected pmhlems the student;; were not w a r e of. (For 2 'CAs this involved the teaching nf imponanrly ernmeous chemistry. of not following suand for one there was asevere pervisor instructions.) Of the remaining 59 cases shown, 2 TAs had one class rate them in the 2nd quartile and another in the 4th. All other student ratings led to identical or adjacent quartile assignments. Throughout any evaluation system, fairness and perceived fairness are important issues. When student and faculty ratings coincide, there is an implied fairness (which may or may not survive the test of a more thorough evaluation). Of the 59 ratings, 26 fell on the diagonal, and 25 fell one category away from the diagonal. In 6 classes, students rated TAs suhstantially higher than teachers, and in 2 classes the reverse situation occurred. In an area as "political" as teaching evaluations, this agreement appears reasonably good. (x2for these data is 25.55. With nine degrees of freedom, a < 0.01; more than 99% of the time, when the rows and columns are independent, the value of x2 will he smaller. The Spearman rank correlation coefficient is +0.38.) Our faculty evaluations have made a big difference to TAs; as much as $600 has been a t stake (a $900 versus a $300 annual raise). Naturally, there is some complaining. TAs are offered the opportunity of a reevaluation any time they believe that a rating has been too low and that their typical performance is better. Onlv a handful of TAs have asked for this, a clear indication that they feel threntrned by "thesystem." Because the unannounced facult\~eraluationis ~ ~ n r r l ~ u(subject hle tu sampling error), many c&nplaints are justified. The favorable praematic outcomes are two. The faculty know much betteiahout what goes on between TAs and their students in the classroom. Also, with this show-the-flag scheme for evaluation, TAs are left with the impression that the faculty are really concerned about a TA's teaching. In addition to examining data for TA evaluations, we also studied data from other "instruments" administered to TAs. For examde. no correlations were found between either rating scheme and ~ . 4 ' srlassroon~a p p r r h e n s h as measured by the 161, I'erst,nnl R e ~ o r of t Communicat~mApprehension s c ~ m TA's cognitive style as measured by the Group Embedded Figures Test score (7). or TA's reasoning skillsas measured b;m,xlitied written I'iagrtian te-1s in).. One might infer frnm this study that faculty evaluation of TAs' rlaisroom teaching i i a justifiable practice. at least as
justified as student evaluation. Before one begins serious contemplation of such a program, however, the provocative opinion by Pickering should be consulted (9).His alternative requires no large-scale systematic approach to evaluation, and is therefore probably much more in tune with current practice than is ours. Literature Cited (1)
Levonson, H., end Brooks. D. W., J. Coll. Sci. Tchg., 6.85 (1975).
(2) Project TEACH Staff, J. CHEM. EDUC., 53.209 (1976).
izi ~~~~~;,"-~i~$~;$~,"~h~~,";t~,";~,"," am, IX.. sen
Centra, J, A,J JossPy~ ~,977. ~ ~ ~ i ~ ~ 15) H M e b m d , M.. Wilson, R. C.. and ~ i e n s tE. , R., , ' ~ v a ~university ~ ~ t i ~~ ~~ ~ ~ h Center fur Researchand Development in Higher Education, University ufCalifornia, ~ e r k e ~ e1971. y, (61 Seiler, W. J.,eIol, J.CHEM.EDUC., 55,170 11978). (7) Witkin. H. A,, et al, "Field-Dependent and Field-Independent Cognitive Styles and Their Educatiollal Impliestiuns." Education Testing Service. Princeton. NJ. 1975. (8)Fuller. Robert 0.. persond communication. Dr. Fuller scored the test results. (9) Pickering,M., J.CHEM. EDUC.55.511 (1978).
Volume 57, Number 4. April 1980 / 295
;
~ ~ ~