Statistical treatment of data from student teaching evaluation

Statistical Treatment of Data from. StudentTeachingEvaluation Questionnaires. In the spring of 1969, a student-faculty committee, of which I was a mem...
0 downloads 0 Views 5MB Size
C. D. Cornwell Univers~tyof W~scons~n Mod~son,53706

Statistical Treatment of Data from Student Teaching Evaluation Questionnaires

In the spring of 1969, a student-faculty committee, of which I was a member, was established in the Department of Chemistry a t the University of Wisconsin-Madison to consider the matter of course and teaching evaluation. The committee prepared a questionnaire to be made available for use in all departmental courses with the results to go to the individual professor and to the departmental chairman. The objective which the committee had in creating the questionnaire was improvement in courses and teaching by providing feedback to the professor and information to the departmental chairman. The questionnaire, with minor changes, has continued to be used over a four-year period. Although its use was optional, it received the sanction of the departmental faculty and has been used in a high proportion of lecture course sections. After the questionnaire had been in use for several semesters, a number of faculty members expressed interest in having a statistical study made. The results of this study, covering a three-year period (six semesters and three summer sessions) are presented in this paoer. In 1972, the committee-on ~ n d e r g r a d u ~ t e - ~ e a c of hin~ the Division of Chemical Education of the American Chemical Society prepared a student questionnaire and arranged to have this used by colleagues in a number of college and university chemistry departments across the country. Data from statistical studies of the results of this questionnaire in two semesters will also he summarized here. The objectives of our statistical analysis were I) To determine whether the differences in results for lecture course sections were statistically significant. 2) To examine the effect of several factors other than teaching

skill of the lecturer in order to learn the extent to which such factors might have to be taken into account in interpreting questionnaire results. 3) To develop, for each question used, "baseline" data with which results for an individual lecturer could he compared. Wisconsin Questionnaire

The questionnaire in its present form is shown in Figure 1. Most of these questions were used throughout the three-year period covered by this study, although a few were added during the first or second year. The number of lecture sections for which data were obtained ranged from 163 (for questions used over the entire period) to 59 (for those added last). The questionnaire was given out and completed during a class period in the last week of each semester in order to obtain a reasonably high proportion of replies. Students were invited to detach the comments page and turn i t in separately at a later time if they wished more time for that part. The X column (no basis for opinion) was added at the beginning of the third year, with negligible effect on the responses except in the case of Questions 7 and 8. Even Presented at a Symposium on "Student Evaluations of Courses and Instructors via Questionnaires" at the 165th National ACS Meeting, Dallas, Texas, April 9-13, 1973. 'Hays, W. L., "Statistics," Holt, Rinehart and Winston, New York, 1963, Chapter 12.

for these questions, the overall mean of the responses was not affected appreciably. Although the scale used introduces some skewness, it will be assumed that the mean for each lecture section does provide a fair representation of the results for that section. Comparison of Variations between and within Sections

The first question we ask is, are the differences in means for various lecture sections statistically significant? The question is bound to arise, because the scatter in lecture section means may be comparable to or even less than the spread of responses within typical sections. To examine this question quantitatively a one-way analysis of variance' was performed on the data for one semester. This type of analysis compares the amount of variation between groups (in this case, lecture sections) with the amount of variation wrthin groups. Each question of the questionnaire is analyzed separately. The definitions of statistical q u a n t i t i ~ sused are given in the Appendix. CHEMISTRY LECTURE COURSE QUESTIONNNRE Indicate the ~ r r e n a hof your agreement or disagreement with the statement. h l o v by mrirclingrherppmptinte number scmrdinz lo the followmgrale:

If you fee1 that you have little or no bask for an opinion about a statement. do not meir. rbanynumber. but markIXI in rhespnccatrhencht.

1. The lecturer is generally well prepared 2. The lectures arewll organized 3.Thr1ecturerm~k~~ihecoumcimresrinc 4. The lecturer wrmswilling toanswer questi~mduringhis lecture 5. Thcle~fur~rspshclearly 6. Thclecfursrrril~nleg!bly 1. Thcl~~t"rermakci.ffectir.uaeofvisu.l aids 8. The lecturer i.sufT,ciontb available for h r l ~ outsidaelars hour. 9. The Ieefurcr i l a n d f e d v e teacher 10. The examsare reflectionsof the material

I

1 1

2 2 2

3 3 3

4 4 1

1 1 1

2 2 2

3 3 3

1

2

3

4

5

0

1 1

2 2

3 3

4 4

5 5

0 0

5

0 0 (1

0

5

4

4 1

5 5

5 5

1 l

) )

prabsbly bo v&~sbleto you in the future

Figure 1. Wisconsin questionnaire.

Volume 51, Number 3.March 1974

/

155

Table 1. Results of Analysis of Variance Testing Significance of Differences Among Lecture Section Means. Wisconsin Questionnaire' Q u e s t i o n V S Retween

MS Within

Table 2. F-ratios from Analysis of Variance Testing for Effects of Semester-Year, Enrollment, Subject Area, and Course. Wisconsin Questionnaire

--

F-Ratio

Qvdon-

F-ratio----SemesterEnroll- Subject YearB mantr Area* Coursed,.

1 Well prepared 2 Well organized 3 Makes course interestinl~ 4 questions during lecture 5 Speaks elesdy 6 Writes legibly 7 Effective use of visual aids 8 Available outside class 9 Effective teacher 10 Exams reflect material covered 11 sufficient time allowed 12 types of questions appropriate 13 teat understanding 14 graded fairly 15 Prohiem work beneficial 16 Assigned reading beneficial 17 Probnble future vslue 18 Background assumed 19 Pace of course 20 Amount of ~ r o b l e mwork b

See Pigure 1 f& statements of que.&ns.

The results of the analysis for several questions are shown in Table 1. If there are no real differences between sections, then the mean square between groups ( M S between) and the mean square within groups ( M S within) should he equal except for random error, and their ratio F should he unity except for random error. If there are real differences between sections, these differences contribute to MS between hut not to M S within, and then F is expected to be greater than unity. The F-values in Table 1 are certainly greater than unity and are far larger than can reasonably be attributed to chance. Indeed, for every one of the twenty questions, the probability of obtaining by chance an F value as large as the one found is less than 1 in 1000. The inescapable conclusion is that the differences among lecture section means are significant. Obviously, we have not shown that the differences in means reflect only differences in the performance of the lecturers. The differences could in part be due to other factors such as the nature of the course or tne size of the class. What has been established beyond any reasonable doubt is that real differences exist among the lecture section means. The proper question now becomes, what do these differences mean? To what extent are they affected by factors, such as those just mentioned, which are not directly related to the teaching skill of the lecturer? Study of Factors Other Than Lecturer's Peiformance

We turn now to an examination of some of the factors, other than lecturer's ~erformance.which miaht - svstematically affect the lecture section means. Again we use the one-way analysis of variance, but now the data are means of responses for individual lecture sections. For each analysis, the sections are grouped according to some characteristic whose effect is being tested. Although each lecture section is given equal weight, regardless of its size, data not meeting the following criteria were rejected 1) Number of questionnaires tallied ' 5 0 % of end-of-semester en

rollment 2) Number of responses for given question 250% of number of

questionnaires tallied 3) Number of questionnaires tallied 2 5

For each question, analyses of variance were carried out examining separately the role of four factors: semesteryear, enrollment, subject area, and course. The importance of subject area, for example, was determined by an analysis of variance in which the lecture sections were grouped by subject area. For each question, the analysis 156

/ Journal of Chemical Education

compared the amount of variation in lecture section means between groups with that within groups. The F-ratios obtained are presented in Table 2. These ratios provide a basis for deciding whether there are real effects associated with the groups. For the number of groups and lecture sections occurring here, F-ratios exceeding a critical value of about 2 support, a t a 95% confidence level, the hypothesis that real differences exist among the groups. In many cases, F meets this criterion, providing evidence for real effects. Our obiective at this uoint. however. is not s i m ~ l vto determine whether demonstrable effects' exist but, rither, to estimate their im~ortance.For this DurDose. a more pertinent quantity than F is w2, which measures the strength of an effect. The total variance is due partly to effects associated with the groups and partly to other factors, assumed random. w 2 is the fraction of the total variance in the lecture section means which is due to real effects associated with the groups. Although w 2 is not known, an estimate can be calculated from F (see Appendix). We now examine the estimated strengths of effects for the four groupings, which are presented in Table 3. Sernester-Year The estimated values of wZ for semester-year groupings never exceed 0.10 and are most often below 0.05. In other Table 3. Estimated Strengths of Effects (u2). Wisconsin Questionnaire

-

Questiono

semear -

Year

Eat wL------Enroll- Subject men* Area Courae

1 weuprepa* 2 Well organwed 3 Makes course ink-tin. 4 Questions during 1e"ture 5 Speaks elearly 6 writes legibly 7 Effective uae of visual aids 8 Available outside class 9 Effective teacher 10 Exams reflect material covered 11 sufficient time allowed 12 types of questions appropriate 13 test understanding 14 graded fairly 15 Prohlem work beneficial 16 Asaigned reading beneficial 17 Probable future vslue 1s Background assumed 19 Pace of course 20 Amount of problem vork

F o r complete statement of question, eee Figvre 1.

words; the means are very stable and change little from semester to semester. It is possible that the variation would !