How students reconcile discordant data: A study of lab report

How Students Reconcile Discordant Data. A Study of Lab Report Discussions. Miles Pickeringl and David L. Monts2. Princeton University, Princeton. NJ 0...
2 downloads 8 Views 3MB Size
How Students Reconcile Discordant Data A Study of Lab Report Discussions Miles Pickeringl and David L. Monts2 Princeton University, Princeton. NJ 08544 Suppose a student does a lab experiment and ends up with two pieces of data which are not consistent within experimental uncertainty. Suppose further that this discrepancy violates the Law of Definite Proportions. What sort of explanations will students invoke? Will they affirm the primacy of their data or their pre-established beliefs? For many years we have used test questions asking students about systematic errors in our lab. ("If one had used acid instead of base on step three, would the result be higher or lower?"). Do the scores on this sort of question provice an objective measure of the proficiency shown by the students' arguments about systematic errors in lab reoorts? Are there more eeneral erade correlations? The exoeriment on which this studv is based has been publishedelsewhere (I).I t is an oxidatikreduction experiment in which W(V1) is reduced to "mineral blue," a n o n stoichiometric oxide, with a formula, which unknown to the student, depends on the mode of preparation. This oxide can be prepared by natural gas reduction of WOs, in which case the formula is determined gravimetrically. Alternatively, it can he prepared by Zn reduction and reoxidized by KMn04 titration, as in the well-known vanadium experiment ( 2 5 ) . In both cases students were expected (and most were able) to carry out the uncertainty calculations and to show, therefore, that the difference in mole ratios was not due to random error. The range of propagated uncertainty was usually large enough to include some sort of simole whole number formula. but in some cases this would have'to involve mixed oxidation states of W. We expected, when assigning this experiment, that the students simoly would develoo two different formulas for the products of the reductions and state that there were a t least two blue oxides of tungsten. Surprisingly, this did not happen, in spite of leading questions in the lab manual, which, we hoped, would push the student toward some resolution of the discrepancy. T o get further information on student explanations and also

-

-

' Author lo whom correspondence should be addressed. 'Present address: Depariment of Chemisiry. University of Arkansas.

Fayetteville. AR 72701. Our TAs do not grade reports but ralher check them over. In turn, students are expected to fix up the deficiencies.The grades are based on exams, as previously described in J. CHEM. Eouc., 54,315 (1977), EOUC., 55,511 (1978). and J. CHEM.

794

Journal of Chemical Education

how these correlated with other measures of student performance, we studied a large number of student lab reports. Th'w oaoer will reoort the correlation between the st& of exolanation invokLd in the lab report discussion and otier measires of student performance.

The Study This study was carried out on the large freshman advanced chemistry lab a t Princeton (240 studenc, math SAT 6 W 0 0 ) . but the results are of aeneral applicability to any class where mastery of uncertain6 analysiGd detailed dwc&ion of data is expected. Student lab reports were collected from teaching assistants immediately after the report deadline. T o make the study easier, students were asked to present data on a standardized form but were allowed full freedom regarding discussions. The lab manual asked two leading questions: (a) "Can you find a common mole ratio for both mineral blues? Do the ranges of uncertainty overlap?" (h) "Does the composition of mineral blue depend on bow it is made?" The reports were all photocopied for this study and the original returned to the teaching assistant for checking.3 No TA comments were on the copies used for this study, and extraneous influence from this source was thus excluded. Reports were first screened for calculational erron (this excluded 33 of the original 168 reports) and then for confessions of specific experimental blunders in the discussions (15 reports). Thirty-two reports were excluded because of ooor titration technique (the students were unable to standardiee ihe KMnO4 to within 3%relative). The remaining 88 reports were sorted by the senior author into the strategy groups described in the next paragraph. The reports were read twin., two weeks apart, and only a few classifications (about 6) were changed as a result of this second reading. The strategies used by students are summarized in Table 1, together with the percentage of the class using a given strateev. Grouo A comorises students who acceoted the experimikal faits and concluded that, yes, th; method of preparation did make a difference to the formula. This was often, although not always, hedged, in some way, for example, "much as it goes against everything I've been taught about

Table 2. Tell Scores for Strategy Groups

Table 1. Bask Dlscusslon Stralegles Number

Strategy of papers Two compounds are different according to data A. Concluded farmula depended on made of preparation B. Explained discrepancy by rational argument about blunder w systematic errw C. Avoided internal discrepancy by choosing one of the two values D. Explained discrepancy by faliacious argument about blunder or systematic error E. Explained discrepancy with vague argument about blunder F. Insisted on consistency without worrying about differences.Discarded data, wrote experiment off as a failure. G. No discussion H. Waffled completely. Deliberate agnosticism about results I. Looked up fwmulain handbooks and tried to reconcile Two compounds are me same accading to data J. COnclusion so states 3 K. COncIusion states formula inconsistem with data 88

Total Lab Total Score on Syst% matic Error Q ~ e ~ t i o n ~ Exam

% of papers

A

23.1 i 6.9 26.4 i 8.5 22.5 f 9.5 20.7 7.3 21.0f 6.7 21.4f 6.0

B

C

*

D E

F

88.1 f 10.9 93.8 i 14.7 90.2 i 15.9 76.2 f 14.6 73.0 f 11.9 83.2 f 13.5

Lecture Course Final 70.0 f 15.8 82.7 i 10.5 77.9 f 14.3 76.0 f 10.7 69.5 i 17.8 76.2 i 14.6

Total possible points 36

-3

chemistry, my data forces me to conclude . . . ." About one quarter of the papers used this strategy. Group B was comprised of students who explained the fads by reference to a systematic error of one sort or another and who argued this consistently, e.g., "If the reduction with zinc were not complete, this would make the apparent mole ratio seem too high, and thus it may be really the same as that for the natural gas reduction." Group C was composed of students who emphasized one result or the other as correct. In most cases, they were people whose nrocedural errors caused the Zn reduction to be incompl&, and who therefore got formulas best rounded to WO.4. Knowinp: this to be the formulaof the staningmaterial, the; discarded the result as nonsensical withouta detailed explanation. Group D is similar to B, but in this case the systematic error arguments were in the wrong direction. Group E was comprised of students whose arguments were vague but still in terms of systematic error or blunder. Typically, a group E student would state a possible systematic error ("human error") without arguing why this should cause the result to be wrone in the observed direction. people who wrote the experiment off as ~ r o Fucontained ~ a failure because i t did not produce the expected results. Typically, the group F student affirmed the Law of Definite Prooortions and insisted that all results inconsistent with i t we; "wrong" but gave little in theway of detailed explanation as to what might have happened. I t seemed to bother them that "mineral blue" might be several different compounds. Groups A through F contain 83%of the papers, and the remaining groups are very s m d . Group G is composed of students who did not write a discussion. Groun H is com~osed of total agof students who waffled, or who took a nosticism: "It's imoossihle to tell from the experiment," or "My technique is too poor to give accurate results anyway." Group I contains students who tried to explain why their results were not thesameas that in the.'HandhonkofChemistry and Physics" (which . gives two formulas, W $011 and W20j, for minerai blue). The moups J and K were composed of people whose mole ratios for the two prospective methods appeared for various reasons to be consistent. Usually the student had made a mistake in the uncertaintv calculation which led to a ranee which was too wide. ~ r o u Jp adopted a formula for mineral blue which was consistent with their data, while group K chose

120

100

formulas that were inconsistent. Both of these groups were comprised of very few students. Of course, the groups described above do not represent "pure" intellectual styles; such a classification is, of course, artificial and somewhat arbitrary at best. The full pattern of student individuality emerged in the arguments about the data. So borderline cases are common, and for these we have tried to classify the paper according to the major thrust of the student's argument. At the end of the semester. all of our students take an open-book lab written exam, i h i c h largely determines their laboratorv erade. As oart of this exam. we asked two multipart que&ions aboutthe experiment. The first of these two questions posed a number of systematic errors concerning the Zn reduction and permanganate standardization. The student was exvected to indicate, for each systematic error, whether the apparent mole ratio of oxygen to tungsten would be higher, lower, or unchanged. The second of these two multi-part questions posed apair of discordant mole ratios (with uncertainties) for the two reduction products. Possible explanations were suggested, and for each the student had to indicate whether this could or could not possibly explain the discrepancv hetween the two values. The averaee scores on these two systematic error questions are listed for each strategy group in Tahle 2. This table also contains the averaee total lab exam score and also the average f i a l exam grade fo; those members of the erouo " .in the lecture Dart of the course. Because groups G, H, I, j,and K contain so few students, we have not further analyzed them, since any conclusions might well be misleading. Dlscuoslon The results of this study can be discussed from several viewpoints. (1) Do systematic error questions work? It is clear from Tahle 2 that the scores on the systematic error questions for the large groups (A-F) were very nearly identical except for erouo . . versus the averaee " .B. A t test of the averaee " score for B (26.4) score of the other laree erouos combined 122.0) eives a t value of 1.75.

~-

on lab reports will probably do well on test questions involving these ideas. The other groups do not appeartohe very different in average score. This means that sysfxmatic error questions test only the specificskills required to make a detailed argument and provide no ranking of the less sophisticated approaches. Thus, the test questions work, but they probe very narrowly. (2) Are there other important correlations between grades and stmt.Vvl -..-,

While thesys~maricerrorquestions arc important, they test only a small part of the scientificjudgment 5k1lIsthat we are trying to inrulratr The lab test as a whole and the tmal exam in the aasnriat~d lecture course may give more information. On the lab test, group B has the highest score, but groups A and C are not far behind. Grouo F is intermediate and .. erouos . D and E have almost ident~rallow SI.OWE. If thew groups are combined into larger "superp,ups."as inTal,le:$.then the 1 test givesstatistiral significance for all combinationsof supergroups. The students' choice of the discussion strategy is not arbitrary but ~

~

Volume 59 Number 9

September 1982

795

Table 3.

Comparlson of Scores on Open Book Lab Wrlnen Exam Groups

Raw Swre Difference

t

A+B+CversusF A+B+CversusD+E F versus D E

6.8 15.1 8.3

1.52. 4.17' 1.65b

+

standard deviation Normalized Lab wrinen exam swre Course final raw scwe standard deviation Normalized final

A?

is related to something measured bythe test. Second, thereseems to us to he a correlation between the scientific maturity displayed by the lab discussion strategy and the student's test score. Since the idea of scientific maturity -not be reduced to a numerical variable, this idea is difficult to pursue, but it is suggestive, nonetheless. On the final exam in the associated lecture course, group E again is a t the bottom and .. erouo B a t the too. However.. erouos A and C " move downward. We have been unable to demunsrratr any statirtirally significant differences between the various groupson thecourse find. 'l'hrrefure, it is likely that whatever c a w s the student tu choose a particular strategy is muchlessrelated togeneral chemical knowledge (tested on the closed-book lecture final) than to a variable measured by the open-hook lab exam. Unfortunately, pure statistical analysis can give us no clues as to what this variable might he, and speculation is futile and likely to be misleading. (3) Does the student who is ''open minded" do better than the "closed-minded" student? I t is also interesting to compare group A, an "open-minded" strategy in which a student is willing to have his mind changed by his data, to group F, the strategy of affirming the Law of Definite Proportions above all else and essentially discarding the data, the ultimate "elosed-minded" strategy. Yet there appears to be no statistically significant differences between group A and F on any of the tests or upon the performance on the systematic error questions. The only interesting thing is the reversal in position between the open-hook lab exam, where "open-minded" group A does better, and the closed-hook lecture find, in which "closed-minded" group F does hetter. Scores of the A and F groups have been normalized by subtracting the mean for all groups. It appears that the major effect ia that group A did significantly worse on the closed-book exam, while group F maintained its oositian in the class. I t is not clear what caused this strikine effect. 1t'mav be that some members of erouo.A accent the idea t h i t the formula is deoendent on modes of oreoaration s k d v, I,ecause [hey are cl~eniicallgunsophist~catedand unaware of the rmtradirtiun with ertahlirhd theory. Clearly a student whg, he^ m t know about the Law of Definite Proportions is going t o do poorly in the lecture course ( $ 1 ' l h r I alur of s)sternatic error querriom. This study r ~ i n i o r r eOUT ~ feeling that grading of discussions hy individual teaching asswtants wffrrs from the following problems:

.

796

~~~

~

~~~~

~

~~~~~~~

Grow F

Lab WrHten raw score