The Development of Statistical Concepts in a Design-Oriented Laboratory Course in Scientific Measuring Martin J. Goedhart and Adri H. Verdonk Centre of Mathematics and Science Education, University of Utrecht, Princetonpiein 5, 3584 CC Utrecht, The Netherlands
Learning how to measure is a n objective of laboratory courses in most university chemistry faculties. Emphasis is frequently on learning the practical skills of a measurement, such as operating the instruments and elaborating the dataobtained from them. Far less attentionis normally paid to the design and evaluation of a measurement and to the interpretation of measurement results. However, scientific measuring includes design and evaluation as much as the performance of measurements. At this moment a curriculum including design and evaluation of measurements is not available. This article offers a contribution to the development of such a curriculum. We consider the role measurements play in chemical research and present some results of an empirical educational investigation performed in a first-year laboratory course in chemistry a t the University of Utrecht (the Netherlands). Finally, we discuss some implications for curricula ofchemistry faculties and for a new laboratory course. Amoreelaboratg description oftheresearch project is given in a PhD thesls by Goedhart ( I , . Theory and Practice in the Curriculum The current curriculum of the chemistw facultv of the University of Utrecht is similar to the chemistry & i c u l a at many otheruniversities. I t is divided into lecture courses and laboratory courses. Theory is presented in lectures; laboratory courses aim a t practical skills. The first year offers lectures in elementary physical chemistry (thermodynamics, chemical kinetics, electrochemistry, chemical bonding), organic chemistry,inorganic chemistry,biochemistry, analytical chemistry, physics, and mathematics. Lectures are in most cases combined with tutorials, training students in applying the theory to problems. Firstyear students spend about 30% of their time on a laboratory wurse in general chemistry including organic and inorganic syntheses, physicochemical measurements, and chemical analyses. The dichotomy between lectures and laboratory courses in the cumculum raises questions on the integration of theorv and practice. This is a fundamental problem in curriEula OFmany chemistry faculties. The -laboratory course is mainly prescription-oriented. This means that students' operations are- prescribed by detailed instructions. In such a course students can obtain a good practical result without having to use theory. This means that a prescription-oriented laboratory course may contribute to a gap between theory and practice. On this basis several authors question the usefulness of prescription-oriented laboratory courses in this Journal (2-7). Connectinz theorv and practice in laboratory courses is commonly akemptld by iicluding questions inUthelaboratory manual. An example of such a question is: "Why are conductometric measurements performed in a thermostatic bath?"Such questions are supposed tomake students aware of the f a d that in a measurement some variables should be controlled in order to minimize their influence on the measured quantity. However, we found that our students paid little attention to these questions (see also (8)),
simply because answering them is not a prerequisite for performing the experiment. The procedure is determined by the instructions, and the questions are merely an addition. Another attempt to include theory in practical work involves the introduction of '%lank spaces" into instructions. In the volumetric determination i f acetic acid in vinegar, for example. students themselves mav be instructed to decide on the amount of vinegar to b"e used in a single titration. if the a~proximateconcentration of acetic acid in the sample is &en. This strategy is sometimes called a problem-oriented (9)or a research-oriented approach (5). We decided to concentrate on this possibility. We chose the name desimdriented laboratow course, as we wanted our students design selected of the practical work themselves using theorv learned in the lecture courses. In this way we hoped to integrate theory into practical work. This article reports on the analysis that enabled us to design such a course. It also offers a brief sketch of the course. Developing a design-orientedcoursein scientificmeasuring produces several questions. First, design-orientedpractical work demands a new kind of instructions. Complete instructions do not fit in with the new approach. We have to select appropriate design-tasks onmeasurements. These tasks had to fit in with students' knowledge, skills, and interests. Secondly, we have to establish which fields of theory are needed for designing and evaluating measurements. In order to make a selection we have to look a t the role measurements play in research activities of chemists. Measurements in Chemical Research
Measurements play a role in almost e v e n field of chemical research albeit in different ways. chemical analysis and theorv development are two of the wntexts in which measurements are important. In a chemical analysis, statements are made concerning the amounts of certain constituents in a sample. It is important to note that not only analytical chemists and analysts perform chemical analyses, but also that it is a daily routine for many chemists. Synthetic chemists identifv ~roducts.Biochemists analvze the amino " svnthesized " acid composition of an isolated protein fraction, etc. There is a similaritv in methods and instruments used in analvtical measurements, but there is a differencein the of the analysis. These differences result in different requirements concerning the reliability of the measurements. In the case of food products or soil samples this reliability is demanded by legal provisions, such as the Food and Drugs Act or environmental legislation. In the case of identification of a synthesized product a rough measurement is ofkn satisfactory. Choosing the samplingmethod and the most adequate separation method and analytical technique to reach the demanded reliability is of major importance in analytical measurements. Chemometrics is the area in which these topics are discussed. Volume 68 Number 12 December 1991
1005
Theory development occurs in all chemical disciplines, since this is the ultimate purpose of scientific research. Physical chemistry is the chemical discipline in which theory development is most prominent. Activities are aimed at testing hypotheses with the goal to confirm, refme, adjust, or reject theories or models. Research workers in physical chemistry often use special instruments sometimes self-built - operating under controlled conditions. We distinguish between different contexts because they differ in the extent to which conditions, methods, and instruments are considered as a problem. For instance, in a chemical analvsis an absomtion spectrum is recorded in maximum of a certain order to detekine the ah~o;~tion solution withacertain instrument. Calibrationofthewavelength scale is not necessary as long as measurements are carried out a t the same wavelength with the same spectrometer. The properties of the instrument are not considered as a problem if the method produces reliable (i.e., accurate and reproducible) results in relation with the intended use of the measurement result. On the other hand, for a physical chemist investigating a theory about the electronic structure of a certain compound, it is important to know the wavelength of the ahsomtion maximum as exactlv as possible. Calibration of the wavelength scale is essential. ~breover,i t is important to know the influence of the adiustments of the instrument undofthe pmperties ofthe othkrcompounds present in the s a m ~ l on e the obxcrvcd wavelen~h.The influence of measurement conditions on the measured quantity in this context is very important and these conditions must he controlled within acertain range. From the exam~lesabove.it becomes clear that theextent to which measurement conditions are considered as aproblem d e ~ e n don s the context and the Dumose of the measurement. 'This also applies to the desirededqualiiy of the measurement result. In some cases a high quality is demanded; in other cases a relatively low quality is sufficient. It also ~ l a v different s roles in different means that error analvsis " chemical research areas. Physical chemists frequently use error analyses, while synthetic organic chemists-although they measure--mostly do not. To reiterate, scientific measuring includes not only the performance of measurements, but also their design and evaluation. Designing a measurement means the choice of measurement conditions, instruments, and methods. Evaluating a measurement means a critical discussion of these choices. The design of a measurement takes place in view of a desired oualitv of the measurement result derivedfmm the context and the purpose of the measurement. The performance (the actual measurement itself) is aimed a t reaching this desired quality. In the evaluation one tries to check if this quality is really obtained. Therefore the quality of the measurement result is the connecting factor between design..,~erfonnance. and eualuation of a measurement. . Consequently, the'role of error analisis becomes important since error analysis is the tool used in the design (in an error analysis apriori), the performance (in a n error calculation) and evaluation (in an error analysis a posteriori) of a measurement. This means that error analysis is to play a major role in our new course and that we have to investicate the possibility of inkmating theory of error analysis in& practical work: &
"
-
-
Error Analysis in a Traditional Laboratory Course The role intended for error analysis in a new laboratory course made us decide to investigate the use of error analvsis in the ~ r e s e ncourse. t Within this laboratom course a ieries of leitures on statistics and error analys& is given with the goal of preparing students to use statistical proce-
1006
Journal of Chemical Education
dures in the laboratory The lecture course is taught as part of the laboratory course and the grade scored for the exam is included in the overall assessment of the laboratory course. This means that at the level of organization the statistics course is integrated in the laboratory course. Problems with students' use of statistical ~ r o ~ e d u rin es the laboratory course were known to exist f i r a long time alreadv. In our investieation we tried to find out the nature of these pmblems. We analyzed the laboratory manual used bv the students and students' r e ~ o r t sof some of the measurement experiments. Complete prescriptions for the performance of the measurements were given in the manual. In most experiments the fmal task in the instructions was: "Calculate the best measurement result andgive anerror analysis." Aprescription for the performance of the error analysis, however, was lackinc. From the assistants'instructions i t appeared that in an ermr analysis students were expected ~ c a l c u l u t ar measure forthequality ofthe best result re.g.,as a standard deviation or a confidence interval;. The indication of -~ sources of error and proposals for improving the measurement method were not mentioned as belonging to the task. Consequently, error analysis in the instructions seemed to mean just error calculation and nothing else. Error calculation was a n isolated task in this experiment, not related to an error analvsis a ~osterioriand not resulting in an evaluation of the measurement. In scientificresearch, however, the significance of an error calculation is found in connection with an error analysis a priori or a posteriori in the design or in the evaluation of a measurement. In addition, we found that a n error analysis a priori was hardlv attempted bv students, because of the prescriptionoriented nature of the course.. From the analysis of lab reports it appeared that students experienced large difficulties in performing error calculations. These difficulties concern the choice of an adequate calculation technique. For instance, students chose the culculatlon of an unweighed mean where a we~ghrdmean was approprlatp In otherexpenments students used h e a r reeresslon to rstabl~sha relutwnshlo between variables wcere the application of this technique was doubtful. In the mathematical elaboration~roblemswere also encountered. e.g., in the use of Gauss' formula for the propagation of errors. In one renort we found a value for the dissociation constant of acetic acid of 1.42 x 10" with a calculated standard deviation of 1.87 x lo-'' ! We conclude that the objective oi the lecture course in statistics was not achieved. We think that this was not caused hy the teacher's possible incapability or by the contents of the lectures, since we know that in other faculties there are similar problems. In this Journal several authors indicated problkms with the use of statistics in the laborator, (10,11).In our opinion there is a gap between theory a& practice. In our current course theoryandpractice of error analysis are integrated at the leuel of organization, bbu obviously, this does not mean that error analysis is integrated by the students intopmctical work. ~~~
~~~~
~
~
~~
-
Errors and Students Investigations were continued along two lines. First, we wanted to obtain a better understanding of students'interpretations of some concepts used in error analysis (from now on we shall call these statistical conce~ts).Second. we would like to know what problems can be fbund by analyzing the contents of textbooks on error analysis and statistics Listening to students in the laboratory revealed that students' interpretations of statistical concepts differed from ours. By interviews, questionnaires, analysis of laboratory notebooks and lab reports, and listening to audio
Errors Mentioned by a Student after Having Performeda pH Measurement.
use wrona substance m~lltvoltph meter not correctly callorated read off forgottento aa1~s.tr ght temperatdie pH meter wrongly connectea used inaccurate glassware Lsea olny g assware wetghea ace! c ac d ncorrectly too mucn d .teo or loo l nle electrodes not cleaned normal Hz0 used instead of demineralized H 2 0
are thought to be random, statistics can be used to give a distribution of measurement values. I t is important to note that the true value is a theoretical construct. It is wnstructed in order to make a statistical treatment of measurement values possible. In most textbooks the nature of the true value remains obscure. Tavlor is one of the few authors who is clear about the nature of the true value, but his elucidation is made many -Dazes . - after the truevalue has been introduced (13,p 109-110): What is 'the truevalue'of aphysicalquantity?Thisisa hard question,to whichthereis nosatisfaetory,simple answer.Since is obvious that no measurement can exactiy determine the true value of any continuous variable (a length, a time, etc.),it exists. is not even clear that the true value of such a ouantitv ~~,~ Nevertheles*,it will he very convenient M assume that e v e g physlcal qunnt~tydoeshavea trucvnlue;and we will make this assumption. Wrcan thinkufthctruevalucofaquantrty asthat value to which one approaches closer and closer as one makes mare and more mea&rements, mare and mare carefully. As such, the 'true value' is an idealization, similar to the mathematician's ~ o i nwithno t size or line withno width: and like both of these it $ a useful idealization. ~~
recordings of discussions between students and assistants we gained insight in students'interpretations of statistical concepts. The table lists errors mentioned by one student in his lab notebook in response to a question in the manual (Write all possible errors that can occur in this measurement and that maviniluence the measurement result) after having per~ormedapotentiometric pH measurement. I t seemed that students i n t e r ~ r e terrors a s ~ersonal errors, as mistakes made by themselves. They see& to scan the lab instructions for everv action mentioned then they regard any opposite action aserror. It was remarkable thai in discussions students (and frequently assistants!) use the word "error" in combination with the words "to make".' Further research made clear that students did not interpret standard deviation as a statistical parameter, but as a quantity that delimits an area in which the real measurement result lies. So, if x, is the mean value and s is the standard deviation students think that the real result lies between (x, - s) and (x, + s). In statistics, however, the standard deviation indicates a chance of about 67% (for normal distributed values) that the real result lies in that region. students interpret errors in a personal context: they think they are responsible themselves for measurement errors. This interpretation was not changed after the lectures in statistics. even if attention was explicitly drawn upon the meaning of error in statistics. 1; other words, students' preconceptions were very persistent. Error in Textbooks I n view of students' inter~retationsof the wnceut error it is interesting to examine how this concept is introduced in some textbooks on error analysis. Most textbooks intmduce statistical concepts in the &me way. Skoog and West (12). Taylor (131, Squires (14).and Rabinowicz (15) first introduce a so-called "true value'. Other names such as "wrrect quantity" and "accepted value" are also used, although their meanings are somewhat different. The true value is sometimes thought to consist of an infinite number of figures. The value of single measurement will always deviate from the true value or the accepted value. Error is then introduced as the differencebetween the true value or accepted value and a measured value. This error is sometimes indicated as the absolute error (e.g., by Skoog and West (121, p 41).
a
The accuracy of a measurement is often described in terms of absolute error, E, which can be defined as the difFerence
between the observed value xi and the accepted valuex, : E=xi-x,
Leaving systematic deviations aside, the absolute error is considered as a resultant of a large number of fluctuations caused by unknown forces. Since these fluctuations 'It is important here to note that "error" and "mistake" are both translations of the Dutch homonym fwt.
~
~
~
~
~
~
~
.
After the introduction of the true value and the absolute error the textbooks switch to the presentation of a set of measurement values, wnstruct a frequency histogram and make plausible that this histogram takes a regular shape when the number of measurements is increased. Subsequently a limitingdistribution(mostly the Gauss-function) is presented as a means to describe the empirical distribution a t large numbers of measurements. Some authors point to the fact that the limiting distribution should be considered as a model (see 13, p 105): It should be noted that the limiting distribution is a theoretical construct which can never itself he measured exactly. So, the limiting distribution is generated from statistical theory Its nature becomes clear from the fact that it is a cont&ous mathematical function, whereas the empirical distribution is discontinuous. Some textbooks suggest incorrectly that the Gauss-function is an empirical distribution found from the frequency histogram if the number of measurements bewmes very large, such as in Levitt (16, p 10):
.,
.
It can be shown that if n is verv. lame the freouencv . distribution curve icomeapondmgro rhe frequency hlsto~~am in F I ~ . l A 2,is p v r n by the equation
The limiting distribution is a verv useful model. The ~auss-function,for example, gives a criterion for the best value of a set of values. i.e.. the ~ooulationmean u. This value can be estimated for'a fink; number of measurements bv the arithmetic mean. The model also !zives a quantitative measure for the measurement erroras the population variance 3 that can be estimated by- the quadratic standard deviation or sample variance
.
sz = Z x j
- z,
)'%n
- 1)
I n our opinion most textbooks follow a somewhat curious route tointroduce statistical concepts. Following the introduction of the true value and the absolute error a totally different approach is chosen in considering an empirical distribution of measurement values. The whole ~ o i n of t introducing the true value and the absolute error lies in the possibility of developing a statistical model for the distribution of measurement values. This connection is not very clear in most textbooks In the books error as absolute error is fvst related with an individual measurement. Later error is related to a set Volume 68 Number 12 December 1991
1007
of measurements a s a standard deviation. Consequently, our content analysis of the textbooks yielded two interpretations of error. The f r s t interpretation considers error by means of the concept true value. We call this a n error in a statistical context. The second interpretation considers the dcllection of empirical mensurcment values. This is error in an ernprrrcal c~rrtexl.Hoth contexts diffrrr from the personal context in which students operate. Another characteristic of these textbooks is that they~-pay attention mainly to the procedures used in error calculations, hut hardly any attention is paid to the way these calculations can be used in error analyses. Consequently, a n error calculation is not presented a s a n activity within the context of scientific work. Both characteristics show that in learning scientific measuring (including design and eualuationJ the textbooks considered are not a ~ o r o ~ r i aat se teaching materials.
tion (in an empirical context) corresponds to a certain extent to a Gauss function (in a statistical wntext). The degree of correspondence can be investigated with a chisquared test.
Empirical distribution of measurement values Mean value as average (x, = Lr,In ) Error as deflection, as range or as mean deviation 2 Ixj.-zm Iln 3. Statisticalcontext Statistical model (Gauss function) for the distribution of values Mean value as estimate of population mean (1) Error as estimate of variance (3) given by the sample variance
A Plan for a New Course The results of the analyses above and of the educational research activities can be used to develop a n outline for a new course, in whichwe want students touse error analysis in design, performance and evaluation of measurements. This means that we exuect students to learn how to make a n adequate use of statistical concepts. Therefore, we choose for a develoument of these conceuts in the laboratory. We first want students to give up the personal nature of their wncept error. We want them to consider differences in measurement results as a property inherent to measurements, so that different measurement values (from the same or from different observers) can he accumulated in a frequency histogram. Next, a me& value and some deflection measures can be introduced. This means that. starting from the personal wntext of the students, we want first t i develop a n empirical context. Subsequently, we want students to proceed from an empirical to a statistical wntext. The limiting distribution is introduced a s a model with which several statistical procedures can be performed. The choice of the arithmetic mean for the best value and the quadratic standard deviation for the population variance can be motivated from this model. I n a subsequent stage attention can be drawn toward statistical tests. linear reeression. the urouaeation of errors, etc. ~ ~ r i the n g deveiopment of a statistical context the fact that the limiting distribution is a model should be emphasized. This may help students to see when and why this model is used. They will understand that there are several models (e.g., the Poisson distribution), each of them to be used in a different situation. I n the plan given above a premature introduction of error as a scientific conceut is avoided. After discussine the students' personal error the concept of random error &n be introduced in connection with the introduction of deflection measures. After a statistical model is introduced, the conceut of absolute error is needed. Svstematic error can he introduced a t the point where testsUarediscussed in which results from different uouulations are being comuared. in Hctual Error is interpreted h e r e as a n effect. measurements sources of error can be indicated a s probable causes of this effect2 The desirm of this course has imuortant conseauences for the relation between lectures andpractical i n s t k t i o n . In ourouinion. abandonment of the students'uersonal context will be achieved only in a situation in whiih students have the opportunity to compare their own measurement results with results from other students. Students' own experiences in measurements and subseauent discussions about the results are essential in achieving a n empirical context. Therefore, laboratom work will be the onlv way to reach this goal. However, ih our experience, the choice of appropriate ex~erimentsis difficult. We believe lectures and textbookistill can play a role in the development of statistical concepts later in the course, although - i t is not clear in what way. Such a oath. alone which scientific wnceuts are to be developed; is ;amel an educational struct&e, after Ten Voorde (17).The educational structure eiven here is essentially different from structures we fou& in textbooks. One
Some statistical wncepts bridge two different contexts. Such a concept is normal distribution. A normal distribu-
'Nole Inat n sc~entrltcangLage !he woro error s ootn -sed for tne effect (as in systematic error or 'aeterm nale error and for me possible sources of the effect(as in "calibration error" or "reading error").
~
A.
Different Contexts So far we have distinguished between three different contexts: a personal context expressed by students, an empirical context and a statistical context, the latter two found in textbooks. I n a personal context, errors are interpreted by students as mistakes. Errors are connected with the actions of the students themselves. I n a n empirical context the empirical distribution of measurement values plays a major role. The distribution can be represented graphically a s a histogram or as a frequency polygon. The mean value can be considered a s the average of n values (x, = 2 xiln). Error can he seen qualitntivrly iis d~flv(tronand can be expressed quantitatively a s range ,the difference between the highest and the lowcat value, or as mean deviation (Z 11, - r, n!. These quantities are not related to a itatistical model. I n a statistical context a model derived from statistical theory is used. One of the basic principles of this statistical theory is the so-called true value. Ultimately, the true value is considered as the population mean of the limiting distribution. In the case of a Gaussian distribution the population mean is estimated by the arithmetic mean. The quadratic standard deviation sZ is the best estimate for the population variance o of the Gauss-function. This population variance
..-
at&
is considered a s a measure for the measurement error. The occurrence of the term (n - 1) in the denominator of the expression for the standard deviation can be motivated exclusively from a statistical model. In the following scheme the different contexts are summarized. 1.Personal context
Error as mistake 2. Empirical context
1008
Journal of Chemical Education
difference is the starting point. We depart from the personal context of students. This means that our departure point is a set of empir~calvalues, not a theoretical model as is common in textbook.;. Another difference is the position of the various concepts of error in our structure. At moment we have no Dradical ex~eriencevet with - this -the realization of this educational structure in practice. ..Some ----- starts ~~~- have been made. but further research inorder to develop a n adequate teaching strategy and proper experiments and teaching material is necessary. We see the develooment of such educational structures a s a major task for researchers in chemical education. Thmueh the work in our d e ~ a r t m e n teducational , structures, differing from the presmt structures in rhemistq textbooks, are hecoming available tbr the concepts ofchemical substance, chemical reaction, and element (18-221, of chemical equilibrium and of chemical synthesis. ~
~
Acknowledgment The authors wish to thank Wobbe de Vos for his critical remarks and his efforts in the translation of this text.
Literature Clted
1. Goedha~t,M. J. M e m u m in measuring An rdueotionnl study into (mehing and 1eorning measuring m a first-year vniwrsity couna in proeticoi chemistry; PhD thesis, Utrecht. 1990(inDutch, with a svmmalyin English). 2. Mulder, T.; Verd0nk.A. H.J Chsm. Edur 1984,61,451. 3. Young, J.A.J. Chem. Educ. 1988.45.796. 4. Smith,R. B. J. Chom. E d v e 1969.46.273. 5. Venkatachelam,C.;Rudolph,R. W. J Chem.Educ. 1974,51,479. 6. Chisholm. M. G.J. Chsm Educ 1975,52, 739. 7. Wade, .I G J . Chem. Edue 1979.56.825. 8. MaeDuWe, D. E.Educ in C k m . 197% 10.87. 9. Fife. W. K J. Chem. Educ 1368.45.416. lo. O.&