Practical Experiments in Statistics Craig A. Stone and Lorin D. Mumaw San Jose State University, San Jose, CA95192-0101
Gaining practical knowledge of statistics is important for undereraduates in the physical sciences. Demee programs in . chemistty, physics, bioio&, and math generally require students to take a course in statistics. Other fields, such as psychology and business, also rely on a knowledge of statiskics. Learning the concepts of statistics is essential if students are to understand the onalitv and esueciallv the limitations of their data. Without h i s ;nderst&ding it may be difficult to comoare two different observations whose values s u ~ w sdift ferent conclusions. Statistics can help in designing those experiments by more clearly defining a property or leading to a more firmly established conclusion. Although students may be exposed to a thorough theoretical treatment of statistics, they often miss the benefit of reducing this theory to practice. Laboratory experiments are time-consuming, so the size of data sets is limited. Instruments that have a high throughput are expensive. Few are available in a classroom setting, and they are probably unavailable for classes in math, psychology, and business. I t is thus difficult to generate the large data sets needed to studv statistical conceuts. The expernnents described here ran he app1it.d 11, any field that reauires a knowlt,dm! of statistics. They are ens.v to carry out; and they use ;expensive instrumentation. Sealed sources of radioactive nuclides are used to generate the data. Nuclear decay, a microscopic property, produces a natural statistical fluctuation on which the experiments are based. The sources are small, often with an intensity on the order of the '"Am sources found in home smoke detectors. Thus., no suecial . licenses or handling ~rocedures are needed for either the sources or the i n s t G e n t s . Radiation-detection instruments are available through several manufacturers who market their equipment to high school and university science programs. An introduction to these concepts can be found in numerous books on statistics. The following are suggested references for various fields: general statistics (1, 2);general sciences, mathematics, and engineering ( 3 , 4 ) ;nuclear science applications and business ( 1 0 , l l ) . (5);biology (6,7);psychology (8,9); The primary goal is for students to become f a m i h r with orobsbilits distributions. Experiments an,dtsi~mttdti] acquire a data set large enough to generate a series of fre-
quency distributions. From them, students learn how the size of a data set (i.e., the number of measurements) affects
--
518
Journal of Chemical Education
Figure I . Gaussian distributionswith mean values of 25,50, and 100. In most experiments in nuclear science, the width of the curve is defined as the sauare root of the mean value.
650 +lo
32
Mean
a0
t
o
5M
5Dl
i 100 200 MO 400 Measurement Number
Figure 2. Time distribution for a typical data set. The data set used to construct this figure contained 500 measurements.
the quality of derived factors. Values for the l o standard deviation and for 20 are extracted from the frequency distribution curve, showing the amount of valid information that lies outside these boundaries. Advanced classes extend these ideas by comparing the performance of two instruments. Students must characterize the stability of the instruments and qualitatively compare both the time and frequency distrihutions. In a second part of the experiment they extract the instrumental standard deviation from the observed total standard deviation. A final set of experiments explore data sampling and inhomogeneity. Counting Statistics in Radioactive Decay The binomial distribution is the underlying prohahility distribution that describes statistics for nuclear decay. I t is valid when the prohahility of an event is constant. The expression, applied to radiation detection, is given by
where n is the number of decays observed during a time interval & P(n)is the prohability of observing n decays; N is the population of radioactive nuclides; a n d p is the probability that a nucleus will decay in At. Applying this distribution to normal counting conditions is difficult because large factorials are involved. Two properties of radioactive sources simplify the prohability distribution. The sources contain a large population of nuclides, usually over lo1', and their half-life is very long compared to the time over which the experiment is carried out. The prohability that a particular nucleus decays in any time interval At is thus very small, p d1s1r1huurm Thew di;tril~uti~ms \wrt! gnnrrareo using rhr snme data rhar produred the tlme d~qtn-
bution in Figure 2. The 50-point distribution (part a of Figure 2, generated using the first 50 points of the 500-point data set) is starting to take the f o m of a Gaussian distribution but has a large amount of scatter. A 500-point distribution (part 0 is well-defined with several points in the wings of the Gaussian curve. Such a large data set, though, is tedious to construct. Sets with 250 points have worked well under typical classroom situations. Distributions shown in Figure 3 were constructed by combining data in bins of five counts. So, for the 500-point distribution (Fig. 30, the first vertical bar on the left represents the number of measurements withvalues between 536 and 540 decays. An appropriate bin size must be chosen to show sufficient detail. A bin size that is too small will make it difficult to understand the distribution, and one that is too large filters out the detail. Figure 4 shows the effect that the bin size has on the frequency-distribution curve.
-
MS
1
LO 100 Numberof Mearurementr
Figure 5. How the mean varies with the number of points.
Optimum Number for Calculating the Mean Students are often required to make replicate measurements on a system and, from this information, to calculate a mean value and its standard deviation. Figure 5 shows how the mean d e ~ e n d on s the size of the data set for one experiment. The horizontal axis represents the number of measurements used to calculate the mean. Eachvalue was calculated hy summing from the first to the nth measurement. A log scale was used for the horizontal axis to emphasize the variation with small sample sizes. This figure shows the mean of three measurements to be high by 1.2% of the value found with the entire set. By 250 measurements the mean settles down to within 0.2% of its final value. Figure 6 shows how the standard deviation of the mean varies with the size of the data set. The optimum number of measurements for calculating the mean depends on the scatter of the data. If a Gaussian or related behavior is assumed to exist in the system, then a small number of measurements may be used. Noise or other nonstandard behavior increases scatter and can skew the data. The mean will be more unstable, requiring a larger number of measurements. In general the performance of each system must be determined. The contribution of the instrument to the scatter will be studied in a later section.
10 1m NumberofMearluemen,r
Figure 6. How the standard deviation varies with the size of the data set.
Pan A
Extracting the Observed Standard Deviation
I t is straightforward to determine ranges for various multiples of 0 once a large data set has been assembled. The purpose of the more tedious calculation described here is to emphasize how many measurements lie beyond these ranges. Doing this, students will gain a better appreciation of this figure of merit. Spreadsheet computer programs are simple methods of determining the standard deviation. A sorting function is used to arrange the data in increasing value. Students then determine the range that encompasses 68.3% of the data (lo), centered about the mean. The 20 boundary is a t 95.5%, and the 30 boundary is a t 99.7%. For data sets of 250 measurements, they simply select 85 points on either side of the measurement that is closest to the mean value. The rest of the data set (31.7%)lies beyond the f l u boundary. Statistical functions in computer programs can calculate accurate values for the standard deviation, but it is important to emphasize the amount of valid data that lies beyond this boundary. A second method is to eenerate a time-distribution plot (i.e., the value of a measurement versus time) and take-the number of the measurement a s time. Horizontal lines are drawn a t the mean value and a t the mean value f la. Fig-
1m Is0 Mearuremenl Number
Figure 7. Xme distributionsfor two instruments. ure 2 shows a time distribution with lines drawn a t these values.
An i m ~ o r t a nao~lication t. . of the freauencv distribution is thv dt!ttmnination of the pcrfinmanw of an instrument. Each in.itrument will contrtbute to the total width of thc frequency distribution, broadening and possibly skewing the data. This experiment shows how i t is possible to choose the optimum instrument for a measurement, and it shows that the obsewed uncertainty has several components. A radiation-detection measurement is useful here because the instrumental uncertainty can be calculated from the ohsewed total uncertainty and the contribution due to the natural scatter of the data. The experiment can he carried out in two ways. Students can choose to construct two data sets with different instruVolume 72 Number 6 June 1995
521
Vatlle
8b Figure 8. Frequency distribution for two instruments. Part a is the frequency distribution for the instrument whose time distribution is shown in part a of Figure 7. Likewise, part b of this figure is the frequency distribution for the instrument whose time distribution is shown in part b of Figure 7. ments. If laboratory time i s limited, groups work as a team, sharing data sets and independently analyzing the data. Asecond method is to work with one instrument. After assembling a counting system, students collect the first data set, change one experimental parameter, and then collect the second data set. Some parameters that can be changed: Students can switch to a different power supply or amplifier. They can also change the high voltage, the preamplifier capacitance, or the amplifier gain. Data sets are constructed for each set of conditions. Comparing the Stability
Time distributions are used to compare the stability in the two data sets. These are generated by plotting measurements with their value on the vertical axis and time on the horizontal axis. The number of the measurement (ex., 1, 2, 3, ...) is taken a s time or At. A constant should be added to each measurement in one data set to vertically offset that distribution from the other. Figure 7 shows an example of such a graph. I t is possible to measure the stability of a n instrument by looking qualitatively a t the time distributions. The graphs should look random. A noticeable slope suggests that an experimental parameter is drifting. Linear-regression programs can be used to calculate the slope of the data. Other features to look for include a n oscillation that is of a lower frequency than the statistical fluctuation or a region that varies significantly from the mean. These features might suggest that one data set is better than the other or that one instrument performs better than the other. Part a of Figure 7 has a large drift and is obviously the poorer instrument of the two. 522
Journal of Chemical Education
F gxe 9 Samp ng c-ne for a nonoqcnco-s system Dan a, Pan o snow me I me o str oui80rr for ins sjstem, ano pan c snows 11s frequency distribution.
Comparing the Frequency D~stributions
Instrumental performance is also determined by comparing t h e frequency distributions. The distributions should be symmetric and a s narrow a s possible. Excess noise in one component of the instrument will increase the width of the distribution and can lead to skewing. Without noise the standard deviation is the square root of the mean. If instrumental noise has a Gaussian distribution, then i t will combine with the statistical fluctuation by
a is the observed standard deviation; ad,,, is the where , natural statistical fluctuation of the data from nuclear de, , i s the instrumental uncertainty. Stucay; and q i , t o d e t e r m i n e if t h e d e n t s should calculate a instrument significantly changes t h e assumed model (where a is the square root of the mean). The standard de;,,,, is a quantitative figure of merit for comviation a
paring the two instruments. Figure 8 shows that no such calculations are necessary to determine which is the optimum instmment.
and the sample is the individual measurement, each collected with a time At. Sample sizes are increased by increasing the measurement time. Assume that the counting times are held to multiples of At. Counting for a longer time period is similar to summing every n measurements. Because the measurement is the variance, summing the measurements properly propagates the uncertainties: The total uncertainty is still the square root of the sum. The original set of measurements can thus be used to explore variations i n sample size.
Sampling and Inhomogeneity A common measurement problem i s sampling some feature of a large system. What sample size provides representative results:? In a fairly homogeneous system a measurement with a particular sample size can take on a as Gaussian distribuiion. T h e d(y& of icattcr dec~~cases the s i z e of t h e s;irnplt! incrc~scsuntd the sample includes the entire system under study. Nuclear-decay data sets can be used to illustrate this, assuming instrumental noise does not significantly distort the distribution. The system i s the set of measurements,
A sampling curve i s shown in Figure 9a. In this figure the value for a measurement versus sample size is plotted, which is i n units of At. The data used to generate this fig-
Figure 10. Sampling curve (part a) for a system whose time distribution has a positive bias. A function was added to the homogeneous distribution giving a rise of 10%over the 500 points. Part b is the time distribution for this system, and part c is the frequency distribution.
Figure 11. Sampling curve fora system with an exponential bias (part a). The homogeneous system was normalized using a function with an exponential form. Data were then normalized so that the sum of all values is equivalent to that forthe homogeneous system. Part b is the time distribution for this system and part c is the frequency distribution.
Sampling Curves
Volume 72 Number 6 June 1995
523
Inhomogeneous Systems mi:
Inhomogeneity skews the results. Spikes may he apparent in the time distribution, and i t may have a slope or other nonstandard behavior. The frequency distribution becomes asymmetric, and a larger number of samples is needed to obtain a representative sampling . - of the system. Three inhomogeneous systems are shown in Figures 1012. The assumed time distributions were generated by adding a function to the original data set and normalizing the entire data set so that the sum of all points is equivalent to that from the original data set. In this way, thk systems are equivalent in total concentration or response hut have different inhomogeneities. Figure 10 bas a time distribution with a 10%(positive) slope over the 500 points. The frequency distribution is asymmetric and is distorted on the right side; the sampling curve is much broader a t large sample sizes. In F i m r e 11, a system i s shown that could describe an vlem(!ntal roncentrat~onhighly deprndmr on p:irriclr siw, an enponenti:il function. The frc.qucl~cvdlstrll~ut~on does not appear as a Gaussian f ~ n c t i o ~ a spreads nd over a wide range of counts. Likewise, the sampling . .curve i s broadly distributed. Asvstem with two components or phases is shown in Figure f2. A Gaussian peak was s u p e ~ m p o s e don the otherwise random distribution. This peak is evident in the frequency distribution a s t h e region to t h e right of t h e primary peak. The sampling curve almost appears to have two components. A low&v&ed sampling c k e , centered near a mean of about 620, is fairly well-defined, and a second weaker sampling curve i s suggested near a mean value of about 750. Conclusion Several courses a t San Jose State University have used these experiments for two years. Most of the- experience has been in courses in nuclear science and health physics, courses that traditimnlly h a w a strongemphas~sin siaristirs and inirrurncnta~lon.The Chemistrv Department of this teaches a course in scientific computing. AS course students use the experiments to learn about data processing and issues of instrument performance. Students in each course carry out these experiments on a variety of the radiation-detection instruments in the Nuclear Science Facility. During the upcoming year the experiments may be extended to courses within the Physics Department and later to other departments around campus.
Figure 12. Sampling curve for a two-component system (part a). A Gaussian distribution, centered at measurement number 200, was superimposed on the homogeneous distribution. The data were normalized so that the sum of all values is equivalent to that from the homogeneous system. Part b is the time distribution for this system, and part c is the frequency distribution. ure are the same as those used to generate Figures 2 and 3. Summed measurements a r e normalized to l A t . The shape of the sampling curve can he understood a s a series of frequency distributions viewed from above. Ahomogeneous system will have a time distribution that is uniform, randomly varying about a mean value. This is shown as part b of Figure 9, along with the resulting frequency distribution (part c).
524
Journal of Chemical Education
Literature Cited 1. Witte, R. S. SlotiBirr. 4th ed.: Harcourt Brace Jovanovich Colleg: New York, 1993. 2. Baird, D. C. Eq~~rimenlofinr~, 2nd ed.: Prentiee Hull: Englewood CliK?, NJ. 1983. 3. ~ a r s e nR. . J.: an, M. L . A ~ntmduction ~ to Marliamoiieoi statialicr and i t x ~ p p l i cations. 2nd ad.; PmnticeHall: Englcwoad Cliffs. NJ. 1986. 4. Hogg. R. V; Ledolte~J A p p i i d Sloiislics /or Enpinreir ond Physicnl Sckntisfs. 2nd cd.; Macmillan: New York, 1992. 5. Knoll. G. F Radiotion Defection mnd Mensummml. 2nd ed.: John Wiley and S a m NewYork. 1989. fi. Clarke. G. M. Stntislrcs ortd Eroerimsatoi D o s ~ ~2nd I . ed.: Edward Arnold Ltd: Lon-
.
,
andHall: New York, 1985. 6. Howell. D. C. Fundomenla1 Stolisiics for the Behavioral Se6nces. 2nd ed.: Duxbury: Selmont, CA. 1989. 9. Fergusan. G. A. SfnlisiicnlAnniy.ir in Psyehoiopy nndEducafion. 4th ad.: McGrauHill: New York, 1976. 10. Winn. P. R.; Johnson. R. H. Burinesr Slatidicr: Maemillan: New York. 1978. 11. Mendenhall. W.; Rainmuth. J. E.;Beaver, R.: Duhan, D. Sfolislics/nr Mnizagement ondEmnomirs, 5th ed.: PWS: Boston. 1986.