Statistics by Computer Simulation A. Lotz lnstitut fur Physikalische Chemie der Universitat Munchen, Munich, Germany This article describes a three-to-four hour experiment from a practical course in physical chemistllr that consists of a computer amiulatlon df ihrowing dwe and the statistics involved. It is rlesirmrd to impro\v t h itud(:nts'undrr;t:~ndinc ~ of the basic asp& of stat&ical evaluation of experimental data as well a s to introduce them to programming techniques. A computer simulation i s undoubtedly more elegant than is statistics with 'real' events, a s was proposed previously i n this Journal; e.g., cold-draw length ratios of plastic strips ( I ) ,dilution of a dye (21,measurement of the length of a pestle (31, and weighing pennies (4).An earlier version of this experiment consisted of throwing real dice. This method was soon abandoned due to the boredom involved in collecting the data. Topics of the Experiment Instead of hamassing the students with a wealth of statistical theory and tests, i t is probably wiser to limit the discussion to the basic statistical tasks such as the calculation and proper interpretation of the mean and standard deviation of a collection of data, linear regression of paired data, and error propagation of calculated results. A survey of students found that the concept of the limiting distribution and its sampling is not understood. Linear regression is used onlv by those students who have a pocket calculator with wiredregression analysis, but they db not understand the process. Special case approximation formulas for error propagation are applied without regard for their validity. Error propagation is not included in this computer experiment. Linear regression is addressed by deriving the pertinent formulas in the course manual. and askine the students to write a program that calculates these parameters for a eiven set of values..x;. vi. Because this Dart of the ex~eriment is straightforward, it is not described i;l this paper. ' 'l'he tmphaiis in the pwtion nf the expermlent addrcssmg the mean and standard deviat~oni, un the character of these stntiitivs a s rsrim;irt!s for the, p;lr;imtwrs ofthe limitinx C.nussi;m distrihution. In ;dditim, the Gaussian diitribkion is shown by the experiment to be a limiting case of the Bernoulli distribution
-
for the caseN + -.Nus 3 (5).P(n,N.u) is the ~robabilitv of n events i n N ind&ndent trials,if k e probability of a n event in a single - trial i s .u,. and the probability of no event in a single trial is q(p + q = 1). Bernoulli Distribution Bernoulli's distribution can be applied to the rolling of dice that is simulated in this experiment by drawinn random numbers on a computer (the random Lumber generator should be checked for even distrihution!). Random fronl the total range i d 0 181 1 nun1b6.r.i in the r;ingc! 0 to 1 (i s ~ m u l a t erolhng a s i x The students are asked to make 111.000throu iof10 dice. The results ofrhr c n m o u t r r s i m ~ ~ lation are presented as a histogram. A calculation of the limiting distrihution (eq 1)with N = 1 0 , p = 116,q = 516, and 128
Journal of Chemical Education
0
2
4
6
Fiaure 1. Ten dice rolled 10.000 times (full bars), and corresoondina ~&noullidistribution (open bars). ~bscissa:number of dice with fa& showing a six; height of bars: relative frequency of the event.
F gure 2 F f l y dce ro eo 10.0001 mes i f - oars, and correspond l q Ga-ssan o slr oAon open oars, Aosc~ssan-moer oface w 1'1lace shorn ng a s x; netghl o' oars. re at ve Ireo-ency ol me evcnl
n = (0,1 , . . . ,101 is performed, and compared to the sample diagram (Fig. 1).The calculation of the limiting distribution can be done easilv on a oocket calculator with the recursion formula
that can be shown to be true by inserting the expressions for P(n+l,N,p) and P(n,N,p) (eq 1). Gaussian Distribution When the number of dice thrown simultaneously is increased ( N = 501, the limiting distribution begins to approximate the Gaussian distribution (Fig. 2). Larger values ofN can take an excessively long time. The throwing of
Dependence of Mean and Standard Deviation on Sample Size
50 dice. r e ~ e a t e d10.000 times, required five minutes on the H P V ~ &person'al ~ comput&. here are algorithms for generating samples from a Gaussian distribution (61, but these are not easily understood by the students. As with the Bernoulli distribution, the students compare the sample and the corresponding Gaussian distribution p=-
1
Number of
10
100 1,000 10,000
- (n- ll)inoi
(3)
whereir=iVp a n d o = I n addition to the experiment with 10,000 simulated throws, samples of 10, 100, and 1000 throws are genernumated. and the "experimental" values for the mean (LI) ber df dice with ;aces showing a six and the standard deviation o are compared with the limiting distribution. The table gives result; for this simulation. Final Remarks The experlrnent h r ~hi e n irnplerntmted for two years iind has been ~ o s ~ t ~ accepted vdy bv the students. Grnph~csullroutines &om theiibraiy t h a t t h e students can append to their programs for a visual presentation of the results add to the attractiveness of the experiment. The gaming aspect
-
Simulated Throws of 50 Dice
~ e a number of 7.40 7.72 8.36 8.33 Dice Showing a Six Standard Deviation 2.91 2.47 2.63 2.65
50
F= -=
2.64
of these simulations renders statistics more appealing and enlivens statistics to more than a chapter in the textbook that was viewed by students a s a n annoying appendix to the subjects treated i n the course. Literature Cited Spencer,R. D.J. Chrm. Edsc. 1984,61,555-563. Pase1k.R.A J. C h m . Edue 1985.62.536536. O'Reillx J. E.J. Chem. Educ 1986,63,894-896. Richard8on.T. H.J Chem.Educ. 1991,68.310911. Mareenau. H.: Mumhv G. M . The MolhamolicsofPhvsics ond Chemislrv;Nostrend: Gneeton, i956;b i 3 9 . 6 . Efrfathiou,C.E.J. Chem. Edue. 1992.69.733-736
1. 2. 3. 4. 5.
Volume 72 Number 2 February 1995
129