Spreadsheet Statistics - Journal of Chemical Education (ACS

Spreadsheet Statistics. John M. Simpson. J. Chem. Educ. , 1994, 71 (4), p A88 ... Abstract. Using computer spreadsheets to do statistical analysis on ...
2 downloads 0 Views 2MB Size
the computer Spreadsheet Statistics John M. Simpson Penn State University-Beaver Campus Monaca, PA 15061

Personal computer spreadsheets are useful for teaching many chemical concepts. Representative examples can be found in quantum mechanics (11,acid-base titrations (2), chemical eauilibrium (3).. gas .. laws (4).kinetics (5).auantitative analisis t6,, X-ray diffraction I?), iniitrumental analvsis (8). and madincc (91.These examples takeadvanta~eof the sp*eadsgeet's ;ability to store aiarge amount of aata and to perform calculations rapidly and graph the data or results with a minimum of programming skills. Another capability of spreadsheets that has not been reported as frequently is the ability to do statistical analysis on data. This note will describe a unique application in which freshman chemistry students at the Pennsylvania State University Beaver Campus analyze lab data using the built-in statistical functions of LOTUS 1-2-3 to summarize mean and standard deviation values for a set of data via a statistical quality control type chart. Statistical Quality Control Charts Although statistical quality control (SQC) methods have been adopted in many nonacademic analytical laboratories there appears to be little emphasis upon teaching SQC principles in the undergraduate chemistry curriculum r,--,. im

An important component of this technique is the SQC chart. Although there are several variations upon bow an SQC chart is defined, it frequently consists of a line graph in which the results of some measurement (r value) which is repeated a t regular intervals is plotted versus the time units (x value). Superimposed upon the graph are three parallel lines representing the expected value (usually the mean, m) and the two control limits (m f n * std) where std represents the standard deviation and n has a value ranging from 2 to 4. The SQC chart is a convenient way to determine if the measured values are within the expected random distribution of error (in control) or outside of expected random distribution of error (out of control). Although instructional labs seldom provide situations in which measurements are repeated over an extended period of time we have found that a simple modification to the SQC chart allows us to convey to the students the same basic kormation concerning thireliability ofa data point. The modification is used whenever the entire lab section is performing the same measurement such as standardizing a base. Each individual result (molarity of the base in this case) is collected a t the end of the lab period and the list of M values provided to all the students. They are asked to enter the information into a spreadsheet and to use the built-in statistical functions to determine the mean, standard deviation, and control limits. These values are plotted on the Y axis and a number (individual student number) used to identify each individual in the lab section is plotted on theX axis. The result is shown in the figure and Presented at the American Chemical Society Meeting. Denver, CO, March 1993. A88

Journal of Chemical Education

edited by

RUSSELL H. BATT Kenyon College Gambler,OH 43022

is based upon representative student data. This graph is similar to a contml chart except that thex values represent a count of the students rather than a time value. The figure clearly indicates that individuals 4, 21, and 24 have results that are out of control. The SQC chart is easy to make because the spreadsheet does all of the mundane calculating and plotting (11).Furthermore, the chart is an effectiveindicator to each individual regarding the reliability of their result. In our experience students are much more impressed by the visual impact of a poor result on the chart than they would be from a simple inspection of the numbers. Conclusion It is our feeling that students tend to view statistics as a boring topic and consequently are reluctant to use statistical methods. We suspect that one of the reasons for this avoidance is the amount of time required to cany out a large number of simple calculations and to construct a plot. By relying upon spreadsheets to do the routine work, the focus is shifted to the interpretation of the statistical analysis. The visual impact of the graphical results in the case of the SQC charts on students is impressive. They seem much more inclined to analyze a poor result when they see it on a graph. Fortunately, spreadsheets are commonly available on most campuses and students have little difficulty in learning how to use them. This situation should make it possible for instructors to incorporate statistical analysis into their lab courses without having to purchase additional, more specialized software

Control chart of student molarity data from a representative lab section. Legend 0 Upper Limn Data Points A Lower Limit +Mean of Individual Values

the computer bulletin board Literature Cited 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11.

P i o w F J. ChmL Educ. 1991,6s,A28%A2&?.

C m i e , J. 0.;Whiteley, R. V. Jr J. C k m . Ed=. 1991, 68, 921-926 David, C. W J Chem. Edue 1989,66,A237-,4238. Van Houten, J.J . Chem. Educ 1988,6J,A31bA315. Cae, D. A. J. Chem. Educ 1987, M , 496497. Mullin, J. L.; Eiermsn, R. J.J. C h . Educ 1990,67,878-881. Rosenthal J.J C k m . Edur 1991,68,A285-A286. Prais.M.0. J . Chem.Edue. 1992.69.488-493, Sparmw, 0. J Chom. Edue 1985.62.139-140. Laquer, F C.J. Cham Edur 1990,67,900-902. &chi, 0. I.Am. Lob. 1967,19,82-95.

Introducing Spreadsheet Data Analysis In the First-Semester Laboratory Michelle M. Bushey Trinity University 715 Stadium Drive San Antonio, TX 78212 Numerous approaches to the use of spreadsheets in chemistry courses have appeared in this Journal (Id). Some approaches have been geared specifically to the firstyear laboratory (6, 7).Recently, this Journal published an article describing an experiment that involves massing a collection of in order to reproduce bad data for siatistical analysis (8). We use this same experiment in our introductory chemistry lab, which is analytical chemistry, in order to introduce statistical concepts about small and laree data sets as well as to introduce our students to the use of computers in general and spreadsheets specifically Tv~icallv.we have seven sections of 2.532 students enrolfeh in t g s lab class. Most ofthese students are first year students. and for manv thid is their first colleee science laboratory. While m a 4 of these students are "computer literate", many have limited prior experience with computers. This lab is designed to introduce them to the lah, campus computer facilities, and to spreadsheets. As in the original article (81, students are given a set of 10 pennies each. Each penny set contains one "had data point". Nine pennies in each set are minted either prior to or a f k r 1982, and the 10th penny is from the other time period. Thus, the 10th penny will have a mass roughly 0.5 g different from the other nine pennies since the metal composition of pennies was changed in 1982.As in the original experiment, students perform a variety of statistical manipulations and tests on their individual data sets. They are told to perform a Q-test on any questionable data points and to use their own judgment in deciding if a particular data point should he discarded or not. Each student has one penny that fails the Q-test, but many choose to keep the data point based on the idea that after all, a penny is a penny. Finally, the students bring their data set, consisting of mint date and Dennv mass to the lab instructor and student teaching a h s t & (TA) for entry into Cricket Graph (version 1.3.li). The dates are entered into column 1 and the coin masses are entered into column 2. Data are rounded to the nearest 10 mg before entry. The magnitude of the standard deviation confvms that this is appropriate. Once the data have been entered, the frequency of data in column 2 is taken. Column 3 then consists of the masses (categories) and column 4 is the associated frequency values. The column 3 categories are then sorted ekher in ascending or descending order with column 4 as an associated coiumn. column; 1and 2, dates and masses, are then copied into columns 5 and 6, add the masses are sorted in either ascending or descending order with the column 5

-

-

A90

Journal of Chemical Education

dates associated. The entire class data set is entered into a single spreadsheet. Finally, the TA copies the spreadsheet into Excel (version 4.0) and labels the columns: dates, raw data, categories, freq. values, sorted dates, and sorted data. This explicit set of instructions is followed by the TA in order to prepare the large data set for student analysis. For those students unfamiliar with computers, it is important that the data set appear exactly as it is referred to in their handout. Each student then copies the data set onto their own disk for post lab analysis. The students are instructed to bring a new Macintosh formatted disk to lab for this purpose. We find that it is most convenient to prepare the data for the students in Cricket Graph and then to copy the data into Excel. However, any spreadsheet capable of counting frequency and sorting will be adequate for the lab. The students ultimately should receive the data in a form compatible with the spreadsheet most readily available on campus. Most Computer Services usually will he able to translate the file to another spreadsheet or another machine format (IBM compatible in our case) upon request. At this point, students are told that they have a series of statistical tests to perform on the class data and are iustructed that they must use a computer spreadsheet for this purpose. Students are told that they may use any spreadsheet and any type of computer, but instructions are only provided for Excel for the Macintosh as this is the available spreadsheet on the campus and department networks. This is facilitated by an explicit set of instructions for Excel. For the large data set, students are asked to report on the mean, midrange, standard deviation and 95% confidence limit, aided by computer analysis. This is compared to their own individual data set. Students then prepare a histogram of the data set and are led through a series of handout questions designed to lead them to the conclusion that the large data set consists of two populations. Students are then told to re-evaluate the data based on this conclusion and again to compare results with their small, individual data sets. Most students repeat the statistical analysis a t this point, treating the two data sets separately. Finally, printouts of the histograms and data analysis are requested. The following week, the instructor superimposes Gaussian curves over the class data and lectures briefly on "premeasurement assumptions" and the value of small versus large data sets. Explicit instructions for performing the required data manipulations in Excel are provided from the point of turning on the computer to plotting and printing in Excel. These instructions were written for a student with little or no previous computer experience. Some students require some extra help in getting started but all students are able to complete the assignment, although they do not all reach the desired conclusions. We currently are exploring the possibility of simply posting a class data set on the campus computer network. This would eliminate the need for every lab section to re-enter essentially the same data set. Students would then copy the class set onto their own disks directly from the network. Later in the semester students have two other opportunities to utilize computers and spreadsheets. Explicit instructions, which build upon the first computer lab instructions are again given. The first additional experiment involved determining the two end points of an acid-base titration by use of a Gran plot. In this case, students enter their own data and are given an equation and input in-