In the Classroom
Tested Demonstrations
An In-Class Experiment to Illustrate the Importance of Sampling Techniques and Statistical Analysis of Data to Quantitative Analysis Students submitted by:
JudithAnn R. Hartman Department of Chemistry, United States Naval Academy, Annapolis, MD 21402-5026;
[email protected] checked by:
Daniel W. Bacon Department of Chemistry, Kutztown University, Kutztown, PA 19530 Wayne C. Wolsey Department of Chemistry, Macalester College, St. Paul, MN 55105-1899
Industrial chemists find that the design of a representative and accurate sampling protocol is often the most difficult part of developing an analytical method. For example, it is not difficult to devise an analytical method that will tell a chemist how much lead is in a sample. However, it can be very difficult to design a sampling protocol to decide if a yard has become too contaminated for growing a garden. The importance of sampling protocols has been understood in the academic community for a long time and experiments have been designed to show the importance of a good sampling protocol (1–5). However, because of the time and money constraints in college analytical chemistry courses, most experiments involve the student analyzing an “unknown” that has been designed to give good results in the lab if the student carefully follows the experimental protocol. If replicate analyses are carried out, chances are there are no more than three, and the results are close enough that the student does not appreciate the need for statistics. We attempt to give our Quantitative Analysis students an appreciation of the difficulties inherent in sampling solids and the need for the statistical analysis of data by using the following experiment in a lecture to teach the material, and then reinforcing the material by having the students perform an experiment with different sampling techniques in the laboratory. Although lecture is the most time-efficient means of teaching sampling statistics, most students have a difficult time relating the formulas to “real life” without the use of lecture demonstrations. Bauer (6 ) has published an effective lecture demonstration to demonstrate the need for finely ground samples. In this paper, we describe a companion lecture demonstration to show the importance of homogeneous samples and to teach the use of statistics to determine the number of replicate analyses needed to attain a specified confidence limit. The Experiment Before class we prepare two solid samples. Each sample contains two kinds of candy. We used Hershey Kisses (wrapped in silver foil) and Hugs (wrapped in striped foil), although two flavors of identical hard candy would work as well. The students are told that the first sample is a core
sample. It is a tall plastic cup in which we placed equal amounts of candy in two layers. The second sample, portrayed as a core sample that has been mixed, is equal amounts of the two candies in a brown paper bag. There is enough candy in each sample for each student to take 6 to 8 pieces. We divide the class into two groups and have them pass the sample through the group with each student taking one piece. Each pass through the group is an “analysis” and the result from each analysis is the percentage of striped candies in the sample. We tell the group with the core sample to take a candy from the top without mixing. This instruction may need to be repeated as the exercise proceeds because the students quickly become uncomfortable with their results and feel that it is unfair for them to have a “bad sample”. After the first analysis we start a table on the blackboard to record the results. The samples are then passed back through the class and everyone takes another sample. The results from this analysis are recorded in our blackboard table and the process is repeated until the candy is gone or the point has become obvious. Enthusiastic participation is ensured, since the students are allowed to eat the candies after the experiment is over. This process takes less than 10 minutes in a class of 20 students. An example of a data table is shown in Table 1. There were 8 candies per student in this experiment and we stopped after 6 analyses. Table 1. Data from a Group Sampling Experiment Striped Candies (%) Analysis Number
Mixed Core Sample
1
60
2
40
0
3
50
20
4
60
40
5
60
60
6
60
80
55
33
Mean Standard Deviation
8.4
Unmixed Core Sample 0
33
NOTE: Each analysis is based on 10 candies.
JChemEd.chem.wisc.edu • Vol. 77 No. 8 August 2000 • Journal of Chemical Education
1017
In the Classroom
The students eagerly speculated about the correct answer as the experiment progressed. It only took about 3 draws for the students to confidently guess correctly that there were 50% striped candies in the mixed sample. At this point the students with the unmixed core sample were tentatively guessing that there were no striped candies in their sample, but were concerned because they could not see a pedagogical reason for this result. We stopped after 6 draws because the students had assumed at this point that there was 50% striped candy in their sample and could easily imagine what the rest of the data were going to look like. It was very easy to move from this demo to a discussion of what constitutes a good sampling protocol and what constitutes a good sample. Discussion This experiment also proves to be a good introduction to a discussion on statistics. After it is pointed out that both samples will give the same answer if enough tests are done, someone will undoubtedly ask “How do you know when you’ve taken enough samples?” At this point the students are primed to see statistics as a useful tool in decision making and not just as some formulas that have to be calculated by rote as a lab requirement. As we progress through our lectures on statistics, we use the data from this experiment as an example of the different concepts in statistics. For example, comparing the means and standard deviations of the core and mixed samples is a graphic illustration of both precision and accuracy, and gives students a feeling for the relationship between the observed scatter in experimental data and the standard deviation. The application of the Student t-test to these data incorrectly “proves” that the two samples are different and leads to a good discussion on the uses and limitations of statistics. Finally, the circle can be closed by using confidence limits to calculate how many analyses are needed to ensure that the analyst is 95% confident that the mean for each sample is within 10% of its true value. An example of this type of calculation is summarized in the Appendix. The observation that it takes 12 analyses for the mixed sample and 384 analyses for the unmixed sample to achieve this level of confidence drives home the need for proper sampling protocols. This lecture experiment can be used either as a relatively quick demonstration to get the students thinking about some of the issues in sampling and statistics or it could be used as full lecture or lab activity to demonstrate many ideas and calculations in statistics, such as pooling data, comparison of means, confidence limits, and the use of Student’s t-test. It engages the interest of most students in sampling and statistics and is a particularly effective means of teaching active learners the very formalistic subject of statistics. Literature Cited 1. Herrington, B. L. J. Chem. Educ. 1937, 14, 544. 2. Bishop, J. A. J. Chem. Educ. 1958, 35, 31. 3. Kratochvil, B.; Reid R. S.; Harris, W. E. J. Chem. Educ. 1980, 57, 518–20.
1018
4. Hern, J. A. J. Chem. Educ. 1988, 625, 1096. 5. Butala, S. J.; Zarrabi, K; Emerson, D. W. J. Chem. Educ. 1995, 72, 441–444. 6. Bauer, C. F. J. Chem. Educ. 1985, 62, 253.
Appendix: Answering the Question “How Many Samples Do We Need?” The following equation is used in the standard Student’s t-test to evaluate a collection of data with an unknown standard deviation:
s N
µ = x ±t
where µ = true mean, x¯ = mean of data, t = value from t-test table, s = standard deviation calculated from data, and N = number of data points. If we represent the maximum allowable relative error as R, we’ve defined the following relationship:
s N
Rx =t
This equation can be rearranged as follows to solve for the number of replicates that are needed to ensure that the results of a given analysis are known within a specific confidence limit for any chosen relative error.
N = t2
s2 2
R x2
We can now use this equation to answer the question “How many analyses are needed to ensure that the analyst is 95% confident that the mean for the mixed sample is within 10% of its true value?” For this sample, we know that x¯ = 55 and s = 8.4 (Table 1) and we have decided to solve for the case where R = 0.1 (a relative error of 10%). Unfortunately, we do not know the value of t, since t depends on the number of samples. We can solve this problem by iteration using 1.96 as the initial estimate for t (1.96 is the value for an infinite number of samples at the .95 probability level). Substituting these values into the equation gives: N = (1.96)2(8.4)2/(0.1)2(55)2 = 8.96 ≈ 9 samples According to t tables, t = 2.31 for 9 samples at the .95 probability level. We can now solve for N using this value of t. N = (2.31)2(8.4)2/(0.1)2(55)2 = 12.45 ≈ 12 samples We repeat this procedure with the value of t appropriate for 12 samples (2.26). N = (2.26)2(8.4)2/(0.1)2(55)2 = 11.91 ≈ 12 samples The calculation has converged, and we now know that an experimental design with 12 replicates should give us a result that is within 10% of the true value at the .95 probability level. The same calculation can be performed on the data from the unmixed sample. N = (1.96)2(33)2/(0.1)2(33)2 = 384 samples We can consider the calculation converged at this point, since from the viewpoint of a analytical chemist, 384 replicates is an infinite number.
Journal of Chemical Education • Vol. 77 No. 8 August 2000 • JChemEd.chem.wisc.edu