In the Laboratory
A Laboratory Exercise in Statistical Analysis of Data Mark F. Vitha and Peter W. Carr* Department of Chemistry, University of Minnesota, Smith and Kolthoff Halls 207 Pleasant St. S.E., Minneapolis, MN 55455 We believe it is important that students learn about statistical tests and methods as they relate to hypothesis testing, since hypothesis testing lies at the very heart of scientific investigation. Descriptions of laboratory exercises emphasizing statistical analysis of data have previously appeared in this Journal (1–3). We have developed a laboratory exercise based on facile weighings of vitamin E pills. The exercise introduces students to several common statistical concepts and tests such as the normal distribution function, the Q-test, the t-test, and the χ2 test. We originally designed the exercise for a quantitative analysis laboratory course for nonmajors but have also used it in a similar course for junior-year chemistry majors. Two of the most important ideas we wanted to communicate are the meaning of the standard
Figure 1. Histograms of weights from (a) individual pill weighings, (b) groups of 10 pills, (c) groups of 100 pills with outliers, (d) groups of 100 pills with outliers corrected, and (e) slope analyses.
*Corresponding author.
998
deviation of the mean in contrast to the population standard deviation, and the least squares principle of establishing a best straight line. The procedure below was designed to fulfill these objectives. Procedure Part I. The students first weigh nine pills individually in addition to one pill whose weight has been altered intentionally so as to make it an outlier when compared to the other nine. They do a Q-test at the 90% confidence interval to show that the altered pill is an outlier. Each pair of students then exchanges the weights of their nine regular pills with another pair and does a t-test to compare the means of both sets. Part II. The students next weigh ten collections of ten pills, calculating the mean weight per pill for each set of 10 pills. Part III. In part III of the exercise, students weigh one hundred pills at one time and find the average weight per pill. The students are also asked to predict the size of the standard deviation relative to the standard deviations in parts I and II if the average of ten samples of 100 pills had been measured. Part IV. Finally, the students consecutively weigh one, two, three, and four pills, plot the total weight versus the number of pills, and use linear regression analysis to find the average weight per pill and the standard deviation thereof. The students are also asked to qualitatively compare the mean, median, and standard deviations of the
Figure 2. Distribution of individual pill weights used in the χ2 test (unshaded bars) superimposed over the normal distribution function (shaded region) based on the average and standard deviation from 1994.
Journal of Chemical Education • Vol. 74 No. 8 August 1997
In the Laboratory data in each part of the exercise and to note the effect of averaging averages on the standard deviation of the mean. Results and Discussion
Table 1. Comparison of Means from 1994 and 1995 1994 vs. 1995 Data
Calculated Degrees of Tabulated t Statistica Freedomb t Statisticc
Single pills
Analysis of Class Data We performed statistical analyses on the pooled class data from each part of the exercise so as to introduce additional statistical concepts. Some individual data points have been excluded either because they failed the Q-test or because we strongly suspected some type of systematic error in certain data sets. The results of our data analysis are discussed below. 1994 Class Data As expected, the t-tests comparing the different methods of finding the average weight per pill allowed us to accept the hypothesis that there are no significant differences in the means produced by the various methods at the 99.5% confidence interval. From an educational standpoint, the histograms obtained from the pooled class data (Fig. 1) are perhaps more important than the conclusions drawn from the ttests. Simple visual comparisons readily reveal the decrease in the standard deviation observed by taking averages of averages. Thus, students are shown graphically what they were asked to discover numerically regarding the standard deviations from the various parts of the exercise. This provides the opportunity for the mathematical relationship between the standard deviation of the population (that is, of individual measurements), SDind, and the standard deviation of a mean, SDm, to be introduced to the class. The relationship between SDind and SDm has important implications for data analysis. First, it forms the basis for the increase in instrumental signal-to-noise ratios as a function of the number of analyses averaged. This concept has become particularly important because Fourier transform and diode array instruments allow many analyses to be made in short periods of time. A second important implication of the relationship of SDind and SDm relates to the confidence interval of a measurement. Through this exercise and subsequent discussions, students are shown that by taking numerous replicate measurements or measuring large samples at one time, confidence intervals can be decreased, allowing smaller and smaller statistical differences between data sets to be detected. This has important ramifications for methods development and testing. Another very important educational lesson illustrated by the histograms in Figure 1 is the value of plotting the data. Specifically, in histogram c we notice two outlying data points. We hypothesized that the students had added a second weighing boat (weight ~1.5 g) during the exercise, since one weighing boat does not easily hold one hundred pills. If the students did not account for the added weight before dividing the digital readout by one hundred to find the average weight per pill, their average would be approximately 0.015 g larger than the true average. Subsequent discussions with the students revealed that this was the case. Thus, 0.015 g was subtracted from the two outlying data points and the result is shown as histogram d. The two data points no longer appear as outliers and the distribution appears more Gaussian. We emphasized to the class that it was only after plotting the data that we found the rather obvious deviations and that by thinking about the exer-
5.654
10 pills
12.39
Accept or Reject Null Hypothesis
580.0
2.576
Reject
614.3
2.576
Reject
100 pills
7.949
54.13
2.670
Reject
Slopes
0.793
64.88
2.660
Accept
a
t=
x1 – x2 2
2
2
2
(ref 4).
SD1 SD2 + n1 n2 b
df =
2
SD1 n1
n1 c
2
SD1 SD2 + n1 n2 2
2
+
2
(ref 4).
SD2 n2
n2
99.5% confidence level.
cise and exactly what had been done, we were able to arrive at an explanation for the deviation. One final statistical test introduced in 1994 was the χ2 test (4). Figure 2 shows the distribution upon which the test was based along with the expected normal distribution curve based on the mean and standard deviation of the individual pill weighings. Based on the χ2 test we rejected the hypothesis of a normal distribution at the 90% confidence interval for the data set. A nonnormal distribution could result from several factors, such as the same pills being included in many students’ data or a nonnormal distribution inherent in the production of the pills.
1995 Class Data Compared to 1994 Class Data Similar analyses were performed on the data from 1995 with similar results. We wish to highlight here the comparisons of the means from 1994 and 1995 shown in Table 1. The t-tests show that the null hypothesis can be rejected at the 99.5% confidence level when three of the four methods are considered. Thus, the pills gained a statistically significant amount of weight over the course of one year, presumably due to adsorption of moisture from the air. We note that the results from the slope analysis method indicate that the pills have not increased in weight. We currently have no explanation for the discrepancy between methods. Before concluding, we wish to acknowledge the fact that the statistical tests employed above assume a normal distribution and that we have shown, by use of the χ2 test, that our distributions are not normal at the 90% confidence level. We believe, however, that the lessons learned about graphing data, using and interpreting statistical tests, and finding explanations for initially surprising results are important and should not be sacrificed in the face of a somewhat questionable assumption. In fact, the results of the χ2 test can also be used as another educational vehicle. Conclusions The availability of top-loading digital balances has allowed the development of a simple laboratory exercise allowing the rapid generation of “real” data that can be
Vol. 74 No. 8 August 1997 • Journal of Chemical Education
999
In the Laboratory explored in detail to illustrate statistical methods. Additionally, by pooling the class data, we are able to further illustrate the importance of statistical tests, the value of graphing the data, and the interplay between pure data analysis and data interpretation when unexpected results are discovered.
Literature Cited 1. 2. 3. 4.
Spencer, R. D. J. Chem Educ. 1984, 61, 555–563. Salzsieder, J. C. J. Chem. Educ. 1995, 72, 623. Carter, D. W. J. Chem. Educ. 1985, 62, 497–498. Dixon, W. J.; Massey, F. J., Jr. Introduction to Statistical Analysis, 3rd ed.; McGraw Hill: New York, 1969.
Acknowledgments The development of this exercise was supported by a fellowship from the University of Minnesota Chemistry Department. We thank Megan Mahoney for her assistance in editing and strengthening this manuscript.
1000
Journal of Chemical Education • Vol. 74 No. 8 August 1997