A Statistics Curriculum for the Undergraduate Chemistry Major

Dec 13, 2012 - Most faculty understands the need for a statistical educational component, but there is little consensus as to the exact nature of what...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/jchemeduc

A Statistics Curriculum for the Undergraduate Chemistry Major Nicholas E. Schlotter* Department of Chemistry, Hamline University, St. Paul, Minnesota 55104, United States S Supporting Information *

ABSTRACT: Our ability to statistically analyze data has grown significantly with the maturing of computer hardware and software. However, the evolution of our statistics capabilities has taken place without a corresponding evolution in the curriculum for the undergraduate chemistry major. Most faculty understands the need for a statistical educational component, but there is little consensus as to the exact nature of what is to be taught and who should teach it. Because of the large number of courses required for the undergraduate chemistry major, it seems unlikely that requiring a course on statistics will be practical at most institutions. Additionally, it is unlikely that the typical high school education will address the needed statistics or the software training to prepare students for the chemistry courses. Therefore, the chemistry faculty must teach the statistics needed by the majors. The faculty needs to focus on statistics useful to the chemist and this is distinctly different than what is often encountered in biology, medicine, psychology, and business. A starting point is suggested for a discussion on a statistics curriculum that addresses the needs of the chemistry majors. KEYWORDS: First-Year Undergraduate, Second-Year Undergraduate, Upper-Division Undergraduate, Curriculum, Interdisciplinary/Multidisciplinary, Computer-Based Learning, Mathematics/Symbolic Mathematics, Statistical Mechanics

T

taking the calculus course series through the multivariable course level. For the chemistry major program at Hamline University, this can be as many as three semesters of mathematics classes. To require a course in probability and statistics reaches the breaking point for required courses in an already heavily course-intensive major. So Hamline’s chemistry problem, similar to that of many other chemistry departments, is when, how, and what mathematics are taught for data analysis? This has not been discussed in much detail in the literature beyond indicating that more emphasis should be placed on data analysis and statistics.1 The American Chemical Society’s (ACS) current “Science Education Policies for Sustainable Reform”2 have not addressed mathematical content in the undergraduate curriculum. The American Chemical Society’s “ACS Guidelines and Evaluation Procedures for Bachelor’s Degree Programs”,3 section 5.7, which describe cognate courses has this to say on the mathematics recommended: Certified graduates must complete course work equivalent to two semesters of calculus and two semesters of physics with laboratory. The Committee strongly recommends a calculusbased physics curriculum and study of multivariable calculus, linear algebra, and differential equations. Probability and statistics have clearly been omitted from the ACS certified chemistry degree as a requirement. Therefore, it is left to the chemistry faculty to cover probability and statistics as part of the course work associated with the chemistry major. But what material should be covered? One partial answer is

he chemistry community needs to have a discussion about statistics and to decide what statistics should be taught to the undergraduate chemistry major. A detailed statistics curriculum needs to be embedded in chemistry courses; however, this cannot be completely determined by the individual instructor. That is too general and too broad to be useful. All of the stakeholders should be involved in the discussion: undergraduate programs, graduate programs, businesses, national science organizations, professional programs, and government laboratories. Here, I describe a version of the statistics curriculum Hamline has been working on for the undergraduate chemistry major, as a starting point for the larger discussion. There is no claim that this curriculum is the ultimate, an exemplar, or a model that all should follow. We are in a transition period similar to the one that occurred with the introduction of the calculator and must provide students with a solid understanding of statistics before they are exposed to computer calculations of statistical quantities. If the statistics issue is not addressed, a “standard deviation” and a “best-fit line” will be nothing more than software buttons to push to the students. For example, many incoming students have little more understanding of logarithms and exponentials than where to find “log” and “ex” buttons on their calculators. To avoid this result with statistics, a specific curriculum is needed. The following questions must be addressed: Where are we now in terms of required statistics for undergraduate chemistry majors? What are the important statistical topics that students should know when they graduate? Most undergraduate chemistry programs have a required mathematics component. This requirement is usually met by © 2012 American Chemical Society and Division of Chemical Education, Inc.

Published: December 13, 2012 51

dx.doi.org/10.1021/ed300334e | J. Chem. Educ. 2013, 90, 51−55

Journal of Chemical Education

Article

Table 1. Statistics in the Undergraduate Curriculum Math Coursesb Secondary School

Conceptual Principles (Statistics Focus)a Elementary statistics. Uncertainty in numbers representing experimental data, average, standard deviation. Representation of information. Digital-to-analog and analog-to-digital conversions. Consequences of using different number bases or fractional expressions. Binary, octal, hexa-decimal. Enhancement in signal/noise ratio from multiple scanning in which the signal increases linearly with the number of scans, whereas the noise, which is statistical, increases by the square root of the number of scans. Statistics. Probability, combinatorics, distributions, uncertainty, confidence intervals, propagation of error. Curve fitting. Least-squares methods, regression, using different weights for data, deconvolution (separating out the contributions of several curves of assumed functionality from their overlap in a complicated curve).

Chemistry Coursesd

Collegec General

Advanced

x 

 x

x 

 x

 

x x

 

x x

a The conceptual principles were extracted from Craig.4 bMaterial learned in secondary school, which chemists reinforce. cMaterial chemists expect mathematicians to develop. dMaterial in mathematics for which chemists have responsibility.

found in the report by Craig4 from which Table 1 is extracted. The four areas listed in Table 1 are from a total of 19 areas reported by Craig,4 although none of the 19 areas were listed as high priority. Notice that a significant portion of the material in Table 1 is to be introduced in secondary (primarily high school) or in a college math course. The most recent National Research Council (NRC) publication on K−12 science and engineering curricula does not give any specifics to aid in course development beyond: “By grade 12, students should be able to: analyze data systematically...Use...statistics...and statistical techniques...” and “...understanding the mathematics of probability and of statistically derived inferences is an important part of understanding that science.”5 In Table 1, only very basic statistics are to be introduced by chemists in what the ACS would label introductory and foundation course work with the elaboration of the material to occur in the in-depth course work. I would assert that the portion that is to be covered by a mathematics course is happening as, “It is exceptional for chemistry students to take a course in statistics in a mathematics department.”4 Even when students take a statistics course in the mathematics department, it does not address many of the needs of the chemistry majors. In addition, when the chemistry students take statistics courses in other nonmathematics departments, such as from a business or psychology program, they get a different course aimed at the statistics of large discrete data sets that do not address the needs of the undergraduate chemistry major very well either. Further, there is mixed instruction in secondary schools, and chemistry instructors cannot depend on students having a solid statistics course from secondary school. Consequently, chemistry instructors must assume that the students will not have seen, or will see, any probability and statistics beyond what is taught in the chemistry courses. There have been many internal discussions in the chemistry department and with faculty in mathematics as to what is appropriate and reasonable to teach in the chemistry courses. An effort has been made to find a set of probability and statistics topics that address the needs of the chemistry majors that allow them to deal with small data sets in a rational manner. Although only in the initial stages of evolving the curriculum in statistics for chemistry majors, the current curriculum reflects what is taught in a variety of required chemistry courses. Such a standard would also provide a unified approach to teaching and would provide guidance to faculty members who teach the courses that use statistics, which is even more important when an adjunct instructor teaches a

course. The choice of focusing on how to test data relative to a model reflects the most usual circumstance that chemistry majors will encounter and strongly influences the topics chosen (noting that curve fitting is equivalent to correlation). If chemists were seeing large data sets and needed to search for unknown correlations between multiple variables, the needed statistics would be somewhat different. However, there is generally a model that gives a particular relationship between the variables of interest, and then curve fitting indicates how well the data corresponds to the model (a test of the quality of our correlation). Additionally, in chemistry, applications are emphasized and the more general background theory of statistics is neglected. Feedback from industry shows an increasing use of statistics and the need for more training in statistics for bachelor-level chemists. Starting with an NSF GOALI grant in the late 1990s, changes were incorporated in our curriculum to cover more statistics, primarily changes in the general chemistry program. More recently, some industrial colleagues have been requesting further expansion of our statistics training. A statistics and probability course was suggested to cover statistical process control (SPC), control charting, experimental design, and design of experiments (DOE) theories. DOE is of particular interest as it allows the determination of the minimum amount of data needed to be collected to be statistically significant, which has a direct bearing on cost. This course would probably have to be created in the chemistry department, as it would be different from courses taught outside this department (possibly those at institutions with engineering schools could find something along this line.) This would be a significant change from the current curriculum and the faculty need to ask if this is a path that should be followed. This article aims to start a dialogue to define the statistics curriculum that all undergraduate chemistry majors should be taught. The following program makes no claim to be the final answer to these discussions, but it seems to be a workable set of statistics topics for the undergraduate chemistry major. In general, it is a fairly typical statistics coverage for the undergraduate chemistry major, but what does the chemical community think? In addition to the topics discussed, tutorials and software modules that we have used are included in the Supporting Information.



A STATISTICS CURRICULUM FOR UNDERGRADUATE CHEMISTRY MAJORS The program is divided into two levels: (i) the introduction of statistical concepts to first-term students who need to 52

dx.doi.org/10.1021/ed300334e | J. Chem. Educ. 2013, 90, 51−55

Journal of Chemical Education

Article

Table 2. Summary of Statistics Topics for the Undergraduate Chemistry Program Topic Distributions (Gaussian or random, Student t distribution, and Poisson) Mean

Standard Deviation (and Variance) Confidence interval Error propagation

Best-fit line t test

Error in slope of best-fit line8 Error in intercept of best-fit line9 Error in predicting a dependent y value for a given x value based on a data set9

Introductory Courses Basic structure, graphical representations, expect the students to understand how to find the Student t value from a table, or using Microsoft (MS) Excel, or equivalent program. Know the difference between two-tail and one-tail problems. Expect students to be able to calculate the quantity, understand the difference between the true mean and the sample mean. Expect students to be able to calculate the quantity, understand the difference between the true SD and the sample SD. Expect students to be able to calculate the CI for the mean using MS Excel Know that all the measurements to determine a quantity impact the total error, identify most significant error. Possibly use relative errors to get some total error. (Percentage errors) Be able to obtain the best-fit line from Excel.

Advanced Courses Add the mathematical definitions of the distributions. Include Boltzmann and Maxwell−Boltzmann distributions.

Add the calculation of moments of the distributions

Add the calculation of moments of the distributions  Be able to apply the full mathematical form for the error propagation to a variety of quantities. Implement in Excel or Mathematica.

In addition to obtaining learn the mathematical derivation of the best-fit line. Passing knowledge of how one might derive the nonlinear best fit. 

Understand that one is testing whether two means are statistically the same and be able to use Excel to run the test. 

Calculation introduced in Advanced lab. Implement in Excel or Mathematica.



Calculation introduced in Advanced lab. Implement in Excel or Mathematica.



Calculation introduced in Advanced lab. Implement in Excel or Mathematica.

calculations without using computer statistics packages or instructor provided templates. One approach is to introduce Microsoft Excel with only basic functions such as SUM, SQRT, and so forth, followed by using the analysis tool pack for the “descriptive statistics” and “t test” functions, and then moving to MS Excel functions such as AVERAGE, STDEV, VARP, STEYX, and LINEST. This also lets the student check if the results from a package or statistical function match (they do not always as there are some errors in MS Excel). The statistics topics that are summarized in Table 2 are areas our department has identified as basic to the chemistry curriculum. Others in the chemistry community may feel differently and would have chosen a different set of topics. The topics were selected for the chemistry curriculum based on the observation that very few data sets in chemistry are large, with most collected data having 10−100 points, and the data are usually compared to a theoretical model. The statistics of looking for unknown correlations between variables that are commonly used for large data sets has been omitted in the current program, because this is not usually the situation in chemistry. Additionally, to make the statistic education useful, the students are trained in the computer applications that make it practical to do such statistical analysis.

understand the concept of error in experimental data, but do not have the mathematics background to deal with the full range of statistical mathematics and (ii) the introduction of calculus-based material and more computationally demanding material the undergraduate chemistry majors should know by the time they graduate. In the first-year chemistry labs, the following concepts are introduced: random distribution, mean, standard deviation, confidence interval, use of the best-fit line, basic error propagation, Student t distribution, and t test. At this level, the introduction is largely descriptive and the calculations the students need to do are based on Microsoft (MS) Excel functions. The students are supplied with detailed methods in their lab manual on the application of these functions. The MS Excel add-in “Analysis ToolPak” is introduced, but students are only given detailed directions on its use for a few specific cases. A statistical test is defined and its use is described.6 In analytical and physical chemistry courses, this foundation is expanded to develop the full calculus-based error propagation method; the statistical determination of the error for the best-fit line, slope, or y intercept for an x,y data set; the statistical prediction of a dependent value (y for a given x) from an experimental data set; and derivation of the best-fit line with extensions to nonlinear fits. The concept of distributions is discussed, and Boltzmann and Maxwell−Boltzmann distributions are discussed in physical chemistry along with the moments of the distributions to obtain averages and variances. Probability is also discussed in quantum mechanics, with electron orbital distributions and expectation values, and is an important aspect of a chemistry student’s training.7 At both levels, it is important for the students to see the structure of what they are calculating and to perform several

Computer Software Skills

Computers are probably the single biggest reason that statistics needs to be embedded into the chemistry curriculum. Statistical analysis is no longer a major chore and best-fit lines are not done by “eyeballing” the center of mass of the data on a plot and drawing a line by hand. Students need to be introduced to spreadsheets and mathematical programs. MS Excel and Mathematica are used. Surprisingly, although students are fairly 53

dx.doi.org/10.1021/ed300334e | J. Chem. Educ. 2013, 90, 51−55

Journal of Chemical Education

Article

on a case-by-case basis, with the students when they do their research projects, either in the summer or during their senior research project.

competent with MS Word, they do not tend to be skilled users of Excel coming from high school. Workshops were provided in the general chemistry course on using Excel to do calculations and plot results. Mathematica is certainly a more powerful application, but most of the chemistry majors have not mastered the steep learning curve to use it and there is uncertainty about what access the students will have to Mathematica in their future. Unless students go to graduate school, there is not much reason to suppose they will purchase Mathematica under the current price structure, but one can be reasonably sure that they will have access to MS Office. Because most of the statistics discussed in the chemistry curriculum can be done with Excel, it is reasonable to place the focus primarily on Excel when training the students.8 In the physical chemistry course, the math department runs a short Mathematica training course and a set of tutorials are also used. Even so, Mathematica gets used minimally by the students, only when they cannot solve a problem using Excel. This is mainly an issue in upper-level courses, such as in the physical chemistry course, where numerical solutions to integrals and differential equations are required.

X−Y Data Sets

The best-fit line is introduced and how it is determined is discussed. Once the best-fit line is established, the questions the student should be able to answer include “How do you predict a y value (at a given x value) and its associated error, the error in the slope of the best-fit line, and the error in the y intercept of the best-fit line?” Although only a linear relationship is discussed, it should be clear how one might apply this to line fits for nonlinear relationships. The best-fit line or trend line or regression is introduced in general chemistry without discussion of how it is obtained, but with applications using Excel such as a Beer−Lambert law calibration plot to determine an unknown concentration. Further discussions occur in the advanced laboratories in which methods to calculate error in the prediction of a value from a calibration curve, the error in a slope, and the error in an intercept based on the best-fit line are developed.9 Error Propagation Methods

Probability and Normal, Student t, and Poisson Distributions

Starting with general chemistry students, initially the idea of using relative error to predict the error in a derived quantity, such as density, is examined, then general error propagation that can be applied to any derived quantity where one knows the relative errors either as experimental values or from confidence intervals is examined, and finally using data set statistics to determine the overall error of a quantity that can be associated with a slope or intercept in a plot (mass = density × volume) is examined.7 The concept is introduced in general chemistry, but its application is not required in the lab reports. In analytical and physical chemistry, error propagation analysis is extended to the general form that students can apply to any derived quantity. This is a difficult concept for many of the firstyear students and it often takes the chemistry majors in the upper-level courses some time to develop proficiency at using error propagation.

An introduction is provided to the basic concepts of probability such as permutations and combinations. This is followed with probability distributions (discrete and continuous) that lead to the calculation of statistical quantities. These concepts are useful in data analysis, quantum mechanics, statistical thermodynamics, the kinetic theory of gases, and radioactive decay processes.9 Averages, Variance, and Standard Deviation

The average, variance, and standard deviation are defined for discrete variables at the general chemistry level and these definitions are applied to data sets in the laboratory using MS Excel (applied to data sets such as the colors and masses of M&Ms, the masses of pennies, etc.). In analytical chemistry, this basic set of statistical concepts is reinforced and elaborated. Physical chemistry typically includes the calculation of these quantities from the probability distribution.10



DISCUSSION Much material has been left out that could be taught as probability and statistics to undergraduate chemistry majors. For example, curve fitting is not explored beyond the linear best-fit line in the proposed curriculum. Should it be given more weight? For example, spectroscopists are fond of such applications, but suspect they are not generally used on a frequent basis, generally occurring in a few student’s research topics, which can be taught individually to the students as needed. Correlation is another area that probably can be omitted from the undergraduate chemist’s education as the same information is essentially obtained from a best-fit line, and for small data sets, it is pretty obvious if there is a correlation between two variables. Another concern is that the use of the more sophisticated statistical tests requires a much greater knowledge of the mathematics of distributions and the applicability of a given test to a problem. It is easy to apply a test from Excel to a data set, but is the result sensible? For example, consider using the t test to determine if two means are statistically the same or different. MS Excel has three versions of the t test in the Data Analysis ToolPak: paired-two sample for means, two-sample assuming equal variances, and two-sample assuming unequal variances. Depending on the student data and how well they understand

Confidence Intervals

Confidence intervals are fundamental to error analysis in science and are used in error propagation, estimating the error in the intercept and slope of a best-fit line, and estimating the error in predicting a value based on a calibration curve.9,11 Depending on the question asked about error, it may involve picking one- versus two-tail forms of the Student t distribution (95% of the distribution is less than x versus 95% of the distribution is between two extremes) and evaluating t values. Although the concept is introduced in general chemistry, it is not used significantly until analytical chemistry and in the advanced laboratories where all results include error analysis. Testing a Hypothesis

For example, one might ask, “Are two means equivalent?” Statistics is often applied by hypothesizing a condition and seeing if a calculation returns a value consistent with a hypothesis.6 Hypothesizing that two means are equal can be tested by doing a t test calculation. This serves as an introduction to the use of statistics to test a hypothesis. This topic is not developed further in the chemistry curriculum and would take significant time and effort to do well; hence, it is best left to an actual math course. This area may be developed, 54

dx.doi.org/10.1021/ed300334e | J. Chem. Educ. 2013, 90, 51−55

Journal of Chemical Education



the statistical structure of the data, it is easy to get errors from the choice of t test that they select (also, a simple typo made in entering the data can change equal variances to unequal variances, leading to problems). At this point, the students can get confused because they do not understand the limits of the statistical calculations. The trade-off for taking time from chemistry topics in the curriculum is that the students are able to do more sophisticated data analysis in their summer research, advanced laboratories, and fourth-year projects.

Article

REFERENCES

(1) Bressoud, D. M. What’s Been Happening to Undergraduate Mathematics. J. Chem. Educ. 2001, 78, 578−581. (2) American Chemical Society, Science Education Policies for Sustainable Reform; http://portal.acs.org/portal/PublicWebSite/ about/governance/committees/education/CTP_004476. No author(s) or publication date available; however, it seems to be current to 2011 (accessed Nov 2012). (3) American Chemical Society, Committee on Professional Training. Undergraduate Professional Education in Chemistry, ACS Guidelines and Evaluation Procedures for Bachelor’s Degree Programs; American Chemical Society: Washington, DC, 2008 (available through the ACS, education page, pdf). (4) Craig, N. C. Chemistry Report: MAA-CUPM Curriculum Foundations Workshop in Biology and Chemistry. J. Chem. Educ. 2001, 78, 582−586. (5) National Research Council of the National Academies, Committee on a Conceptual Framework for New K-12 Science Education Standards, Board on Science Education, Division of Behavioral and Social Sciences and Education. A Framework for K-12 Science Education [electronic resource]: Practices, Crosscutting Concepts, and Core Ideas; National Academies Press: Washington, DC, 2011 (accessed Nov 2012). (6) Phillips, J. L., Jr. How To Think about Statistics; Revised ed.; W. H. Freeman and Co.: New York, NY, 1992. (7) Garland, C. W.; Nibler, J. W.; Shoemaker, D. P. Experiments in Physical Chemistry, 8th ed.; McGraw-Hill Higher Education: Boston, MA, 2009. (8) Billo, E. J. Excel for Chemists: A Comprehensive Guide, 3rd ed.; John Wiley and Sons, Inc.: Hoboken, NJ, 2011. (9) Bowker, A. H.; Lieberman, G. J. Engineering Statistics; 2nd ed.; Prentice-Hall, Inc.; Englewood Cliffs, NJ, 1972. (10) McQuarrie, D. A.; Simon, J. D. Physical Chemistry: A Molecular Approach; University Science Books: Sausalito, CA, 1997. (11) Kreyszig, E. Advanced Engineering Mathematics, 8th ed.; John Wiley and Sons: New York, NY, 1999.



CONCLUSIONS The ability to use statistics in the undergraduate curriculum has been greatly expanded by the availability of computers and software that have eliminated the tediousness of doing such an analysis. However, this has put a heavier load on understanding statistical concepts in order to use them appropriately than was required in the past. This evolution in statistics capability has taken place without a corresponding evolution in the curriculum for the undergraduate chemistry major. As chemists, we need to address this curriculum issue. Requiring an additional course in statistics is difficult in most programs given the course requirements for most undergraduate chemistry degrees. This means that the chemistry faculty will need to cover the educational statistics needs of the undergraduate chemistry major program. A way must be found to cover the statistics needs as efficiently as possible. To cover the statistics needs, a well thought-out curriculum would be a positive step forward. A draft version of such a curriculum is outlined to describe a minimal statistics program to deliver the basic statistics to an undergraduate chemistry major who will work in industry or to continue to graduate programs. The biggest challenge to preparing such a curriculum is keeping it to a minimum, as there are many interesting and useful topics in statistics.



INVITATION I invite any comments or thoughts that the reader would like to share on a statistics curriculum. I would like to explore a way to share such thoughts, perhaps a discussion on the ACS LinkedIn group would be a possibility.



ASSOCIATED CONTENT

S Supporting Information *

The “Primer” document, a student tutorial that covers all the statistics the chemistry majors learn by the time they graduate; the MS Excel document provides templates for the analysis of data sets that are used for upper level courses. This material is available via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS I would like to thank the Hamline University Chemistry department for its support for this work. 55

dx.doi.org/10.1021/ed300334e | J. Chem. Educ. 2013, 90, 51−55