Scientific Data Analysis Toolkit: A Versatile Add-in to Microsoft Excel

May 21, 2018 - Scientific Data Analysis Toolkit (SDAT) is a rigorous, versatile, and ... and biological sciences, from first-year courses, to the grad...
1 downloads 0 Views 2MB Size
Technology Report pubs.acs.org/jchemeduc

Cite This: J. Chem. Educ. XXXX, XXX, XXX−XXX

Scientific Data Analysis Toolkit: A Versatile Add-in to Microsoft Excel for Windows Arthur M. Halpern,*,† Stephen L. Frye, and Charles J. Marzzacco‡ †

Department of Chemistry and Physics, Indiana State University, Terre Haute, Indiana 47809, United States Chemistry Department, Florida Institute of Technology, Melbourne, Florida 32901, United States



S Supporting Information *

ABSTRACT: Scientific Data Analysis Toolkit (SDAT) is a rigorous, versatile, and user-friendly data analysis add-in application for Microsoft Excel for Windows (PC). SDAT uses the familiar Excel environment to carry out most of the analytical tasks used in data analysis. It has been designed for student use in manipulating and analyzing data encountered in the physical and biological sciences, from first-year courses, to the graduate level, and in research. A particularly useful feature of SDAT is its ability to perform rigorous regression analysis using both standard and user-defined model functions. This tool, which can accommodate up to seven fitting parameters, provides the standard deviation of regression of the fit, the standard uncertainties of the fitting parameters, and the covariance matrix. It also supports weighted regression analysis. SDAT includes the following functionalities: Descriptive Statistics, Integrate, Differentiate, Smooth, Spline, Plot, and Regression Analysis. The use of the Regression Analysis tool is illustrated with several examples including the unweighted and weighted analysis of the nonlinear and linearized forms an exponential decay and an example of the use of the covariance term in a propagation of errors calculation. Other example topics are chemical and enzyme kinetics, vapor pressure, and quantum chemistry computational results. SDAT is provided as associated content. KEYWORDS: Second-Year Undergraduate, Upper-Division Undergraduate, Graduate Education/Research, Analytical Chemistry, Chemical Engineering, Laboratory Instruction, Physical Chemistry, Computer-Based Learning



INTRODUCTION Standards and expectations of the rigor and sophistication of data analysis have increased dramatically as a result of the increasing availability and power of computing resources, from centralized mainframe resources, to dedicated laboratory computers, to hand-held and desktop or laptop computers, and to smart phones. Before the advent of these computing resources, when slide rules and mechanical calculators were the means available to most students and researchers, it was necessary to transform data into a linearized form to obtain the slope and intercept values for further analysis. Moreover, it was often considered acceptable practice to “eye ball” a straight line through hand-plotted data points for analysis according to a linear model. Uncertainties in these values were sometimes determined by manipulating these lines through a range of “fits” that was tacitly judged to reflect a reasonable deviation in the fit.1 Fitting data by hand to nonlinear models was much more challenging to most students and practitioners. As access to computing resources increased, least-squares analysis of data became more widely used and was soon the normal expectation. Many articles published in this Journal describe the application of the least-squares method of data analysis and the importance of the use of such rigorous approaches.2−4 For example, in a seminal publication, de Levie presented the case for using weighted least-squares methods and emphasized the ways and reasons for doing so with data rendered into a linear form via the appropriate transformation of a nonlinear model.5 © XXXX American Chemical Society and Division of Chemical Education, Inc.

Many commercial applications are available to enable students not only to write specialized code for instrument control and data acquisition, but also to perform sophisticated and complex mathematical modeling and to assist them in performing robust multivariate data analysis including the determination of optimized regression parameters, their standard uncertainties of regression, and the interpretation of the covariance matrix of the regression fit. However, many of these applications are costly, have a steep learning curve, and may be beyond the reach of many students. While some institutions may be able to dedicate resources for the purchase of multiple-use or site licenses, others may not be able to do so.



BACKGROUND To provide students with an easy-to-use and versatile set of tools for data analysis and presentation that also included the opportunity to perform regression analysis, an Add-In to Microsoft Excel, Scientific Data Analysis Toolkit (SDAT), was developed for the PC.6 An install file is available in the Supporting Information. The use of Excel in first-year chemistry courses has been described7 and has also been shown to be a powerful resource for more sophisticated methods of data analysis.8 Because Excel is now commonly available in most instructional settings, there is little or no financial barrier to using SDAT. In addition, Excel Received: February 7, 2018 Revised: May 1, 2018

A

DOI: 10.1021/acs.jchemed.8b00084 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Technology Report

kinetics data,11,12 for example, [X](t), where [X] is the concentration of a reactant or product. For example, if the rate law can be expressed as d[X]/dt = υX k[X]n, where υX is the stoichiometric coefficient, they can find the order of the reaction, n, directly from an analysis of the differential rate law without having to assume or test various forms of the integrated rate law. In particular, kinetic data for reactions of the type aA + bB → Product(s), where initial A and B concentrations are neither stoichiometrically linked or have sufficiently disparate values that justify the assumption of pseudo-order behavior, for example, [A]0 ≫ [B]0, can be analyzed more simply if the rate law can be expressed as d[X]/dt = υXk[A]n[B]m. Here, k, n, and m can be

is often available on students’ PCs, making SDAT readily accessible to them. Because some beginning and many upper-level college chemistry students may already be familiar with the Excel environment, they will likely require little instruction or training to begin using SDAT immediately. Some precollege students will, as well, have the background and skills needed to learn and use this intuitive, user-friendly learning resource. SDAT has been used in the one- and two-semester physical chemistry courses at Indiana State University for about 20 years. It has enabled them to achieve a level of rigor and completeness in the analysis and presentation of the results they obtained in laboratory experiments. It has also enabled them to solve numerical problems encountered in the lecture courses with a satisfying degree of sophistication, confidence, and satisfaction. A previous version of SDAT, which appeared in 1997,9 is now associated with a physical chemistry laboratory textbook.10 SDAT has been successfully used with Excel versions 2003, 2010, 2013, and 2016.



DESCRIPTION SDAT contains the nine functionalities listed on the menu when the Add-In is opened: Descriptive Provides statistical properties of columns of Statistics data, such as the mean, standard deviation, extreme values, sum, range, and the number of entries. Generate Creates a numerical array or transforms existing data according to a user-defined function. Decimate Reduces the number of cells in a column by removing regularly spaced entries; every nth cell of a specified column is deleted. Integrate Evaluates the integral of a user-defined function between specified limits or a column of numerical values using the trapezoidal method. Differentiate Produces the analytical derivative of columns of x, y values; if only y values are given, unit spacing in the implied x values is assumed. Smooth Reduces the noise by applying a low-pass filter to an equally spaced data array. Spline Expands or contracts a set of data into an equally spaced array using a cubic interpolation. Plot Creates a scatter plot of a selected set of x, y pairs. Regression Fits a standard or user-defined function Analysis containing up to seven parameters to a data set. Provides the standard deviation of regression of the fit, the standard deviations of the optimized parameters, coefficient of determination (r2), and the covariance matrix. Produces graphs of the data points with the fitted function as well as the residuals. Supports weighted regression analysis with user-provided standard deviations of the y values.



EXAMPLES OF USE The nine data analysis tools are described in detail, and examples of their use are contained in Word and Excel documents available as Supporting Information. In this report, a few SDAT features will be highlighted. Among the uses of the Differentiate tool is the opportunity for students to use the differential rate law to analyze reaction

Figure 1. First three Regression Analysis panels displayed in performing a regression analysis using the User Defined function option. (a) Start window, (b) Data Input window, (c) Model Selection window (the linear model is the default option). B

DOI: 10.1021/acs.jchemed.8b00084 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Technology Report

Figure 3. Regression Results. (a) Ready for Analysis window and (b) Regression Results window, which contains a summary of the optimized fitting parameters and their standard deviations.

found directly from such an analysis. An example is provided in the SDAT User’s Guide. The Spline tool uses a cubic interpolation to transform a set of x, y data that may not be equally spaced in x into an array of uniform x-spacing, which is necessary for use with the Differentiate utility or for other purposes. Alternatively, the Spline tool can be used to systematically increase, or decrease, the number of x, y pairs. A particularly useful component of SDAT is the Regression Analysis tool. The experience of one of the authors (A.M.H.) indicates that students readily learn how to analyze their data with this resource, which permits them to report their results at a level of rigor that gives them both confidence and satisfaction in this important task. The user-defined function option gives them the opportunity to test a variety of models with which to analyze data. The relative effectiveness of these approaches can be compared by examining the fitting statistics provided in the Model sheet and by examining the model-data Fit and Residuals charts. For example, students can determine when it is not justified to use a polynomial of higher order(s) or fitting functions containing additional parameters to fit their data. The SDAT User’s Guide and Examples documents provided in the Supporting Information contain 27 illustrations of the

Figure 2. (a) User Function screen accommodates the number of parameters, the input of the function, and the initial guess values of the parameters. (b) Display shows that the user-defined function has been successfully chosen and highlights Model in the flow diagram. Panel c allows the user to check a box for a weighted regression and to enter the location of standard deviations in the x values. The Weights box is now highlighted in green. C

DOI: 10.1021/acs.jchemed.8b00084 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Technology Report

Figure 4. (a) Fit and (b) Residuals charts produced by SDAT after the completion of the weighted regression analysis of the exponential decay data. The y-value error bars are automatically produced in the weighted analysis. The charts have been reformatted for clarity.

The first three screens, displayed in Figure 1, show the Start of the analysis and a brief introduction to the tool, the Data Input panel into which the user enters the rows of the columns containing the x and y data, and the Model Selection window that allows one to choose one of five typical fitting models as well as a User Defined model. Note the colorful flow diagram on the left-hand side of the panels that identifies the sequence of steps used throughout in the regression analysis. A green square indicates the currently active step. After the User Defined option is selected, the User Function input screen is displayed and the information describing the chosen model is entered: the number of fitting parameters (SDAT accommodates up to seven parameters), the user-defined function, in this case, the user would enter A*exp(−k*x) (x always denoted the independent variable), the parameter symbols

SDAT tools including 15 examples of using the Regression Analysis. One example (Regression III in the User’s Guide) demonstrates the use of weighted regression in analyzing the exponential decay of a luminescent sample for which the uncertainties of the intensity data (photon counting) are the square root of the number of counts. The user provides the uncertainties in the y values, uy, and SDAT converts these into the weighting factors, (1/uy). To illustrate the visual user interfaces encountered in the Regression Analysis tool, the stepwise screen shots displayed in carrying out a weighted regression analysis are shown in Figures 1−5. Another example (Regression VIII) illustrates the use of correlated covariance in a propagation of errors calculation. The examples provided in the User’s Guide deal with chemical kinetics, thermodynamics, electrochemistry, and quantum chemistry applications. D

DOI: 10.1021/acs.jchemed.8b00084 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Technology Report

(as entered in the function box), and their initial guess values. After continuing, the user is shown the Model Selection window, which confirms the choice of the function chosen. The next screen gives the user the option to perform a weighted regression analysis. If the weighted regression box is checked, the user is prompted to enter the location of the uy values to be used. These three screens are displayed in Figure 2. Next the user sees the Ready for Analysis panel, which explains what is produced after the regression calculation is launched. At this point, the Finish box, the last step in the flow diagram, is lit. After SDAT successfully completes the regression calculations, the Regression Results window appears. It contains a summary of the results, including the model function used, the parameters used, and their optimized values, along with their standard deviations. The standard deviation of the regression fit is also indicated. Also, a Note appears at the top of the panel that provides the user with a caveat about the fit results being possibly characterized by a local minimum. These two screens are shown in Figure 3. At the conclusion of the regression analysis, SDAT produces two graphs: a Fit chart, which displays the data points, along with the regression fit to the data based on the model chosen, and a Residuals chart, which displays the y-residuals. The charts produced are generically formatted but can be readily customized for inclusion in a report or manuscript. The fit chart also contains a text box that indicates the fitting function and the values of the optimized parameters. In the case of a weighted regression analysis, the y-value error bars are displayed. The charts obtained for the exponential decay analysis example are shown in Figure 4. SDAT also provides a Model worksheet that contains a complete summary of the calculation including the model function used, statistical indicators of the quality of regression fit, the initial and optimized values of the parameters, their standard deviations, and the covariance matrix of the parameters. Also included is a list of the x and y data values used in the calculation, the calculated y values, and their deviations, yi − ycalc. The off-diagonal elements of the covariance matrix allow the user to determine the pairwise interdependence of the parameters used in the fitting function with the particular data set analyzed. The Model worksheet obtained in the exponential decay analysis described above is shown in Figure 5.

Figure 5. Model worksheet produced by SDAT. It has been formatted for presentation.

be using the same software package, this resource can promote active, student-centered learning through facilitated communication among students and between students and instructors. Although many commercial stand-alone products are more comprehensive and sophisticated than SDAT, they can be expensive, and some involve a steep learning curve for use. Their sophistication makes them more complicated to use and thus are often a hindrance, especially for students and nonexperts.



ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available on the ACS Publications website at DOI: 10.1021/acs.jchemed.8b00084. User’s Guide: descriptions of SDAT tools, gives numerical examples of their use (PDF, DOCX) Data described in examples of SDAT User’s Guide (XLSX) Application for installing SDAT on PCs (ZIP)





CONCLUSION SDAT is a sophisticated, versatile, user-friendly data analysis package that is installed on a PC as an Add-In to Microsoft Excel, a platform that is widely available and familiar to students. Its nine tools contain most of the applications needed for science and technology students as well as for researchers. It is easy to install, and an instructor can encourage students to install it on their laptops or PCs. It is a resource that will have value to undergraduate science and engineering students, seasoned researchers, and everyone in between. It has been successfully used in classroom and laboratory instruction in physical chemistry. Chemical engineering students and researchers will also benefit from its use. Since SDAT runs within Excel, data entry as well as data analysis and presentation can be accomplished on a single platform obviating the need to switch between different applications. Students will be able to install SDAT on their PCs for access on demand, and instructors can assign homework problems and lab report assignments that require modeling using regression and other types of analysis. Since all members of the class will

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Arthur M. Halpern: 0000-0002-2211-2826 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors acknowledge M. Huber, who coauthored the Regression Analysis code, and B. R. Ramachandran for helpful discussions.



REFERENCES

(1) Shoemaker, D. P.; Nibler, J. W.; Garland, C. W. Experiments in Physical Chemistry, 5th ed.; Mc-Graw-Hill: New York, 1989; p 76. (2) Zielinski, T. J.; Allendoerfer, R. D. Least Squares Fitting of Nonlinear Data in the Undergraduate Laboratory. J. Chem. Educ. 1997, 74 (8), 1001−1007. E

DOI: 10.1021/acs.jchemed.8b00084 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Technology Report

(3) Young, S. H.; Wierzbicki, A. Linear Least-Squares Regression: LinearLeastSquares.mcd, linear_x.prn, linear_y.prn, exercise22x.prn, and exercise22y.prn. J. Chem. Educ. 2000, 77 (5), 669. (4) Cooper, P. A Simple and Convenient Method of Multiple Linear Regression To Calculate Iodine Molecular Constants. J. Chem. Educ. 2010, 87 (7), 687−690. (5) De Levie, R. When, Why, and How to Use Weighted Least Squares. J. Chem. Educ. 1986, 63 (1), 10−15. (6) SDAT code was written by Halpern, A. M.; Frye, S. L.; Huber, M. (7) Rubin, S. J.; Abrams, B. Teaching Fundamental Skills in Microsoft Excel to First-Year Students in Quantitative Analysis. J. Chem. Educ. 2015, 92 (11), 1840−1845. (8) de Levie, R. Advanced Excel for Scientific Data Analysis, 3rd ed.; Atlantic Academic LLC: Brunswick, ME, 2012. (9) Halpern, A. M. Experimental Physical Chemistry: A Laboratory Textbook, 2nd ed.; Prentice Hall: Upper Saddle River, N.J., 1997. (10) Halpern, A. M.; McBane, G. C. Experimental Physical Chemistry: A Laboratory Textbook, 3rd ed.; W. H. Freeman and Company: New York, 2006. (11) Ramachandran, B. R.; Halpern, A. M. Chemical Kinetics in Real Time: Using the Differential Rate Law and Discovering the Reaction Orders. J. Chem. Educ. 1996, 73 (7), 686−689. (12) Halpern, A. M.; McBane, G. C. Experimental Physical Chemistry: A Laboratory Textbook, 3rd ed.; W. H. Freeman and Company: New York, 2006; pp 20−1−20−14.

F

DOI: 10.1021/acs.jchemed.8b00084 J. Chem. Educ. XXXX, XXX, XXX−XXX