Spreadsheet Programs as Tools for Analytical Chemistry M. L. Salit The Perkin-Elmer Corporation Norwalk, Conn. 06859
As analytical chemists have gathered more data for every analysis, the demands for data reduction and reporting have increased to burdensome levels. In many circumstances, commercially available spreadsheet software can provide solutions that alleviate some of the burden. Spreadsheet software offers an alternative to conventional, linear programming in a language such as BASIC. This alternative can allow solutions to be developed and used with greater ease. Persons unfamiliar with conventional programming can often develop their own solutions using spreadsheet software, without outside help. Modern spreadsheet software contains powerful graphics, data formatting, and data manipulation features that are not easily accessed within most programming environments. The ability to present carefully formatted tabular and graphical representations of data should be considered when comparing the spreadsheet and conventional programming approaches to solving problems. Formatting output can be the most difficult part of conventional programming, and much of this is done automatically by spreadsheet programs. What is a spreadsheet program? A spreadsheet program is a rectangular array of cells; each cell is a discrete region that may contain data, alphanumeric labels, "macro" commands, or formulas. Formulas are built from mathematical operators (+, - , *, /), constants (e.g., 12.34), references to other cells in the array, and built-in functions. Built-in functions include those for financial, scientific, and statistical calculations that allow complex manipulations to be performed with a single formula. It is useful to define spreadsheet software with a metaphor (i.e., using a 0003-2700/88/0360-731 A/$01.50/0 © 1988 American Chemical Society
familiar situation to describe an unfamiliar one). The "spreadsheet metaphor" is based on data tables constructed on sheets of grid paper. This metaphor was first widely embraced for business calculations in the form of
A/C
community does. Spreadsheets have become recognized as powerful tools for data handling, and their application to report generation and routine data analysis for analytical results has become popular. In addition to their
INTERFACE
an "electronic accounting book." The power of being able to make a single change in an input value and having an entire worksheet recalculated in seconds was intoxicating when compared with the hours it would take to perform such "what if . . . " analyses manually. The spreadsheet put the capability for data analysis into the hands of the person responsible for the decision making. This efficiency was perhaps the most significant factor in gaining the business community's wide acceptance of spreadsheet programs and personal computing. Spreadsheets and analytical chemistry Analytical chemists need effective data analysis tools as much as the business
strength for calculation and presentation, the capability of spreadsheet programs to perform logical or database operations on the data is critical to the generation of effective reports. Database operations simplify the generation of reports through the extraction, analysis, and sorting of the data. For example, all data for a single element in a multielement data set can easily be extracted into a separate table by using these database functions. This capability to "predigest" information allows the analyst to communicate results more effectively. Spreadsheet use in analytical chemistry was originally inspired by singleuse, unique applications in research environments. These were generally investigative applications whereby the
ANALYTICAL CHEMISTRY, VOL. 60, NO. 11, JUNE 1, 1988 · 731 A
creation of a dedicated program to han dle the data would have required a much greater effort than the simple building of tables of input data and re sults in a spreadsheet. The spreadsheet approach was more efficient in terms of the cost of development, and the flexi bility of the approach allowed modifi cation of the calculations in an interac tive environment. This flexibility proved to be valuable in research appli cations; the edit/compile/debug re quirements of a conventional program matic approach impede the data modi fication process. The more avenues of data manipulation that are explored in a research problem, the better the un derstanding of that problem. Any ap proach that encourages the exploration of different data-handling techniques is valuable in a research application. These unique research applications still exist, and spreadsheets are being used profitably in them. However, the newer areas of application in routine reporting of results are responsible for the broad acceptance of spreadsheets in the analytical community. Spreadsheets and the analytical process
Figure 1 presents the five steps of the analytical process: sampling, sample preparation, measurement, data reduc tion, and data reporting. The sampling and sample prepara tion phases generate information that will later be collated with the analytical results to provide a complete report. The sampling-phase information in cludes sample identification, origin, and time of sampling. The sample preparation phase produces informa tion such as sample weights or dilution ratios that will be used for the calcula tion of final results.
Sampling
Sample preparation
Measurement
Data reduction
Data reporting
Figure 1. The analytical process.
The measurement phase of analysis is the least exploited spreadsheet ap plication to date. Several recently in troduced commercial products (e.g., Lotus Measure, Palantir Windows InTalk) allow a spreadsheet to interact with an analytical instrument in real time; some of these products support both the transmission of commands to the instrument and the acquisition of data from the instrument. Such appli cations may be as simple as recording a titration curve from an autotitrator or they may be as complex as synthesizing and transmitting control commands to an instrument in addition to providing real-time acquisition of data. Several products developed specifi cally for laboratory applications are available, including Asyst, Labtech Notebook, and DADiSP. These pack ages generally support real-time com munications with instruments and of fer some scientific functions not built in to the general-purpose spreadsheets. Some "integrated" software packages, such as Lotus Symphony, Microsoft Works, and Innovative Software's Smart System, offer a communications module that will allow easy movement of data acquired in real time to the spreadsheet workspace. Using these hybrid products can eliminate data for matting and data transfer steps from the spreadsheet solution. The data reduction phase of analysis is the most natural area of spreadsheet application. Data reduction is calcula tion intensive, and data rearrangement and graphical presentation are also im portant. These are areas in which spreadsheet software excels, and solu tions are easily created. The data-reporting phase can be simplified and enhanced with spread sheet software. Modern programs offer many data formatting options, includ ing both tabular and graphical options. The ease of creating graphics within the spreadsheet environment gives the analyst a wider range of reporting ca pabilities. The proper presentation of an analytical report is critical to the complete understanding of the results. Spreadsheet models
Spreadsheet programs alone offer the user a "clean sheet of paper" environ ment in which to work. The flexibility to do many things is there, but no func tionality exists. A spreadsheet pro gram, like a blank sheet of paper, offers little intrinsic value. As the writing on the sheet of paper defines its function ality, spreadsheet functionality is de fined by a model. This model is the set of rules by which operations take place in the spreadsheet workspace. Different models cause different ac tions to occur in the workspace. Models are composed of the formulas, refer ences, data, and macro commands en
732 A · ANALYTICAL CHEMISTRY, VOL. 60, NO. 11, JUNE 1, 1988
tered in the workspace. Most spread sheet models usually include format in structions for output and for user interaction, such as data entry. The rules that make up a model are generally simple formulas within a fa miliar, easy-to-understand algebraic syntax. This allows users who are inex perienced with computer programming to develop models without learning a complex set of commands and struc tures. The results of a model are dis played immediately, allowing the user to develop the model interactively and to make changes as the results are ex amined. Interactive model development en courages the nonprogrammer to evalu ate approaches that might not other wise be examined. A better under standing of results comes with careful examination and presentation of the data analysis. Using spreadsheet soft ware, there is no need to formally speci fy a program for development by a soft ware engineer or to wait for that pro gram to be developed. Ease of model development is critical to the utility of spreadsheet software. Models are generally built from a pro totype set of rules and formats. Once the prototype is developed, its applica tion to the data set is made by copying it through the spreadsheet. Spread sheet programs are carefully developed to make this copying and editing easy for the user. A simple example can be used to il lustrate the creation of a model using relative and absolute references. This example will calculate the molecular weight of a gas using the ideal gas law. Figure 2 shows the spreadsheet mod el. The experimental data set contains measured pressures and temperatures for a given mass of an ideal gas at con stant volume. The spreadsheet in Fig ure 2 uses references in the form of RxCy, where χ is the horizontal row number and y is the vertical column number. The molecular weight of the gas is calculated at each data point, and an average of the results is calculated as well. A spreadsheet is constructed with a header section, the measured pres sures in the first column of the data table, and the temperatures in the sec ond column. Cells in the header region contain the mass of the gas, the volume of the container, and the gas constant. A prototype formula was construct ed in row 13, column 3. This formula multiplies the two constants (mass of gas and gas constant), referring to them as absolute references. The use of R7C2 in the formula refers to the con tents of the cell in the seventh row of column 2, the mass of the gas sample. Similarly, R9C2 refers to the gas con stant. The product of the mass and the (continued on p. 735Λ)
manageable subsections. This structured approach is especially important for the development of reliable models and for the development of models by persons inexperienced in computer programming.
(a) 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
I
2
3
Calculation Worksheet For Ideal Gas Molecular Weight PV » nKT;
PV = mRT/HW;
MW = mRT/PV
Constants : mass of gas m = volume of gas V = gas constant R =
Macro commands
0.8195 9 500.0 cm"-3 82.06 cm A 3 atm/mol Κ
Measured Quantities Pressure (atm) Temp (K) 1.003 298.27 1.048 311.74 1.085 322.51 1.123 334.68 1.152 342.95 Average MWi
Calculated MW (g/mol) 40.02 40.01 39.98 40.08 40.04 40.02
(b) ι 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3
2
Calculation Worksheet For Ideal Gas Molecular Weight PV = nRT; PV «• mRT/MW; MW = mRT/PV Constants: mass of gas m = volume of gas V = gas constant R = Measured Pressure (atm) 1.0025 1.043 1.085 1.123 1. 152
0.8195 500 82.06
g cm" 3 en* 3 atm/raol Κ
Quantities Temp (K) 298.27 311.74 322.51 334.68 342.95 Average MW:
Calculated MW iq/aol) =R7C2*R9C2 *RC[-1]/(RC[-2]*RBC2) =R7C2*R9C2 *RC{-1]/(RC[-2"*R8C2) =R7C2*R9C2*RC[-1]/(RC[-2 ;*RSC2) =R7C2*R9C2*RC[-1]/(RC[-2]*RSC2) =R7C2*R9C2*RC[-1]/£RC[-2]*R8C2) =AVERAGE(R[-5]C:Rf-l]C)
(c) Β 1 2 3 4 5 6 7 8 9 10
PV = nRT;
PV = mRT/MW;
Constants: mass of gas ra = volume of gas V = gas constant R =
11 Measured 12 13 14 15 16 17 18
Ç_
Calculation Worksheet For Ideal Gas Molecular Weight
Pressure (atm) 1.0025 1.048 1.085 1.123 1.152
M» = B R T / P
0.8195 500 82.06