edited by JOHN W. MOORE
Statistical Analysis of Alternative Models Estel D. Sprague University of Cincinnati, Cincinnati, OH 45221 C. E. Lamabee, Jr. Clermont College, University of Cincinnati, Batavia, OH 45103 The National Research Council (I) has recently drawn attention to the need for education in good data practices for scientists and engineers. Panel members recommended, in part, the development of homework problems and laboratory exercises emphasizing experimental design and the effects of uncertainties on inferences drawn from experimental data. The general methods discussed in this paper could he employed in any of a variety of teaching situations to nartiallv address these needs (see Discussion for sueeestions and description of current usage by the authors). From a statistician's point of view ( 2 ) .scientific exneriments are designed to investigate the 'relationships among various experimental variables (observables) and empirical or theoretical parameters. The first stage of any inv&tigation involves determining which variables are important. The second stage involves determining empirically the relationships between variables in controlled experiments. The final stage involves the development of a mechanistic model that gives the correct functional form of the relationship amone the imnortant variables. hased on a ohvsical understanding of thk system(s) beingstudied. The purpose of this paper is to describe a minimal set of statistical techniques that the authors have found to he useful for mechanistic modelinn in classical phvsical cbemistry. The behavior of gases a t moderate pressur& was chosen here as an arbitrary example for purposes of illustration. The methods thems&es, hbwever,&e bf general applicability. I t is merely required that one have experimental data, with estimated uncertainties, together with one or more models providing possible mathematical descriptions of the experimental behavior. Such situations are common throughout chemistry in general, and physical chemistry in ~articular. In the present example, three equations of state (ideal gas and two forms of van der Waals' eauation) have been evaluated on the basis of their abilityto describe gas compressibility data (carbon dioxide at 303 K). Parameter estimation, model testing, and model discrimination methods are demonstrated that lead to answers for the questions:
--
..
What are the best values far any undetermined parameters of a model? Are any of the models tentatively acceptable? Is one tentative model significantly better than the others? Experimental Methods The gas-behavior experiment employed in this exercise has been in use for many years (3)and bas been modified a t various times (4-6). Both the apparatus and the experimental procedure used here are essentially the same as described by Dannhauser (5).In particular, nitrogen was used as the calibrating gas for the apparatus. 238
Journal of Chemical Education
The central feature of the experiment is the expansion cycle, in which a sample of gas is allowed to expand from one cylinder into a smaller, evacuated cylinder, followed by renewed evacuation of the smaller cylinder. In each cycle the molar volume of the remaining gas increases by a factor determined by the relative volumes of the cylinders. The experimental observables are the pressure, which is assumed to be obtained without systematic error directly from the gauge reading (certified accurate to 0.25% by the manufacturer), and the number of expansion cycles carried out, the temperature beinn held constant a t a measured value. Tvnical resulrs obrainGd at 30.0 r 0.1 O C for nitrogen and carbon dioxideare presented in Table I. The raw pressuredata were measured in pounds per square inch (psi), since this is the unit used in the construction of the gauge. Corresponding values are given in megapascals (MPa), the appropriate SI unit. The pressure data are assumed to have a constant random uncertainty, independent of the pressure value, equal to a standard deviation of 0.5 psi, which translates to a standard deviation estimate of 0.003 MPa. The validity of this assumption will he verified below. Equations of State An equation of state for a pure gas is generally expressed as a mathematical relationship among the pressure, temperature, and molar volume of the gas. Many different equaTable l. Gas Compresslblllty Data at 30.0 "C Number of Expansion Cyclesa 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
P, psi 989 795 638 513 412 331 265 212 171 137 110 89 71 57 46
-
-
Nitrogen P, MPa 6.819 5.481 4.399 3.537 2.641 2.282 1.827 1.462 1.179 0.945 0.758 0.614 0.490 0.393 0.317
-
.SBB ten lor definnion ol expansion cycle.
Carbon Dioxide P, psi P, MPa 868 770 670 573 464 404 335 275 225 184 149 120 96 79 64 51 41
5.965 5.309 4.619 3.951 3.337 2.785 2.310 1.896 1.551 1.269 1.027 0.827 0.676 0.545 0.441 0.352 0.283
4). In this event. a first-order Tavlor-series aooroximation is used to define 'the normal equations of the' least-squares Droblem. Solution of the normal eauations " eives the adiustments required in the parameters; for nonlinear problems, an iterative solution is required. This approach has worked very well for a variety of prohlems, ranging from ligand-protein binding to the analysis of rotational fine structure in vibrational spectra. There are, however, limitations. For problems that are ill conditioned because of heing extremel; nonlinear or because nearly ambiguous combinations of parameters exist, the roundoff errors in the computationalimplementation can lead to failure to converge on any solution, or worse, can lead to convergence to a nonsensical solution. Fortunately, the experience of the person working with the prohlem is generally areliable euide in detectine nonsense. Balanced against these limitations is the flexibility of the method. Virtually any function may be used, linear or nonlinear. There is no limitation to the number of experimental variahles, or even to closed form expressions for the definition of the residuals. Furthermore, pretransformation of the experimental data is not required. Variable transformation and the associated propagation of error are integral parts of the procedure. One needs only the raw data and their uncertainties. The distinction between indenendent and denendent variables disappears in this method and the unceriainties for all variables are taken into account automatically. In application, the method requires initial estimates of anv undetermined ~arameters.For the calibration fit of the iddaLgas equation to the nitrogen expansion data, these are Kand Vo. The nominal dimensionsof the gas cylinders in the apparatus provided an estimate f o K ~ of 1.25. The initial molar volume was estimated from Vo = V16lKl6, where the value of K was taken to be 1.25, and the molar volume after the final expansion cycle, Vls, was estimated from the ohserved nressure.. P,.. usine the ideal-eas eauation. For each of the fits to the carbon dioxide data, K wab held fixed at the \,slue found in the nitroeen ralibrntion fit. The initial estimate of Vo was taken to bk 6',d/K14, with V I again ~ estimated hv means of the ideal-eas eauation. For the fit to the van der ~ a a l equation s with and-b held fixed, values of 0.366 Pa mhol-2 and 4.28 X 10-5 m3mol-I, respectively, were used (7). For the model in which a and b are considered to be undeterminedparameters, the above values were used as the initial estimates. The parameter values obtained from the least-squares calculations are presented in Table 2.
tions of state have been used to describe gas behavior. The ideal-gas and van der Waals equations of state, eqs 1and 2, respectively, were chosen here for purposes of illustration. PV=RT ( P n / V ) ( V - b ) = RT
(1) (2)
+
Two forms of the van der Waals equation were used for the carbon dioxide data, differing only in the treatment of the parameters a and b. In the first, a and b were taken to he constants calculated from the independently determined critical parameters for carbon dioxide (7). In the second, a and b were taken to be undetermined parameters. Optimal values for this data set were found in the least-squares fitting procedures described below. Statlstlcal Models A statistical model is an expression that connects the variables in the theoretical expression with both the experimental observables and the experimental errors. In order to arrive a t such expressions beginning with eqs 1 and 2, the appafatus constant1, K, and the initial molar volume of the gas, Vo,before any expansion cycles, were introduced. After! cycles the pressure is Pi,and the molar volume is V, = V&'. Since there is virtually no uncertainty in the variables i, the number of expansion cycles, i t is appropriate to assume that the random experimental errors are confined to the measured pressures. It is therefore convenient to define the residuals for a particular statistical model as the differences between the observed pressures and the pressures calculated for that model. Using these definitions, the ideal-gas and van der Waals statistical models are eqs 3 and 4, respectively. residual, = P, - RTI(V&')
residual; = P; - RTI(V@
(3)
- b) + ol(V&)z
(4)
The residuals contain contributions from both the random experimental errors and any inadequacies of the models heing examined. A possible third contribution is systematic experimental error. The statistical treatment presented here is valid only if this has been eliminated.
a
Parameter Estlmatlon Least-squares adjustment of undetermined parameters is based on the minimization of the sum of the squares of the (weighted) residuals, subject to the condition that each residual vanishes for perfect agreement between the theoretical model and the experimental data. In the authors' experience, the most generally useful approach to this problem is that described in detail by Wentworth in this Journal (8,9). It is frequently the case that the functions defining the residuals are nonlinear in the parameters (such as eqs 3 and
'
K is the factor by which the molar volume increases in a single expansion cycle. It equals (V, Vb)/Va where Va and Vb are the volumes of the cylinders in the apparatus-see ref 5.
+
Table 2. Least-Sauares Results Parametersa (Vo in m3 mol-')
$(e(ea),MPa2
DF
vo = (3.6951i 0.0014)X lo-'
1.1 X 1 0 F
13
Model Nitrogen Calibration Fit
Ideal Gas
K=
1.24526 i 0.00017
CarLmn DioxMe F i e
Ideal Gas van der Waals (a and b fixed) van der Waals
(adjusted)
;= i0.424 i 0.004)pa m6 mo1r2 b = (5.42 i 0.10)X
lOP m3 ml-'
'All uncemintiir listed are amnard deviation ealrnates. Roundlngol resuns war done using me 3/30rule", as described by ShoemaLer et sl. ( I D )
Volume 65 Number 3 March 1988
239
IDEAL GAS
0 a
=
2
3 Q fi Kc
0
2
4
6
8
10
12
14
0.2 0.0 -0.2 -0.4 -0.6
IDEAL GAS
- l . o T 1 l l l l l l l l l l l l l l l
16
0
NUMBER OF EXPANSION CYCLES Figwe 1. ideal-s dloxlde.
2
4
6
8
10
12
14
16
NUMBER OF EXPANSION CYCLES
model. The squares are experimental data f w carbon
Figure 4. ideal-gas model. The deflnnlan of the residuals Is given by eq 3.
VAN DER WAALS Fixed a and b -.04
a
VAN DER WAALS 1 0 0
2
4
6
8
10
12
14
16
0
NUMBER OF EXPANSION CYCLES
2
4
6
8
10
12
14
16
NUMBER OF EXPANSION CYCLES
Flgure 2. Van der Wads model wlm fixed sand b. The squares are experimental data f w carbon dioxide.
Flgwe 5. Van der Waals model with fixed a and b. The definition of the residuals is given by eq 4.
VAN DER WAALS Adjusted a and b Q
a
a
fi a
2 1
0
2
4
6
8
10
12
14
16
NUMBER OF EXPANSION CYCLES Flgure 3. Van der Wsals m&l with adjusted a and b. The squares are experimemal data for carbon dloxlde.
Model Evaluatlon
The quality of fit of a particular model to the experimental data can he evaluated in a number of ways. The first might he simply examining "by eye" the extent of agreement he240
Journal of Chemical Education
-.004
VAN DER WMLS
-. 006 -. 008
-
Adjusted a and b
0
2
4
6
8
10
12
14
16
NUMBER OF EXPANSION CYCLES Flgve 6. Van der Waals model wnh adlusted a and b. The definition of the residuals is given by eq 4.
tween calculated and experimental results. In order to permit this, the parameters from the carbon dioxide fits in Table 2 were substituted into eqs 3 and 4 to allow calculation of the pressure versus the number of expansion cycles according to each model. The results are plotted in Figures 1-
Mechanistic modeling procedures could be incorporated into the physical chemistry curriculum in a number of ways, being introduced in either alecture or alaboratory course, as desired. For instance, in situations where it is either undesirable or impossible to carry out the actual experiment described here, the raw data provided in Table 1could be used directly, and the analysis could be readily extended to additional alternative equations of state. Given the equations of state, the student would be expected to define the statistical models, determine any parameters in the models, make a tentative evaluation of the models on the basis of the runs test, an4 compare all tentatively acceptable models by means of the relative F test. On the other hand, if equipment and time were available, a more extensive implementation of this exercise would make the student responsible for all steps from performance of the experiment through statistical analysis and lab report writing. The present authors have used both of these approaches successfully in physical chemistry courses over the past three years. Our students are given an intensive introduction to the use of microcomputers in the chemistry curriculum in a required course in the first quarter of the junior year. Data and error analysis form a significant part of this course, together with extensive use of spreadsheets and word processing. The methods described in this paper are covered in detail in that course. Programming itself is not taught, and the students have varied widely in their programming ability and experience. Even complete beginners, however, have
been able to progress successfully from supplied examples to the appropriate specification of desired models in the program. The methods described here have provided a useful tool for the laboratory courses that follow in the junior and senior years and have been applied in the analysis of data from various other ex~eriments.These include rotational fine structure in \,ihrati(rnalspectra, sound rrlovity in gases, Hovle's law t)t.havior of air at low pressure 11-1.5 arm,, criticaimicelle concentrations of surfactants from conductance or osmotic coefficient measurements, etc. Numerous additional applications are readily visualized in the typical curriculum. Literature Cited I ~ r p r . t n. e ~l r r a r m r n r isnrnr.lir onc f ' n m s . . r . n : nu:n r n r r r s h L a w a r . n S.d~n,nlh.ad6 m y Prr?? l ~ a 4 . n a w nIll' . 1-6 2 H n r . C F 1'. H t # n f < r\\. i;Ilun"r..l i $!nr..i.*:l,rL'rprr e i n i r n \I#.?\ Uru
York, 1978:Chapfer 16. S. J.Appl. Mech. 1936.58.Al36. 4. Baron. J. D.; Watson.G. M. J. Chem.Edue. 1954.31.74. 3. Burnett. E.
MeGraacHill: New York,198Rp636. 13. Draper, N. R.:Smith,H. AppliadRrgression Analysis: Wiley: New York, 1966; p 95. 14. Soraeue.E. D.:Larrabee.C. . E..Jr.: . . Hahall. H. B.Anol.Biochom. 1980.101. 175. 15. &f f3, p 306. 16. Atkins, P. W. Physical Chamistry: Freeman: San Francixo, 1978; p 78.