Data Analysis in the Computer Age A Monte Carlo Computer Simulation of Error Sensitivity in the Determination of Formation Constants John E. Douglas Eastern Washington University, Cheney, WA 99004
Chemistry students commonly find statistical methods of data analysis a frustrating and difficult topic. They struggle with complicated formulas, which often remain mysterious black boxes even when presented with rigorous derivations. Propagation of Errors Error Sensitivity
One of the most common statistical procedures in chemistry, and a standard topic in the physical chemistry lahoratory (11,is the propagation of errors. Here we restate the problem in terms of error sensitivity. In other words, how sensitive is the error in the final result to errors in the input parameters? This information is important not only because it leads to a n assessment of uncertainty in the result, but also because it identifies the experimental parameters that have the a e a t e s t imoact on the result-showina where the greajkst effort siould be placed in refining the-experiment. Also. when there are alternate wavs of ~erforminr! the experident and treating the data, i"t is important To know which method is the least sensitive to error and under what conditions. In the standard treatment of propagation of errors it is necessarv to differentiate the functional relations hi^ between the independent and dependenr variahles. In many cases thir is difficult or imooss~bleto do.. es~eciullvif there are several steps and relationships involved in the analysis. There is a computer program available from the Quantum Chemistry Program Exchange (2)that includes propagation of errors, but it, too, is just another black box for the student. Also, it fails in multistep calculations. Consequently, it is not surprising that the assessment of error sensitivity is not taught, or used, as often as it might be. A
The New Computer-Age Approach to Statistics
Thecomputer uge has spawned a new approach to statistics in which the behavior of data sets is observed empirically by use of the computer rather than being predicted by mathematical formulas. This approach complements and is eauivalent to traditional statistical methods. It has the advantage of allowing the student to explore, manipulate, and describe data sets intuitlvelv. and to draw valid statistical inferences without the usuyl concerns for mathematical tractability. Two recent papers describe this new approach and its impact in the classroom. Efron and Tibshirani (3) review several such methods. An example is Efron's "Bootstrap" method (4) in which a small data sample is copied many times to generate a very large sample. Statistics for the original sample are generated from randomly selected points taken from the newly created large sample. Peterson (5)presents some examples of curricular reform in teaching statistics and their salutary effects.
Monte Carlo Determination of Error Sensitivity An Example from Chemistiy
This paper describes how error sensitivity can be evaluated readily by a Monte Carlo computer simulation. The approach is described, followed by an example of its application to the determination of formation constants of 1:l donor-acceptor complexes. The method is quite straightforward. A set of synthetic input data is constructed. The members all have the same mean, but are scattered about this mean with a specified standard deviation corresponding to that expected in the experimental data. Such a random Gaussian set can be produced by the following relationship (6).
where GRND is a random number from a set having Gaussian distribution about the mean DMEAN with a standard deviation DSTD; and GSUM is the sum of NUM uniformly distributed random numbers in the interval [0,11.Avalue of 12 for NUM has been recommended, but a value of 4 gives satisfactory results and greatly reduces computation time. Aresult set is then calculated fmm the input parameter set using the appropriate functional relationships. The accuracy of the method is demonstrated by comparing the means in the result set with the values of these variables that were used to generate the data. The error sensitivity is given by comparing the standard deviations of the result-set variables with those of the inpubset variables. In this discussion it has been assumed that the student is familiar with the concept of standard deviation. If this concept must be presented, then it, too, can be demonstrated by a computer simulation. It is easy to design an exercise in which the student obtains a set of Gaussian random numbers using eq 1and then observes their distribution, relating this to the mean and standard deviation. Applying Monte Carlo Error Sensitivity The Formation Constant and Heat of Formation of Donor-Acceptor Complexes
The Monte Carlo approach to error sensitivity is illustrated here by its application to finding the formation constant and heat of formation of 1:l donor-acceptor complexes. These formation constants have long been determined by the Benesi-Hildebrand equation (7)shown below.
where [A] is the total acceptor concentration; C g is the concentration of free donor; A is the absorbance of the solution; K is the formation constant; and e is the molar absorption coefficient. Volume 69 Number 11 November 1992
885
In the most common application of this method it is assumed that the concentration of free donor is large compared to that of the acceptor. Thus, the concentration of free donor after complexation can be taken to be the same as the initial donor concentration, that is, Ce is replaced by [Bl. Then a plot of
yields e from its intercept, and K from its slope. Scott (8)proposed a rearranged form of eq 2.
In this form, CdAl --;i- vs. CB
.
is olotted. with e obtained from the slooe.. and K from the iniercept.' Christian (9)points out that the restriction [Bl>> [A1 is not necessary because an initial, approximate K can be used to calculate the concentration of free donor CBfrom the known initial concentration [Bl. Several passes through a weighted linear leasesquares program, using either the Benesi-Hildebrand or Scott equations, should lead to converging values of K in well-behaved situations. Commonly, the heat of formation AH is found from a set of K's measured over a range of temperatures, followed by application of the van't Hoff equation with a plot of
Additional ComputationalApproaches There are other computational approaches for determinine equilibrium constants and heats of formation of 1:1 do~or~acceptor complexes. Several workers have proposed nonlinear least-squares methods for finding K, followed by t h e traditionaltemoerature deoendence to find & (10-13).The nonlinear leasesquares methods are claimed to he of greater generality than the simpler linear least squares. Joens and associates (14)have followed a different aDproach in which AH is f o k d from the slope, plotting log (A) vs. 1/T a t different solution concentrations. An approximate Kcan also be calculated. Joens method is said k have the advantaec of beine useful in the limit of low concentration where shution bchavior approaches the ideal. Thus, using these several approaches, the experimental quantities [A], [Bl,A, and T are used to determine K, e, and AH. A knowledge of the sensitivity of K, e, and AH with respect to errors in the four experimental parameters answers several questions. What is the uncertainty in the final results? What are the uptlmal rxpenmmtnl conditions that minmw.s the error smaitivity'' Ifthnsr are not expenmentally attainnhlc, how scrious are the errors under other conditions? Is one method more reliable than another? Are there particular regimes in which one or another methad is superior?
A standard propagation-of-errorstreatment is not practical here due to the complexity of the computational procedure. The best that has been done is to establish concentration regimes that produce the most reliable results. 'The routines are written in TRUE BASIC, which is available in versions for most computer systems. The author will be pleased to
886
Journal of Chemical Education
This has been done by Deranleau (151,Person (161,and Carta, Crisponi, and Nurchi (17). Implementation of the Monte Carlo Method The application of the Monte Carlo method to the determination of formation constants of 1:l donor-acceptor complexes has been implemented in a set of computer routines as described below.' There are three main modules. ABSCA1.C. the first module, can calculate the absorbance of a solution given specified experimental values of concentrations and temperat u r e along with an assumed formation constant, absorption coefficient, and heat of formation. However, the module calculates not one, but a set of absorhances wrresponding to those obtained from a set of repetitive experiments that have Gaussian scatter in the input data. This is done using eq 1. The second module calculates a result set ofK and e values from the input set of absorhances and concentrations a t a oarticular temperature. The mean and standard deviation of this result set give the accuracy and error sensitivity of the method, respectively, The following versions of this module are used, corresponding to the method being examined. KWLSQ-RH, an iterated weightcd least-squarer method based on the Bencsi-Hildebrwd equation (7.9, KWISQSCO'TT, a n iterated weighted linear lcast-squares method based on the Scott equation (8) KEPS11,based on the nonlinear least-squares method of Nwchi and Crisponi (13) KJOENS, based on Joens method (14) The third module, CALCH, computes a set of values for the heat of formation obtained from the equilibrium constants calculated over a ranee of temperatures in the second module. The mean andstandard-deviation of this set yield the accuracy and error sensitivity associated with determining AH. These modules can be run separately, with their output stored in a data file, for use by a following module. This is useful in the developmental stage and to examine the progress of the compkation. ~ o a e v e rit, was expeditious to combine all the modules into a single program, SUPEKK. which acceoted the set of inout ourameters and produced h e mean v&es of K, e, and k,:long with their standard deviations for all of the methods being studied. Results A few of the results obtained are presented here to illustrate the usefulness of the analysis. In all instances reported here, 44 scattered data sets were used in each determination. Formal statistical theory suggests that the uncertainty in the computed standard deviations decreases with the square root of the number of data sets. Thus, the choice of 44 data sets is a compromise between using enough data sets to get reasonably precise results and the wmputer time required. It was found that the more complicated nonlinear leastsquares method was no better than the simple linear leastsauares aoomach. Under *ideal" conditions both of these iethods geld essentially the same results. Under less desirable conditions, the nonlinear method sometimes fails to converge. The Joens method proved to be much less accurate exceot under soecial conditions as described below. Table 1shows resuits obtained under conditions considered ideal a c w r d i n ~to the aforementioned concentration criteria. The WLSQ and KEPSll methods accurately reproduce the initially assumed values of K and AH. A stanbard deviation of 1%in the scatter of the initial concentrations results in a standard deviation of 3 to 5% in the computed K and about 2%in AH.The scatter correspond-
Table 1. Error Sensitivity under "Ideal" Conditions for the WLSQ-BH and KEPSll Methodsa Experimental Conditions: K = 1,AH= 10 kJImol, [A]= 0.001. [B]ranges from 0.25 M to 2.25 M. Scattered Variable and its SD
SD in Computed Quantity (percent of mean)b
K
Experimental Conditions: K = 1.OfAH= -10.0 kJImol, IAl = 0.001 M, 101 ranges from 0.02 M to 0.09 M Method
Scattered Variable and its SD
SD in Computed Quantity (per cent of mean)b
AH (kJlmol)
[A]
0.00001 M (1%)
3.2
1.7
[B]
0.0125 M (1% of mean)
5.0
1.8
Temperature
0.1 'C
0.1
0.7
Absorbance
0.001
0.1
3.7
62
4.1
All of the above
Table 2. Error Sensitivity under "Nonideal" Conditionsafor the RWLSQ-BH, KEPS11, and Joens Methodsa
KWLSCBH~
'
WLSGBH refers lo the weighted least-squaresmethod usingthe BenesC Hildebrand equation (7); KEPS11 refers to the nonlinear least-squares method of Nurchi and Crisponi (11). Identical results were obtained from bath methods. Under these conditions with zero experimental scatter introduced, the WLSO;BH and KEPSll methods reomducedthe assumed values of Kand AH Joens method the error in reproducing With an error of less than 0.1%. ~ i t h ' t h e Kwas 71%. while that in reproducing Hwas 19%.
ing to normal uncertainty i n temperature and absorbance measurements have a nezlifible effect on K, but a lamer impact on AH, with absoria&e being the largest contributor to this error. The Joens method has a n inherent error that gives inaccurate results for both K and AH even without scatter in the input parameters. Table 2 shows results under conditions t h a t are not ideal, but perhaps necessary due to other experimental considerations. In this example the weighted linear leastsquares method gives larger errors than those found under "ideal" conditions, a s expected. But the magnitude of uncertainty is clearly identified, and the results may still be useful. The KEPSll method fails to converge, and is not usable. The h e n s method under these conditions is shown to be preferable to WLSQ i n some respects. Although the Joens method does not accurately return the assumed value of& it proves to be far less sensitive to errors in solution concentrations than the linear least-squares method. Thus, it is possible that the Joens method may actually be preferable under some circumstances. The experimenter can establish the preferable method under the particular wnditions by using error-sensitivity analyses. Note that the Joens method is clearly superior in its determination of
AH. Summary As a n illustration of a new generation of computer-based statistics, a Monte Carlo approach is described for performing a n analysis of error sensitivity in experimental methods. This approach may be more meaningful to students than traditional statistical methods. Also, it can be applied when traditional techniques become impracticably complicated. An example is given of the assessment of error sensitivity i n the determination of formation constants and heats
KEPSII
[A] 0.00001 M (1% of mean)
28.4
36.9
[B] 0.0055 M (1% of mean)
27.0
29.2
Temperature
0.1 'C
0.1
0.6
Absorbance
0.001
4.0
26.4
Many data sets did not converge; Method not usable
~ o e n s ~ [A]
0.00001 M (1%)
[B] 0.0055 M (1%of mean)
< 0.1
< 0.1
1.1
0.7
Temperature
0.1 'C
< 0.1
0.6
Absorbance
0.001
35.1
2.7
'under these k d i l i o k with zero exoerimental Scatter introduced. the WLSGBH melnad reprod~ceothe assdmw v a ~ e sd Kano I H w in an enor otlesslwn0 ID. W tntne Joew metnootnesrror n reprwmng Kwas 14 go0. wn e tnat n reprod,c(ng \H was 0 3".
of formation of 1:l donor-acceptor complexes. Application of these procedures not only yields the uncertainty in the results, but also identifies the most critical experimental parameters, optimum experimental conditions.8nd a comparison of the rehability of alternate experimental methods.
Acknowledgment This work was supported in part by a summer research grant from Eastern Washington University. Literature Cited 1. Shoemaker, D.P.: Garland, C.W.;Steinfdd,J. I.Erponm"tsolPh~ilandC~miat~, 4th ed.; McCrccw-Hill, 1981. 2. Ne1san.R. D.; El1enbergu.M. RNEPROP: SubrnutimsforNumrricaIPmpngofiii of Umrtainties, No.231: QCPE:Bloommgtan, IN 3. Ehn,B.: Tibahirani,R.Seiencp 1981,253,390. 4. is1 Efcon, B.; Tibshirsni, R. Stat& Sei. 1986,1.54, ibl Diaeonia, P.:Efmn, B. Sd Am. 1985,248,116( M a y ) . 5. Peterson,I.Sclonce News 1981.40, 56. 6. Miller, A R. BASIC Pmgmms for Sciantiats and E@nwrs; S y k Berkeley, CA, ,W,.".,C A " " . ,
7. Benesi. H.A : Hildebrand.J. A.Am Chem.Soc 1849.71.2703 . . 8. Scott, k.LR&. Thzu chim 1957.75.2703. 9. Christian,S. D.:Lane,E.H.;Garland.F.JPhye Chem.1974.78.557 10. Gmndes, J.; Christian, S. D. J Am. Chem Soe.1868.90.2239. 11. Lang. J.R.:Drago,R.S.J C h .Educ. 1982.50, 1037. 12. Rosseinslry, D.R.:Kellaui, H. J. C h m .Soe.(A) 1969,1207. 13. Nurchi,V.; C r i s p i , G. J. Chsm Educ. 1988,66,54. 14. Marales. R.: Diaz. C. C.: Joens. J. A. J. Phve Chem l9BB 92.4742.
Volume 69 Number 11 November 1992
887