Applications of the Gaussian Error Function in Analytical Chemistry

Using QuattroPro to examine the properties and fundamental statistical parameters of Gaussian error curves. Keywords (Audience):. Second-Year Undergra...
1 downloads 13 Views 5MB Size
edited by P. BIRK JAMES Arizona Slate University

computer series, 163

Tempe, AZ 85281

Henry Freiser

University of Arizona Tucson. AZ 85721 There is growing awareness of the pedagogic value of using computer spreadsheets even in beginning science courses because they can significantly alleviate math anxiety by simplifying the arithmetic and visualizing problems. This enhances student understanding of the underlying scientific concepts of the "global problem surface". An important example is the use of the spreadsheet to describe a Gaussian curve. It makes this essential function readily accessible even to beginning students ( I ) . Properties of the Gaussian error curve can be examined and fundamental statistical parameters understood with the application of spreadsheet operations. In this paper, the QuattroPro (version 4.0) has been used. Although other spreadsheets are in common use, QPro certainly ranks high in graphies capability as well as other features. The centerpiece of statistical treatment of data is the Gaussian error function, which gives mathematical expression for the behavior of random errors. Errors are just as aften positive as negative. Smaller errors are mare probable than larger ones.

where represents the mean value ofx; and o i s the standard deviation for an infmite number of samples. In this formulation of the Gaussian function, z is a relative deviation, =

* o

expressed in units of o, thus permitting the use of a single graph to describe a "generic" Gaussian m e . A spreadsheet representing the Gaussian curve is constructed quite simply (Spreadsheet commands cited are for QPro.) In column A, labeled r , fill block A2 to A1003 with values from 4 to + 5 in steps of 0.01 (IEF). In column B, labelled y, enter the value of y using the formula in eq 1 and a value of o = 1. Complete the rest of cells B2 to B1003 using black copy

(IEC). Generate anXYgraph using the content of cells B2 to B1003 as Series 1, and A2 to A1003 as the X series.

This results in the familiar bell-shaped curve (Fig. 1). Confidence Intervals and Confidence Limits

Computing the Areas

Using the Gaussian curve, the probability of obtaining a result within a certain interval around the mean can be related to the area under the curve within that interval.

Figure 1. Gaussian error curve. First obtain the area under the entire curve from -50 to +5o. In the spreadsheet, we have divided the entire curve into 1000 rectangles, each having a width of 0.01 and a height ofy, given by the equation for the Gaussian curve and listed in column B. Because the width of each rectangle is constant (0.01),the area for eachis 0.01 xy. The total area under the curve is 0.01 xZy, which is simply obtained using the function + 0.01*@SUM(B2..B1003)and locating the answer in any convenient, unassigned cell (e.g., F2). This value is near 1, as required due to the nature of the Gaussian curve. To find the areas of the curves between f lo, 20, and 30 simply locate the appropriate intervals in column A. Then find the areas, multiplying the values @SUM(B401..B602), @SUM(B301-B702),and @SUM(B201-B802)by 0.01 using cells F3, F4, and F5. These values, when multiplied by 100, represent the percent area under the error curve within the designated limits. They are exactly equivalent to the percent probability of finding a single additional value between f lo, 20, and 30 (about 68,96, and 99.7%). Use of the Curves

The intervals for values included around the mean (flo, 20, etc.) are called confidence intervals, and the percent probability of fmding values within them is called a confidence level. A useful graphical display of the relationship between these two quantities may be seen in Figure 2. For example, consider a very large (about 30 samples is a reasonable approximation ofinhity) data set describing the copper content of an alloy as 41.26 0.12 (std. dev.) %Cu. Any future samples of the same alloy will yield results that fall between 41.02 and 41.50% Cu (confidence interval of ?2o) with a 96% probability (confidencelevel). An important corollary is that the chance is very small (100 - 96% = 4%) that values outside these limits are of indeterminate origin and may be discarded as invalid, even if the cause is unknown. Alternatively, such values can indicate that the samples were different from the rest

+

Volume 71

Number 7 July 1994

549

PERCENT PROBABILITY vs. MULTIPLES OF SIGMA

Symbol

Symbol

N

No. of theoretical D-A plates distribution ratioa

D A ~

L

Column length

MOL-A

wmol of A

A

Column cross section

TR-A

retention time for A

40

u

Flow rate

R-v

=Vim

30

T h e distribution ratio D is the ratio of concentrations of a component in the stationary and mobile phases. Multiplied by Rv, the phase volume ratio, it becomes K, the capacity factor. 4~dditional symbols representing components E F were also provided block names, a total of 30, including those tabulated above.

too 90 80

> '0 60 m

2

so

20 10

Figure 2. Confidence levels at various confidence limits. with a 96% probability. These concepts are used universally for the analysis of experimental data. Simulation of a Chromatogram with a Spreadsheet

To construct a simulated chromatogram using the spreadsheet is a relatively simple and straightforward modification of eq 1. In this application of the Gaussian curve, the situation is only slightly more complicated because the width of the curve, as measured by 0, will increase with the retention time t , of the component on a column of given efficiency (i.e., a given number N of theoretical plates). This is covered by defining o as tJNm. In this application, we recognize that the function y is expressed as C, a concentration expressed as an amount in micromoles per minute, and p changes to tR,z to t, o to t d m , etc., to produce

Then set aside a t least 60 cells to accommodate the blocks and (prefaced by ['I) their meanings. A convenient location would be the first 7 or 8 rows in columns A to G. Now, starting in column A (e.g., cell A101 use /EF(block fill) to enter a series of times in 0.1-min intervals fmm about 2 to 15. Then enter formulas for Ca. ... Ca. -, ete..,in suecessive columns using the formula in eq 2. A seventh column can be written for the sum of all the C values at each time, giving the observed chromatogram. As seen in the figures, curves for each of the six components are shown, as well as the envelope of all six present. Mixtures with more than six components can also be solved this way. In Figure 3, six components are separated on a 500-plate column. For the six separate components shown, the width increases with increasing tE,as occurs in actual chromatograms. The envelope, as overall chromatogram, shows only two broad and overlapping peaks. Figure 4 illustrates how the appearance of chromatogram varies with column efficiency; in a column with N = 1500, the chromatogram shows evidence for five of the six components. Use of the Gaussian Equation in Simulatlon of Spectra

The Gaussian equation can be used a s an idealized shape of many spectral absorption bands and can be useful in deconvoluting overlapping bands. As seen in the chromatographic simulation, it is simple to vary the position of individual bands.We must change only the tR to-,,? and the X coordinate i?om t to I where $Dais the distribution ratio ofA, and $D$Rv is the capacity factor.

~imula'tedChromato am N

and

D(A) to D(F): 0.25, 0.75,%, 1.5, 2.5, F 2

500

300 250

which is used as the basis for the spreadsheet development. The $ in front of various symbols in eq 2 indicates that these are block names. Absolute cell addresses can be used in spreadsheet formulas, so a single spreadsheet can serve for a large number of problem. However, named blocks are better for this kind of problem because it is easier to remember the symbols than the absolute addresses when large number of adjustable parameters are used. For example, if the cell that contains the number of theoretical plates N is named N, then $N can be entered in the formula. This is much easier to remember than, for example, the name $B$2. It is also far less subject to mistake. Other block names used in eq 2, are tabulated here. 550

Journal of Chemical Education

-! 8

200

\

I50

X

8

100

3 50 0 2.5

3.5

4.5

5.5

6.5 7.5 8.5 1WE, mi=

9.5 10.511.512.5

Figure 3. Simulated 500-plate chromatogram for a six-component mixture.

Simulated Chromatogram, N

D(A) to I@): 0.25,0.'/5, 1.0, 1.5, 2.5, 3.2

1500

Absorption Spectra of Thymol Blue 0.8

0.7

500

0.6

450

4

0.5

8 6 0.4

400

?

350

E:

\300

0.2

!250

0.1

0

8a 200

K

0.3

0

-0.1

200

250

300

350

150 100

400 450 500 550 Wovelength (nm)

600

650

700

Figure 6. Spectra of acid and base forms of thymol blue

50 04 I I I ' I I I 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.511.5 12.5

In this application, the width can be made unif01111for each of the spe&al bands or varied if desired. As an illustration, we started with the spectrum of thymol blue, a colorimetric acid-base indicator.It is usedin a solution at pH 6.11(3 units below its p&, representing the 99.9% basic form). After entering this data in a QPro spreadsheet, we tried to match a series of four Gaussian curves whose position (La), height (absorbam A), and o (width) were adjustable. After about 15-20 min of adjusting these parameters by visual observation of the curve representingthe sum of individual Gaussian curves. Firmre 5 was obtained. Althoueh there is no auestion that cdm-ercial curve-fittingprogram'$ could do a m k h better iob. there is distinct Dedameic value in the directness and ~ i & ~ l i &ofy this type oiexe&ie.

Naturally, the reliability of all determinations (single-com~ o n e n and t ~articularlvmulticom~onent)would simifi&tly improve if measurements on the sample were carried out using a large number of different wavelengths. Each suchmeas;rernent results in an additional independent equation ol'the variables. The array of these equations can be treated by the spreadsheet regression f b c tion. The improvement will be seen in the smaller standard deviations in the values determined. Naturally, standard spectra representing known concentrations of each of the components of the mixture must be obtained before the determinations. Figure 6 presents a special example of binary-mixture analysis as the spectrophotometric determination of equilibrium wnstants possible when the conjugate variables each have characteristically different spectra. By this technique, it is possible to determine the pK, spectrophotometrically with far greater reliability than that obtained using most potentiometric methods; it also provides a means of calibrating glass electrodes accurately (2).

Multicornponent Spectrophotornetric Analysis

Literature Cited

IIMEmin. Figure 4. Effect of column efliciency on the appearance of a chromatogram: N = 1500.

.-."..,

1. R e i w H.Comptsond Coleuloliona in Anolytiml Ckmistry; CRC bear: BocaRa-

Another, simpler, useful application is the demonstration of the use of the spreadsheet in solving multicomponent spectrophotometric analysis when experimental spectra are not available. Course instructors will find it convenient in designing homework or test problems. As many as six or seven components can be determined.

Thymol Blue Spectrum & Simulation

0.3

$ 0.2

3-9 A""".

2. Yamaeaki. H.; Sperline, R. P;Reiaer. H.'Spe~tmphotomebie Determination of pH and its Applicaion to Determination of Thermdpamic Eqdibrium constants-; Andytiml Chemishy 1992.64.2720-2125.

Spreadsheet Tools For Solving One-Equation Chemical Equilibrium Problems Bhairav D. Joshi State University College Geneseo, NY 14454-1494

A central question in dealing with chemical equilibrium problems is this:

0.25

J

a

Given the equilibrium constant for a reaction as writfen and the initial concentrations orpartial pressures of all the species inuolued, what will be the concentrations orpartialpressures of uarious species once the reaction reaches its equilibrium position?

2

yIo.15

5gi

O.'

The eauilibrium concentrations of various s~eciesare

10.05

governed by the extent of the rr;ictionl In a recent series

0 250

300

275

350

325

400

375

450

500

425 475 525 HAVElENGTR nm

550

600

575

650

625

Figure 5. Simulation of thymol blue base form with Gaussian model.

'The extent of a react on measdies ts degree of progress startlng fromthe initial amounts of materials in the reaction mixture. It is measured in moles (or other related units) and is defined as the common factor by which the amounts ofvarious species change during a reaction. See also footnote2. Volume 71 Number 7 July 1994

551