The Log-Normal Distribution Function - ACS Publications

PoincarA summed up the situation as long ago as 1892 in his often quoted (3) remark: "Everybody firmly believes in it because mathematicians imagine i...
0 downloads 0 Views 2MB Size
Donald 6. Siono

lowa

The Log-Normal Distribution Function

State University Arnes, lowa

The Gaussian (or normal) prohability distribution function is generally introduced to the undergraduate in several different contexts and receives much attention as a useful, simple mathematical tool. I t has been used to explain or describe such diverse phenomena as the distribution of errors in a physical measurement, the diffusion of particles ( I ) , the ground stat,e of the simplc harmonic oscillator in quantum mechanics, the line shapes of esr and nmr spectra (#), and the meanderings of a drunk about a lightpost. PoincarA summed up the situation as long ago as 1892 in his often quoted (3) remark: "Everybody firmly believes in it because mathematicians imagine it is a fact of nature, and observers that it is a theory of mathematicians." Nevertheless, experimental data rarely conform to the normal distribution function for the following simple reason. From the symmetry of the Gaussian, we know that if an observable quantity takes on a value greater than twice the mean, then it must be able to take on negative values also. But many experimental variables are always positive (e.g., volume, mass, concentration, kinetic energy, absolute temperature, and the entropy change of an isolated system). I n observations of variables of this sort, the Gaussian distribution function is often inappropriately applied-the experimental distribution function is usually positively skewed. Very frequently the observations follow the lognormal distribution function (4, 6). It is simply related to the Gaussian, which can be written as f(z)dz = -exp

4%

(-g)

dz

Figure 1.

A positively skewed distribution function showing the definitions I, and r2 are thevoluesof z where I = ~ 0 1 2 .

of the parameterr:

g(z) and the geometrical meaning of the parameters a, b, and c can be obtained from Figure 1 which sho~vsa typical log-normal curve. A half-width of t.he curve can be defined by W E L2 -

(4)

ZI

and a convenient measure for the skewness by

through the transformation

and thus has the form

where g(z)dz is the probability of observing a value of the random variable z between z and z f dz. The lognormal writt,en in this form has a total area of unity. For a non-normalized distribution function the coefficient in braces before the exponential term is changed to [yob/(z - a)] cxp (-cZ/2) where yo is the maximum of Supported by a. grant from the U S . Public Health Service (AM-01549).

where zo is the value of z at yo, and zl and 2%are the values of z at yo/2. By requiring (3) to pass through the points (y0/2, z ~ ) (yo, , 20) and (yd2, 4, the parameters a, b, and c can be related to 20, H and p by the expressions

Substituting eqns. (6), (7), and (8) into eqn. (3) gives

which is a convenient form for the log-normal in terms Volume 49, Number 11, November 1972

/

755

of the readily obtainable empirical parameters yo,20, H, and p. Through the successive use of L'Hospital's rule it is easy to verify that as p approaches 1.0, the lognormal distribution asymptotically approaches the Gaussian, i.e., the Gaussian is a special case of the lognormal (6). The log-normal distribution has some advantages over the other possible functions which have been used to describe positively skewed distributions (7). Its intimate connection .with the Gaussian through the transformation of eqn. (2) allows many of its properties to be easily derived. The transformation also enables one to quickly carry out numerical calculations and the very extensive tables of the Gaussian can be utilized. The properties of the log-normal have been wellstudied and its utility in many fields has been proven. It has been used to describe the frequency distribution of such diverse quantities aci colloidal particle sizes (S), the shape of the ultraviolet spectra of complex molecules (Q), the response of animals to drugs (lo), the size of foreheads of crabs (IZ), the number of words in a sentence of G. B. Shaw ( l d ) , and the size of particles from a crumbling cookie (15). More recently the utility of the log-normal has been neatly demonstrated (14) in the controversy over the ascorbic acid requirements of man and the prevention of the common cold, as summarized by L. Pauling (15). Perhaps the most fascinating feature of the log-normal is that it is possible to think up diverse mechanisms by which it can arise in physical situations (IS). Three of the more common derivations follow. 1) I t is well known that, with certain mild restrictions, the sum (or arithmetic mean) of many independent random variables yields a quantity that approaches the normal distribution. The transformation described by eqn. (2) quickly shows that the product (or geometric mean) of many independent random variables yields a quantity that approaches the log-normal distribution. This is true irrespective of the distribution of the individual random variables, hut the sum (product) approaches the Gaussian (log-normal) faster, the closer the distribution of the individual random variable is to being Gaussian (log-normal). This is a very simplified statement of the central limit theorem of statistics. 2) A conceptually different mechanism by which the log-normal can arise is known as the "law of proportionate effect." We visualize a stepwise process in which a random variable takes the value x, on the jth step. On the jth step, the change in x is a random proportion of the value obtained in the preceding step, symbolically Thus, after n steps, we have

where sois the initial value, and k, is a constant of proportionality. Taking the log of both sides, expanding it and assuming that the change at each step is small yields

756 / Journal o f Chemicol Educofion

Figure 2. The resolved spectrum of 5-deoxypyridoxd showing the cornparison of the experimental painh (x-line) with rum of the log-normolr (stmight line1 and the indivduol log-normals. The difference of the ex. perimental and the fined sum of the log-norrnak expressed as o percent of the first band rnoximurn molar absorptivity it olro shown.

The central limit theorem as stated above thus requires x./xO to be log-normally distributed. 3) Another way of looking at the preceding example is to write eqn. (9) in differential form as

where k is subject to fluctuations in time and follows an arbitrary distribution function. This is just the equation of a first-order process that is so ubiquitous in chemistry and other fields. There are other mechanisms which generate the lognormal and still others that generate distributions that are close approximations of the log-normal (16). For this reason, due caution should be used in deducing a physical mechanism to explain the set of observations that are log-normal or nearly so. Noting that while four independent parameters such as position, maximum, half-width, and skewness are the minimum necessary to describe a distribution with varying degrees of skewness, it is perhaps well to recall Bertrand's famous quotation: "Give me three constants and I will draw an elephant; give me four and I will make him wave his trunk." Curve fitting is no substitute for deriving a good model of the physical system. A satisfactory theory of the underlying mechanism should not only yield good fits, but must also be able to predict some relation between the parameters (e.g., the dependence of H upon zo) or the dependence of at least one of the parameters upon some external, independent variable

(such as temperature) without introducing still more parameters. At the very least, it must give physically reasonable order-of-magnitude estimates of the parameters used in the curve fitting. Even if no reasonable mechanism can be discovered, the log-normal can be useful in obtaining estimates of areas and higher moments and can permit the relatively convenient and accurate resolution of overlapping bands in spectroscopy (9) and chromatography. An example of the log-normal applied to ultraviolet spectroscopy is shown in Figure 2. One log-normal is fitted to each peak by a method of least squares, and the area under each peak is easily obtained from the parameters. In spectroscopy the area is proportional to the oscillator strength, which can he obtained from theoretical calculations. I n chromatography it is proportional to the amount of substance that has passed through the column. In summary, it is clear that the log-normal is a useful function that could be profitably introduced to the undergraduate, and probably used more frequently in chemical research, because of its relatively simple

analytical form, its relation to the Gaussian, and especially the diverse physical mechanisms and models that can generate it. Literature Cited (1) ANDERSON. L. B.. A N D REILLT.C. N.. J. CHEM.EDOC..44, 9 (1967). (2) P e m ~ n ~L.. a . J. CHEM.EDUC..44.432 (1967). H. ~..quoted frem J. W. Mellor. Higher Mathematics for stu(3) P o ~ w c * ~ d e n t ~of Chemistry and Physics." Longmans Green s a d Co., 1922. (4) Amcnssos. J.. A N D BXDWN.J. A. C., "The Log Normal Distribution with Special Reference t o Its Uses in Economics," Cambridge Univeraity Presa. Cambridge: 1957. (5) G ~ o n u rJ, . H..Nature. 156,463 (1945). P.. Ann. Moth. Stol.. 4, 30 (1933). (6) YUAN. (7) E ~ ~ e n ~W. o sP., . "Frequency Curves end Correlation." Charles and Edward Loyton. London. 1927. (8) H E.~~ D A.NG., . "Small Particle Statistics." Elsevier. Amsterdam. 1953, D. 113.

(9) S u ~ o D. . B.. A N D METZLER, D. E.. J . Chem. Phss.. 51, 1856 (1969). A fortran program that resolves rpeotra in digital form into its componenta is available on request. (10) FINNET,D. J., "Probit An&ly&" (2nd ed.). Cambridge University Presa, Cambridge, 1952. (111 . . KEPTEYN.J. C.. "Skew Freouenov . . Curves in Riolom .. and Statistics." N O O ~ ~Grooingen, ~ ~ K , 1903, C. B..Biomelrika.31,356 (1940). (12) W~LLIAMS. (13) Kocr. A. L..J . Theor.Bio1.. 12,276 (1966). E .~L., E LJR., , Biomicnco,21,981 (1971). (14) S P I T Z N ~ C L.."Vitamin C and Commnn Cold.'' W. R. Freeman and 1.1.5 1 PAULINC. ~. Company, San Frsnciseo. California. 1970. (16) K o c ~A. . L.,J. Theor. Biol., 23, 251 (1968). ~

~~

Volume 49, Number 1 1 , November 1972

/

757