Research: Science and Education edited by
Advanced Chemistry Classroom and Laboratory
Joseph J. BelBruno Dartmouth College Hanover, NH 03755
The Role of Fisher Information Theory in the Development of Fundamental Laws in Physical Chemistry J. M. Honig Department of Chemistry, Purdue University, West Lafayette, IN 47907;
[email protected] Many fundamental equations in chemical physics may be derived from a single basic principle that rests on the achieving an extremum in the Fisher information measure, the latter developed by R. A. Fisher (1) in 1922. The unifying principle that involves rendering the Fisher information measure an extremum is reviewed. It is shown that with this principle, in conjunction with appropriate constraints, a large number of fundamental equations can be derived from a common source in a unified manner. A much more rigorous, informative, and extensive presentation may be found in a book by B. Roy Frieden (2); see also the articles cited in (3). Readers are strongly urged to peruse these citations for a proper insight into this remarkable area. The resulting economy of thought pertaining to fundamental principles deserves to be much more widely appreciated. Basics An observer records the measurement of a physical variable, such as position, indicated by y, which quantity in general differs from the correct value θ of the actual position. Let xn be the deviation of the nth measurement from the correct value, so that
yn R xn
e2 z
Rˆ y R
2
(2)
where angular brackets denote expectation values. We shall see shortly that e 2 cannot be reduced indefinitely; there will always remain an irreducible difference between θˆ(y) and θ. The rather austere mathematics that follows invites an illustration by means of an example. Consider a box containing one He atom, surrounded by a tangle of heating wires that generates a very uneven temperature profile within the box. It is desired to determine the average position θ of the atom inside the enclosure. This is a well-defined, though unknown, quantity that can be altered by changing the position of the wiring. It is in this sense that one can carry out a partial differentiation with respect to θ in eq 4. To determine θ an experimentalist takes (perhaps not literally) N successive snapshots of the He atom to calculate the
116
e
Rˆ y R
Rˆ y R
dy p R, y
0 (3)
e
Equation 3 does not contradict eq 2 since deviations of θˆ(y) from θ may be negative as well as positive. Here and below the integrals are taken to be with fixed limits, usually between ‒∞ and ∞. Next, differentiate with respect to θ; for fixed integrals one may execute this step under the integral sign: v vR
(1)
records the measurement in terms of its deviation from the correct value. The observer can then fashion an optimum value θˆ based on the estimates of θ obtained in N repeated measurements. For example, the sample mean θˆ(y) = N–1∑Nn=1 yn, with y = { yn }, usually provides a better approximation to θ than does a single measurement. The observer seeks the best estimate available in this process. Under the assumption that all determinations are unbiased this is accomplished by minimizing the mean square error given by
average value, θˆ(y) = N–1∑Nn=1 yn, over all measurements of the positions yn in the various snapshots. The notation θˆ(y), y = { yn } is meant to emphasize that the average value depends on the particular sequence of the N measurements adopted in any one run: different repetitions of snapshot runs, particularly with different N, will give rise to different θˆ(y) values. Let p represent the probability that a value y is encountered for a particle in an ensemble for which the mean value of the observations is θ. If the measurements are unbiased we write for the expectation value of the deviation from the correct value:
e
dy p R, y Rˆ y R
e e
dy e
vp R, y vR
Rˆ y R
e
dy p R, y
(4)
e
0 The last integral converges to unity because the probabilities are normalized. Also, we set vp R, y vR
z
v ln p R, y vR
p R, y
(5)
Rˆ y R 1
(6)
hence, eq 4 may be rewritten as e
dy e
v ln p R, y vR
p R, y
Square the above equation: e
Mz
dy e
v ln p R, y vR
p R, y
t (7)
2
p R, y
Rˆ y R
1
Journal of Chemical Education • Vol. 86 No. 1 January 2009 • www.JCE.DivCHED.org • © Division of Chemical Education
Research: Science and Education
Next, invoke the Schwarz inequality: for two complex functions f(x) and g(x) one finds that
b a
2
f x g x dx c
b
f
a
x f x dx
t
b a
(8) g x g x dx
When f and g are real functions, eq 8 as applied to eq 7 yields e
M 1 c
dy e
v ln p R, y vR
2
(9)
e
dy
t
p R, y p R,, y
Rˆ y R
2
e
In most cases of interest the probabilities obey the "shift relation" (10) p R, y p y R p x Then, the first integral in M, eq 9, is defined to be the Fisher information measure, I: e
I z
dx e
d p x
dx
2
1 p x
(11)
The second integral is defined to be the mean squared error, e2 for measurements of an ensemble of particles
e2 z
e
2 dx p x Rˆ y R
(12)
e
which agrees with the definition, eq 2. As an aside, if one applies the above to the position x of a particle, then ex2 determines in repeated measurements the square of the fluctuations from its mean position; hence ex relates to the uncertainty in our knowledge of its exact position. Continuing now with the main argument, on applying eqs 11 and 12 to eq 9 one obtains the so-called Cramer–Rao inequality: (13) 1 M c e 2 I or e 2 s 1 / I number.1
Obviously, for real x, I is a positive Thus, the above equation sets a lower nontrivial bound on the mean square error or uncertainty in any measurement. This has obvious repercussions in formulating the Heisenberg inequality in quantum mechanics, discussed below, but also refers to inevitable errors in the physical measurement of any observable quantity. When the equality obtains one faces an irreducible minimum in the measurement errors. Two limiting cases are worth noting: (i) The distribution p is “flat”. Here the measurements are more or less uniformly distributed over the entire interval –∞ < x < ∞. Then dp∙dx ≈ 0, I → 0, and e 2 is essentially unbounded. A broad and smooth distribution implies considerable randomness and a large mean square error. (ii) The distribution p(x) peaks sharply about x0.
Now (dp∙dx)2 is large near x0 but close to zero elsewhere. I grows larger and e2 smaller the more peaked the distribution becomes, corresponding to a smaller degree of randomness. In view of the above, I is in fact a measure of the disorder of a system. This leads to the association of I with the negentropy defined as Η = –∫ dx p(x) ln p(x) (Η is not "aitch" but cap Greek "eta".); the relationship can be properly established through discussions beyond the present purview. Since I relates to negentropy it follows that dI∙dt ≤ 0; that is, in repeated measurements of any process under consideration I diminishes and tends toward a minimum, at which δI = 0. The special circumstances that characterize the processes occurring in a particular system are introduced via constraints, such as requiring all distribution functions to be normalized, requiring the energy of an isolated system to be fixed, or more generally, introducing constraints of the type discussed in ref 2. It follows further that the mean square error grows as the Fisher information diminishes with increasing time. That is, random processes that are inevitable in any physical measurements become increasingly more severe. However, when the constraints are applied, as shown below, p(x) approaches a constant, lower limit—that is, a stationary state—in which I assumes its minimum value. This is the essence of the extreme Fisher information (EFI) principle, which is at the heart of the derivations described below. We now consider several specific cases of interest. Most of these utilize prior knowledge levels where EFI becomes the minimum Fisher information principle (MFI). The Schrödinger Equation Consider again the problem of locating a particle anywhere within a one-dimensional region. In the absence of any constraints the particle could be anywhere with equal probability. As mentioned above, this translates into the requirement that the Fisher information should be at an extremum. However, since the particle is subjected to an external, specified potential V(x) the probability of locating the particle will be influenced by it. Relative to the unknown total energy E the mean kinetic energy of the particle, , which is positive, is subject to the constraint e
K
e
dx p x E V x 0
(14)
This constraint is handled in a manner analogous to the method of Lagrangian multipliers: Instead of minimizing I (eq 11) one minimizes the augmented function e
dx e
d p x
dx
2
e
M0
e
1 p x
(15)
dx E V x p x minimum
where λ0 is an undetermined multiplier. It is convenient at this stage to introduce a probability amplitude q(x), which is defined by
2 p x q x
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 86 No. 1 January 2009 • Journal of Chemical Education
(16)
117
Research: Science and Education
whereby eq 15 is rewritten as (λ = λ0∙4) e
L
dx e
e
2
dq x
dx
M e
dx E V x q x
2
(17)
minimum As is well established (see the online material for details), one handles the extremum problem by solving the (nontemporal) Euler–Lagrange equation d dx
vL vq b
dL 0 dq
L q b 2 M q2 E V x
At this stage it is vital, for purposes of proper treatment, to regard the symbols q and q′ as unrelated quantities. Straightforward execution yields the solution
d2q
q E V x 0
dx 2
(19)
This is seen to be a disguised form of the Schrödinger equation; in usual notation q is replaced by ψ and λ, by –[h2∙(2m)]–1. Aside from the obvious tour de force one should note how the second derivative has entered eqs 11 and 17 through use of the Fisher information function I. The presence of this derivative in the conventional presentation of Schrödinger’s equation has always been considered as somewhat of a puzzle; one generally simply posits that the classical momentum should be replaced by the operator –ihd∙dx. Here the second derivative occurs in a natural setting without requiring ad hoc arguments. In a sense the Schrödinger equation is simply a consequence of minimizing the Fisher information subject to the constraint eq 14.
We now render more explicit the relation between the Rao–Cramer and the Heisenberg inequalities. Since the latter relates uncertainties in position x and momentum μ (we reserve the symbol p for probabilities) we introduce the probability amplitude φ(μ) for momentum. This may be split into a real and imaginary contribution φ(μ) = φ1(μ) + i φ2(μ) in momentum space. Each of these can be related to the real and imaginary parts of the amplitude function in real space introduced earlier, namely ψ(x) ≡ q(x) = q1(x) + i q2(x). Their interrelation is established via a slightly modified Fourier transform,2 as specified by K j N 2 Q h
j
1 / 2
dx qj x exp i N x h
1, 2
(20)
The probability density in μ space is written out in terms of the φj(μ) as
118
p N K N
2
K1 N
2
K 2 N
2
C
C z i K1 N K 2 N K 2 N K1 N
(21)
dN N2 p N
dN N2
K1 N
2
K 2 N
2
(23)
the odd function in C having dropped out in the integration process. On the other hand, the Fisher channel capacity obeys the relation I 4 dx q b x
2
4 dx
q1b x
2
2
q 2b x
(24)
Next, construct the ordinary Fourier transform Q j( y) of qj:
Q j y 2 Q
1 / 2
dx qj x exp i y x
(25)
Relate this quantity to eq 24 and introduce the Parseval theorem as applied to the indicated derivatives in the manner described in the online material. This leads to
I 4 dy y 2
Q1 y Q 2 y
2
2
(26)
Comparison of eq 25 with eq 20 shows that on setting y = μ∙h (y is thus seen to be the wave number) we obtain the relation (27) Q j y h Kj N h z h Kj y When introduced in eq 26 one finds that
The Heisenberg Inequality
N2
(18)
q b z d q dx
1 M
We next specify C in the form of an integral involving x1 and x2: use eq 20 and set q1 = q1(x1) and q2 = q2(x2). It is straightforward to show that C assumes the form 1 C N dx1 dx2 q1 x1 q2 x 2 t Qh (22) sin M h x1 x 2
rendering C an odd function of μ. The expectation value of momentum squared is then found by introducing eq 21
I 4 h2 4 h2
dN N2
2
K1 N K 2 N
dN N2 p N z 4 h 2 e N2
2
(28)
where p(μ) is the probability of encountering the momentum μ. By the argument set forth below eq 12, the quantity eμ specifies the uncertainty in momentum of the particle. In light of eqs 28 and 13 we then find that or
1 ex 2 c I 4 eN2 h 2
(29)
ex e N s h 2
(30)
which is one version of the famous Heisenberg uncertainty principle. The Maxwell–Boltzmann Distribution Law We go back to eq 14 and now apply the ordinary equipartition law to the expectation value of the kinetic energy of particles at temperature T moving in one dimension:
K
dx m x 2 2 p x k T 2
(31)
Journal of Chemical Education • Vol. 86 No. 1 January 2009 • www.JCE.DivCHED.org • © Division of Chemical Education
Research: Science and Education
This represents one constraint that is relevant to our discussion. A second is the normalization for the probabilities of encountering gas molecules in the domain –∞ < x < ∞, as specified by d x p x 1
(32)
On applying the method of undetermined Lagrangian multipliers λ and ν we seek a minimum of the function that also involves the Fisher information: 2
dx p b x
L
p x
dx m x 2 2 p x k T 2
M
O
(33)
d x p x 1
The corresponding Euler–Lagrange functional is given by L p b 2 p M m x 2 p 2 O p
(34)
On inserting this relation into the Euler–Lagrange equation 18 we obtain
2
d dx
pb p
pb p
2
1 M m x 2 O 0 (35) 2
For convenience we define an auxiliary function h x z p b x p x
(36)
that converts eq 35 to the Riccati equation 1 M m x 2 O 0 (37) 2 As may be verified by direct substitution, one solution of this equation is given by h(x) = Bx, where, B is an arbitrary constant. One may then solve the above equation for x. However, for present purposes it suffices to replace h(x) in eq 36 by Bx and to solve the resulting differential equation 36 for p(x) to obtain
2 h b x h 2 x
p x a exp b 2 x 2
(38)
For convenience, the constant in the exponent has been taken as ‒b2. One may determine the constants a and b2 by substituting eq 38 into the constraint relations eqs 31 and 32. These definite integrals (with limits ‒∞ to ∞) converge to (∙π)∙b and to (∙π)∙(2 b3), respectively. One thus obtains
p x
It is remarkable that the diverse laws cited above may all be linked to one overarching principle of rendering extremal an appropriately constrained Fisher information unit. The above procedures may be carried further (2) to generate the basic diffusion equation, the Klein–Gordon3 and Dirac equations, the Maxwell equations, the Einstein field equations, optical detection limits, various theorems in statistical mechanics, and various fundamental relationships in the field of electrical engineering, communications theories, and economics. Readers are invited to pursue these topics on their own. The economies of thought that are implicit in basing all the above theoretical constructs on one fundamental approach deserve to be more widely appreciated. Acknowledgment
minimum
Retrospect
m 2Q kT
1 /2
exp
m x 2 2 kT
(39)
which is the Maxwell–Boltzmann distribution law for one dimension.
The author gratefully acknowledges the careful reading, criticisms, and perceptive commentary on the initial draft of the typescript by B. Roy Frieden. Notes 1. There are important cases in which x is a complex quantity, as when it is used to represent time coordinates in quantum theory. 2. As is well established, and implicit in eq 20, the Fourier transform in effect allows us to switch variables from the domain of real space (namely x, which gets “integrated out” in the integration) to the domain of momentum (or reciprocal) space μ that is of further interest. Equation 20 contains the factor 1/h in the exponent and a compensating term in the prefactor; these are absent in the conventional definitions for a Fourier transform, as in eq 25. 3. A derivation of the Klein–Gordon equation based on the Fisher principle and extensions thereof are available on request to the author.
Literature Cited 1. Fisher, R. A. Phil. Trans. R. Soc. A 1922, 222, 309–368. 2. Frieden, B. R. Science from Fisher Information; Cambridge University Press: Cambridge, 2004. 3. Frieden, B. R. Am. J. Phys. 1989, 57, 1004–1008. Frieden, B. R. Phys. Rev. A 1990, 41, 4265–4276. Frieden, B. R. Phys. Lett. A 1992, 169, 123–130.
Supporting JCE Online Material
http://www.jce.divched.org/Journal/Issues/2009/Jan/abs116.html Abstract and keywords Full text (PDF) Supplement Derivation of the Euler–Lagrange equation and Parseval's equality
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 86 No. 1 January 2009 • Journal of Chemical Education
119