Anal. Chem. 1900. 62. 2142-2144
2142
17501 1300
0
A
Mean Error
Pi
1
550
an intercept of -0.495, and an 1.2 of 1.00 results; i.e. the errors produced by eq 2 (e) are
l17.5
E
13.0
650
850
750
950
PMT Voltage Figure 1. Means and errors between means and estimates computed from eq 2 plotted as a function of applied PMT voltage.
were compared to the standard mean and variance computations of the same data. RESULTS AND DISCUSSION The data are summarized in Figure 1 where the ten means computed for each data set are plotted as a function of applied PMT voltage. There is an exponential increase in intensity with applied voltage as expected. The variation of the data also increases exponentiallybecause of the domination of shot noise. In all cases the mean and MLE computed iteratively from eq 4 were equivalent within the round-off error of the computation, i.e. absolute errors were less than Also plotted in Figure 1 are the differences between the results of eq 2 and the mean. Computed in this order positive errors result which increase in magnitude as the mean increases. The variability in these errors also increases as well because of the shot noise domination in the signal. When these errors are plotted against cy, a line with a slope of 0.497,
Sir: The first part of this reply to Ron Williams’ comment (I) summarizes and explains in more detail the counting statistics that were used to derive the equations at issue (2). A review of this material is necessary in order to point out some of the fundamental implications of the equations derived in ref 2. In the second part, an explanation of the error in ref 1will be used to illustrate a problem in interpretation of analog data with formulas derived from counting statistics. In essence, the maximum likelihood estimator equation derived in ref 2 was incorrectly applied by not accounting for the conversion between photon counts and photomultiplier current. It is well-known that the outcomes of coin flipping, multiple step solvent extraction, discrete material sampling, and photon or ion counting are all described by the binomial distribution. The binomial distribution is characterized by the probability density function (PDF) (3) P n ( k ) = @)@(I-
(1)
PJk) is a normalized PDF giving the probability that an event or particular outcome will be observed k times in n independent trials and p is the probability that the event will occur. Both k and n are integers since there cannot be fractional event counts. Although they are in fact unitless 0003-2700/90/0362-2142$02.50/0
= 0.497a - 0.495
(11)
This is in very good agreement with the difference between eq 4 and 2 if the first terms under the radicals are ignored. Since a varies from around 1to over 30 in these data, absolute errors of up to 15 were observed. This corresponds to a relative error of ca. 1%. With eq 11 the differences between the Poisson and Gaussian calculations presented in Table I of ref 1 can be explained by assuming slight differences in the variances of the synthetic data sets on the order of 0.1-1 % which propagate to produce the same size variations in the computed differences. Apparently the variances are consistently larger than the means since eq 2 consistently produces results which are less than the Poisson estimators. The use of the mean is superior to eq 2 because it is more robust, more accurate, and simpler. Equations 4 and 5 are as accurate as the standard formulas but are inferior because they must be solved iteratively. LITERATURE CITED (1) Bialkowskl, S. E. Anal. Chem. 1969, 61, 2479. (2) photomoniplier Handbook (PMT-62);RCA SolM State Division: Lancaster PA, 1980; Appendix G. (3) Wonnacott, R. J.; Wonnacott, T. H. Introducfov Statistks, 4th ed.; Wiley & Sons: New Ywk, 1985: Chapter 18.
Ron Williams Hunter Laboratory Department of Chemistry Clemson University Clemson, South Carolina 29634
RECEIVED for review December 27,1989. Accepted May 21, 1990.
parameters, they have implicit count units and are restricted to positive values. The parameter p is continuous,falls within the finite range, 0 5 p 5 1,and is also a unitless parameter. It is convenient to characterize this or any PDF by a central or most probable value and a distribution width or variance. Since the binomial PDF is discrete, the most probable outcome is that integer, kmp,for which the probability is highest. One way to find k,, is to plot the probability of obtaining a given outcome, k , versus k , and then select that which has the highest probability. An easier way is to use the expectation of k . The expectation, ( k ) , is given by
The summation in ref 2 results in ( k ) = np. The expectation is not the most probable value because ( k ) is a real number. The most probable value is that integer that is closest to ( k ) . The width of a PDF is most often characterized by the variance. By definition, the variance is the difference between the expectation of the square and the squared expectation. For the binomial, u2 = ( k 2 )- ( k ) 2 ,where ( k 2 )is the expectation of the square. The expectation for the binomial is ( k 2 ) = np(1- p ) + ( n ~ and ) ~ thus , the variance of this distribution is u2 = np(1 - p ) . The distribution variance is a continuous 0 1990 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 62, NO. 19, OCTOBER 1, 1990
parameter and ita square root is related to the standard deviation of the normal distribution which most resembles the binomial. In fact, the binomial PDF may be approximated by the normal distribution in the limit of np(1- p) >> 1. This limit is reached when the number of trials, n, is large and is often referred to as the law of large numbers. Using a Taylor series expansion of In { P , ( k ) )about the expectation, k = n p , it is easily shown that the binomial PDF is approximated by P(kln,p) = [27rnp(l - p)l-’l2 e x p W - n ~ ) ~ / [ 2 n p -( lp)lJ (3) where P(kln,p) is the probability of obtaining an outcome of k events in n trials. In this limit, measurement k can be thought of as continuous. Equation 3 is perhaps more facile to interpretation based on our knowledge of the normal distribution. The most probable value is just k,, = np and the variance is u2 = np(1- p ) . The normal distribution is the asymptotic limit of the binomial for large numbers of independent measurements, n. When the binomial probability parameter is also small, (1 - p) = 1, and eq 3 is approximated by
P(kln,p) = (27rnp)-’l2exp[-(k - n ~ ) ~ / 2 n p ] (4)
. Since n and p are fixed, eq 4 may be written P(kla) = ( 2 ~ a ) - ’ exp[-(k /~ -~)~/2a]
(5) where p(kla) is the PDF for a continuous measurement of k counts from a distribution parameterizedby a. The restriction of having an integer number of measurement parameters is now lifted since a is a real number. Reference 1 states that a Gaussian model is “formulated so that the signal variance is equal to its most probable value ... by substituting the mean for the variance.” This is a misleading statement. Equation 5 results directly from the binomial distribution under the conditions given above. In fact, eq 4 (or 5) is equivalent to the binomial in the high count limit (3). Equation 5 serves as the basis for the maximum likelihood estimation (MLE) equations derived in ref 2 and is the source of contention in ref 1. The MLE of the a parameter is based on the data obtained from repeated measurements. With each replicate measurement, an independent photon count, ki,is obtained. Note that in this reply, k is used in place of y in refs 1 and 2 as the measurement variable to stress the point that this is an integer value with implicit units of counts. The probability of obtaining a particular set of measurements from a parent distribution characterized by a single a parameter is Pbt = nP(kjla) (6) I
where each of the N replicate measurements has a probability of P(k,la). Although the PDF describes the outcomes of discrete events, it is a function of the continuous parameter, a. And thus differentiation of the total probability in eq 6 with respect to a is valid. The MLE of the a parameter is obtained by setting the derivative of eq 6 with respect to a equal to zero and solving for the parameter. Reference 2 gives the solution resulting from these steps N
ami= (74
+ Ckj2/N)1/2 J=1
- 72
(7)
where amiis the maximum likelihood estimate of the a parameter characterizingthe parent population. As pointed out in ref 2, this result is surprising in that it is not in agreement with the simple mean obtained by performing the same steps on the total probability obtained with the Poisson PDF N ami=
Ekj/N
j=l
(8)
2143
The interesting thing here is that both normal and Poisson PDF are obtained as limiting cases of the binomial PDF. Moreover, the binomial, normal, and Poisson PDF all have the same mean and variance in the small p limit. Reference 2 showed that both MLE estimates had the same relative errors based on synthetic data. Ron Williams extended eqs 5 and 7 by including an adjustable constant of proportionality, a,to “account for other noise sources in the measurement.” Justification for including this proportional factor based on the measurement physics was not given. As shown above, a is only founded in the small p approximation used to obtain eq 4 from eq 3. But as used in ref 1, this adjustable variance proportionality factor constitutes an additional parameter and so maximum likelihood estimates must be obtained by joint maximization of both a d and CY. Since a is allowed to vary, aad is equivalent to u2 of the normal PDF. Including the independent a parameter yields results equivalentto that for normally distributed errors, i.e., those described by independent mean, a d = C y / N , and variance, u; = ( C y 2 ) / N - ( C Y / N ) ~This . is in fact the same conclusion arrived at by Williams. But it would have been easier to find the mean and variance from the normal equations and then to calculate a = u2/amb In essence, the adjustable parameter “washes out” any of the inferences that resulted from using the normal PDF equivalent to the binomial PDF. The significant outcome of Williams’ proof is that a statistical equivalence between the MLE parameters obtained with the Poisson and normal distributions derived from the binomial is established. Equations 6-10 in ref 1 show that the equation a t issue, eq 7 above, is in fact a valid estimate of the a parameter for a unit proportionality parameter. By use of eq 7 with an independent estimate of the variance (either C(y - U ) ~ / N or, C(y - ( Y ) ) ~ /as N in ref 1, it may be shown that eq 7 reduces to eq 8. These equations are not equal but rather both are estimators of the same parameter. Indeed, ref 2 points out that eqs 7 and 8 are both valid and, presumably, independent estimators of the probability parameter for counting distributions derived from the binomial PDF. So why then does Ron Williams proceed to attempt to show that while the simple mean equation is correct, eq 7 is not? More importantly, given that both eqs 7 and 8 are correct, why does Ron Williams obtain results that show that eq 7 is not correct? The experiments reported on in ref 1 were performed at constant light intensity but with different photomultiplier bias voltages. Bias potential does not change the quantum efficiency of the photocathode and thus the number of photon counts per unit time should not have changed. Thus there is no reason to suspect that eqs 7 , 8 , or even 1 given above should not correctly describe the results obtained in ref 1. However, Williams measured current, I, with a sampling oscilloscope. In this case eq 7 is inappropriate since it applies to experiments in which detection events are counted. To scale the measured currents into counts requires an additional parameter, P. The p parameter has units of A-’ and is the inverse current supplied by a single photon detection event. Used in eq l, this scale parameter yields
Pn(P0 =
(;T)P”(~
- P)”-”
(9)
where PI has been substituted for k . This scaling parameter is important. Clearly, a factor of 2 in P can change the probabilities quite substantially. Using the same limiting conditions as those used to obtain eq 6
2144
ANALYTICAL CHEMISTRY, VOL. 62, NO. 19, OCTOBER 1, 1990
Since the current is being estimated N
Iml=
aml/P = [(2P)-2
+ j=l E I j 2 / N p - (2P1-1
(12)
where I d is the MLE of the current. Including /3 in the MLE expression for current is in keeping with the derivations based on the binomial PDF. This result is not equivalent to that of eq 4 in ref 1. I t is equivalent if we let a = 1 / P on the right-hand side and aml= Im on the l left. The expression is simpler than the simultaneous solution of eqs 4 and 5 of ref 1 in that I m l can be determined directly (with knowledge of 6). Lastly, eq 12 was derived without including an arbitrary variance proportionality factor. Since the experiments reported on in ref 1 used constant light intensity and variable photomultiplier tube bias potential, P would change since the gain is a function of bias potential. It is not an adjustable variance parameter. I t is a scale factor for the measurement. The equations given above only describe statistical fluctuations that apply to photon counting experiments. Williams’s experiments fall into the category of analog processed photomultiplier signals ( 4 5 ) . Many sources of variance in addition to that due to the counting statistics have been described. First, the current pulse magnitude is not the same for all photon detection events. This results in what is called photomultiplier multiplication noise. Multiplication noise results in a variance of the pulse heights a t the anode and is a function of the bias potential. Second, there is noise due to the dark current. The dark current noise also scales with the gain. And third, there is Johnson noise in the analog electronics. Ingle and Crouch (5) give the necessary details to write the noise variance. Assuming that there is no flicker noise, the total variance is
+ meI,(l + a ) / +~ meId(l + a ) / ~(13)
uI2 = aa2
where 02 is the amplifier noise variance, m is the photoelectron gain factor, e is the electron charge, I , is the current due photon detection events, a is the multiplication noise factor, T is the measurement time, and I d is the dark current. At any set photomultiplier tube bias voltage, the average (expectation) current is I = (I, I d ) and C T =~ CT? + nK(I, I d ) , where K is the constant factor e(1 a)/,. Use of these in the normal PDF results in
+
+
+
P(III,a12) = (27r~&’/~exp[-(I = I)2/2a12] In the shot noise limit,:a
(14)
= 0, and
P(I(1)= ( 2 ~ r n K I ) - ’ exp[-(I /~ - I)2/2mKI]
(15)
The MLE current obtained from the total probability is N
Iml = [ ( n ~ K / 2+ ) ~EIj2/IvJ1/2 - mK/2
(16)
J=l
LITERATURE CITED
In this case, mK is exactly equal to a used by Williams. Notice that the gain, m, is a function of the photomultiplier bias potential. In fact m = l/p. Thus N
Iml =
[(K/2/3)*
+ CIj2/N]1’2- K/2@
the other hand Ingle and Crouch refined the concept and use the term “quantum noise” to indicate that the noise is due to the discrete nature of photon or particle detection ( 5 ) . Equation 16 is important in that it shows that the form of the parent population parameter MLE of ref 2 is more general than that derived based only on photon or particle counts. In fact, it applies to any case where the variance is proportional to the average current, i.e., both quantum noise and shot noise. Equation 12 and those of ref 2 apply strictly to quantum noise limited detection, e.g., photon counting, and are not appropriate in cases where average currents are measured or where other noise sources may be present. Equation 16 shows that shot noise limited data can be treated with the same formula if the proportionality constant, K, is known. The question might arise as to how K/2P is determined. A separate experiment must be designed to determine this term. Since p, and therefore K/2p, is a function of the photomultiplier bias potential, a reasonable experiment would be to measure the anode current as a function of bias potential at constant illumination. Both signal and dark anode currents are proportional to the gain. The gain is proportional to the bias potential raised to a power equal to the number of dynode stages (4). Both K/2P and Id could then be obtained by using shot noise limited regression techniques (7). The K parameter might also be obtained independently by determining the pulse height distribution of the particular photomultiplier (4, 5). After examining these more detailed equations, one can rationalize why Williams found that the MLE based on eq 7 was not as accurate as the simple mean. Williams’ experiments were run in analog mode and so eq 12, 17, or some similar equation should have been used. Indeed, Ron Williams used an equation of the same general form as eqs 12 and 17 and concluded that the results were more accurate than those obtained by using eq 7. This should be no surprise. Equation 7 above is strictly valid only for count data or data that is scaled into counts as in eqs 11 or 12. It was fortunate that the equations used in ref 1 were of the correct form. Using an arbitrary proportionality factor to account for “other noise sources”without knowledge of what the noise sources are or how they influence the measurement may lead to correct estimates as well as erroneous ones. Moreover, Williams’ equations are only correct with a rather loose interpretation of what the parameter a is. It is almost guaranteed that the application of an equation derived for counts to obtain estimates of current or potential data will result in erroneous results. Reference 1indicates that ad was being estimated. But it is apparent by examining the similarities between eq 11 above and eq 6 in ref 1 that what was really being estimated was the maximum likelihood current.
(17)
j=1
is the more correct MLE result for treating shot noise limited data obtained in analog mode. Again, although K/2P in eq 17 bears some resemblance to Williams’s a , they are not equivalent. This can be seen by rescaling Id with the factor 0.Williams measures “y” and from this estimates a. But the “y” that Williams measures is current, not counts. There is a subtle distinction between shot and quantum noises. In most texts, shot noise is defined as that current variance that is proportional to the average current (6). On
(1) Williams, R. Anal. Chem., preceding paper in this issue. (2) Bilkowski, S. E. Anal. Chem. 1989. 67, 2479-2483. (3) Papoulls, A. Probability, Random Variables, and Stochestic Processes, 2nd ed.: McGraw-Hill: New York, 1984; Chapter 3. (4) Engstrom, R. W. Photomukiplier Handbook; RCA Databook PMT-62; RCA Corporation, Lancaster, PA, 1980. (5) Ingle, J. D.;Crouch, S. R. Spectrochemical Analysis; PrenticeHail: Engiewood Cliffs, NJ, 1988; Chapter 5. (6) Skoog. D. A. Principles of Instrumental Analysis, 3rd ed.; Saunders College Publishing: New York, 1985: Chapter 2. (7) Bialkowski, S. E. Anal. Chem. 1989, 61, 2483-2489.
Stephen E.Bialkowski Department of Chemistry and Biochemistry Utah State University Logan, Utah 84322-0300 RECEIVED for review April 3, 1990. Accepted May 18, 1990.