Bias and precision in the estimation of exponential decay parameters

data sets. Working with sets containing as few as 20 counts distributed exponentially in a finite number of “bins”, we encountered indications of...
1 downloads 0 Views 740KB Size
Anal. Chem. 1993, 65, 1240-1246

1240

Bias and Precision in the Estimation of Exponential Decay Parameters from Sparse Data Joel Tellinghuisen' Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235

Charles W. Wilkerson, Jr. Chemical and Laser Sciences Division, Center for Human Genome Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545

The statistical properties of maximum likelihood (ML) and least-squares (LS) estimators for the exponential decay parameters I'(rate) and 7 (lifetime) are examined for finite observation time ( T ) and a small number of counts ( N ) . The ML estimators and their variances are assessed via numerical calculations employing the exact Npoint distributions for exponentially distributed random data. Bias is a common feature of all these estimators under these conditions. In addition, the ML lifetime estimator exhibits the interesting property of divergence in its variance, which means that I' is the statistically preferred parameter for characterizingexponentialdecay for finite observation time. The bias in I' is less than N/(N - 1) for all Tand thus can be easily corrected for. The effect,s of various ways of binning the data areexamined for both the ML and LS approaches. INTRODUCTION First-order processes are extremely important in kinetics. Not surprisingly, the problem of extracting statistical estimates of exponential decay rates and lifetimes has received considerable attention.I-l* Much of this effort has been directed toward greater efficiency and reliability in processing the abundant data that are often available from photon counting experiments. In the present work we examine problems that occur at the other extreme-the sparse data limit. This regime is important in the determination of fluorescent lifetimes of single molecule^.^^ The present study began with some desultory Monte Carlo calculations designed to examine the reliability of least~~~

~~

~

(1) Bartlett, M. S. Phrlos. Mag. 1953, 44, 249. (2) Annis, M.; Cheston, W.; Primakoff, H. Reu. Mod. Phys. 1953, 25, 818. (3) Hall, P.; Selinger, B. J. Phys. Chem. 1981, 85, 2941. (4) Isenberg, I.; Small, E. W. J. Chem. Phys. 1982, 77, 2799.

(5) Demas, J. N. Excited State Lifetime Measurements; Academic Press: New York, 1983. (6) Good, H. P.; Kallir, A. J.; Wild, U. P. J.Phys. Chem. 1984,88,5435. (7) O'Connor, D. V.; Phillips, D. Time-Correlated Single Photon Counting; Academic Press: New York, 1984. (8) Zuker, M.; Szabo, A. G.; Bramall, L.; Krajcarski, D. T.; Selinger, B. Rev. Sci. Instrum. 1985, 56, 14. (9)Ballew, R. M.; Demas, J. N. Anal. Chem. 1989, 61, 30. (10) Lyons, L.; Clifford, P. Nucl. Instrum. Methods Phys. Res. A 1989, 274, 557. (11) Lyons, L.; Saxon, D. H. Rep. Prog. Phys. 1989, 52, 1015. (12) Bajzer, Z.; Sharp, J. C.; Sedarous, S. S.; Prendergast, F. G. Eur. Biophys. J . 1990, 18, 101. (13) Wilkerson, C. W., Jr.; Goodwin, P. M.; Ambrose, W. P.; Martin,

J. C.; Keller, R. A. Detection and Lifetime Measurement of Single Molecules in Flowing Sample Streams by Laser-Induced Fluorescence. Appl. Phys. Lett., in press.

0003-2700/93/0365-1240$04.00/0

squares estimates of lifetimes from typical photon counting .data sets. Working with sets containing as few as 20 counts distributed exponentially in a finite number of "bins", we encountered indications of significant bias, the magnitude of which depended strongly on the observation time range. Since least squares is not necessarily unbiased in such applicat i o n ~ ,we ~~ also ~ 'examined ~ maximum likelihood estimators (MLEs). A detailed study of the latter has verified the occurrence of bias in the ML estimators of both the lifetime T and its reciprocal r,when the time range of the observations is finite. More surprising, it has shown that under such conditions the variance i n the M L E of T formally diverges. This result seems not to be widely known; since all lifetime observations are perforce limited to a finite time range, the implication is that r is the preferred parameter for characterizing exponential decay processes. To see how the divergence in the variance of the estimator of T arises, consider the following simple situation: Data are collected in two bins of equal width spanning total time T. In the usual situation the first (early time) bin will contain more counts. However, there is a finite probability (and increasingly so for decreasing total counts) that the second bin may contain more counts than the first. This event corresponds to a negative decay rate, or exponential growth. Of course, if both bins contain equal counts, the deduced net decay rate is zero. Obviously the corresponding lifetime in this case is infinite. The same behavior occurs when the binned data are replaced by an exact average 5 of the distributed times. Because of the manner in which the estimated r passes linearly through zero at 5 = TI2 (see below), the estimator of T diverges to + m as t approaches T / 2 from below and to --OD as 5 approaches T / 2 from above. For a finite number of points in the data set, the probability distribution must sample this pole in T, rendering all even moments of T , and most importantly the variance, similarly divergent. While this result would rarely be evident in work with large data sets, it is now recognized as the source of the erratic behavior in the lifetime variance from our initial Monte Carlo calculations with sparse data. In the present work, we first review the derivation of the maximum likelihood estimators for T and r. We evaluate the statistical properties of these estimators numerically, using the exact probability distributions for N random points distributed exponentially on the finite time interval T. We then examine the effects of binning the data in various ways. Of particular interest in this regard is the simplest binning scheme-two bins of equal width-which has been shown previously to yield estimated standard deviations which (14) Parratt, L. G. Probability and Experimental Errors in Science; Wiley: New York, 1961. (15) Bevington, P. R. Data Reduction and Error Analysis for the Physical Sciences; McGraw-Hill: New York, 1969.

0 1993 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 65, NO. 9, MAY 1, 1993

-

exceed the'infinite bins" results by less than 30%? Finally, we describe Monte Carlo results for least-squares fits involving several different binning algorithms. To keep the various contingenciesclear, all of our calculations involve backgroundfree data. However, our least-squares algorithm should be directly applicable to data with background, which is a main reason for presenting it here.

4 ,

1241

,

10

MAXIMUM LIKELIHOOD ESTIMATORS Derivation. The MLEs for exponential decay have been treated numerous times previously,1-3JOJ1 so we give just a brief summary of their derivation here. Consider times distributed randomly on the interval 0 I t IT with a probability distributionP(t ) that is proportional to Then if the interval is divided into k time bins, the probability of a count in the ith bin (1 5 i 5 k) is

PI =

0

0 5

1

1 5

2

-

t

fo,

Figure 1. scale to right, taken as the root of eq 5, as a functipn of t for 0 It IT = 2 (arbitrary time units and their reciprocal for Also shown is the distribution function PN0 for = 1 and the same time range, obtained for various numbers of points N.

r).

which is identical in form to eq 5. Furthermore,

dt/STe-'.' 0 dt

where to = 0. If the bins are of equal width w ,then w = T/k, and p , = e-'I'"e'W

0

- l ) / ( l -e-'')

Let N counts be distributed over the k time bins such that the ith bin contains n, counts; thus N = Cni. The likelihood function is k

from which

(2)

P

(3)

We maximize L by maximizing its logarithm. Setting a(ln L)Iar = 0, we obtain (4)

where ti = iw. The root of eq 4 is the MLE f b for binned data. Similarly the MLE ;b is the reciprocal of f b . The latter result is also obtained directly by defining r as the adjustable parameter and maximizing L with respect to T . This procedure yields an equation identical to eq 4 but with 1/r substituted for r t h r o ~ g h o u t . ~ If we hold kw = T while narrowing the bin width w, then as w 0 , k and eq 4 becomes

- --

(5)

The result of this limiting procedure is the creation of N occupied time bins each containing a single count, with all the other bins empty. The time "label" ti for the ith bin now becomes the exact time associated with the corresponding count. Thus the sum over bins collapses to a sum over the N exact t , values (which we have relabeled with 'y to emphasize the change); and the average time in eq 5 is recognized as the usual, naive definition of the average of N quantities. The root of eq 5 is the MLE f , and the MLE ; is again the reciprocal of f . 1 ~ 2 It is of interest to note that the theoretical average value of t for the range 0 It 5 T is

(8)

withg E e-rT. Equations 6-8 are useful in consistency checks on the Monte Carlo calculations. For example, it is a fundamental statistical result that the variance in a mean of N quantities is the parent distribution variance divided by N. Thus an average of N random numbers exponentially distributed on 0 5 t I T should fall within U , / ( N ) ~of/ ~t t h about two-thirds of the time. 'Exact" Time Results. We want to examine the bias and precision of the estimators f and ;for 'exact" times t, and their counterparts for binned data. To that end we first examine the nature of the roots I' from eq 5. Possible values o f t span the full range of allowed t values for any number ?f sampled data points. As noted previously t = TI2 yields r = 0. By implicitly differentiating eq 5, one can show that as 2 approaches Tl2,J? is linear with slope

-

-

f diverges as l/i as t 0 and as (t - T)-1 for Z T. Correspondingly ;diverges a t t = T/2, as noted earlier, and goes to zero a t t = 0 and t = T. The behavior o f f is illustrated in Figure 1for T = 2. For general values oft (or & for binned data),eq 5 (or eq 4 for binned data) must be solved numerically for f and .; However,this can be done reliably and efficiently using even simple root finders such as Newton's method. The statistical properties of the estimators f and ;can be examined using Monte Carlo methods to simulate real data. Thus, for example, a set of N random, exponentially distributed times on 0 It I T yields a single value of 2 (or, with binning, 6,). Each such average time then leads to a single numerical estimate of f and its reciprocal, r. From the statistics of a large number of equivalent sets of N points, we can assess the bias and precision in these estimators. While this Monte Carlo approach is straightforward, there is a better, exact method of obtaining the same informationprovided one knows the parent distribution from which the Monte Carlo sampling is done. For example, if the continuous probability distribution for some variable x is P ( x ) ,then the expectation or average value for any function f ( r )can be obtained very simply as ( f ( x ) ) = J : : f ( x ) P ( x )dx, where x1 and x 2 delimit the region over which P ( x ) is defined and

1242

ANALYTICAL CHEMISTRY, VOL. 65, NO. 9, MAY 1, 1993

normalized. We can assess the bias q d precision in f and ;in preciselyJhis w y , evaluating ( f ), ( T ) , and their variances (e.g., up2 = (rz)- (r)z)by numerical quadrature of integrals of form

(f") = JOT[f(t)]"qy(t) dt

I

1

N = 2

(10)

Here t plays the role of the variable x and P N ( ~represents ) the exact probability distribution for on 0 It 5 T. For a random variable uniformly distributed on 0 It IT, the distribution function o f t is proportional to16J7 gN(U)

= UN-' - (;)(u . l p l + (;)(u . 2)N-1....

+

valid for r .1 < u < r, r = 1, 2, ..., N, with u = Nt/T. This is an N-segment function, with derivative continuous to order N - 2 at the segment boundaries. [For example gdu) is piecewise linear, with a discontinuous first derivative at u = 1or t = T/2.] gN(u)goes as uN-lfrom u = 0 and as (N- u)~-' -for decreasing u from u = N. For the case of present interest, t represents the average of N random numbers distributed exponentially (Le., as e-rt) on 0 It IT, for which it can be shown that

p N ( t )a e-rNtg N [ U ( t ) 1 (12) This PN is illustrated for r = 1, T = 2, and several choices of N in Figure 1. Note that with increasing N, P N ( ~ ) progressively narrows and assumes the characteristic bell shape of a Gaussian distribution-a nice numerical illustration of the all-important central limit theorem of statistics. As T a,P N ( ~ takes ) the simpler form, p N ( t )a e-rNi-Nt - 1 (13)

-

Using this distribution function, one can readily verify the textbook results,14( 7 )= T and 4 7 = N-l/*. The first of these shows that in this limit T is an unbiased estimator. (It also is equivalent to noting that as T a,the second term on the right-hand side of eq 5 vanishes, giving t = l / r = 7 . ) On the other hand, r is seen to be a biased estimator for infinite T, with ( f ) = I"/(N- 1). Further, fi is a less efficient estimator than ;in this limit, having up/(f) = (N - 2)-1/*. For finite T we have evaluated these expectation values by numerical quadrature using the trapezoidal rule. In these integrations the root of eq 5 is obtained for each t using Newton's method. For I? the computations converge quickly as the integration step size is decreased, so that for most purposes a few hundred grid points syffice. It is interesting to note that for N = 1,all moments (P)diverge. However, the -ZN-l behavior of PN(?)near t = 0 can cancel the t-mpole in f" at t = 0 (and similarly at 5 = T),rende$ng convergent all moments ( f m having ) m f N - 1. Thus (I-) is defined for N 2 2, and the variance in r is defined for N L 3. Results of these calculations for several choices of N are shown as a function of the reduced time, T I'T in Figure 2. Similar procedures were used to estimate ( i )except , that the region near = T/2 was omitted as a defined gap. Convergent results were obtained by extrapolating to zero gap width. Of course this procedure cannot work for the even moments of 7 , which clearly diverge due to the pole a t 2 = T/2. Results for ( 7 ) are illustrated in Figure 3. The behavior of ( f ) in Figure 2 manifests the anticipated bias at large 7and shows a somewhat reduced bias at smaller

-

(16)Cramer, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, 1951. (17)Springer, M.D. The Algebra of Random Variables; Wiley: New York, 1979. The expression given on p 88 contains some errors; it should read as eq 11.

i

10 ........... ......................................

... ( - 1 ) ' y l)(u .(r .mN-l (11)

-1

LO/ 2o 0

0

4

12

16

7

Flgwa 2. Relative decay rate as a function of the reduced time T = r T , as obtained from eq 10 (rn = 1) by numerical quadrature for various N.

i7e;-A ....... ........ --.... -;/--. ...........

..........

.*.*..,.

1.0

;

..

I



-

,

! .

.

.--

.

(.

I

-

7

-

0.6

-

0.2 I

I

I

I

Flgura 3. Relative lifetime as a function of the reducedtime, illustrated for several values of N.

7.For N > 10, the variation with 7 is weak enough that simple correction by the infinite time bias factor ( N - 1)/N would appear to suffice in most applications. In contrast the low-'T behavior of (;) in Figure 3 is something of a surprise, especially the approach to zero at 7 = 0. Of course this region is not generally of practical interest, but it is interesting to note that substantial and strongly varying bias is observed in the potentially more important region 15 T 5 2, which happens to have been a region probed in our initial Monte Carlo calculations. The region of strong bias at low T is pushed to progressively smaller 7 with increasing number of total counts, so for data sets containing many more than 20 counts, this bias is not likely to be a problem. The surprising behavior of (;) as 7 0 as well as that of ( f ) can be accounted for formally as follows. With increasing N the distribution PN(T)becomes increasingly Gaussian in form, withvariance P/(l!W,andas T+O,itscentroidmoves ) approximately to i = T/2. More specifically, P N ( ~becomes

-

P N ( t )a exp(-r"'t) exp[-(t - T/2)2/(2u:)] (14) Using this form for P N ( ~and ) representing f by the linear form suggested by eq 9, and its reciprocal for ,; we find that the expectation value for the latter goes roughly as rNTz/12 near T = 0. This agrees with the numerical calculations within

ANALYTICAL CHEMISTRY, VOL. 85, NO. 9, MAY 1, 1093 1.4 1.3

1249

1.6

I

N-5 1.2

1.2

op(rel)