Effect of background on the least-squares estimation of exponential

An Experimental Comparison of the Maximum Likelihood Estimation and Nonlinear Least-Squares Fluorescence Lifetime Analysis of Single Molecules. Michae...
3 downloads 0 Views 475KB Size
Anal. Chem. 1999, 65, 1277-1280

1277

Effect of Background on the Least-Squares Estimation of Exponential Decay Parameters Joel Tellinghuisen Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235

The effect of background on the statistical precision of exponential decay rates is examined using Monte Carlo calculations and direct evaluation of the least-squares error matrix. For typical observation time ranges, the presence of an unknown background reduces the precision in the decay rate by an order of magnitude. However, the uniform background is practically the worst case, and if the background distributionis “shaped”and if this shape can be ascertained independent of the decay data, the loss in precision can be greatly reduced. For the uniform background, the strong correlation between the decay parameter and the background parameter can be broken by operating at less than 100% duty cycle, such that the background is recorded independent of the signal some of the time. This approach can yield precision approaching that for the case of an a priori known background.

INTRODUCTION Optical fluorescence has long been recognized as a very sensitive technique for monitoring atoms and molecules in low concentrations.’ With the advent of the laser it became at least conceptually possible to excite and detect fluorescence from individual molecules. That quest was fulfilled 15years ago for gas-phase atoms2.3 (and about the same time for detection by resonance photoionization4). However,a number of technical difficulties postponed the demonstration of singlemolecule detection in condensed phase for about a decade.5-9 And only within the last year has it been possible to record temporal data at the single-molecule level in solution, permitting the estimation of decay lifetimes for individual fluorescing dye molecules.10 Such sensitive detection methods hold great promise in a variety of analytical techniques, such as fluoroimmunoassay, capillary zone electrophoresis, flow cytometry, DNA finger(1) Mitchell, A. C. G.; Zemansky, M. W. Resonance Radiation and Excited Atoms; Cambridge University Press: Cambridge, U.K., 1971. (2) Greenless, G. W.; Clark, D. W.; Kaufman, S. L.; Lewis, D. A.;Tonn, J. F.; Broadhurst, J. H. Opt. Commun. 1977, 23, 236. (3) Pan, C. L.; Prodan, J. V.;Fairbank, W. M., Jr.; She, C. Y. Opt. Lett. 1980, 5, 459. (4) Hurst, G. S.; Nayfeh, M. H.; Young, J. P. Appl. Phys. Lett. 1977. 30, 229. (5) Lange, R.; Grill, W.; Martienssen, W. Europhys. Lett. 1988,6,499. (6) Moerner, W. E.; Kador, L. Phys. Reo. Lett. 1989,62, 2535. (7) Shera, E. B.; Seitzinger, N. K.; Davis, L. M.; Keller, R. A.; Soper, S. A. Chem. Phys. Lett. 1990, 174, 553.

(8)Soper, S. A.; Shera, E. B.; Martin, J. C.; Jett, J. H.; Hahn, J. H.; Nutter, H. L.; Keller, R. A. Anal. Chem. 1991, 63, 432. (9) Ambrose, W. P.; Moerner, W. E. Nature 1991,349, 225. (10) Wilkerson, C. W., Jr.; Goodwin, P. M.; Ambrose, W. P.; Martin, J. C.; Keller, R. A. Detection and Lifetime Measurement of Single Molecules in Flowing Sample Streams by Laser-Induced Fluorescence. Appl. Phys. Lett., in press. 0003-2700/93/0365-1277804.00/0

printing, and DNA For the last of these in particular, it is of interest to ascertain the best scheme for discriminating reliably among the several fluorescent fragments expected in the analyte stream. To that end, Tellinghuisen and Wilkerson’*examined the statistical properties of exponential decay for the kind of sparse data sets expected for the detection of single molecules. Among other results, that work revealed that (1) significant bias can occur in the maximum-likelihood (ML) and least-squares (LS)estimators for both the exponential decay lifetime T and ita reciprocal r, (2) the variance for T is formally divergent for finite observation time, and therefore (3) r is the statistically preferred parameter for data seta containing a small number of counts. The work in ref 12 considered only background-free exponentially distributed data, which could be characterized in terms of a single unknown parameter. In the present work I examine the effect of background on the precision for determining the exponential decay parameter. When the magnitude of the background is unknown, this situation requires at least two adjustable parameters. While it is possible to obtain these parameters directly by the maximumlikelihood method that was emphasized in ref 12, it has proven to be computationally more practicable to use the method of least squares. This means that the results are strictly valid only if the minimal number of counts per time bin is sufficient to ensure that the Poisson weighting becomes essentially Gaussian; then least squares becomes effectively a maximumlikelihood method, hence statistically equivalent to the direct ML approach.13J4 Since this requires -50 counts per bin, the present findings cannot be rigorously valid in the very small count regime emphasized in the earlier work. Nonetheless, the following main results should hold at least qualitatively for small N as well as for large: (1) For a fixed number of signal counts, an unknown uniform background reduces the precision with which r can be determined by a factor of -2-20, depending on background level and observation range. (2) The uniform background is about the most pathologicalcase; structured backgrounds (with shape known) introduce regions which are relatively free of background counts and thus improve the precision in r. (3) As an offshoot of (2), for typical observation time ranges and background levels, r can generally be determined more precisely in a 50% duty cycle mode with separate recording of background alone than in a 100% duty cycle mode.

LEAST-SQUARES RELATIONS The problem of fitting data to an exponential plus a background is fundamentally a nonlinear one, and I treat it (11) Davis, L. M.; Fairfield, E. R.; Harger, C. A.; Jett, J. H.; Hahn, J. H.; Keller, R. A.; Krakowski, L. A.; Marrone, B. L.; Martin, J. C.; Nutter, H.L.;Ratliff,R. L.;Shera,E; B.;Simpson,D. J.;Soper,S. A.Cenet.AnaL 1991, 8, 1. (12) Tellinghuisen, J.; Wilkerson, C. W., Jr. Anal. Chem., companion

paper in this issue. (13) Hall, P.; Selinger, B. J. Phys. Chem. 1981, 85, 2941. (14) Bevington, P. R. Data Reduction and Error Analysis for the Physical Sciences; McGraw-Hill: New York, 1969. 0 1993 American Chemical Soclety

1278

ANALYTICAL CHEMISTRY, VOL. 05, NO. 9, MAY 1, 1993

with standard method^.^^-^^ In all nonlinear LS methods one starts with initial guesses for the adjustable parameters and then revises these in some systematic approach designed to minimize x2. In the Monte Carlo calculations reported here, I have used only the parabolic expansion approach,14 also known as the inverse-Hessian method.17 In fact, since the goal is the assessment of the relative precision in the estimates of r, one needs only the LS error matrix, as is shown below. The reduced x2 can be represented in terms of the observed (ni)and calculated (ci) counts in the ith bin as

where the sum runs over the k bins and is minimized with respect t o p adjustable parameters. The calculated bin count is a function of these parameters and of the independent variable(s) x l , c1 = F(8;zl),where the parameters are now represented as the p elements of the vector 8. With the assumption of Poisson statistics, the variance in the bin count n, is equal to n,, giving w, = l/nl. (In ref 12 the weights were assessed on the calculated curve, Le., w, = llc,, to handle the problem of empty bins for sparse data sets. In that case the weights become a part of the adjustment process, requiring equations differing from those given below. However, the two approaches become equivalent in the large-count limit, which is here assumed in order to justify use of eq 1.) Near the minimum in x2, the fit function F can be approximated as a linear function of the adjustable parameters and can be iteratively corrected, 81 =

80 - 48

(2)

with 48 obtained as the solution of the equations

XTWXA#!3= XTWS (3) Here 6 is a column vector containing the k residuals (c, = n,), W is a k X k diagonal matrix, with W,, = w,, and X is k X p , with elements XI,= (t3F(n,)/t3BJ).These partial derivatives are evaluated at x l using the current values BO of the parameters. When convergence is achieved, the variances in the parameters are proportional to the respective diagonal elements of the matrix A-1 = (XTWX)-l, which may be obtained in the course of solving eqs 3. Frequently the weights are known in only a relative sense, in which case the quantity we have called x: serves as a scale factor in the variancecovariance matrix, V = xgA-'. The corresponding standard errors u, = V111/2 are called external e~timate5.l~ When the weights are known reliably on theoretical grounds, as here, the internal estimates u, = (A-1),,1/2 are appropriate. (This is equivalent to noting that the expectation value of x," is unity.) To express the fit function, I consider the predicted bin count c, as a sum of signal and background contributions. The signal counts are distributed on the interval 0 It I T asP(t)a e-rt. Then if the interval is divided into k time bins, the probability of a signal count in the ith bin (1 Ii Ik ) is (4) where t o = 0. The bins are taken to be of equal width w = T / k ,whence

p, = e-i'u(e'u - l ) / ( l - e-'T)

(5)

Background counts are assumed to be distributed uniformly (15)Deming, W. E. Statistical A d j u s t m e n t o ~ ~ a tDover: a; New York, 1964. (16)Shoemaker, D. P.; Garland, C. W.; Nibler, J. W. Experiments in Physical Chemistry, 5th ed.; McGraw-Hill: New York, 1989. (17)Press, W. H.; Flannery, E). P.; Teukolsky, S. A,; Vetterling, W. T . Numerical Recipes, Cambridge University Press: Cambridge, U.K.,1986.

over the possibly more limited time range TB,so that the probability of a background count falling within a particular bin is w/TB within the region where the background is defined to occur and zero otherwise. Thus, if the number of signal counts is Ns and the background counts NB,the calculated signal is

ci = NsPi for background-free time regions, and c, = NSP,

(6)

+ NBW/TB

(7) where the background occurs. Representing the total counts as N = NS + NB,one can rewrite eqs 6 and 7 in terms of the fraction f B of counts due to the background, e.g., C, = N(1- fB)p,4- NfBW/TB (8) All three of the parameters in eq 8-N, f ~and , r-may be treated as adjustable. However, since the total counts will always be directly available in any experiment, it is appropriate to treat N as known, leaving only two adjustable parameters. (Calculationsshow that the three.-parameter and two-parameter approaches yield identical results for f~ and r for a given data set, but slightly differing x: values, hence differing external error estimates. This is because the fitted estimates of N in the three-parameter fits are always systemically slightly lower than the true N , as has been shown by Bevington.14) I also consider cases where the background parameter is known a priori. One can examine the dependence of the relative standard error on N, f B , T , and TBin straightforward fashion through Monte Carlo calculations, as was done in ref 12. Although I have done some such calculations, most of the resulta presented below were obtained by simply evaluating the A-l matrix under "theoretical" conditions, i.e., using specified values of the parameters where needed to assess the partial derivatives in the X matrix and taking the weights as l/c,. This approach, which I will call the direct method, must agree with the Monte Carlo resulta in the large-N limit and represents a considerable computational savings over the procedure of averaging over multiple Monte Carlo data sets. (Since individual Monte Carlo calculations settle on divers, statistically distributed values for the parameters, even the internal error estimates fluctuate about these theoretical values.) It is worth noting that the dependence on N need not be investigated, since for fixed values of the other parameters all standard errors reflect the expected l / N / * dependence.

RESULTS AND DISCUSSION Figure 1illustrates the dependence of the relative error in r on background for a fixed number of signal counts and several reduced observation time ranges, T = I'T. Similar information is presented differently in Figure2. In both cases it is clear that an increasing background reduces the precision with which r can be estimated, as might be expected. If the background is known a priori, we obtain the results shown in Figure 3. (For NB # 0 there are slight differences in the results obtained for such a "frozen" background, depending on how the background is represented in the fit function. The results in Figure 3 are the most optimistic, obtained by fixing f B in eq 8.) Comparison of Figures 1 and 3 shows the significant loss of precision occasioned by the need to simultaneously determine a decay rate and a uniform background. Not surprisingly, the precision loss is greatest for the shortest observation time range. It is worth noting that at f B = 0, the "known" background case of Figure 3 becomes equivalent to the background-free case treated in refs 12 and 13; even at the longest observation time range shown ( T = 4) the

ANALYTICAL CHEMISTRY, VOL. 65, NO. 9, MAY 1, 1993

30

1279

-

L

0

1

I

I

3000

6000

9000

0 0.0

1

1

1

1

0.4

0.8

Background Counts

r

0

2000

4000

6000

8000

Flgure 2. Signal counts needed to achieve 10% relathre standard error In as a function of background counts for various reducedtime ranges 7.Results are from the direct method. Other informationas In Figure 1.

r,

L 16

E

c

I

I

I

L

I = 1 (+I 0 )

0

-

I

I

I

3000

6000

9000

Background Counts

r

Flgure 3. Relathre error in for Ns = 1000 as a function of total background, wlth the latter assumed to be known, hence frozen In the flt.

corresponding error remains -20% greater than the ruleof-thumb llNf2,which of course holds only in the limit ‘t

-

12

1

1

1.6

20

r

I

40

-

30

-

20

-

I

I

I

I

I

I

I

I

t

I

2

-

-

-’.... ...........................................................

Background Counts

20,

1

Flgure 4. Dependence of the relatlvestandarderror In on the extent of the background region, where the uniform background In confined to the tlme range t I TB(seeInset). Results are Illustrated for Ns = & = 1000, 7 = 2, and TBin the same reduced time unlts as T. Points represent Monte Carlo results (100-200 data sets): curve represents direct results.

0 0.0 0

1

10 -

__- -

cn

1

T8

Flgure 1. Relative standard error In I’ (%) as a function of total background counts for 1000 signal counts. Both and fBare treated as unknown, hence adjustable. Results are shown for both Monte Carbcalculations(points)andthe directcalculation (cvves), for several reduced Observation time ranges 7.(Note scale change for 7 = 1). The unlfcfm backgroundextends over ttie entlre observationrange, 80 that TB = T. Each Monte Carlo point represents an average of 200-300 data sets.

m

1

m.

The results in Figures 1-3 all pertain to the case where the uniform background extends over the full observation range, or TB= T. Figure 4 shows what happens when the background is confined to a limited portion of the observation region. As the background range is progressively compressed (TB O), the results converge on those for 7 = 2 and NB= 0 in Figure 3. (These results also agree with those computed from eq 11 in ref 13, which dealt with only the background-free case.) At

-.

I

I

I

I

I

0.8

0.4

I

1.2

I

I

1.6

I

I

2.0

r

Figure 5. Relatlve error In for a “spike” or width 0.2, as a function of the position of its leadlng edge (dashed curve), and for a unlform background confined to the region I t I r, wlth r, = r rB(see Inset). Other quantltles as in Figure 4.

r,

-

the other extreme, the results converge on those for I = 2 and N B = lo00 in Figure 1. Remarkably, the error in r is significantly below that for TB= T for almost the entire range of TB,suggesting that even a small background-free region is sufficient to break the strong correlation between r and f~ in the fits. Figure 5 illustrates two variations on the theme in Figure 4, in which background is confined to a movable “spike” of width At = 0.2 or is truncated at the small-t end of the observation range instead of the large-t end. The presence of a narrow background spike reduces the precision only nominally from the background-free results, with only weak dependence on where the spike is located-provided, of course, that is location is known and that only its magnitude must be extracted from the fit. The other results show the expected limiting behavior a t TI= 0 and T, but display a surprising increase in the error in r when only a small region at small t is free of background. Background contributions seldom occur in the manner assumed for the purpose of the calculations behind the results in Figures 4 and 5. On the other hand, these results suggest a variant which is often experimentally feasible, namely, the possibility of recording data and blanks in alternating fashion, such that signal plus background alternates with background alone. It is of interest to know whether one is better off recording signal plus background at all times (100% duty cycle) or intentionally discarding the signal some of the time in order to record the background alone (