Accurate Analysis of Fluorescence Decays from Single Molecules in

Stefano Santabarbara , Tania Tibiletti , William Remelli , Stefano Caffarri ... J. Kelly , Clifford Talbot , Yuriy Alexandrov , Ian Munro , Matilda Ka...
0 downloads 0 Views 202KB Size
Anal. Chem. 2003, 75, 4182-4187

Accurate Analysis of Fluorescence Decays from Single Molecules in Photon Counting Experiments David A. Turton, Gavin D. Reid, and Godfrey S. Beddard*

Department of Chemistry, University of Leeds, Leeds LS2 9JT, U.K.

In time-resolved, single-photon counting experiments, the standard method of nonlinear least-squares curve fitting incorrectly estimates the fluorescence lifetimes. Even for single-exponential data, errors may be up to (15%, and for more complex fits, may be even higher, although the fitted line appears to describe the data. The origin of this error is not a result of the Poisson distribution, as is often stated, but is entirely due to the weighting of the fit. An alternative weighting method involving a minor change in the fitting method eliminates this problem, enabling accurate fitting even in difficult cases, including the small data sets observed in single molecule experiments and with a precision similar to that of maximum likelihood methods. Time-correlated single-photon counting, SPC, has been a wellestablished technique for measuring decay times of excited molecules for more than 30 years.1-3 More recently, its use has been extended to lifetime imaging (for a recent review see Elson et al.)4 and into single molecule detection,5-8 which is gaining importance in analytical chemistry. The analysis of SPC decay data has also been the subject of much discussion, and various methods have been usedsweighted least squares9 (LS), maximum likelihood9,10 (ML), and entropy methods,11sand each method may be combined with global analysis.12 It is commonly stated that for counting data, the LS method is not valid and cannot be used to analyze data with a small number of counts because of the Poissonian rather than Gaussian distribution of the data.3,8 Recently, it was shown8 that for single-molecule * Author for correspondence. E-mail: [email protected]. (1) Birks, J. Photophysics of Aromatic Molecules; Wiley-Interscience: New York, 1970. (2) O’Connor, D.; Phillips, D. Time-Correlated Single Photon Counting; Academic Press: London, 1984. (3) Lakowicz, J. R. Principles of Fluorescence Spectroscopy, 2nd ed.; Kluwer Academic/Plenum: Norwell, MA, 1999. (4) Elson, D.; Webb, S.; Siegel, J.; Suhling, K.; Davis, D.; Lever, J.; Phillips, D.; Wallace, A.; French, P. Opt. Photonics News 2002, 13, 26-32. (5) Enderlein, J.; Kollner, M. Bioimaging 1998, 6, 3-13. (6) Ambrose, W. P.; Goodwin, P. M.; Jett, J. H.; Van Orden, A.; Werner, J. H.; Keller, R. A. Chem. Rev. 1999, 99, 2929-2956. (7) Kuhnemuth, R.; Seidel, C. A. M. Single Mol. 2001, 2, 251-254. (8) Maus, M.; Cotlet, M.; Hofkens, J.; Gensch, T.; DeSchryver, F.; Schaffer, J.; Seidel, C. Anal. Chem. 2001, 73, 2078-2086. (9) Bevington, P.; Robinson, D. K. Data Reduction and Error Analysis for the Physical Sciences, 2nd. ed.; McGraw-Hill: New York, 1992. (10) Hauschild, T.; Jentschel, M. Nucl. Instrum. Methods Phys. Res., Sect. A 2001, 457, 384-401. (11) Livesey, A.; Brochon, J. Biophys. J. 1987, 52, 693-706. (12) Beechem, J.; Gratton, E.; Ameloot, M.; Knutson, J.; Brand, L. Topics in Fluorescence Spectroscopy; Lacowicz, J. R., Ed.; Plenum: New York, 1992.

4182 Analytical Chemistry, Vol. 75, No. 16, August 15, 2003

data, for which in total, only hundreds to a few thousand counts are collected, the ML method is far superior, even though LS is itself a ML estimator.9 Because distinguishing the fluorescence lifetime of individual molecules whose emission spectra overlap can be the only way to distinguish different chemical species, accurate determination of emission lifetimes under single molecule conditions is vital. Even in SPC experiments on bulk samples, it is recommended that restrictions are placed on the analysis, for example, by truncating the decay to channels containing, say, 20 or more counts, by rebinning the data to increase the count in each channel, or alternatively comparing fitted decays after repeated calculations, each progressing a little further down the decay profile. None of these approaches is satisfactory, and each leads to inaccurate parameters. But despite all this, LS remains the most popular method of curve-fitting. In this paper, we show that the problems encountered in LS fitting are not due to the Poisson distribution per se but are entirely due to the weighting used. A simple modification of the LS method of fitting removes these problems without the need to rewrite computer codes and allows LS methods to give more precise and unbiased estimates of fluorescence decay parameters. We also discuss why the standard “goodness of fit” test is unreliable. THEORY When analyzing emission decay data obtained from bulk solutions by SPC, the emission intensity can vary over 4 orders of magnitude. Conversely, in single molecule experiments, there are perhaps only a few hundred counts in total, with a zero in many channels. In both cases, weighting the data is a vital part of analyzing the decay. In fitting a model to data by the LS method, the minimum in the χ2 statistic is sought. Minimizing χ2 is the same as maximizing the likelihood that the estimated data describes the experimental data. The χ2 is defined as9

χ2 )

∑(c - m ) w 2

i

i

i

(1)

i

where c is the set of experimental data, m is the set of estimated data, and w is the set of data used for weighting, each being distributed over i data points. The ideal weighting is wi ) 1/σ2i where σ is the standard deviation in c. For Poisson distributed data, this is equal to the square root of the expectation value. An approximation9,13 to σ2 is the data itself which gives the Neyman χ2. 10.1021/ac034325k CCC: $25.00

© 2003 American Chemical Society Published on Web 06/25/2003

∑(c - m ) /c

χN2 )

2

i

i

(2)

i

i

This is the standard definition of χ2, but it is undefined if ci ) 0, and in this case, one of two variations is used: either zero valued data are made equal to unity10 (eq 3)

χN2 )

∑(c - m ) /max(c , 1) 2

i

i

i

(3)

i

or a constant is added to every point (eq 4). 2

χN1 )

∑(c - m ) /(c + 1) 2

i

i

i

the preliminary fitted parameters as initial guesses to the next fit. We shall call this procedure the fitted weighting, or FW, method. We compared our results also with the ML method.9,14 In this method, the likelihood L is given by the combined probability L ) ∏iP(ci, mi) where the chance of measuring a set of mi counts is P(ci,mi), and mi is the estimated counts in channel i. The logarithm of the estimator L is the most convenient form to use, and maximizing ln(L) is the same as minimizing -2 ln(L). For Poisson distributed data where P(c, m) ) mce-m/c!, the function to minimize is

-2 ln(L) ) 2

∑[m - c ln(m ) + ln(c !)] i

i

i

i

i

(4)

i

∑[m - c ln(m )] + const

)2

i

i

i

(7)

i

An alternative approximation for σ2 is the estimated data, m, producing Pearson’s χ2 10,13

χP2 )

∑(c - m ) /m 2

i

i

i

(5)

i

which has rarely been used. Each of these definitions of χ2 will bias the fit. For the Neyman versions, the natural distribution of counts in the data produces values that are either greater or smaller than the true value. Weighting by the data increases the contribution to χ2 for all data values smaller than the true ones and diminishes the contribution for all data greater, thereby biasing the fitted curve toward lower values. Conversely, the Pearson estimator biases the fit toward higher values, because higher values of mi reduce χ2. In both cases, LS fitting finds a minimum in χ2, but this does not correspond to the true parameters due to the systematic bias. These sources of bias are present for all distributions of the noise. Therefore, inaccuracy will be introduced whenever Pearson or Neyman weighting is used, for example, even when the noise in the data is Gaussian-distributed, as can easily be confirmed by modeling. We now show that by performing a preliminary fit to the data, then using the resultant fitted curve as the weighting for a final fit, the sources of bias are removed, and the estimated parameters, for example, amplitudes and lifetimes, are obtained with good accuracy and precision. The criterion function of the method can be defined as

χF2 )

∑(c - m ) /f 2

i

i

i

(6)

i

In attempting to extract the estimated parameters, the goodness of fit test must be performed and give an unbiased estimate of the parameters. The likelihood ratio test λ* first introduced by Neyman and Pearson13 can be used for this purpose. A maximum likelihood χ2 is defined as

[

χmle2 ) -2 ln(λ*) ) 2 ln

]

L(c, M) L(c, m)

(8)

where L(c, m) gives the likelihood that c counts are observed if the model m was true and L(c, M) the likelihood that c counts occurred, maximized without any restrictions on the model, and is therefore parameter-independent. For the Poisson distribution, it can be shown that the maximum likelihood χmle2 is

χmle2 ) 2

∑(m - c ) - 2 ∑ c ln(m /c ) i

i

i

i

i

i

(9)

i,c*0

MODELING Because the issues of concern here are entirely with the distribution of the data and its weighting, these can be argued without recourse to experimental data, because we know that the experiment produces Poisson distributed data. Simulation has the advantage of the elimination of experimental artifacts, repeatability, and absolute measurement of accuracy and precision. Convolution with the instrumental response can be included if required without changing any of our conclusions. Exponential decays were used to describe the emission and were defined as

mi )

∑ A exp(-t /τ ) j

i

j

(10)

j

where fi is the fitted curve returned from the preliminary fit. Only in the more difficult of the two exponential fitting cases described below was a second iteration found to be necessary, that is, three calls to the LS routine. We find that the weighting used in the initial fitting has only a slight effect upon the quality of the second fitting, but to be cautious, we have used a constant weighting wi ) 1 so as to eliminate any possible bias. For simplicity, we used

where j ) 1 or 2 and A and τ are the parameters to be estimated. The decays were simulated by repeatedly sampling from a uniform distribution until the required total number of counts was reached and distributed according to the decay lifetimes τ. For example, a single-exponential decay with a total of N photons is modeled by filling each element of an N-element vector with a random

(13) Neyman, J.; Pearson, E. Biometrika 1928, 20A, 263.

(14) Bialkowski, S. E. Anal. Chem. 1989, 61, 2479-2483.

Analytical Chemistry, Vol. 75, No. 16, August 15, 2003

4183

Figure 1. (A) Percentage deviation in measured lifetime, (τcalc - τtrue)/τtrue, for total number of counts in a single-exponential decay with 512 data channels and (B) σ error in lifetimes. See text for details.

variable, -γ ln(R0-1) which has an exponential distribution with expectation, γ, where γ is the unitless lifetime, T/τ, and R generates a uniform random number in the range 0-1. A histogram of the vector is formed with the number of channels required to simulate an experiment. Levenberg-Marquardt algorithms9 used either as built-in procedures in IGOR (Wavemetrics Inc., Lake Oswego, Oregon) or as described in Numerical Recipes15 and programmed in IDL (Research Systems, Boulder Colorado) or Fortran77, gave similar results. The maximum likelihood equations were solved using the Nelder-Mead Downhill Simplex method.15 The simulation and fitting procedures were repeated at least 2 × 104 times for each parameter set, and the distribution of estimated parameters was analyzed by forming histograms, which were then fitted by the Gaussian function exp(-1/2(∆τ/σ)2) to obtain ∆τ and σ. We first compare the results of fitting single-exponential decays from 100 to 106 total counts distributed over 512 channels for both Neyman and fitted weighting (FW) least squares variants and for maximum likelihood. Figure 1 shows the percentage accuracy in the lifetime, 100∆τ/τ, where τ ) 100, and the width, σ. Related plots showing the same features are obtained with 256, 1000, or 4000 channels of data. The Neyman methods produce systematic errors throughout which are generally greater than the magnitude of the width, σ, which means that the correct lifetime will almost never be obtained. The FW method gives results similar to ML estimation with accuracy better than 1% to below 100 total counts. The fit is still surprisingly good below 100, well within the width σ, but here the different implementations of the LevenbergMarquardt algorithm gave slightly different results, so they have not been included. It is stressed that for each data set and method used, the fitting procedure converged properly. Occasionally, decay data includes some background counts from noise in the detector and electronics. Because only small numbers of background counts in photon counting experiments are common, we have considered decays with 100 and 2000 total counts in the presence of Poisson-distributed noise with a mean count of 1. Furthermore, we can fit data with the background fixed

at 1 or as an additional parameter and allowed to vary. When there are 100 total counts, the average decay lifetime using the FW method is not significantly different from results obtained from zero background data (Figure 1). Using the ML method produces errors of ∼2% with fixed background and 5% with variable background, slightly greater than obtained with background-free data. For the larger number of counts (2000 total), the FW method still estimates accurate lifetimes, but the Neyman methods continue to behave very poorly. The residuals plot is a very informative measure of goodness of fit.2,3 The standard method is to plot the normalized residuals and look for nonlinearity, either by eye or using correlation techniques. However, to judge the fit in the tail of the decay, the residuals are usually normalized by the same weight as was used to weight the fit. But when the data is used to weight both, as in Neyman’s χ2, the error introduced in the fit is canceled by the same error in calculating the residuals. The inherent bias in the χ2 statistic distorts the fitting procedure and produces incorrect results, but this will not be immediately obvious by inspecting the residuals for systematic deviations or calculating runs tests, scedacticity, autocorrelations, the Durbin-Watson parameter,16 or power spectra etc. The bias in fitting the decay profile may be understood with a simpler calculation, which is that of estimating the mean of the Poisson distribution for different given mean values. At each time point in the decay, the true mean should be obtained by fitting to the model function no matter how many counts are present, which means that any fitting procedure should return an unbiased approximation to the true mean for any number of counts. The calculation estimating the mean for different χ2 statistics is shown in Figure 2 where the percentage error is plotted vs the true mean of the Poisson distribution. Only when the FW method is used is a good approximation to true mean returned for all values.

(15) Press: W.; Flannery, B.; Teukolsky, S.; Vetterling, W. Numerical Recipes; CUP: Cambridge, 1986.

(16) Hines, W.; Montgomery, D. Probability and Statistics in Engineering and Management Science, 3rd ed.; Wiley: New York, 1990.

4184 Analytical Chemistry, Vol. 75, No. 16, August 15, 2003

TWO-EXPONENTIAL DECAYS The extra number of parameters in multiexponential decays makes an extensive comparison between the various weighting

Figure 2. Percentage error in calculated 100(calculated - true mean) vs true mean for a LS calculation using different χ2 statistics. N is Neyman’s χ2 eq 3, N1 eq 4, and F is the fitted weighting (FW) method based upon eq 6. The line for the unweighted fit is indistinguishable from the FW on this scale.

methods unrealistic. We therefore concentrate on two illustrative cases to show that the fitted weighting method also gives superior results to either of the Neyman’s weightings. Figure 3 (A) shows a contour plot of the error for decays with τ1 ) 50 and τ2 ) 100, inset Figure 3. Each component in the decay has an equal number of counts so that the preexponential parameters are A1 ) T/(2τ1) and A2 ) T/(2τ2) for T ) 70 000 total counts; notice that there are only 1000 counts in the peak. This is an easy case for fitting because the decay lifetimes are well separated and visibly produce a double exponential decay. The errors in the preexponential terms are also shown in Figure 3B. Only plots of the FW method and Neyman eq 3 are shown. Similar poor fitting is obtained also by Neyman’s alternative weighting (eq 4). The apparent lack of intensity in the contour plots for Neyman’s weighting compared to the other contours is due to the wide scattering of values obtained.

Finally, we tested difficult fits discussed initially by Grinvald and Steinberg,17 (see also Lakowicz3), and the results are shown in Figure 4. This sum of two exponential decaying terms has lifetimes in the ratio 5:7. On visual inspection, it appears to be a single exponential decay, because the number of counts deviates by only ∼0.1% from exponentiality after the counts have fallen to 1% of their initial value. Fitting routines find it difficult to extract the two decay times accurately, because the preexponential components compensate for changes in the decay times. In our example, τ1 ) 71 and τ2 ) 100, and the two components have equal numbers of counts, with 200000 counts in total. The Neyman estimator completely fails to fit this decay, but the FW method is a great improvement and improves further if the total number of counts is increased, provided that measurements extend to such a time that more than two decades of intensity are present in the decay. For two-exponential decays, the accuracy of FW fitting is not affected by a background count, although the precision worsens. This is due to the increased number of parameters if the background is allowed to be free, but it is also due to the reduced dynamic range when the background count becomes greater than that of the signal. ESTIMATING “GOODNESS OF FIT” After estimating parameters by fitting decay curves, it is important to ask which statistic should be used to estimate the goodness of fit of the data. Because the mean number of counts in each channel varies, this statistic should be independent of the mean. When a statistic is χ2 distributed, then the expected value 〈χ2〉 should be equal to the number of degrees of freedom, 〈χ2〉 ) v for all mean counts. The expectation value for a given number ∞ of counts, s, is given by 〈χ2〉s ) ∑i)0 µie-µ/i!φi,s where φI,s is defined in the argument to the summation of eqs 2-5, for example, in Pearson’s χ2; φi,s ) (i - s)2/s and similarly for other χ2. Figure 5 shows 〈χ2〉s vs s for the Pearson, Neyman, and maximum likelihood functions for one value of µ ) 5; when s ) µ, the normal χ2 value is obtained. The analytic solutions for 〈χ2〉 are given in the Appendix.

Figure 3. Analysis of 20 000 decays with lifetimes τ1 ) 50 and τ2 ) 100 and preexponential constants A1 ) 700 and A2 ) 350 counts, eq 10. Each decay contained 70 000 counts. (A) Contour plot of percent error in τ1 vs percent error in τ2 for fitting by the FW method, contours (F), and by Neyman’s method, (N). The prefitting contours (F) peak at zero error within the (1% resolution of the contour plot. The dotted contour shows the lowest value, and the contours are drawn at the same constant interval and cover the same range. The inset shows a typical decay with its fitted line. (B) shows corresponding percent error in preexponential parameters.

Analytical Chemistry, Vol. 75, No. 16, August 15, 2003

4185

Figure 4. Fits obtained with the difficult case of lifetimes in the ratio 5:7 and equal numbers of counts in each component of the decay. (A) Lifetime errors for the FW method, contours (F), and for the Neyman method, eq 3 and contours (N). The dotted line shows the lowest valued contour, and contours are plotted on the same interval. In (B), the errors in the preexponential parameters A1 and A2 are also plotted as contours.

Figure 5. Calculated average, 〈χ2〉, for different χ2 functions having a mean of 5.

If the model function used to describe the data is correct, then the number of counts, s, in any channel should be equal to the mean of the Poisson distribution, that is, s ) µ, and in Figure 6 〈χ2 〉 is plotted against the true mean counts, µ, for v degrees of freedom. The Pearson χ2 has the constant value 〈χ2〉/v ) 1 for all µ. However, as has been pointed out before, using the ML method10 with χmle2 as an estimate of fitting very surprisingly leads to incorrect results, because the χmle2 statistic is not χ2-distributed. This is because 〈χ2〉/v is not unity, or even constant, with the mean number of counts; neither are the Neyman statistics χ2 distributed. Figure 6 appears to show that any estimation of the goodness of fit of the data based on any χ2 other than the Pearson statistic would lead to incorrect results for the consistency as the wrong limiting value is obtained. This means, for example, that the parameters obtained by ML estimation may not be the best that could ideally be obtained. The goodness of fit can be estimated by calculating the cumulative probability distribution Q(χ2(min), v) of the χ2 distribu(17) Grinvald, A.; Steinberg, I. Z. Anal. Biochem. 1974, 59, 583-598.

4186 Analytical Chemistry, Vol. 75, No. 16, August 15, 2003

Figure 6. 〈χ2〉 vs true mean counts. Only the Pearson estimator χP2 is χ2 distributed; the other estimators show significant deviations.

tion for the number of degrees of freedom present. The value of Q gives the chance that by random events the χ2 for the true model will take on a value less than or equal to a specified value, such as χ2(min). If Q > 0.1, the model can be accepted, if Q < 0.001, it is likely to be wrong. The error in the parameters can be estimated from the parameter space occupied when χ2 increases by 1 from its minimum value.9 This approach would be correct if the true model were being fitted, but this is not necessarily the case; frequently, the purpose of fitting is to determine what the true model is. In the Pearson case, when s ) µ, we obtain 〈χ2〉 ) 1 at both this value and when µ + 1 (Figure 5). Therefore, an experimental 〈χ2〉 ) 1 does not necessarily indicate that the true model is being used. Furthermore, because the minimum χ2 does not occur at µ but at xµ2+µ, poor parameter estimation can lead to a 〈χ2〉 < 1. The same principles are true for the Neyman tests, but the errors will be greater. In the MLE χ2, the minimum is always at the mean, but the size of the χ2 cannot be relied upon to give an estimate of goodness of fit other than to say that the smallest value is best.

These considerations suggest that any 〈χ2〉 goodness of fit test is unreliable unless calibrated by simulating the data. CONCLUSIONS We have shown that the errors encountered in least-squares fitting of single and double exponential fluorescence decay data are entirely due to the weighting method and can be removed by a simple modification to the fitting routine. These results contradict the widespread assumption that least-squares fitting of Poisson distributed data is invalid, particularly so for a small number of counts, and indeed, similar inaccuracies appear when fitting data which contains Gaussian-distributed noise when Neyman’s weighting is used. We find that practically all decay data can be accurately fitted by the least-squares method with a precision that is limited only by the total number of counts and the dynamic range. Goodness of fit tests are important in determining if the correct model has been fitted. We also show that χ2 tests are inherently unreliable and that if the residuals are examined, they must not be weighted by the data. ACKNOWLEDGMENT We acknowledge the support of The Royal Society and EPSRC. D.A.T. acknowledges the award of a Messel Scholarship from the Society of Chemical Industry. We thank David Salt (University of Portsmouth) for helpful discussions. G.D.R. is a Royal Society University Research Fellow.

APPENDIX ∞ The expectation value is 〈χ2〉µ ) ∑i)0 µie-µ/i!φi,µ where φ is 2 the argument in the summation of χ for a mean of µ counts in any data channel. The averages are

〈χp2〉 ) 1 〈χN12〉 ) (1 + 1/µ)(eµ - 1 - µ)e-µ and

〈χN2〉 ) e-µµ2[1 - γ - ln(µ) Ei(1, - µ) - πi] + µ[2e-µ - 1] where Ei(n, x) ) ∫∞1 e-xtt-n dt is the exponential integral. No analytical solution for the average χ2 for maximum likelihood was found, and this was calculated numerically. The Pearson χ2 has its minimum value at s )

xµ2+µ, with the value 2xµ/(µ+1)[µ

+ 1 - xµ2+µ]. Neyman’s χ2, eq 4, has a minimum at s ) (µ/(1 - e-µ)) - 1 with the value 1 - (µ/(eµ - 1)). Received for review March 31, 2003. Accepted May 16, 2003. AC034325K

Analytical Chemistry, Vol. 75, No. 16, August 15, 2003

4187