Bayesian Approach to the Analysis of ... - ACS Publications

Mar 14, 2012 - Bayesian Model Selection Applied to the Analysis of Fluorescence Correlation Spectroscopy Data of Fluorescent Proteins in Vitro and in ...
0 downloads 0 Views 1010KB Size
Article pubs.acs.org/ac

Bayesian Approach to the Analysis of Fluorescence Correlation Spectroscopy Data I: Theory Jun He, Syuan-Ming Guo, and Mark Bathe* Laboratory for Computational Biology and Biophysics, Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States S Supporting Information *

ABSTRACT: Fluorescence correlation spectroscopy (FCS) is a powerful tool to infer the physical process of macromolecules including local concentration, binding, and transport from fluorescence intensity measurements. Interpretation of FCS data relies critically on objective multiple hypothesis testing of competing models for complex physical processes that are typically unknown a priori. Here, we propose an objective Bayesian inference procedure for testing multiple competing models to describe FCS data based on temporal autocorrelation functions. We illustrate its performance on simulated temporal autocorrelation functions for which the physical process, noise, and sampling properties can be controlled completely. The procedure enables the systematic and objective evaluation of an arbitrary number of competing, non-nested physical models for FCS data, appropriately penalizing model complexity according to the Principle of Parsimony to prefer simpler models as the signal-to-noise ratio decreases. In addition to eliminating overfitting of FCS data, the procedure dictates when the interpretation of model parameters are not justified by the signal-to-noise ratio of the underlying sampled data. The proposed approach is completely general in its applicability to transport, binding, or other physical processes, as well as spatially resolved FCS from image correlation spectroscopy, providing an important theoretical foundation for the automated application of FCS to the analysis of biological and other complex samples.

F

complex living systems11,12 or analytical samples of unknown properties.13,14 Indeed, such processes may be governed by a combination of physical processes that is unknown a priori, and inference of molecular properties such as diffusion coefficients, rates of transport, or association−dissociation kinetics from a single assumed model may lead to erroneous results.15 The problem of complexity is made worse by the fact that fluorescence fluctuation measurements obtained from living systems often suffer from low signal-to-noise ratios, photobleaching,16 and limited temporal sampling rate and total measurement time,17 as well as biological heterogeneity and nonstationarity of the process under study.18,19 For these reasons, an objective and unbiased approach to model evaluation in the analysis of FCS data is of interest in order to enable the automated, unsupervised application of the procedure to a broad array of biological applications. Bayesian inference provides one such framework. Originally developed and applied extensively in the social sciences, Bayesian inference provides an objective and general framework for multiple model selection (or hypothesis testing) and parameter inference in a single step without any special requirements on models such as nesting or linearity, bypassing

luctuations in fluorescence intensity provide quantitative insight into a range of molecular processes including diffusion, active transport, and binding kinetics. Classical fluorescence correlation spectroscopy (FCS) measures fluctuations in fluorescence intensity in a small detection volume created by a pinhole or two-photon excitation to infer molecular properties from governing continuum reaction−diffusion−convection equations. Whereas traditional FCS is performed at a single point with a photomultiplier tube (PMT) or avalanche photodiode (APD),1−5 the relatively recent advent of laser-scanning microscopy and electron multiplying charge-coupled device (EM-CCD) cameras has enabled spatially resolved FCS to be performed in living cells and tissues using confocal microscopy.6−10 These data now provide a rich source of spatiotemporal information upon which to base biophysical models of biomolecular dynamics. Like all measurement techniques, essential to the interpretation of FCS data is the use of an underlying mathematical model that is assumed to govern the physical process under study. Temporal autocorrelations in fluctuations in fluorescence intensity are conventionally fit by a closed-form analytical solution pertaining to the assumed model, from which corresponding molecular properties are inferred. Although the choice of physical process may be unambiguous for certain cases such as free molecular diffusion or convective transport in homogeneous solutions, the correct choice of model becomes considerably less unambiguous in the application of FCS to © 2012 American Chemical Society

Received: December 22, 2011 Accepted: March 14, 2012 Published: March 14, 2012 3871

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry

Article

in steady-state transport processes are due to fluorescent particles entering and leaving the detection volume as a result of distinct physical processes such as free diffusion, anomalous diffusion, and convection due to active, motor-driven transport or fluid flow. Each process results in a distinct functional form of the TACF of the fluorescence intensity:4,5,34

the need for complex pairwise statistical tests. Although Bayesian inference is not commonly applied in the biophysical sciences, it has made a significant impact on the analysis and modeling of gene expression as well as genetics data that suffer from similar issues of noise, heterogeneity, and model uncertainty.20−23 Bayesian inference has additionally been applied to single-particle diffusion analysis24 and more recently to the analysis of single-molecule fluorescence resonance energy transfer (FRET) data.25 Traditional analysis of FCS data typically involves the use of maximum likelihood estimation (MLE) to fit one or more models to measured temporal autocorrelation functions (TACF), and reduced χ-squared values to select the best fitting model.26,27 Although useful for obtaining parameter estimates and uncertainties for a given model at the point estimate provided by least-squares fitting, MLE tends to favor complex models that overfit measured data because the reduced χ-squared approach does not penalize model complexity adequately.28,29 Moreover, MLE does not allow for the direct ranking of multiple competing models by their relative probabilities, supporting only pairwise comparisons of nested models, where the simpler model provides the null hypothesis and a standard likelihood ratio test or the F-test is performed.30 These deficiencies render conventional MLE suboptimal for use as a systematic framework for hypothesis testing in the analysis of FCS data. At the level of parameter estimation, Bayesian inference is similar to MLE when uniform priors are chosen for the model parameters. At the level of model selection, however, Bayesian inference assigns explicit model probabilities that are proportional to the data marginal likelihoods by considering the full range of parameter values and their posterior probability distributions rather than only using pointwise ML estimates, thereby penalizing model complexity appropriately and preventing overfitting of experimental data.28,29 In the present work we develop an empirical Bayesian inference procedure for the problem of model selection and parameter estimation of molecular transport properties from FCS data. The proposed procedure is applied here to simulated, theoretical TACFs in order to evaluate rigorously its theoretical properties in a controlled setting in which the sampling and noise properties of the TACF, as well as the underlying physical model, can be controlled independently and precisely. The procedure is applied to intensity fluctuations measured from actual FCS data in accompanying work.31 This first step in the establishment and evaluation of the presently proposed, novel procedure to analyzing FCS data is in our view essential in order to test its performance rigorously in the absence of confounding effects of uncontrolled noise in the TACF, sample heterogeneity, photobleaching, and nonstationarity of the underlying physical process. Although the approach is applied here to temporal correlations of transport processes, it is equally applicable to time series fluctuation data reporting on physical processes including, for example, chemical reactions, as well as more generally to biophysical processes including single-molecule transport dynamics interrogated using single-particle tracking and spatial−temporal image correlation spectroscopy.7,32

G (τ ) =

⟨δF(t )δF(t + τ )⟩ ⟨F(t )⟩2

(1)

where F(t) is the integrated fluorescence intensity in the detection volume at time t, τ denotes lag time, brackets denote ensemble average, which by ergodicity is also assumed to be its time average, and δF(t) = F(t) − ⟨F(t)⟩ is the fluctuation in intensity about its mean. For the case of two-dimensional free diffusion of ND noninteracting species (“ND model”), the TACF adopts the simple, closed-form solution:4,5,33,36 ⎛ ⎞−1 τ ⎟ GD(τ ) = ∑ αi⎜⎜1 + τDi ⎟⎠ ⎝ i=1 ND

(2)

where αi = Bi2⟨Ni⟩/∑i(Bi⟨Ni⟩)2 is the relative amplitude of species i, ⟨Ni⟩ is the average number of particles in the focal volume, and Bi is the brightness of species i. τDi = w02/4Di is the diffusive time scale of species i, where w0 is the detection width and Di is the diffusion coefficient of species i. Similarly, for the case of two-dimensional diffusion−convection (“DV model”) of a single species, the closed-form TACF is34 −1 ⎡ −1⎤ ⎛ τ ⎞2 ⎛ 1 ⎛ τ ⎞ τ ⎞ ⎥ ⎢ GDV (τ ) = ⎜1 + ⎟ exp −⎜ ⎟ ⎜1 + ⎟ ⎢ ⎝ τV ⎠ ⎝ ⟨N ⟩ ⎝ τD ⎠ τD ⎠ ⎥⎦ ⎣ (3)

where τV = w0/v is the convective time scale, v is the convective speed, and eq 3 simplifies to GV(τ) = ⟨N⟩−1 exp[−(τ/τV)2] for the case of pure convection (“V model”). The nondimensional Peclet number, which is given by Pe ≡ w0v/D, characterizes the relative contributions of convective versus diffusive transport on the length scale of the detection volume.37 The preceding twodimensional results are easily generalized to 3D, as well as to other physical processes.4,5 Classical Model Selection. In classical data regression, one attempts to fit a model f(x, β) = [f(x1, β), f(x2, β), ..., f(xn, β)] to a data vector with n data points y = {y1, y2, ..., yn}, where x = {x1, x2, ..., xn} and β = {β1, β2, ..., βp} denote values of the n sampled points and the p model parameters, respectively. For FCS data, y = {G(τ1), G(τ2), ..., G(τn)} and x = {τ1, τ2, ..., τn}. Each data point is assumed to have an error, εi, associated with it so that the data are related to the model by yi = f(xi, β) + εi for (i = 1, 2, ..., n), where errors are assumed to be independent and normally distributed around zero.38 The probability distribution of each data point with standard deviation σi is then p(yi|β) = (1/(σi(2π)1/2))exp{−([yi − f(xi, β)]2)/(2σi2)}, and the total probability (or “likelihood”) of the full set of data



THEORY Fluorescence Correlation Spectroscopy. Classical FCS measures time-dependent fluctuations of fluorescence intensity collected from a femtoliter illumination volume detected using a PMT or APD.1,2,5,33−35 Fluctuations in fluorescence intensity 3872

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry

Article

P(Mk|y) only requires knowledge of P(y|Mk), which is obtained from the integral P(y|Mk) = ∫ βP(y|β, Mk)P(β|Mk) dβ. Here, P(y|β, Mk) is the likelihood (probability) of the data given a specific set of parameter values in eq 4 and P(β|Mk) is the prior probability distribution of the parameters, which may also be taken to be uniform in the absence of information suggesting otherwise. Calculation of P(y|Mk) may be performed exactly using numerical integration or analytically using the asymptotic, Laplace approximation.41 In the Laplace approximation it is assumed that the integrand P(y|β, Mk)P(β|Mk) can be approximated by a multivariate Gaussian distribution centered at the ̂ Bayesian point estimate, β̂Bayes, where βBayes = argmaxβ[P(y|β, Mk) P(β|Mk)].28,29 This allows for the analytical evaluation of the requisite marginal likelihood:

that is assumed to consist of n independent individual measurements is n

P(y|β) =

∏ p(yi |β) i=1

=

n ⎧ [y − f (xi , β)]2 ⎫ ⎪ ⎪ ⎬ exp⎨− ∑ i ⎪ ⎪ 2σi 2 (2π )n /2 ∏in= 1 σi ⎩ i=1 ⎭

1

(4)

Given the fact that each value in the TACF is the average of a large number of samples, the assumption of normally distributed noise in the TACF is justified by the central limit theorem. If instead noise is correlated, as is often the case with FCS data,39 the general multivariate Gaussian function for the probability of the data is used instead:28,38 P(y|β) =

1 (2π )n/2 det(C)

̂ P(y|Mk) = (2π ) p /2 |ΣBayes|1/2 P(y|βBayes , Mk)

1 exp − [y − f(x , β)]T C−1 2

{

}

× [y − f(x , β)]

̂ × P(βBayes |Mk)

from which model probabilities may be computed directly, where ΣBayes is the covariance matrix that is evaluated asymptotically at ̂ , given by ΣBayes ≈ HBayes−1, and HBayes = −∇∇log[P(y|β)P(β)]β=β̂ βBayes Bayes is the Hessian of the posterior (Supporting Information, S2). ̂ When a uniform prior is chosen, βBayes and ΣBayes are equivalent to β̂MLE and ΣMLE, respectively. The Laplace approximation is employed in the present work, and direct numerical integration using a Monte Carlo approach is explored in the Supporting Information (S3). Once model probabilities have been computed in the preceding manner, model selection then necessarily proceeds in a subjective manner by choosing a threshold for the evidence supporting or refuting a given model (Supporting Information, S4).41 Importantly, conditioning models directly on the observed data results in explicit model probabilities. This is in stark contrast to standard hypothesis testing, which conditions on the null hypothesis. This facilitates the nonpairwise comparison of non-nested models, unlike classical hypothesis testing.42 In addition, Okham’s razor or the Principle of Parsimony is naturally incorporated into the model evaluation process by the penalization of unnecessarily complex (or overparameterized) models through marginalization of the parameters in the calculation of model probabilities.28,29 Our use of standard nonlinear regression in the application of the Bayesian inference procedure to FCS data offers the benefit that it is efficient computationally and may be implemented in a straightforward manner, as shown below. Once model probabilities have been computed, parameter estimation may proceed either by using the most likely model when one such model is dominant or by accounting for the finite probabilities of each model in computing parameter estimates and uncertainties.43 Selection of Priors. At the outset of a new investigation without prior information regarding the biophysical process of interest, a natural choice for priors are noninformative, uniform distributions that assume ignorance about distinct models and their parameter values other than an estimate of their approximate range. Although we adopt noninformative priors in the present work for all models and parameters, future studies may address the incorporation of nonuniform priors, as would be present, for example, in follow-up investigations of biophysical or biological phenomena. In either case, the posterior distribution converges to the same asymptotic value,

(5)

where C is the covariance matrix of the error terms. When the errors are independent, all off-diagonal terms in C become zero and this expression reduces to eq 4. For simplicity we assume independent errors in the following discussion and therefore employ eq 4 for the probability of the data. We demonstrate in the Supporting Information (S1) and an accompanying work that is focused on application of the present Bayesian framework to actual FCS data that correlated errors are incorporated in a relatively straightforward manner through calculation of the covariance matrix either directly from intensity traces or multiple independent TACF measurements.31 Conventional data regression employs MLE to calculate a point estimate of the model parameters via maximization of the foregoing probability: ̂ βMLE = arg max β[P(y|β)]

(6)

̂ where βMLE denotes the ML estimate of the model parameters given the observed data y. Maximization of the probability in eq 4 is equivalent to minimization of the sum of the squared n errors, χ2(y, β) = ∑i=1 ([yi − f(xi, β)]2)/(σi2), as conventionally performed when calculating the goodness-of-fit measure χ2. Bayesian Model Selection. The principal conceptual difference between conventional and Bayesian hypothesis testing is the explicit treatment of models and their parameters as random variables by the latter approach. Rather than seeking solely point estimates of model parameters, in Bayesian inference we seek model probabilities P(Mk|y) given the observed data, where Bayes’ theorem40 relates these probabilities to the marginal likelihood of the data P(y|Mk): P(Mk|y) =

P(y|Mk)P(Mk) ∝ P(y|Mk) P(y)

(8)

(7)

where P(Mk) is the prior probability of model k, which in the absence of any information may be taken to be equal for all k, and P(y) = ∑kP(y|Mk)P(Mk) is the marginal likelihood of the data considering all models, which amounts to a normalization factor that is constant and independent of the specific set of models considered. Because model probabilities are subjected to the normalization constrain ∑kP(Mk|y) = 1, calculation of 3873

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry

Article

Figure 1. Bayesian analysis of two-component FCS data. (A) Individual TACF curve, G(j)(τ), and the mean of eight individual TACFs, G̅ (τ), simulated using the two-component diffusion model with D2/D1 = 5, α2/α1 = 1, and 144 data points on a quasi-logarithmic scale. G̅ (τ) has a theoretical noise level of σ0/G0 = 17%, where σ0/G0 = σKoppel(τ1)/∑iαi. Weighted least-squares fits of the one- and two-component diffusion models (ND = 1 and ND = 2, respectively) to G̅ (τ) and corresponding normalized residuals are shown. Error bars represent the estimated noise (standard errors) in G̅ (τ) calculated from 256 individual TACF curves. (B) Inferred model probabilities computed from fitting G̅ (τ) as a function of increasing noise level. Parameters are the same as in panel A, with eight individual TACF curves used to compute each G̅ (τ), and 32 G̅ (τ) total per noise level. (C) Diffusion coefficients obtained from the one- and two-component model fits in panel B. Black lines denote prescribed (“true”) values of D1 and D2 for the two-component model. (D) α obtained from the one- and two-component model fits in panel B. The black line represents the prescribed (“true”) value of α1 and α2 for the two-component model, which are equal. Medians and upper and lower quartiles are shown in panels B−D for the results of fitting 32 mean curves (Supporting Information, S10). This analysis procedure is performed for all subsequent reported figures unless otherwise noted.

sample: a larger number of data points allows for proper model discrimination despite higher noise (Supporting Information, S9). In other words, the Bayesian approach prefers the more complex model only when the quality and amount of data are sufficient to justify the added complexity. Importantly, the probability of the three-component model remains near zero for all levels of noise, demonstrating that the procedure also does not overfit the TACF data. Parameter estimates from the one- and two-component models show that the diffusion coefficients obtained from the two-component model begin to diverge when the probability of this model decreases to zero (Figure 1C), whereas the inferred amplitudes or mean particle numbers are more robust to the noise (Figure 1D). Thus, model probability may be used as a measure of reliability of underlying parameter estimates obtained for a given model. For a fixed level of noise and α2/α1 = 1, the two-component model is preferred to the one-component model when the difference between D1 and D2 is large because the two species may then be distinguished in the TACF (Figure 2A). In contrast, in the intermediate regime when D2/D1 ≈ 1, Bayes prefers the simpler one-component model because the two components cannot be distinguished, and the range of this regime narrows with decreasing noise (Figure 2B). Thus, the Bayesian model selection procedure again naturally penalizes overly complex models when the resolution in the data does not allow for the underlying physical process to be distinguished. Note that the two components are better resolved

although a poor choice of prior will reduce the absolute rate of convergence. Clearly, however, proper (normalized) priors must be used in all cases when integrating posterior distributions for model comparison.28,44 The use of alternative priors such as normal or Jeffreys priors29 in computing model probabilities is explored in the Supporting Information (S5 and S6). Jeffreys priors represent equal probability per decade (scale invariance) for broad prior ranges that represent great uncertainty in a given parameter.



RESULTS AND DISCUSSION Effects of Noise and Underlying Physical Process on Model Probabilities. In order to illustrate key properties of the proposed Bayesian approach, we begin by applying the procedure to simulated TACFs that are generated by adding Koppel noise to the analytical TACF for two-component pure diffusion (Supporting Information, S7 and S8). This simulates experimental TACFs that result from multitau hardware correlators.45−47 For fixed ratio of diffusion coefficients (D2/ D1 = 5) and equal α, the two-component model is preferred over the one-component model for low noise levels as expected (σ0/G0 < 18%) (Figure 1B). As the relative noise level increases beyond, however, the one-component model probability increases monotonically, and eventually dominates near σ0/G0 = 32%, indicating preference for this simpler model (P > 95%). Thus, the Bayesian approach naturally prefers the simpler model when the signal-to-noise ratio in the TACF is low, with this crossover depending on the number of data points in the 3874

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry

Article

Figure 2. Bayesian analysis of simulated two-component FCS data with varying physical processes. (A) Inferred model probabilities as a function of D2/D1 with α1 = α2 = 0.5 and σ0/G0 = 7% added noise. D2/D1 is varied by changing D2 while holding D1 fixed at 10 μm2/s. (B) Same analysis as panel A for σ0/G0 = 26% added noise. (C) Inferred model probabilities as a function of α2/α1 for D2/D1 = 5 and a fixed level of noise σ0/G0 = 17%. α2/α1 is varied while keeping α1 + α2 = 1. (D) Same analysis as panel C for D2/D1 = 20. In all analyses, the inferred probability of the threecomponent model is always zero and is not shown.

(i.e., D2/D1 of the crossover point is closer to 1) when D2/D1 < 1 because D2 is the varying parameter and the characteristic decay in the FCS curve from a slow component (low D2) is generally less obscured by noise than the decay from a fast component (high D2) because of the decay in Koppel noise with increasing τ. Similarly, for fixed level of noise and ratio of D2/D1, the twocomponent model is preferred when the amplitudes of the two components are similar (α2/α1 ∼ 1), whereas the onecomponent model is preferred when α2/α1 is considerably different from 1 because the contribution from one of the species to the TACF is too small to be detected (Figure 2, parts C and D). Importantly, the regime where the two-component model is preferred becomes narrower when the ratio of D2/D1 approaches 1 because in this limit the species have identical diffusion times and cannot be distinguished. Similar model selection properties to those above are observed when fitting TACFs with uniform noise (Supporting Information, S11), which is typical of image correlation spectroscopy (ICS), in which the sampling time is fixed over τi.6−10 Correlated noise can be handled by using the more general noise model eq 5 instead of eq 4 to calculated model probabilities. Ignoring noise correlations overfits the overly complex model when correlations in the noise are strong, while overfitting is not observed when the covariance matrix of the noise is incorporated into fitting (Supporting Information, S1). This important observation is addressed in detail in our accompanying work that applies the present framework to TACF curves computed from underlying fluorescence intensity traces.31 Incorporation of Additional Physical Processes. In general applications of FCS to unknown physical transport processes such as membrane receptor dynamics in adherent

cells, it is not known a priori whether the true physical process is governed by pure normal diffusion or whether additional or alternative modes of transport are present, such as directed transport (convection) driven by active processes such as molecular motors. To illustrate the extension of the Bayesian approach to additional physical processes, we include models of convection and mixed diffusion−convection in the fitting process (Figure 3A). As above, Bayes correctly identifies the two-component diffusion model as the true physical process at low noise. However, as the relative noise level increases, the probability of the one-component diffusion model again increases to nearly 1 before the probability of the pure convection model increases to compete with the one-component diffusion model at high noise (Figure 3A). In this high-noise regime, Bayes is unable to distinguish between the two simplest models and does not prefer one over the other because they have equal complexity (i.e., three fitting parameters). The range of the prior affects the model probabilities slightly, but it does not alter the relative ranking of the models (Supporting Information, S12). Again, Bayes does not overfit the TACF, as illustrated by the fact that the probabilities of models of greater complexity remain near zero for all levels of noise (Figure 3A). The foregoing well-established feature of the Bayesian inference approach that allows for the simultaneous evaluation of nonnested competing hypotheses is an important distinction from conventional frequentist hypothesis tests. Switching the “true” underlying physical process (i.e., the model used to generate the TACF curve) from two-component diffusion to one-component diffusion−convection results in a similar pattern of model selection as a function of noise. When diffusion is comparable with flow (Figure 3B), Bayes correctly distinguishes the diffusion−convection model at low noise, until first the simpler one-component pure diffusion model 3875

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry

Article

Figure 4. Inferred model probabilities as a function of Peclet number for the simulated diffusion−convection model with (A) σ0/G0 = 42% and (B) σ0/G0 = 6% added noise.

Incomplete Model Set. Finally, it is of interest to test the performance of the Bayesian model selection approach when the “true” physical process is not among the possible set of models considered in the data fitting and model inference process. Of course, this is the most common scenario in reality because all models are wrong, being mere approximations to reality.48 However, as pointed out by Posada and Buckley,30 despite the fact that all models are wrong, some models are useful, and the aim of Bayesian inference is precisely to determine which of a set of competing models is best justified according to the Principle of Parsimony to describe the observed data given the relative model complexities. To investigate this scenario, the two-component diffusion model is again used to simulate the TACF, but this model is excluded from the fitting process (Figure 5). For low noise Bayes prefers the more complex, three-component model to the simpler, one-component model, whereas for high noise model preferences are reversed (Figure 5A), and this reversal occurs at increasingly higher noise levels as the ratio of D2/D1 is increased (Figure 5B). Interestingly, this crossover occurs at a lower noise level than in Figure 3A, presumably because the overfitting (three-component) model is penalized more than the “true” model due to its relatively higher model complexity. Comparison with the Maximum Entropy Method, Bayesian Information Criterion, and Classical Statistics. The maximum entropy method (MEM) is an alternative procedure that has been proposed for model selection in the analysis of FCS data.49−51 In this approach, the true physical process is assumed to consist of independent diffusing species with an unknown distribution of relative concentrations. Associating the distribution of diffusing species with a probability distribution allows for the definition of an entropy that can be maximized in the TACF fitting process, consistent with maximum entropy data fitting procedures that generally seek the maximum entropy distribution that is consistent with a set of observed data.28,52

Figure 3. (A) Inferred model probabilities for simulated twocomponent diffusion (D2/D1 = 5) as a function of added noise. Fit models include one-, two-, and three-component diffusion, pure convection (“V”), diffusion−convection (“ND = 1 + V”), and twocomponent diffusion−convection (“ND = 2 + V”). (B) Inferred model probabilities for a simulated diffusion−convection transport process (Pe = 1) fit with the subset of models in panel A. (C) Same as panel B with Pe = 10.

competes with the true model at intermediate levels of noise, and then the one-component pure diffusion and pure convection models compete at high noise because of their identical model complexity. The same trend is observed when convection dominates: the pure convection model is preferred at intermediate noise levels until one-component pure diffusion becomes indistinguishable from pure convection at high noise. Exploring a broad range of Peclet numbers results in similar Bayesian model selection behavior as that observed in the preceding pure two-component diffusion model analyses (Figure 2, parts A and B). Namely, for the well-distinguished physical regimes of Pe ≪ 1 and Pe ≫ 1 Bayes clearly prefers the simpler pure one-component diffusion and convection, respectively, whereas in the intermediate regime Pe ∼ 1 Bayes selects the more complex, true physical process of diffusion− convection, where the width of this intermediate regime broadens with decreasing noise level (Figure 4). As before, overfitting is naturally avoided, as demonstrated by the negligible probability of the two-component model. 3876

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry

Article

Figure 6. Maximum entropy analysis. (A) Distributions of species obtained by fitting TACF curves from the two-component diffusion model (τD1 = 0.01 s, τD2 = 0.001 s, α2/α1 = 1.5) simulated with three levels of added noise. (B) Distributions of species obtained by fitting TACF curves from the diffusion−convection model (τD = 0.01 s) simulated at three Peclet numbers.

Figure 5. (A) Inferred model probabilities as a function of added noise for simulated two-component diffusion (D2/D1 = 5) when the twocomponent diffusion model is excluded from the fitting process. Fit models include one- and three-component diffusion, pure convection, and diffusion−convection. (B) Same as in panel A with D2/D1 = 20.

Returning to the simulated two-component diffusion model, here we apply the maximum entropy fitting procedure to infer the distribution of species {αi} for three levels of noise (Figure 6A). In contrast with the Bayesian approach that prefers simpler models at increasing levels of noise, the maximum entropy approach prefers more complex models (broader species distributions) with increasing noise (or with lower numbers of data points), because maximum entropy approaches do not explicitly model noise in the data fitting process.52−54 Application of the maximum entropy procedure to simulated diffusion−convection TACF curves at varying Peclet number results in a multi-component distribution that becomes narrower and shifts to slower diffusion time scales with increasing Peclet number (Figure 6B). Thus, although the maximum entropy procedure is potentially useful for analysis of multi-component systems of purely diffusing species, it is clearly limited in its ability to infer alternative physical processes, as pointed out previously by Petrov and Schwille.5 Bayesian inference is also rooted in the principle of maximum entropy but behaves distinctly from the foregoing maximum entropy approach because noise associated with the data-generation process is incorporated explicitly into the model inference process28 (Supporting Information, S13). As an alternative to the Laplace approximation, model probabilities may alternatively be approximated using the simpler Bayesian information criterion (BIC) without specifying the parameter prior distribution under the condition of a large sample size n. The BIC is a special case of the Laplace approximation that assumes the unit information prior, which penalizes complex models less than the currently implemented uniform prior of parameters. In the case of a large sample size n, the contribution of the prior to the model probabilities may be small and therefore ignored so that model probabilities may be approximated using the BIC. The performance of the BIC for

FCS data illustrates that model probabilities calculated using the BIC and the Laplace approximation are similar when the sample size is large (Supporting Information, S14). However, the BIC fails to penalize overly complex models sufficiently when the sample size is small, in which the BIC approximation is no longer valid. In addition to adequate penalization of model complexity, including the contribution of the prior to the model inference process enables the specification of the allowable parameter range, which cannot be achieved with the BIC. Further, the additional computational cost of employing the Laplace approximation instead of the BIC is negligible, so this asymptotically exact approximation is preferred in the application of Bayesian inference to FCS data. 28,29 As an alternative to Bayesian inference, classical frequentist approaches to hypothesis testing such as the F-test or the likelihood ratio test may be employed for hypothesis testing. However, there are several clear limitations in these tests that motivate the use of the Bayesian approach: (1) these tests may only be applied to nested models, whereas in the case of FCS models are often not nested, (2) model probabilities cannot be directly computed for each model, and (3) pairwise comparisons of multiple competing models are both computationally complex and may result in different conclusions depending on the significance level and comparison order selected.30 In contrast, Bayesian inference conveniently deals with complex model spaces (non-nested, multiple models) and produces model rankings or relative strengths while appropriately modeling noise in the data without complex pairwise comparisons or statistical tests, rendering it convenient for application as a standardized hypothesis testing procedure in the systematic and objective evaluation of FCS data. 3877

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry



Article

(4) Thompson, N. L. In Topics in Fluorescence Spectroscopy; Lakowicz, J. R., Ed.; Plenum Press: New York, 1991; Vol.1, pp 337−378. (5) Petrov, E. P.; Schwille, P. In Springer Series in Fluorescence; Springer-Verlag: Berlin, Germany, 2008; Vol. 6. (6) Kolin, D. L.; Costantino, S.; Wiseman, P. W. Biophys. J. 2006, 90, 628. (7) Kolin, D. L.; Wiseman, P. W. Cell Biochem. Biophys. 2007, 49, 141. (8) Burkhardt, M.; Schwille, P. Opt. Express 2006, 14, 5013. (9) Sisan, D. R.; Arevalo, R.; Graves, C.; McAllister, R.; Urbach, J. S. Biophys. J. 2006, 91, 4241. (10) Kannan, B.; Hong, Y.; Thankiah, S.; Liu, P.; Maruyama, I.; Wohland, T. Biophys. J. 2007, 325A. (11) Wachsmuth, M.; Waldeck, W.; Langowski, J. J. Mol. Biol. 2000, 298, 677−689. (12) Schwille, P. Cell Biochem. Biophys. 2001, 34, 383−408. (13) Culbertson, M. J.; Williams, J. T. B.; Cheng, W. W. L.; Stults, D. A.; Wiebracht, E. R.; Kasianowicz, J. J.; Burden, D. L. Anal. Chem. 2007, 79, 4031−4039. (14) Sanguigno, L.; De Santo, I.; Causa, F.; Netti, P. A. Anal. Chem. 2011, 83, 8101−8107. (15) Kim, S. A.; Heinze, K. G.; Schwille, P. Nat. Methods 2007, 4, 963−973. (16) Delon, A.; Usson, Y.; Derouard, J.; Biben, T.; Souchier, C. Biophys. J. 2006, 90, 2548−2562. (17) Tcherniak, A.; Reznik, C.; Link, S.; Landes, C. F. Anal. Chem. 2009, 81, 746−754. (18) Milon, S.; Hovius, R.; Vogel, H.; Wohland, T. Chem. Phys. 2003, 288, 171−186. (19) Hac, A. E.; Seeger, H. M.; Fidorra, M.; Heimburg, T. Biophys. J. 2005, 88, 317−333. (20) Friedman, N.; Linial, M.; Nachman, I.; Pe’er, D. J. Comput. Biol. 2000, 7, 601. (21) Friedman, N.; Cai, L.; Xie, X. S. Phys. Rev. Lett. 2006, 97, 168302. (22) Sachs, K.; Itani, S.; Carlisle, J.; Nolan, G. P.; Pe’er, D.; Lauffenburger, D. A. J. Comput. Biol. 2009, 16, 201. (23) Jaqaman, K.; Danuser, G. Nat. Rev. Mol. Cell Biol. 2006, 7, 813− 819. (24) McHale, K.; Berglund, A. J.; Mabuchi, H. Biophys. J. 2004, 86, 3409−3422. (25) Bronson, J. E.; Fei, J.; Hofman, J. M.; Gonzalez, R. L. Jr.; Wiggins, C. H. Biophys. J. 2009, 97, 3196−3205. (26) Meacci, G.; Ries, J.; Fischer-Friedrich1, E.; Kahya, N.; Schwille, P.; Kruse, K. Phys. Biol. 2006, 3, 255−263. (27) Meseth, U.; Wohland, T.; Rigler, R.; Vogel, H. Biophys. J. 1999, 76, 1619−1631. (28) Sivia, D. S.; Skilling, J. Data Analysis: A Bayesian Tutorial, 2nd ed.; Oxford University Press: Oxford, U.K., 2006. (29) Gregory, P. C. Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support; Cambridge University Press: New York, 2005. (30) Posada, D.; Buckley, T. R. Syst. Biol. 2004, 53, 793. (31) Guo, S. M.; Monnier, N.; He, J.; Guangyu, S.; Wohland, T.; Bathe, M. Anal. Chem. 2012, DOI: 10.1021/ac2034375. (32) Petersen, N. O.; Hoddelius, P. L.; Wiseman, P. W.; Seger, O.; Magnusson, K. E. Biophys. J. 1993, 65, 1135. (33) Elson, E. L.; Magde, D. Biopolymers 1974, 13, 1−27. (34) Magde, D.; Webb, W. W.; Elson, E. L. Biopolymers 1978, 17, 361−376. (35) Qian, H.; Elson, E. L. Appl. Opt. 1991, 30, 1185. (36) Lakowicz, J. R. In Principles of Fluorescence Spectroscopy; Springer: New York, 2006. (37) Truskey, G. A.; Yuan, F.; Katz, D. F. Transport Phenomena in Biological Systems; Pearson Education: Upper Saddle River, NJ, 2009. (38) Seber, G. A. F.; Wild, C. J. Nonlinear Regression; Wiley: New York, 1989. (39) Koppel, D. E. Phys. Rev. A 1974, 10, 1938−1945.

CONCLUSIONS FCS is a powerful technique to infer transport and binding processes of proteins and other macromolecules in a spatially as well as temporally resolved manner in complex chemical systems including living cells and tissues.4,5 Application of FCS involves two distinct stages of data analysis, namely, model inference and parameter estimation. Model inference involves testing multiple competing models that are hypothesized to govern the physical process under investigation, such as onecomponent diffusion, multi-component diffusion, convection (directed transport), and mixed diffusion−convection, among others. Bayesian inference provides a convenient framework for this process, conditioning models directly on the data rather than on a null hypothesis, thereby computing model probabilities explicitly based on their (favorable) ability to describe the data given their (unfavorable) complexity, satisfying the Principle of Parsimony. Although subjectivity necessarily still enters the model selection process once model probabilities have been computed because a probability threshold must be selected before a “correct” model may be chosen, the Bayesian approach formalizes the hypothesis testing process by using measured data to update one’s knowledge of a system’s behavior by computing posterior model (and parameter) probabilities explicitly, without resorting to complex statistical tests. Importantly, the proposed Bayesian approach also provides a screening test for the downstream interpretation of model parameters, because low model probabilities imply poorly described data with associated parameter values that may either be erroneous or have high associated uncertainty. Application of the presently proposed approach to TACFs generated from fluctuation intensity traces that include effects of photophysics and light collection, as well as correlated and inhomogeneous noise, is the subject of an accompanying work.31



ASSOCIATED CONTENT

S Supporting Information *

Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Phone: 617-324-5685. Fax: 617-324-7554. E-mail: mark.bathe@ mit.edu. Notes

A patent has been filed on behalf of the Massachusetts Institute of Technology by Ditthavong Mori & Steiner, P.C. listing Mark Bathe, Syuan-Ming Guo, Nilah Monnier, and Jun He as coinventors of this Bayesian approach to FCS data analysis.



ACKNOWLEDGMENTS Funding from MIT faculty start-up funds and the Samuel A. Goldblith Career Development Professorship awarded to M.B. is gratefully acknowledged.



REFERENCES

(1) Magde, D.; Elson, E. L.; Webb, W. W. Biopolymers 1974, 13, 29−61. (2) Icenogle, R. D.; Elson, E. L. Biopolymers 1983, 22, 1919−1948. (3) Berland, K. M.; So, P. T. C.; Gratton, E. Biophys. J. 1995, 68, 694−701. 3878

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879

Analytical Chemistry

Article

(40) Bayes, T.; Price, R. Philos. Trans. R. Soc. London 1763, 53, 370− 418. (41) Kass, R. E.; Raftery, A. E. J. Am. Stat. Assoc. 1995, 90, 773−795. (42) Bevington, P. R.; Robinson, D. K. Data Reduction and Error Analysis; McGraw-Hill: New York, 2003. (43) Raftery, A. E. Sociological Methodology 1995, 25, 111−163. (44) Carlin, B. P., Louis, T. A., Eds. Bayesian Methods for Data Analysis, 3rd ed.; CRC Press: New York, 2008. (45) Koppel, D. E.; Axelrod, D.; Schlessinger, J.; Elson, E. L.; Webb, W. W. Biophys. J. 1976, 16, 1315. (46) Wohland, T.; Rigler, R.; Vogel, H. Biophys. J. 2001, 80, 2987− 2999. (47) Saffarian, S.; Elson, E. L. Biophys. J. 2003, 84, 2030−2042. (48) Box, G. E. P.; Tiao, G. C. Bayesian Inference in Statistical Analysis; Wiley-Interscience: New York, 1992. (49) Bryan, R. K. Eur. Biophys. J. 1990, 18, 165−174. (50) Sengupta, P.; Garai, K.; Balaji, J.; Periasamy, N.; Maiti, S. Biophys. J. 2003, 84, 1977. (51) Langowski, J.; Bryan, R. Macromolecules 1991, 24, 6346−6348. (52) Jaynes, E. T. Proc. IEEE 1982, 70, 939−952. (53) Jaynes, E. T. Phys. Rev. 1957, 106, 620−630. (54) Jaynes, E. T. Phys. Rev. 1957, 108, 171−190.

3879

dx.doi.org/10.1021/ac2034369 | Anal. Chem. 2012, 84, 3871−3879