Nonparametric Estimates of Drift and Diffusion ... - ACS Publications

Oct 13, 2014 - In many cases, one can still use FP algebra with censored and truncated data even if, at every time t, there are regions along the reac...
1 downloads 0 Views 435KB Size
Article pubs.acs.org/JPCB

Nonparametric Estimates of Drift and Diffusion Profiles via Fokker− Planck Algebra Steven P. Lund,*,† Joseph B. Hubbard,*,‡ and Michael Halter*,‡ †

Statistical Engineering Division and ‡Biosystems and Biomaterials Division, National Institute of Standards and Technology, 100 Bureau Drive, Stop 8980, Gaithersburg, Maryland 20899, United States S Supporting Information *

ABSTRACT: Diffusion processes superimposed upon deterministic motion play a key role in understanding and controlling the transport of matter, energy, momentum, and even information in physics, chemistry, material science, biology, and communications technology. Given functions defining these random and deterministic components, the Fokker−Planck (FP) equation is often used to model these diffusive systems. Many methods exist for estimating the drift and diffusion profiles from one or more identifiable diffusive trajectories; however, when many identical entities diffuse simultaneously, it may not be possible to identify individual trajectories. Here we present a method capable of simultaneously providing nonparametric estimates for both drift and diffusion profiles from evolving density profiles, requiring only the validity of Langevin/FP dynamics. This algebraic FP manipulation provides a flexible and robust framework for estimating stationary drift and diffusion coefficient profiles, is not based on fluctuation theory or solved diffusion equations, and may facilitate predictions for many experimental systems. We illustrate this approach on experimental data obtained from a model lipid bilayer system exhibiting free diffusion and electric field induced drift. The wide range over which this approach provides accurate estimates for drift and diffusion profiles is demonstrated through simulation.



INTRODUCTION From diffusion in the presence of hydrodynamic flow1 to protein folding2,3 and gene expression,4 many physical and biological processes can be described by random fluctuations superimposed upon a deterministic drift. Descriptions of diffusion range from abstract mathematical formulations5 to predicting the behavior of neutrons in nuclear reactors to understanding the spread of infectious diseases throughout a population. Given the generality and importance of this process, enormous resources have been committed to measurement, modeling, and theoretical formulations. At present, there are three commonly used method classes for extracting diffusion coefficient profiles from experimental data. The most direct approaches occur when individual diffusive trajectories or paths are observed (e.g., upper panel of Figure 1). Data of this type allows one to extract the displacements or transitions of the individual diffusing entities through time. These displacements can be used to estimate empirically or through modeling w(y,Δt|x), the probability of an entity occupying state y at time difference Δt after occupying state x.6−10 Given transition probabilities, drift and diffusion profiles are available as A(x) = limΔt↓0(1/Δt) ∫ ((y − x)w(y, Δt| x)dy) and D(x) = limΔt↓0(1/Δt) ∫ [(y − x − A(x)]2 w(y, Δt| x)dy, respectively.11,12 Isolated diffusion trajectories are unavailable in many systems (e.g., fluorescence recovery after photobleaching (FRAP), concentration profiling in multicomponent systems, and sedimentation velocity concentration profiling). When many This article not subject to U.S. Copyright. Published 2014 by the American Chemical Society

similar or identical entities diffuse simultaneously and their paths cannot be resolved from one another, specific individual trajectories cannot be traced (e.g., lower panel of Figure 1). This type of data easily facilitates estimating unconditional or marginal distributions (or density profiles), f(x, t), which describe the probability that any entity occupies state x at observation time t. However, this data is seemingly opaque with respect to transition probabilities, w(y,Δt|x), which are related to the marginal density profiles by f(y,t + Δt) = ∫ ∞ −∞w(y,Δt|x) f(x, t)dx. In theses cases, one often seeks a solution of a diffusion/drift equation. In particular, one assumes parametric forms for the drift and diffusion profiles (e.g., spatially uniform external field and constant diffusion coefficient), and solves the appropriate equations or simulates the diffusion process to estimate the parameters of the drift and diffusion profiles. If the assumed forms of the drift or diffusion profiles are incorrect, the resulting profile estimates are likely to describe the system behavior inaccurately. Alternatively, one could utilize fluctuation theory13 (e.g., dynamic structure factors14,15) to extract a generalized diffusion coefficient. When using fluctuation theory, it is commonly assumed that no external forces act on the system, so that drift effects may be safely neglected. If external forces are present, then it is assumed that their mathematical form can be specified Received: August 20, 2014 Revised: October 6, 2014 Published: October 13, 2014 12743

dx.doi.org/10.1021/jp5084357 | J. Phys. Chem. B 2014, 118, 12743−12749

The Journal of Physical Chemistry B

Article

theoretical approach results in estimates comparable to results obtained using a solved diffusion/drift equation while requiring fewer model assumptions. The wide range over which this approach provides accurate estimates for drift and diffusion profiles is demonstrated through simulation. An uncertainty analysis for the estimated profiles, based on bootstrapping and Bayesian model averaging, is also presented.



METHODS Fokker−Planck Algebra in One Dimension. The onedimensional Fokker−Planck (FP) equation can be written as ∂f (x , t ) ∂ ∂2 = − [A(x , t )f (x , t )] + 2 [D(x , t )f (x , t )] ∂t ∂x ∂x (1)

Most commonly, eq 1 is used to predict the evolution of a distribution under a priori specified drift and diffusion profiles, A(x, t) and D(x, t), respectively. Boundary and initial conditions for f(x, t) are also specified to ensure solutions to eq 1 are unique. One then computes f(x, t) at arbitrary times as the solution to the corresponding partial differential equation. Here we present methods for solving the FP equation in the opposite direction under the assumption that the drift and diffusion profiles are time invariant. Note that eq 1 is a linear equation in partial derivatives of the probability density function, f(x, t). Suppose that one experimentally observes an evolving population at multiple times and that the observed distributions are realizations of a single solution, not necessarily unique, to the FP equation for some time-invariant, unknown drift and diffusion profiles, A(x) and D(x), respectively. Specifying additional boundary or initial conditions on f(x, t) is not required because the distributions are observed, not solved for, and they need not be the only existing solution to the FP equation for the fixed but unknown A(x) and D(x). We now derive a unique solution for the drift and diffusion profiles, given the observed distributions at two unique times. Consider eq 1 for a pair of time points, ti ≠ tj, such that A(x, ti) = A(x, tj) ≡ A(x) and D(x, ti) = D(x, tj) ≡ D(x). For notational simplicity, we frequently omit the x dependence and write A, D, f(t) and F(t) for A(x), D(x), f(x, t), and F(x, t), respectively. In this case, eq 1 can be written as

Figure 1. Illustration of Brownian trajectories in a parabolic potential landscape with A(x) = −x and D(x) = 1. The upper panel shows a single stochastic path, whereas the lower panel shows 70 interlacing trajectories from a stationary pair of drift and diffusion profiles along with six snapshots of the evolving population density.

a priori, as with solving diffusion/drift equations. There remains a need for a nonparametric method that can simultaneously determine both drift and diffusion profiles by observing population distributions or density profiles sparsely in time. The one-dimensional FP equation12 describes the temporal evolution of a probability density function f(x, t) given input parameters consisting of a diffusion coefficient profile, D(x, t), along with a drift profile, A(x, t). A key property of the FP equation is that it is a linear partial differential equation in population densities, f(x, t). Given a set of density profiles, {f(x, t)}t∈τ, from a discrete set of time points, τ, we show how timeinvariant “coefficients” D(x) and A(x) can be extracted via linear algebra. It should be noted that given estimates of D(x) and A(x) conditional or transition probability densities can be obtained via simulation or by solving the FP equation. There is large interest in systems for which diffusion may be coordinatedependent.13,16−21 In one dimension, we illustrate how our proposed method can be used to test if diffusion (or drift) varies along the reaction coordinate. That is, we provide a statistical test to rigorously examine whether the coordinatespecific estimates of diffusion (or drift) are consistent with a single scalar or, alternatively, require a varying profile description. This methodology facilitates the analysis of systems experiencing nonconstant diffusion or an unknown external field when individual diffusive trajectories are difficult or impossible to distinguish from a broader trajectory collection. The biological sciences, in particular, provide experimental examples of such systems, including fluorescence recovery after photobleaching (FRAP), membrane diffusion or the evaluation of fluctuations and the shape of the epigenetic landscape of gene expression.22 We illustrate our suggested approach on experimental data obtained from a model lipid bilayer system exhibiting free diffusion and electric field induced drift. Our

∂f (t ) ∂ ∂2 = − [Af (t )] + 2 [Df (t )] ∂t ∂x ∂x

(2)

for t = ti, tj. Let F(t) denote a cumulative distribution function, which reports the proportion of the population with values less than x at time t. When integration occurs over a finite range with a lower bound L such that the probability flux (i.e., time rate of change for the cumulative distribution function) at L at time t is negligible

∫L

x′

∂f (x , t ) ∂ dx = ∂t ∂t

∫L

x′

f (x , t ) d x ≈

∂F(x′, t ) ∂t

provided f(t) is differentiable. The constraint that the probability flux be negligible at L neither implies nor precludes the presence of a reflecting or absorbing boundary, as this condition is satisfied by any distribution with negligible density outside a finite domain. In experiments where every observation falls within the linear dynamic range of an instrument, one can satisfy this condition simply by setting L below the minimum observed value. For some theoretical applications integration may occur over an infinite range, in 12744

dx.doi.org/10.1021/jp5084357 | J. Phys. Chem. B 2014, 118, 12743−12749

The Journal of Physical Chemistry B

Article

D̂ (x). We use Bayesian model averaging computed over a sequence of polynomials with degrees from 0 to a number between 10 and Q. We compute the Bayesian information criterion (BIC) of each fitted polynomial to assign probabilities to each polynomial.24 The final estimate D̂ (x) is given as a weighted average of the individual polynomials. Local drift estimates are then computed as  xq =  *xq + D̂ ′(xq), where D̂ ′(x) = ((dD̂ (x))/dx). Uncertainty in D̂ ′(x) can be a predominant source of uncertainty in  xq, and uncertainty in model choice can be a predominant source of uncertainty in D̂ ′(x). We use the probabilities assigned to polynomials of varying degree to account for model choice uncertainty when estimating the uncertainty of  xq. The local drift estimates are then smoothed and interpolated (in the same manner as were the local diffusion estimates) to obtain a final drift profile estimate  (x). When modeling either drift or diffusion, the adequacy of any given polynomial (i.e., the consistency between the resulting smoothed point estimates and the raw point estimates) can be assessed via Hotelling’s T2. For example, the p-value for the hypothesis test that D(x) is a constant (i.e., zero degree polynomial) can be obtained by comparing a multivariate t test statistic to an appropriate F-distribution.

which case ∂f(t)/∂t must exist and be bounded (i.e., f(t) must be Lipschitz). After integrating with respect to x, we may write eq 2 as ∂F(t ) ∂ = −Af (t ) + [Df (t )] ∂t ∂x

(3)

All coefficients of integration can be taken to be zero because the integrands all contain a probability density function. Solving eq 3 for A at t = ti provides A=

∂ lnf (ti) dD 1 ∂F(t ) +D − dx f (ti) ∂t ∂x

t = ti

(4)

Substituting eq 4 into eq 3 for t = tj and solving for D yields f (ti) D=

∂F(t ) ∂t t = t j

f (ti)

∂f (t j) ∂x

− f (t j )

∂F(t ) ∂t t = t i

− f (t j )

∂f (ti) ∂x

(5)

Using this method to estimate a drift and diffusion profile over a given interval (l, u) only requires accurate values of F(x, t) for x within (l, u) at a minimum of two time points. Additional details about f(x, t) for x outside of (l, u) do not affect estimates within (l, u). Estimates of A and D can be extracted from each pairing of observed distributions, and as the number of profiles increases, the results can be aggregated to form increasingly reliable estimates. We demonstrate the effectiveness and flexibility of this approach by analyzing a combination of experimental and simulated data in the Results section. Computational Implementation. We provide an overview of our implementation methods in this section. For complete details, please see the Supporting Information (SI). Data and code to reproduce the results reported in this paper in R23 are available at https://github.com/SteveLund/FP-Algebra. While eqs 4 and 5 are direct consequences of eq 2, in practice we must estimate the density profiles from experimental data. The uncertainties and biases in these estimated distributions break the strict equality, and resulting estimates of D and A may be volatile at values of x for which any of f(t), ((∂f(t))/∂x), or ((∂F(t))/∂t) are very small, at either ti or tj, or for which (f(ti)((∂f(tj))/∂x)) ≈ (f(tj)((∂f(ti))/∂x)). When the population distribution evolves in time, the volatility in the estimates of D and A for a particular x may not be the same for all time pairs (ti, tj). Thus, it is beneficial to obtain estimates D(x) and A(x) from each possible pair of times and then to combine the results into a comprehensive final answer. We now outline our proposed method for estimating D and A from experimental data. As shown in eq 4, solving for A(x) requires knowledge of D′(x) = ((dD(x))/dx), which requires estimating D for a continuum of x. To simplify computation, we define A*(x) = A(x) − D′(x) for which estimation does not require simultaneous consideration of a continuum of x. That is, from each time pair, we estimate A*(x) and D(x) for each x in a given set, χ = {xq}qQ= 1, without knowledge of A*(x) or D(x) for any other values of x. We then combine the estimates from each time pair to obtain a single estimate of A*(xq) and D(xq), respectively. After completing this process for each xq ∈ χ, we then smooth and interpolate across the set of local diffusion estimates, {D̂ xq}qQ= 1, to obtain a final diffusion profile estimate,



RESULTS FP Algebra Applied to Membrane Diffusion. We evaluate the approach described above on experimental data obtained from a supported lipid bilayer system.25 This provided a well-defined model system because lipids within the supported bilayer exhibit free, two-dimensional diffusion within the membrane26 and fluorescence microscopy can be used to acquire high quality measurements of the diffusion process. The model bilayer was fabricated by transferring Langmuir monolayers onto glass coverslips via Langmuir−Blodgett/ Langmuir−Shaefer deposition.27 To enable the analysis of diffusion and drift, negatively charged, fluorescently labeled lipids were doped (∼1% mol/mol) into the bilayer so that the location of these molecules could be followed by time lapse, epi-fluorescence microscopy. The supported lipid bilayer was also micropatterned with a grid of squares to create barriers to lipid motion (see upper panel of Figure 2). Initially, the supported lipid bilayer is uniformly fluorescent (see left side of upper and lower panels of Figure 2). Upon the application of an electric field, the charged, fluorescent lipids drift toward the anode (see lower panel of Figure 2). Depleted zones and enriched zones of the charged, fluorescently labeled lipids were created by the electric field induced drift. Pixel intensities, a surrogate for the local concentration of fluorescently labeled lipids, were analyzed using the proposed methodology to extract drift and diffusion profiles. This image sequence was originally collected with the intention of analyzing the stationary state via a solved diffusion equation,25 which allows for direct comparison between conventional and FP algebraic methods. In the context of the lipid images, let x⃗ = [x y]T describe a pixel coordinate, where x and y denote the distance from the left and bottom barriers, respectively. For both the conventional and FP algebraic methods, the Dxx component of the 2 × 2 diffusion tensor is assumed to be constant across y, so that Dxx(x,y) = Dxx(x). The x-directional drift is assumed to be caused solely by the applied electric field and constant across y (i.e., Ax(x, y) = Ax(x)). Both methods assume the lipids exhibit 12745

dx.doi.org/10.1021/jp5084357 | J. Phys. Chem. B 2014, 118, 12743−12749

The Journal of Physical Chemistry B

Article

growth beyond 125 μm is believed to be an artifact of fluorophore quenching.28 Where the traditional approach estimates a fixed ratio of A/ D, FP algebra can produce local estimates of drift and diffusion. To improve the signal-to-noise ratio, the three regions (each 80 pixels by 69 pixels) are considered as a single (240 pixel by 69 pixel) image for each of the first five observation times, as seen in the lower panel of Figure 2. The stationary state images observed at 58 min were not included in this analysis as the exposure time for this image differed from those at earlier times. The top panel in Figure 4 displays the marginal intensity

Figure 2. Epi-fluorescence microscopy images of micropatterned, supported lipid bilayer. (Upper) Raw images at 0 and 23 min. Numbers (1, 2, 3) identify regions used for analysis. (Lower) Enlarged time lapse images of three micropatterned regions from supported lipid bilayer.

FP dynamics and that no lipid escapes its respective region. As described in the Discussion section, these assumptions allow one to extract Dxx(x) and Ax(x) by analyzing the column sums of the intensity matrices. In the following description, the x component subscripts are omitted (i.e., Dxx(x) is denoted as D(x), and Ax(x) is denoted as A(x)). The conventional method28 for estimating diffusion for this system requires the additional assumption that both drift and diffusion are constant within each region, so that D(x) = D and A(x) = A. Under these assumptions, the stationary state solution to the FP equation takes the form f(x) = becx, where b accounts for the arbitrary scaling of intensity units and c provides an estimate of A/D. When fitting this function to the profile of average intensity within each column (i.e., marginal profile), an additive intercept allowing for background intensity is also included. Figure 3 displays the steady state marginal

Figure 4. Lipid image intensity profiles along with corresponding drift and diffusion (log scale) profile estimates. Vertical lines provide estimated pointwise uncertainties of ±2 standard deviations. For drift and diffusion, the time pair used for each estimate is indicated by the colors of the central estimate and uncertainty bounds.

profiles as a function of distance from the left barrier for each time (following background subtraction described in the Supporting Information). Local estimates of drift and diffusion and corresponding uncertainties are shown in Figure 4. Additional details for FP analysis of the lipids data are provided in the Supporting Information. As discussed above, separate estimates of drift and diffusion are obtained from each pair of observed distributions. For a given x, some time pairs will produce better estimates of A(x) and D(x) than others. When f(x, t), ((∂f(x,t))/∂x) or ((∂F(x,t))/∂t) are nearly zero for either distribution, the resulting estimates of A(x) and D(x) are sensitive to noise and small biases. That is, small, flat, or temporally unchanging portions of a distribution can produce highly volatile estimates. For example, the initial intensity distribution is flat at all locations, so estimates formed with this distribution are unstable. Similarly, the distributions observed at 23 and 28.25 min are flat and small until roughly 60 μm and 90 μm, respectively. Applying eq 5 in such regions can produce unstable and even negative estimates of D, which are excluded from this plot. Additionally, the quenching effect seen in Figure 3 appears to be significant at intensities above roughly 65% of the maximum intensity. For this reason, we exclude estimates formed from estimated densities above the orange horizontal threshold shown in the top panel of Figure 4. By visual inspection, the results from FP algebra are generally consistent with a constant drift around 0.09 μm/s and a constant diffusion around 1 μm2/s, which is comparable to the estimate of 1.3 ± 0.2 μm2/s reported by Halter et al. for a similar system. The FP

Figure 3. Lipids stationary state intensity profiles (58 min) and corresponding drift to diffusion ratios estimated via conventional method.

profiles from three different micropatterned regions (58 min after electric field exposure). For each region, a series of threeparameter exponentials (f(x) = a + becx) were fit initially using pixels within (0, 110) μm of the left barrier. The domain was extended one pixel at a time to (0, 125) μm, producing a sequence of eight three-parameter exponential fits for each region. These fits and the range of the exponential coefficients are shown in Figure 3. These results imply the ratio of A/D lies between (0.08, 0.11) μm−1. The fall off from exponential 12746

dx.doi.org/10.1021/jp5084357 | J. Phys. Chem. B 2014, 118, 12743−12749

The Journal of Physical Chemistry B

Article

Figure 5. Observed population distributions (4 out of 8) and estimated landscape, drift, and diffusion profiles corresponding to simulation of 100 000 random walkers. Simulations reflected in this figure use constant diffusion coefficient profiles. Pointwise 95% confidence intervals are provided as thin gray lines for all but the first row. The green smoothed estimates of potential, drift, and diffusion profiles are Bayesian model averages computed over a sequence of polynomials. p-Values resulting from our suggested test for constant diffusion are reported in top left corner of each diffusion plot. (p-Values less than 0.05 indicate that data provide strong evidence that the true diffusion profile is not constant.)

Figure 6. See caption from Figure 5. Simulations reflected in this figure use nonconstant diffusion coefficient profiles.

algebra results are also in agreement with the A/D ratios reported above from the traditional analysis. We note that an experiment designed for FP algebra would ideally focus more heavily on the evolution, rather than stationary state, of lipid intensities by more frequently collecting images at early times. Drift and Diffusion Profile Estimates from Simulations. Our suggested drift and diffusion estimators are direct consequences of the FP equation, independent of the complexity of the true (time-independent) drift and diffusion profiles, provided the densities remain differentiable. To illustrate this flexibility, we simulate the temporal evolution of populations of 100 000 random walkers for six different drift profiles. Figures 5 and 6 display results for simulations with constant diffusion and nonconstant diffusion coefficient

profiles, respectively. In both cases, the provided plots display the observed distributions at several times along with true and estimated drift and diffusion profiles. For visual interpretability, we also include the potential landscape, which is given by the negative integral of the drift (i.e., U(x) = −∫ A(x) dx). In each of the 12 simulation scenarios presented in Figures 5 and 6, our methodology for assessing constant diffusion correctly distinguishes between the true diffusion profile being constant or nonconstant. That is, our suggested test for constant diffusion produced p-values greater than 0.05 for each of the six scenarios presented in Figure 5 with constant true diffusion profiles and produced p-values less than 0.05 for each of the six scenarios presented in Figure 6 with nonconstant true diffusion 12747

dx.doi.org/10.1021/jp5084357 | J. Phys. Chem. B 2014, 118, 12743−12749

The Journal of Physical Chemistry B

Article

profiles. Simulation and analysis protocols are described in the Supporting Information.

estimates can be obtained from the portions of intensity profiles that are above the background intensity and below the quenching threshold. Time-Dependent Profiles and Boltzmann’s H-Function. Theoretically, FP algebra could be used to estimate timedependent drift and diffusion profiles by imposing constraints along x rather than t to the forms of A(x, t) and D(x, t). Such a scenario would still require observing the population at multiple times to facilitate the estimate of ((dF(t))/dt). This could be useful for systems with a known physical symmetry. Alternatively, Hubbard et al.33 presented a method to estimate a time-dependent diffusion coefficient from population distributions using relative entropy, also called Boltzmann’s H-function



DISCUSSION From Multi- to Unidimensional. The proposed methodology is most stable in one dimension. Some multidimensional systems can be effectively analyzed as a (set of) onedimensional system(s). If dimension i satisfies Ai(x⃗, t) ≈ A*i (xi, t) and Dii(x)⃗ ≈ Dii(xi⃗ ), then the evolution of dimension i does not depend upon the other dimensions. In this case, dimension i may be accurately treated as a one-dimensional FP process and analyzed through its marginal distribution, f xi(x) = ((dFxi(x))/dx), where Fxi(x) = Pr (xi < x) is the cumulative distribution function of xi. For example, the lipid transport data was collected as two-dimensional images, and we extracted onedimensional drift and diffusion profiles (as functions of column index/horizontal position) by analyzing the marginal (x-axis) intensity profiles. In doing so, we implicitly assumed that the horizontal component of lipid drift and diffusion was constant across distance from the bottom boundary (y). Alternatively, in the event that a single variable evolves much more slowly that the remaining ones, the one-dimensional L/ FP dynamics can be extracted via the method of “adiabatic elimination of fast variables.” Examples of adiabatic elimination include statistical properties of laser light12 and descriptions of electromagnetic29 or hydrodynamic1 fluctuations. Generally speaking, it may be possible to identify a 1-d (onedimensional) order parameter that captures the essential physical features of the full multidimensional problem.30−32 For instance in the protein folding kinetics, one can simply count the number of binary amino acid contacts that correspond to the correctly folded 3-d structure and then define the order parameter as the fraction of these “native contacts.” Insofar as this fraction evolves continuously in time with a short-term memory of previous contacts, a 1-d L/FP projection provides a concise and predictive model for the folding of many real proteins.2 Normalization Invariance. The FP equation remains valid for f *(x, t) = kf(x, t) for any real k. Thus, using FP algebra does not require the probability density function f(x, t) to be normalized (i.e., ∫ f (x , t ) dx = 1), provided the (unknown)  normalization constant k is fixed across both x and t. In many cases, one can still use FP algebra with censored and truncated data even if, at every time t, there are regions along the reaction coordinate where f(x, t) is unknown (Supporting Information). The experimental significance of this result is that the described methods often can be applied when parts of the estimated distributions are subject to common instrumental artifacts such as finite dynamic range or saturation/nonlinear response. The fluorescence microscopy data of the supported lipid bilayer illustrates these points. Due to self-quenching of the fluorescent dye molecules when the local concentration is high, we do not know the exact normalization factor (i.e., total intensity of lipid fluorophores). However, we assume that relationship between pixel intensity and lipid concentration is time invariant, so for low intensities the normalization factor is fixed. Relatedly, outside a fixed intensity range, pixel intensity is not proportional to lipid concentration and the marginal intensity profile is not representative of the distribution of lipids. That is, there are regions of x for which f(x, t) is positive but cannot be accurately estimated. As the lipids results show, accurate drift and diffusion

H(ti , t j) =

⎡ f (x , t ) ⎤ i ⎥ f (x , ti)ln⎢ dx −∞ ⎢⎣ f (x , t j) ⎥⎦





(6)

Several restrictions limit the utility of their estimator. If the relaxing distribution is not observed to converge to a near stationary state, their estimator cannot be implemented. Additionally, if the observed population narrows or shifts in time or contains any regions of x in which f(x,ti) cannot be well-estimated, H cannot be computed. For systems where H can be computed, this function may provide guidance to three aspects of FP algebra. First, suppose that tI is the last time point at which data were collected from the evolving population. Under L/FP dynamics, the second law of thermodynamics requires that H(ti, tI) decreases in i for i = 1,..., I − 1. Significant deviations from a monotonic decrease in H(ti, tI) provide strong evidence against the assumption of L/ FP dynamics, upon which the FP algebraic methods described in this paper are based. Second, when choosing from a set of lower (e.g., one) dimensional marginal projections of the full joint probability, one might identify the (single) optimal coordinate as that for which the projected H function, Hproj, most closely matches the upper bound, H, computed from the full joint probability. Third, the evolution of a distribution between time pairs with small H values contains little information regarding A and D. For systems with a stationary state, f(x,∞), H(t,∞) can be thought of as the information in the population remaining at time t about D and A. As described in the following section, this consideration can help determine when to observe a relaxing distribution. When To Observe a Relaxing Population. Using the proposed method, drift and diffusion profiles can be extracted from probability density functions estimated from a single pair of time points. However, if data are collected at only two distinct times, the quality of those estimates will be severely limited by the estimation of ((∂F(x, t))/∂t), the local slope of an arbitrary curve, using only two points. Obtaining density profiles at many times is recommended as doing so improves estimates of ((∂F(x, t))/∂t) and increases the number of time pairs available to estimate D and A. We also recommend observing the population most frequently at early times. For a fixed landscape and diffusion, a perturbed system tends to relax most rapidly at early times. Also, frequent early observations help to accurately fit a nonparametric curve near the boundaries (i.e., where it is hardest to estimate ((∂F(x,t))/∂t)). Plotting the relative entropy, or another measure of (pseudo)distance, between the estimated density from each observation time and that from the final observation time from a preliminary experiment can help assess when to observe a system. For a 12748

dx.doi.org/10.1021/jp5084357 | J. Phys. Chem. B 2014, 118, 12743−12749

The Journal of Physical Chemistry B

Article

fixed number of observation times, it is beneficial to vary the time intervals between observations such that the ensuing relative entropies are evenly spread. When no previous data is available to estimate the relative entropies, we recommend sampling from a relaxing population in a manner such that the sequence of times separating observations increases exponentially (e.g., ti+1 − ti = 2(ti − ti−1)), on the basis of the exponential decay of relative entropy for normal distributions in a parabolic potential.12 Choosing evenly spaced observation times can adversely affect both bias and uncertainty in drift and diffusion estimates. We expound upon and illustrate these points in the Supporting Information.



(13) Berezhkovskii, A.; Szabo, A. Time Scale Separation Leads to Position-dependent Diffusion Along a Slow Coordinate. J. Chem. Phys. 2011, 135, 074108. (14) McQuarrie, D. A. Statistical Mechanics; Harper & Row: New York, 1976. (15) Zwanzig, R. Time Correlations and Transport Coefficients is Statistical Mechanics. Annu. Rev. Phys. Chem. 1965, 16, 67−102. (16) Straub, J. E.; Borkovec, M.; Berne, B. J. Calculation of Dynamic Friction on the Intramolecular Degrees of Freedom. J. Phys. Chem. 1987, 91, 4995−4998. (17) Straub, J. E.; Borkovec, M.; Berne, B. J. Molecular Dynamics Study of an Isomerizing Diatomic in a Lennard-Jones Fluid. J. Chem. Phys. 1988, 89, 4833−4847. (18) Straub, J. E.; Berne, B. J.; Roux, B. Spatial Dependence of Timedependent Friction for Pair Diffusion in a Simple Fluid. J. Chem. Phys. 1990, 93, 6804−6812. (19) Zwanzig, R. Diffusion Past an Entropy Barrier. J. Phys. Chem. 1992, 96, 3926−3930. (20) Best, B.; Hummer, G. Coordinate-dependent Diffusion in Protein Folding. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 1088. (21) Hinczewski, M.; von Hansen, Y.; Dzubiella, J.; Netz, R. R. How the Diffusivity Profile Reduces the Arbitrariness of Protein Folding Free Energies. J. Chem. Phys. 2010, 132, 245103. (22) Sisan, D. R.; Halter, M.; Hubbard, J. B.; Plant, A. L. Predicting Rates of Cell State Change Caused by Stochastic Fluctuations Using a Data-Driven Landscape Model. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 19262−7. (23) R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2012; ISBN 3-900051-07-0. (24) Raftery, A. Bayesian Model Selection in Social Research. Sociol. Methodol. 1995, 25, 111−163. (25) Halter, M.; Nogata, Y.; Dannenberger, O.; Sasaki, T.; Vogel, V. Engineered Lipids that Cross-Link the Inner and Outer Leaflets of Lipid Bilayers. Langmuir 2004, 42, 2416−2423. (26) Saffman, P. G.; Delbrück, M. Brownian Motion in Biological Membranes. Proc. Natl. Acad. Sci. U.S.A. 1975, 72, 3111−3. (27) Ulman, A. Characterization of Organic Thin Films; MomentumPress: New York, 2010. (28) Groves, J. T.; Boxer, S. G. Electric Field-induced Concentration Gradients in Planar Supported Bilayers. Biophys. J. 1995, 69, 1972−5. (29) Landau, L. D.; Lifshitz, E. M. Electrodynamics of Continuous Media, 1st ed.; Addison-Wesley: Reading, MA, 1960. (30) Berezhkovskii, A.; Szabo, A. One-Dimensional Reaction Coordinates for Diffusive Activated Rate Processes in Many Dimensions. J. Chem. Phys. 2005, 122, 014503. (31) Krivov, S. V.; Karplus, M. One-Dimensional Free-Energy Profiles of Complex Systems: Progress Variables That Preserve the Barriers. J. Phys. Chem. B 2006, 110, 12689−98. (32) Krivov, S. V.; Karplus, M. Diffusive Reaction Dynamics on Invariant Free Energy Profiles. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 13841−6. (33) Hubbard, J.; Lund, S.; Halter, M. Boltzmann’s H-Function and Diffusion Processes. J. Phys. Chem. B 2013, 117, 12836−12842.

ASSOCIATED CONTENT

S Supporting Information *

Details regarding implementation of the proposed method, the data processing applied to the lipids data set, and the simulations. Further discussions regarding the analysis of censored and truncated data, when to observe relaxing distributions, and Fokker−Planck algebra in multiple dimensions. This material is available free of charge via the Internet at http://pubs.acs.org/.



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: [email protected]. *E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors thank David Ross, Andrew Ruhkin, and Elizabeth Strychalski from NIST for their useful feedback regarding the methodology and manuscript.



REFERENCES

(1) Landau, L. D.; Lifshitz, E. M. Fluid Mechanics, 1st ed.; AddisonWesley: Reading, MA, 1959. (2) Onuchic, J. N.; Luthey-Schulten, Z.; Wolynes, P. G. Theory of Protein Folding: the Energy Landscape Perspective. Annu. Rev. Phys. Chem. 1997, 48, 545−600. (3) Muñoz, V.; Eaton, W. A. A Simple Model for Calculating the Kinetics of Protein Folding from Three-Dimensional Structuress. Proc. Natl. Acad. Sci. U.S.A. 1999, 98, 11311−6. (4) Sasai, M.; Wolynes, P. G. Stochastic Gene Expression as a Manybody Problem. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 2374−2379. (5) Feller, W. An Introduction to Probability Theory and its Applications; John Wiley and Sons: New York, 1971; Vols. 1 and 2. (6) Gradis̆ek, J.; Siegert, S.; Friedrich, R.; Grabec, I. Analysis of Time Series from Stochastic Processes. Phys. Rev. E 2000, 62, 3146−3155. (7) Siegert, S.; Friedricha, R.; Peinke, J. Analysis of Data Sets of Stochastic Systems. Phys. Lett. A 1998, 243, 275−280. (8) Fuchs, C. Inference for Diffusion Processes: With Applications in Life Sciences; Springer: New York, 2013; Vol. 1. (9) Poulsen, R. Approximate Maximum Likelihood Estimation of Discretely Observed Diffusion Processes. Ph.D. Thesis, University of Aarhus, 1999. (10) Qian, H.; Sheetz, M. P.; Elson, E. L. Single Particle Tracking. Analysis of Diffusion and Fow in Two-dimensional Systems. Biophys. J. 1991, 60, 910−921. (11) Van Kampen, N. G. Stochastic Processes in Physics and Chemistry; North-Holland: Amsterdam, 1981. (12) Risken, H. The Fokker-Planck Equation, 2nd ed.; SpringerVerlag: New York, 1989. 12749

dx.doi.org/10.1021/jp5084357 | J. Phys. Chem. B 2014, 118, 12743−12749