High-Resolution Nanoparticle Sizing with Maximum A Posteriori

Mar 11, 2019 - We derive unbiased statistical models for two observable quantities in a typical nanoparticle trajectory—the mean ... 2019 13 (1), pp...
0 downloads 0 Views 1MB Size
Subscriber access provided by ECU Libraries

Article

High Resolution Nanoparticle Sizing with Maximum A Posteriori Nanoparticle Tracking Analysis Kevin S. Silmore, Xun Gong, Michael S. Strano, and James W. Swan ACS Nano, Just Accepted Manuscript • DOI: 10.1021/acsnano.8b07215 • Publication Date (Web): 11 Mar 2019 Downloaded from http://pubs.acs.org on March 12, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

High Resolution Nanoparticle Sizing with Maximum A Posteriori Nanoparticle Tracking Analysis Kevin S. Silmore, Xun Gong, Michael S. Strano, and James W. Swan∗ Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA E-mail: [email protected]

Abstract

of an impurity in a commercially produced gold nanoparticle sample. Modern algorithms such as MApNTA should find widespread use in the routine characterization of complex nanoparticle dispersions, allowing for significant advances in nanoparticle synthesis, separation, and functionalization.

The rapid and efficient characterization of polydisperse nanoparticle dispersions remains a challenge within nanotechnology and biopharmaceuticals. Current methods for particle sizing, such as dynamic light scattering, analytical ultracentrifugation, and field-flow fractionation, can suffer from a combination of statistical biases, difficult sample preparation, insufficient sampling, and ill-posed data analysis. As an alternative, we introduce a Bayesian method that we call Maximum A posteriori Nanoparticle Tracking Analysis (MApNTA) for estimating the size distributions of nanoparticle samples from high-throughput single particle tracking experiments. We derive unbiased statistical models for two observable quantities in a typical nanoparticle trajectory — the mean square displacement and the trajectory length — as a function of the particle size and calculate size distributions using maximum a posteriori (MAP) estimation with cross validation to mildly regularize solutions. We show that this approach infers nanoparticle size distributions with high resolution by performing extensive Brownian dynamics simulations and experiments with mono- and polydisperse solutions of gold nanoparticles as well as single walled carbon nanotubes. We further demonstrate particular utility for characterizing minority components and impurity populations and highlight this ability with the identification

Keywords single particle tracking, Bayesian inference, particle sizing, cross validation, Brownian dynamics, polydispersity

Mean particle size and the distribution of particle sizes are basic yet critical properties of colloidal dispersions that are used in nanotechnology. The size distribution, in particular, is important for assessing the quality of the dispersion or the success of associated synthesis methods that target specific sizes and shapes. Additionally, many technological applications and industrial processes involving colloidal particles are highly dependent on particle size, such as nanoparticle-based drug delivery, 1 the kinetics of pharmaceutical uptake, 2,3 and catalytic activity. 4 While many methods exist to create inorganic particles and quantum dots with near monodisperse distributions, 5 the majority

ACS Paragon Plus Environment

1

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of nanoparticle systems are produced as polydisperse mixtures. Examples include graphene, graphene oxide, 6 clays, 7 and fullerenes. 8 Colloidal instability and aggregation generate undesired broad distributions even for initially monodisperse samples. Many widely used methodologies fail to adequately capture the size distribution in its entirety. Examples include dynamic light scattering and direct imaging techniques such as electron microscopy. On the other hand, techniques such as analytical ultracentrifugation and fieldflow fractionation are too cumbersome for routine analysis. There remains a pressing need to rapidly and efficiently characterize the particle size distribution of complex, polydisperse nanoparticle dispersions. While many techniques that rely on light scattering or direct observation are capable of accurately determining the sizes of particles in a monodisperse sample, they may fail to yield the correct size distribution of a polydisperse sample or even the correct shape of a wide unimodal distribution. This failure is not simply due to inadequate instrumental precision; rather, as will be discussed, it is a direct mathematical consequence of the fact that these techniques rely on ill-posed model formulations that are highly sensitive to experimental noise, are not necessarily guaranteed to produce unique solutions for a given set of data, and are consequently prone to generating spurious results without aggressive regularization. The most common techniques for sizing colloidal particles involve light scattering, the application of external fields, or microscopy. Of these techniques, light scattering is often used in research due to its relative simplicity and the existence of many commercially available instruments. Light scattering experiments can be categorized as either static or dynamic, where intensity data is collected as scattering angle is varied or as time passes, respectively. Of the two, dynamic light scattering (DLS), is more often applied to size colloidal dispersions whereas static light scattering is more commonly applied to the analysis of macromolecules. Laser diffraction and UV-Vis spectroscopy are additional techniques for sizing particles, but be-

Page 2 of 23

cause they are largely limited to particle sizes greater than several hundred nanometers 9 and metallic particles with strong surface plasmon resonance, respectively, we will not address them here. In a DLS experiment, the scattering vector, q, is fixed and the autocorrelation of the intensity of scattered light, g (1) (q, t), is calculated. 10,11 It can be shown that for a monodisperse solution of spherical particles, 



g (1) (q, t) = exp(−Γt) = exp −Dq 2 t ,

(1)

where D = kB T /(6πηR) by the classic StokesEinstein relation, R is the radius of the particle, and η is the viscosity of the surrounding solution. It is further assumed that solutions are dilute enough such that hydrodynamic interactions are negligible and do not affect the intensity autocorrelation. For a polydisperse solution, each particle size in the sample contributes to the observed autocorrelation function of the intensity such that g (1) (q, t) =

Z







p(R) exp −D(R)q 2 t dR , (2)

0

where p(R) is the size distribution, or probability density function (PDF), for hydrodynamic radius and D(R) reflects the size dependence of the diffusion coefficient. Again, this equation is valid for dilute solutions and small enough particles such that Rayleigh scattering is dominant. Calculation of the size distribution p(R) involves inverting equation 2 given observed autocorrelation data. Equation 2 is a modified Laplace transform of the size distribution, but even more generally, equation 2 represents a Fredholm integral equation of the first kind. These integral equations are usually written in the general form: g(t) =

Z

K(t, x)f (x) dx ,

(3)

where K is known as the kernel. For example, in the case of DLS, the kernel is exp(−D(R)q 2 t). It is well known that for smooth kernels, Fredholm integral equations are fundamentally ill-posed, meaning that solu-

ACS Paragon Plus Environment

2

Page 3 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

L(s(R), D(R), r, t) is a (numerical) solution to the Lamm equation. The Lamm equation is an advection-diffusion equation that depends on the diffusivity of the particles as well as their sedimentation coefficients, s. The sedimentation coefficient, in turn, is given by s = m(1− ν¯ρw )/(6πηR), where m is the mass of the particle, ν¯ is its partial specific volume relative to the solvent (usually water), and ρw is the density of the solvent. 24 While FFF and AUC can both be used to estimate the sizes of monodisperse samples of proteins and other nanoparticles, certain biases and limitations inherent to the techniques can hinder the accurate determination of a complete size distribution. In FFF, particle-membrane interactions can influence the resulting size distribution, 16 and resolution is inherently limited by the stochastic nature of particle diffusion. This mechanism for resolution loss also applies to gel electrophoresis and similar methods. Like the governing equation of DLS, equation 4 for AUC analysis also represents a Fredholm integral equation of the first kind. The kernel is now the solution to the Lamm equation for a specific size, which is smooth for most practical scenarios and guarantees that equation 4 is also ill-posed. Regularization methods, such as maximum entropy regularization, must be employed alongside fast numerical Lamm equation solvers (e.g., in Sedfit 19 ) to obtain approximate, smooth solutions for p(R). Furthermore, numerical Lamm equation solutions depend strongly on values of the sedimentation coefficient, s, and consequently require accurate knowledge of the particles’ specific volume, which is often not known in advance. Additionally, with both methods, it is often unclear how surface charges 25 as well as hydrodynamic effects 26 due to particle-particle and particle-wall interactions influence the resulting size distributions, as these complex interactions are neglected in typical analyses. In fact, for neutrally charged, nonspherical particles, hydrodynamic effects can manifest at volume fractions below ∼7%. 27 Strictly speaking, AUC experiments can be run at different concentrations and the sedimentation coefficient of a particle in the absence of confounding hydrodynamic

tions are unstable, are not necessarily unique, and can depend discontinuously on the observed data. 12 That is, for DLS, any noise in the autocorrelation function can directly confound calculation of the underlying size distribution and will result in spurious results. This ill-posedness is exhibited by other methods that rely on analysis of an autocorrelation function, such as X-ray scattering 13 and fluorescence correlation spectroscopy. 14 Techniques such as the method of cumulants 15 and CONTIN 10 attempt to address this issue by only focusing on moments of the distribution and by employing regularization, respectively, but information is inevitably lost in the process. Ultimately, though, the underlying difficulties associated with analyzing DLS data of polydisperse solutions (not to mention intensity biases and signal dominance by larger particles) necessitate cautious interpretation of results. Elution or sedimentation-based methods such as field-flow fractionation (FFF), 9,16,17 gel electrophoresis for charged particles, 18 and analytical ultracentrifugation (AUC), 19,20 unlike light scattering methods, depend on the difference in hydrodynamic mobility among particles of different sizes subjected to cross flow or external fields. Both techniques benefit from the versatility of being able to characterize a wide range of possible particle sizes and have been applied to analyze various different types of nanoparticles 21–23 and proteins. In AUC experiments, a sample is placed in a specially designed, optically transparent centrifugal cell. The sample is then spun at rapid speeds and absorption profiles (proportional to concentration via the Beer-Lambert law) are measured along the radial direction of the cell at many points in time (usually over a period of hours in the case of nanoparticles). The observed concentration profiles over time represent contributions from each particle size in the sample, so the size distribution must be calculated from the following equation a(r, t) =

Z



p(R)L(s(R), D(R), r, t) dR , (4)

0

where a(r, t) is the observed absorption profiles over the length of the cell and time, and

ACS Paragon Plus Environment

3

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 23

Walker. 29 Convergence of MApNTA as a function of the number of trajectories collected is also discussed. Finally, we conduct single particle tracking experiments with gold nanoparticles and carbon nanotubes to demonstrate its efficacy.

effects can be estimated by extrapolating to the zero concentration limit. 20 This procedure, though, is quite laborious and is not well defined for a polydisperse sample. Direct observation of particles via microscopy or single particle tracking represents a final commonly employed category of particle size characterization. In order to determine the size distribution or degree of polydispersity of synthesized nanoparticles, researchers often deposit the particles on a substrate and conduct scanning electron, transmission electron, or atomic force microscopy. From micrographs, one can then measure individual particle sizes by hand or via image processing algorithms to create a histogram of sizes. 28 Such methods are especially common for particles, such as metallic nanoparticles and carbon nanotubes, that do not change structure upon deposition. That being said, microscopy of deposited samples may suffer from adsorption biases and inherently does not allow for the study of aggregation states in solution, which is important for many applications. In this work, we address the lack of available techniques to reliably determine particle size distributions of arbitrarily polydisperse samples by developing a statistically robust, Bayesian framework that can efficiently and accurately infer size distributions from single particle tracking data. The resulting distributions are in general much more accurate than classical analysis based solely on mean square displacement. The method is named Maximum A posteriori Nanoparticle Tracking Analysis (abbreviated MApNTA and pronounced “manta”). Additionally, unlike other maximum likelihood algorithms, MApNTA employs cross validation — a common technique in applied statistics and machine learning — to ensure that distributions are not overfitted to the data. We first derive the theory underlying MApNTA, the benefits of using cross validation, and a quantitative metric for estimating confidence in resulting size distributions. Next, we validate MApNTA on many different Brownian dynamics simulations and compare MApNTA’s accuracy to a recent, state-of-the-art algorithm for analyzing size distributions developed by

Theory Single particle tracking (SPT) is capable of addressing these issues and is commonly used in analytical and biological settings 30,31 in order to calculate the diffusivity of single particles or an ensemble of particles in a fluid environment. Conventional SPT instrument setups consisting of a video camera and an optical microscope are limited to observation of particles that are larger than the diffraction limit (∼1 µm). However, the commercial development of SPT instruments has enabled researchers to directly observe the diffusive motion of nanoparticles via the use of forward scattered laser light with dark-field imaging. 32,33 In a typical particle tracking experiment, videos are collected of diffusing particles. Particle trajectories are then extracted during post-processing using a blob tracking algorithm such as that of Crocker and Grier, 34 which is often implemented alongside commercial instrumentation or as part of freely available software packages like Trackpy. 35 It is during this step that biases such as drift flows and non-uniform scattering may be incorporated into the data if they are present. For each particle trajectory of d steps (i.e., observed in d + 1 consecutive video frames), an empirical mean square displacement (MSD), ρ¯ =

d 1X (∆xi )2 d i=1

(5)

can be used to estimate D via Einstein’s 36 classic fluctuation-dissipation formula hρi = 4D∆t.

(6)

∆t is the duration of time between frames (the inverse of the frame rate), ∆xi is the imaged displacement of the particle between frames i−1

ACS Paragon Plus Environment

4

A

B

d

10

1

p( )

Empirical p(R)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

0 0

100 R (nm)

200

5 0 0.0

0.5 = x 2 ( m2 )

C 1 10 100

1.0

Estimated p(R)

Page 5 of 23

1.0

×10 2

0.5 0.0

0

100

200 R (nm)

300

Figure 1: Simulation results demonstrating that the mean square displacement of finite particle trajectories are fundamentally unable to reconstruct the original size distribution. (A) The original monodisperse size distribution in the form of a Dirac delta function corresponding to particles of 100 nm radius exhibits (B) distributions p(¯ ρ) with decreasing variance in ρ¯ as the observation duration d∆t is increased from 1 to 100∆t as shown (∆t = 1/25 s). (C) The resulting estimated size distributions p(R) are necessarily erroneous for finite trajectories, highlighting the need for a different approach. and i, and h•i is the long-time mean. Here, it is assumed that the imaged motion is in two dimensions (leading to the value of 4 for the coefficient) and the solution is sufficiently dilute such that hydrodynamic interactions can be neglected. Since Brownian motion is independent in each dimension, it suffices to consider only a particle’s projected 2D motion. In the standard approach, a histogram of diffusivities from each trajectory can then be compiled and readily converted into a proper size distribution given the temperature and viscosity of the solvent. Because diffusional motion is inherently stochastic, the empirical mean ρ¯ is only equal to the true mean hρi in the long-time limit as d → ∞. Much work has thus been dedicated over the past two decades to measuring trajectories accurately 37–40 and deriving better statistical estimators for individual tracks in various environments. 41–44 An unappreciated detail of this analysis is that for polydisperse samples, the histogram resulting from binning estimated diffusivities from individual trajectories only converges to the true p(R) as both the number of tracks and the length of tracks approach infinity. That is, even with an infinite number of observed trajectories of finite length, the resulting size distribution will not, in general, converge to the true distribution.

In this sense, then, calculation of p(R) is not ergodic. The right panel of Figure 1 demonstrates this concept with a perfectly monodisperse solution and illustrates the characteristic ‘smearing’ of estimated size distributions given an infinite number of trajectories of different, finite lengths. Of course, while one may be able to observe a substantial number of tracks, observing tracks of infinite length is not feasible in practice due to the tendency of particles to diffuse out of the camera’s focal volume. We find that relatively little has been contributed regarding the development of better statistical methods for the analysis of complete size distributions for polydisperse samples, where they are greatly needed given finite amounts of experimental data. Previous work in this area has largely focused on single particles in specific environments 45–47 or inferring particle size distributions that are known to obey a specific, unimodal functional form, such as the maximum likelihood method of Kato and coworkers. 48 Naiim et al. 49 developed a successful Bayesian inference method to calculate a discrete size distribution for particles with similar scattering intensities using data collected from multi-angle DLS experiments. Additionally, Walker 29,33,50 developed an iterative maximum likelihood algorithm (or EM-type algorithm) for arbitrary size distributions referred

ACS Paragon Plus Environment

5

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

to as FTLA for “finite track length adjustment.” However, FTLA only makes use of displacement data, and while convergence of the algorithm is guaranteed (only for sufficiently large data sets), the proposed stopping criterion for the iterative algorithm has a large effect on the resulting distribution, does not have a clear statistical basis, and is prone to overfitting or oversmoothing of size distributions, as discussed below. A general statistical treatment of single particle tracking data for estimation of size distributions is lacking. In this work, we chose to focus on analysis of SPT sizing methods because of the wealth of data collected during an experiment and the fact that, unlike other sizing methods, SPT experiments are efficiently run and can be continuously operated. Much of the data in a typical SPT experiment is unused by existing algorithms even though it lends itself to thorough statistical analysis. In a single particle tracking (SPT) experiment and after applying a given blob tracking algorithm, one is left with a set of trajectories, or tracks, of particles’ positions in a series of frames. More formally, one has a set (m) {{xi }i=0...d(m) }m=1...M , where M is the num(m) ber of total trajectories observed, xi is the position of particle m in video frame i, and d(m) is total number of steps for which particle m was observed. (When discussing the behavior of a generic particle, the superscript notation will often be neglected.) From these trajectories, displacements can be calculated such that ∆xi = xi − xi−1 represents the change in position from video frame i − 1 to video frame i. Certain fixed quantities of the experimental conditions — specifically, temperature T , solvent viscosity η, the dimensions of the focal volume (Lx , Ly , Lz ), the exposure time of the camera tE , the duration between video frames ∆t, and the static tracking error x — are assumed to be known. If not known, they can be calculated with the use of a standard sample and maximum likelihood estimation with the appropriate distribution. 37 As mentioned above, size distributions for a sample of particles can be created by estimating the diffusion coefficients from MSDs. If a spherical particle is observed for d steps over frames

Page 6 of 23

of duration ∆t (i.e., seen in d + 1 frames), then the estimated diffusion coefficient for its trajectory is Pd (∆xi )2 ˜ . (7) D = i=1 4d∆t ˜ for each trajectory can then A histogram of D easily be converted to a histogram for hydrodynamic radius, but doing so neglects the stochastic nature of the collected data. In the most general sense, given data collected from SPT experiments, the fundamental equation that governs the data collected is the following: p(data) =

Z



p(data | R)p(R) dR ,

(8)

0

where p(data) is the joint probability distribution of the data collected, and p(data | R) is the conditional distribution given the hydrodynamic radius of a particle. In this study, we accounted for several observable quantities for each particle track in a SPT experiment: (1) the MSD, ρ¯, (2) the number of steps over which the particle is observed, d, and (3) the location of the particle in the frame in which it is first seen, x0 . Thus, p(data) = p(¯ ρ, d, x0 ), and the conditional distribution with respect to radius is p(¯ ρ, d, x0 | R) = p(¯ ρ | d, R)p(d | x0 , R)p(x0 | R).

(9)

The conditional independence of ρ¯, d, and x0 fortunately allows for the development of individual statistical models for each conditional distribution, which are detailed in the following sections and in the Supporting Information. Throughout, it will be assumed that only the particles’ positions in 2D space are known from videos of trajectories (x0 ∈ R2 ) even though particles are really diffusing in 3D space.

Mean Square Displacement It is assumed that the solution of particles is sufficiently dilute such that hydrodynamic interactions can be neglected, that the surrounding solvent is isotropic in composition and temperature, and that particles are far away from

ACS Paragon Plus Environment

6

Page 7 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

x0 in the viewing area takes d steps before vanishing from the field of view. In other words, this is the probability that a particle survives in the viewing area of the microscope for exactly d steps. As such, it is intimately related to the first hitting time, or first passage time, of Brownian motion in a rectangular box. 53,54 Additionally, it is worth noting that because d is an integer, p(d | x0 , R) is a probability mass function as opposed to a continuous distribution. The full, functional form for this conditional distribution is omitted from the main text for the sake of brevity and can be found in the Supporting Information. A key feature of our approach is the inclusion of the distribution for track length, which contributes to the accuracy of the method by decreasing the conditional variance of predicted size given observed data (see Figure S1). In fact, the inclusion of any piece of information in equation 8 that is not uniformly distributed necessarily improves the predictive power of Bayesian inference.

physical boundaries. In a quiescent fluid, these assumptions should generally be valid for suspensions dilute enough to allow for identification and tracking of individual particles. The observed stochastic motion of a particle is then governed by the 2D diffusion equation, and the probability distribution for ρ¯ is gammadistributed. 37,51,52 Parameters including tE , the exposure time of the camera, and x , the static error due to pixelization, also factor into the expression. Example distributions for p(¯ ρ | d, R) can be seen in the middle panel of Figure 1 for different values of d. As d increases, the distribution becomes sharper around the long-time limit of hρi, such that as d → ∞, p(¯ ρ | d, R) converges in distribution to the delta function δ(¯ ρ − hρi). Thus, although longer trajectories provide ‘sharper’ information, neglecting to account for the spread at smaller d limits the inference potential of typical MSD-based analysis.

Trajectory Starting Location Regularized MAP Estimation

Let the dimensions of the focal volume of the microscope being used during a SPT experiment be denoted as L = (Lx , Ly , Lz ). We assume that p(x0 ) = 1/(Lx Ly ). The focal depth, Lz , of a typical optical microscope cannot typically extend beyond several micrometers. Compared toqthe focal depth, the viewing area length scale, Lx Ly , is often significantly larger. This means that most particles that enter the focal volume of the camera enter from the top or bottom so that the starting location for each trajectory can be assumed to be approximately uniformly distributed across the viewing area. Additionally, assuming a uniform distribution for p(x0 | R) is necessary without a complete understanding of the optical setup to avoid imposing an incorrect biased distribution (see Figure 7).

We now have the functional form for p(¯ ρ, d, x0 | R), and we aim to predict p(R) from the governing equation p(¯ ρ, d, x0 ) =

Z 0



p(¯ ρ, d, x0 | R)p(R) dR . (10)

As mentioned in the introduction, this equation represents a Fredholm integral equation of the first kind and is mathematically ill-posed. However, many of the issues inherent to numerical inversion schemes, such as unstable oscillatory behavior in the solution that can lead to spurious peaks, can be circumvented by instead using maximum a posteriori (MAP) estimation. First, p(R) is decomposed into a linear combination of N basis functions as p(R) =

N X

wn χn (R).

(11)

n=1

Trajectory Length

The weights, w, determine the heights of the basis functions. Any set of basis functions can in principle be used, but in this work we chose

p(d | x0 , R) represents the probability that a particle of size R and first observed at location

ACS Paragon Plus Environment

7

ACS Nano

for all R (w ≥ 0) and must be normalized (1T w = 1). Fmn represents the contribution of trajectory m to basis function n and is equal to

0.3

p(R)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0.2

Fmn =

0.1 0.0

0

1

2

R

3

4

to use triangular, or hat, functions as illustrated in Figure 2. A linear combination of triangular functions yields a piecewise linear function. The two basis functions at the ends of the domain are ‘half triangles,’ but this is of minor consequence. If the basis functions themselves P are properly normalized, then N n=1 wn is constrained to be 1, as p(R) must also be normalized since it is a PDF. The posterior distribution for w given observed data is p(w | ρ¯, d, x0 ) ∝ p(¯ ρ, d, x0 | w)p(w),

Z 0



(m)

p(¯ ρ(m) , d(m) , x0

| R)χn (R) dR .

(14) If the basis functions are local, like triangular functions, then the domain of this integral is finite. In MApNTA, the matrix F, whose entries are Fmn , is numerically calculated via trapezoidal integration. The log-prior, λg(w), serves as a regularization term that prevents overfitting. 55 Regularization, in general, is necessary for accurate inference of size distributions given a finite amount of data. Without regularization, the inferred distribution is prone to overfitting the collected data, exhibiting too many sharp peaks, and not generalizing well to predict different trajectories from the same sample. This is not, however, a symptom of numerical instability or related to the ill-posedness of the original equation. Equation 13 (without regularization) is similar to a linear program in the sense that certain wn are increased at the expense of others to maximize the objective function. In fact, equation 13 is a linear program when only one trajectory is observed (M = 1). Stated differently, assigning the collected trajectories to a certain few basis functions tends to maximize likelihood more than a smoother distribution does. The inclusion of a regularization term prevents this from happening and can be viewed equivalently as a prior that imposes the fact that real particle size distributions exhibit a certain degree of smoothness. We have tested and implemented two different types of regularizers, or priors: an entropy regularizer and a prior that penalizes the first derivative of p(R), denoted as ‘D1.’ For the entropy regularizer, gent (w) = R∞ 0 0 0 0 0 p(R ) log p(R ) dR , where R = R/∆R with ∆R as the spacing between the peaks of the basis functions. λ = 0 corresponds to no penalization and results in a p(R) with many sharp peaks. λ → ∞ corresponds to maximization of the entropy such that p(R) approaches a uniform distribution. For the D1 prior, gD1 (w) =

Figure 2: The size distribution is decomposed into a sum of triangular basis functions (also known as ‘hat’ functions).

(12)

by Bayes’s theorem, where the PDF for the data is now conditional on w due to the basis function expansion, and p(w) represents a prior. Given M observed trajectories that represent M independently drawn data points from the distribution p(¯ ρ, d, x0 ), the MAP estimate for ˆ is calculated by maximizing the w, denoted w, posterior (or log-posterior): M N X 1 X ˆ = arg max log Fmn wn + λg(w) w M m=1 w n=1

!

s.t.

Page 8 of 23

w≥0 1T w = 1.

(13) Equation 13 is the central governing equation of the MApNTA algorithm. As such, it warrants particular attention. The constraints in equation 13 simply represent the fact that p(R) must be non-negative

ACS Paragon Plus Environment

8

Page 9 of 23 R∞

0

) 2 ( dp(R ) dR0 penalizes changes in the slope dR0 of the size distribution and can be efficiently represented as a quadratic form, wT Aw, when using triangular basis functions (see Supporting Information). In this case, λ also controls the extent that ‘spikes’ in the inferred distribution are repressed. In most situations, both priors perform similarly. A major (and fortunate) feature of equation 13 is the fact that it represents a convex optimization problem when g(w) is a convex function. The Supporting Information provides a proof for the convexity of MApNTA. This useˆ ful property guarantees that if a solution for w 56 is found, then it is a global minimum. MApNTA uses interior point optimization to find ˆ but other convex optimization algorithms w, would also suffice. It should also be noted that if the domain of the basis functions is chosen incorrectly and there is a significant density outside of the domain, then that density will be lumped into the basis functions at the edge of the domain. Such behavior allows the user to automatically identify if the size range being investigated inadequately captures the data. An example of this can be found in the Supporting Information.

is conducted, and the optimization problem in equation 13 must be solved for each value of λ and for each MCCV fold (i.e., (# λ’s)×(# MCCV folds) times).

0

±1 standard error

4.34 Score

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

4.33 min

4.32

1SE

4.31 10

4

10

2

100

102

Figure 3: An example score function illustrating the locations of λmin and λ1SE . The dotted lines represent one standard error of the score function among all of the MCCV iterations. Next, a score function, Score(λ) ≡ −1 P K k Scorek (λ), is created as the average log-likelihood of the model evaluated over the test sets in each of the K MCCV folds. Ideally, a ‘best’ λ can be chosen by identifying the one that minimizes the score. However, appealing to parsimony in light of finite data, limited computational power, and experimental errors, we choose λ according to the one standard error rule, 59 where λ1SE is the largest λ that is within one standard error of the score function at λmin . In other words, λ1SE ≡ max{λ | Score(λ) ≤ Score(λmin ) + q P 2 k (Scorek (λmin ) − Score(λmin )) /K}. An example score function and an illustration of this procedure can be seen in Figure 3. Once λ1SE ˆ and p(R) is determined, the final result for w is calculated as the solution to the optimization problem in equation 13 using all of the M trajectories and λ = λ1SE . Throughout this work, we have found that α = 0.5 and a range of λ equal to [10−3 , 105 ]/N for both the D1 and entropy priors generally work well. N , again, is the number of basis functions employed. The resulting size distributions are not particularly sensitive to the value of α chosen (see also Zhang 57 ), and this range of λ encompasses a local minimum in the score

Hyperparameter Selection with Cross Validation It is now necessary to select the ‘best’ hyperparameter λ without knowledge of the true size distribution. To do so, MApNTA uses cross validation, a robust statistical technique often used to prevent overfitting of data. 55,57 The data set is partitioned into training and test sets and λ is selected as the value that best predicts the data in the test sets when optimized on the training sets. In particular, MApNTA uses Monte Carlo cross validation (MCCV), 57,58 also known as repeated learning-testing, in which the M trajectories are partitioned randomly into training and test sets over several iterations, or ‘folds.’ That is, a fraction, α, of the M trajectories are chosen randomly and placed in a training set at each iteration and the other (1 − α)M trajectories are considered the test set. A grid search over different values of λ

ACS Paragon Plus Environment

9

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for most data sets. Compared to simple K-fold cross validation, Monte Carlo cross validation better reduces bias due to the random partitioning of the data set. 57 Even so, the one standard error rule is employed to address overfitting by compensating for the sensitivity of λmin with respect to the particular choices of MCCV folds. This sensitivity is directly related to the typical flatness of the score function for small values of λ. In other words, the potential for overfitting is still present without an exhaustive cross validation procedure over all possible partitions of the data, which is a computationally unfeasible task. In the limit of an infinite number of folds, λ1SE approaches λmin since the standard error of the score approaches 0. As applied to MApNTA, this means that the resolution of the resulting distribution can be increased or decreased at will based on available computational resources. That is, for a small number of folds, the solution is conservatively regularized, as the standard error of the score function should be relatively large. As the number of folds is increased, λ1SE decreases as it approaches λmin and the solution becomes less regularized. This is accomplished at the expense of time spent running the algorithm.

Page 10 of 23

tential fluctuations of p(R) with respect to the data. Instead, we choose to report the N × N covariance matrix for w. Such a matrix clearly illustrates exactly how the different elements of w vary around the mean of the posterior. Roughly speaking, the higher the variance of a certain wn , the less we can be confident in its value. Furthermore, the structure of the covariance matrix, as discussed more below with reference to Brownian dynamics simulation results, can often be used to support the chosen λ1SE . To calculate the covariance matrix, we developed an efficient constrained Monte Carlo method to sample from the posterior, which is described in the Supporting Information.

Results and Discussion Brownian Simulations To validate MApNTA, Brownian dynamics simulations of (non-interacting) spherical particles were conducted where the radius of each particle was drawn from a particular, known distribution. Particle locations were initiated at a uniformly random position in a box of dimensions (Lx , Ly , Lz ) = (118, 88, 5.5) µm. These dimensions represent experimentally realistic values. The trajectory length of each particle was then calculated by detecting when each particle escaped the box as an integer multiple of frames of duration 1/25 s. 2D mean square displacements (i.e., as would be calculated from observing the trajectories through a microscope objective) were then drawn for each particle trajectory from the appropriate gamma distribution accounting for static and dynamic errors, as described above. Thus, single particle tracking experiments were conducted in silico, and the mean square displacements, the initial locations of particle trajectories, and the track lengths were appropriately calculated. Figure 4 demonstrates an ensemble of 1000 particles with normally distributed sizes having 50 nm mean and MApNTA results under D1 regularization for two fixed values of λ, λ1 = 10−5 and λ2 = 101 . It is clear that λ1 illustrates overfitting, whereas λ2 results in a

Error and Confidence One important aspect of MApNTA is its ability to provide information on the error associated with resulting size distributions. As MApNTA is a Bayesian method, one has access to the posterior distribution of w and can sample from it to get a sense of how probable (or not) ˆ other solutions are for the MAP estimate, w, given the observed data. It is quite common in statistics to generate so-called credible intervals from the posterior to demonstrate that a certain percentage of the posterior by volume is contained within a certain area of the parameter space. However, for an application such as MApNTA, a credible ‘interval’ in such a highdimensional space (i.e., N dimensions, one for each of the elements of w), is quite hard to interpret and does not shed much light on the po-

ACS Paragon Plus Environment

10

0.2 0.0

R (nm)

0.2 0.0

1

i

2

0

25 50 75 100 R (nm)

C

log|Cov(wi, wj)|

1

1 2 3 4 5 6 7 8

2

1 2 3 4 5 6 7 8

log|Cov(wi, wj)|

B j

p(R)

0.4 A

p(R)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

j

Page 11 of 23

i

Figure 4: (A) The known size distribution of 1000 simulated Brownian particles (top) and the results of MApNTA under D1 regularization for two different values of λ to demonstrate overfitting. (B) The covariance matrix of the weights about the mode of the posterior for λ1 , associated with the overfitted curve. (C) The covariance matrix for λ2 . smooth curve that better approximates the real distribution. There is also a marked difference in the covariance matrices in panels B and C of Figure 4 that result from MApNTA’s MCMC method. For the overfitted λ1 , the variance is sharply peaked around the weights that correspond to the spikes in the distribution, whereas for λ2 , the covariance matrix is much smaller in magnitude and varies more smoothly over the weights that correspond to nonzero density in the size distribution. The difference between the two covariance matrices is deeply tied to the fundamental bias-variance trade-off between the two models. Covariance matrices that exhibit sparse correlations among distant basis functions and look qualitatively similar to that of panel B in Figure 4 often indicate a strong degree of overfitting. Conversely, size distributions that are over-generalized and possess little predictive ability often exhibit covariance matrices that, while small in magnitude, show correlations among weights that extend beyond the region of nonzero density. With MApNTA, the choice of λ1SE via the one standard error rule prevents the inferred distribution from overfitting the data. To corroborate this lack of overfitting, we recommend that covariance matrices be reported alongside MApNTA results as a quantitative measure of fluctuations in the weights given observed single particle tracking data.

Figure 5 demonstrates the results of MApNTA compared against the results of the FTLA algorithm for different imposed size distributions in Brownian dynamics simulations. That is, each column of Figure 5 represents a different example distribution that can be associated with a commonly encountered experimental system among: (A) a monodisperse solution, (B) a lognormal distribution, (C) a monodisperse solution with a small impurity, (D) an aggregated sample, and (E) a bidisperse sample with well-separated peaks. For each distribution, approximately 10,000 particles were simulated. The frame rate and exposure time were set to experimentally realistic values of 1/25 s and 10 ms, respectively. Static error was set to 0 as justified with experimental analysis in the next section. Additionally, N = 100 basis functions were used, spanning a range of 1 to 100 nm, and 200 MCCV iterations were conducted. The first three rows of Figure 5 present the PDFs and CDFs of the real distributions as well as the distributions produced by the FTLA algorithm and MApNTA under both D1 and entropy regularization. As one can see in the last two rows of Figure 5, the choice of λ1SE for each sample results in covariance matrices that are relatively ‘smooth’ and exhibit values of the same order of magnitude across the support of their respective PDFs. This indicates a lack of overfitting and an associated degree of

ACS Paragon Plus Environment

11

ACS Nano

A

B

Real FTLA

C

D

E

0.05 0.00 0.10

D1 Entropy

0.05 0.00 1.0 0.5 0

50

R (nm)

100 0

5

50

R (nm)

100 0

4 2

0

50

R (nm)

100 0

5

10

0

0

0 60 Real quantiles (nm)

25 50 Real quantiles (nm)

25

50 Real quantiles (nm)

25

100 0

50 Real quantiles (nm)

5 0 5

50

R (nm)

100

30 40 50 Real quantiles (nm)

j

D1

40

50

R (nm)

i

i

i

i

i

i

i

i

i

i

j

1 2 3 4 5 6 7 8

log|Cov(wi, wj)|

(Model - Real) quantiles (nm)

0.0

1 2 3 4 5 6 7 8

log|Cov(wi, wj)|

CDF

MApNTA p(R)

p(R)

0.10

Entropy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 23

Figure 5: Columns (A) through (E) represent different example populations of 10,000 Brownian particles, including a unimodal distribution in (A), a lognormal distribution in (B), and multimodal distributions in (C)-(E). Brownian particle motion was simulated in a finite box to represent the focal volume of a typical microscope along with relevant experimental errors (see text). The top three rows represent the PDFs and CDFs for the known histograms of particle sizes (in black) as well as results using FTLA 29 and MApNTA with D1 and entropy regularization. The fourth row illustrates quantile-quantile (Q-Q) plots of the model CDFs versus the real CDFs (shown by the dotted black lines). The last two rows are the MApNTA covariance matrices for D1 and entropy regularization. ACS Paragon Plus Environment

12

Page 13 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

confidence in the inferred distributions. From the results in Figure 5, it is clear that the FTLA algorithm generally leads to PDFs that are too broad to describe relevant features in the underlying distributions. MApNTA, on the other hand, is able to correctly infer the dominant peaks and their widths. The models’ success in reproducing the real distribution can be compared using quantile-quantile (Q-Q) plots, which illustrate differences between CDFs based on the location of their quantiles. Due to smaller deviations from the real distribution in the Q-Q plots, it is evident that MApNTA is able to infer the underlying particle distributions more accurately in all cases for both the D1 and entropy priors. Although both MApNTA priors lead to CDFs that match the real distributions well, differences between them do exist. The D1 prior may be better at capturing local peaks whereas the entropy prior is slightly smoother across the entire domain of the distribution. Furthermore, it is worth noting that MApNTA, in addition to inferring peaks in the PDF, is also able to infer the relative proportions of particle populations. Specifically, the inflection points in the CDFs of samples C through E match the true subpopulation ratios of 1:4 in C, 5:3:2 in D, and 1:1 in E within 5%. This strongly indicates MApNTA’s ability to detect not only the presence of impurities, aggregates, etc. but also their fractions. In order to determine how the amount of data collected and computational effort employed affects MApNTA results, Brownian dynamics simulations were conducted with a varying number of trajectories M or a varying number of MCCV folds K (holding K = 50 and M = 104 constant, respectively). The underlying particle size distribution was that of the bidisperse sample D in Figure 5. 10 individually simulated SPT experiments with the same parameters as those in Figure 5 were conducted for each M or K, and the results are presented in Figure 6. The Kolmogorov-Smirnov (KS) statistic measures the difference between two CDFs. Namely, compared to the real, known CDF of the particles imposed in the simulation, the KS statistic for a given distribution is equal to maxR |CDF(R) − CDFreal (R)|. Based

on the KS statistic, it is clear that MApNTA’s ability to accurately infer the underlying distribution increases with the number of particle trajectories observed. This is not surprising. The KS statistic starts to plateau for both regularizers around 50,000 particle trajectories, which is easily collected in real experiments. With 10,000 observed particle trajectories, the KS statistic is largely insensitive to the number of MCCV iterations performed. However, in practice, the more MCCV iterations performed, the more resolved the distribution becomes, as illustrated in panels C and D, where an increasing number of folds leads to a better distinction between the two peaks of the underlying distribution. It also appears that for a given number of folds, the entropy regularizer is slightly more resolved than the D1 regularizer, as supported by the KS statistic and the series of PDFs in panel C. The results in Figure 6 are distribution-dependent but nonetheless give some sense of the number of particle trajectories and computational power required to infer a particle size distribution to a given degree of accuracy.

Experimental Results As mentioned in the Methods section, a few parameters of the experimental setup must be quantified using known standards. Trajectories of monodisperse gold nanoparticle and polystyrene bead dispersions with known sizes were collected and maximum likelihood estimation with the relevant statistical distributions was used to calculate focal depth and static error. For the instrument used in this study, focal depth was found to be 5.5 µm, and static error, x was found to be negligible. Specifically, x /[4D(∆t − tE /3)] < 0.01 over a wide range of nanoparticle sizes. As such, x was set to 0 for MApNTA analysis with negligible consequence. Figure 7 displays the experimentally collected distributions for track length and initial trajectory starting location within the video frame for gold nanoparticles of 30 nm in radius. For small values of d, p(d | R) ∼ d−3/2 , which is exactly the scaling that is predicted by the full solution for p(d | R, x0 ) in the limit as x0 approaches

ACS Paragon Plus Environment

13

ACS Nano

0.10

D Entropy

0.05

Folds

0.00

0

50 100 0 R (nm)

A 50 100 MCCV Folds D1

d 10

1

10

3

100

Folds

B1

50 100 R (nm)

3/2

101

102

d

0.75

Figure 6: The Kolmogorov-Smirnov (KS) statistic comparing the CDFs produced by MApNTA to the real, empirical CDF for a simulated bidisperse sample of particles. In (A), the number of MCCV folds was held constant at 50 and the number of observed particle trajectories M was varied, whereas in (B), M was held constant at 104 and the number of MCCV folds was varied. Shaded error bars represent one standard deviation over 10 independent simulated SPT experiments. (C) and (D) illustrate the the increasing resolution of each regularizer as the number of folds (i.e., computational power) is increased.

0.5 0.25 0

1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6

p(x0 R = 30 nm)

M

C

B p(d R = 30 nm)

D1 0.3 Entropy 0.2 0.1 0.0 2 10 103 104

y0/Ly

KS Statistic

A

p(R)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 23

0

0.25 0.5 0.75

x0/Lx

1

Figure 7: Data collected from 288,434 trajectories of gold nanoparticles with a radius of 30 nm. (A) Distribution of track lengths. For small d the distribution scales like d−3/2 , which agrees with theoretical predictions. (B) Distribution of trajectory starting locations.

ACS Paragon Plus Environment

14

Page 15 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

the boundary of the focal volume. Furthermore, this scaling is predicted by the L´evy distribution (the solution to the 1D first hitting time problem with a single wall) and has been reported elsewhere. 38 Panel B of Figure 7 shows a histogram of the initial trajectory starting locations, the complexity of which is likely caused by the optics of the instrumental setup and necessitates the use of a uniform distribution for p(x0 | R) without significantly detailed knowledge of the setup. Figure 8 shows the results of MApNTA compared with that of FTLA as implemented in a commercial instrument for solutions of gold nanoparticles and their mixtures. Samples A through C correspond to monodisperse gold nanoparticles of radius 20, 25, and 30 nm. Both results largely agree, with slight differences in the widths of the distributions. Column D, though, presents a supposedly monodisperse solution of 50 nm gold nanoparticles discovered to be off-specification. Both algorithms detect the dominant peak around 41 nm, but MApNTA also detects the presence of an impurity subpopulation that FTLA misses. The presence of this small subpopulation was confirmed with AFM data of the deposited sample (see Supporting Information). Columns E and F present the results of mixtures of 20 and 30 nm particles (i.e., A:C in a 1:1 ratio) and 30 and ‘50’ nm particles (i.e., C:D in a 1:1 ratio), respectively. The PDFs of MApNTA under D1 and entropy regularization match the simple linear combinations of the individual D1 PDFs in columns A through D relatively well. More specifically, unlike FTLA, which predicts distributions that are too smooth and error bars that are evidently too small, MApNTA is able to resolve the peaks of the constituent particle sizes. Indeed, the small impurity in the sample from column D is also seen in the mixture in column F (leading to three resolved peaks). It is worth noting as well that despite the presence of peaks in the D1- and entropy-regularized solutions, all of the values in the covariance matrices are of a relatively small magnitude, indicating a lack of overfitting and a high degree of confidence in MApNTA’s inference. Additional experiments with trimodal distributions of gold nanoparti-

cles can be found in the Supporting Information. Finally, as another example of its versatility, MApNTA was used to characterize the size distribution of an aqueous solution of SDS-wrapped single-walled carbon nanotubes (SWCNTs). It is fairly well known in the nanoparticle synthesis field that size characterization of carbon nanotubes (and other shape anisotropic particles, for that matter) is generally a challenging task that either requires deposition onto a substrate 62 or other laborious methods discussed in the introduction. Streit et al. 61 used MSD analysis of diffusing carbon nanotubes to create size distributions, but due to the use of fluorescence instead of darkfield laser scattering, all metallic nanotubes in the sample were inherently neglected. Figure 9 presents the results of MApNTA under D1 and entropy regularization for the prepared SWCNT dispersion. Panel A features the hydrodynamic radius distribution. Again, FTLA appears to miss particles of small hydrodynamic radii. Assuming a hydrodynamic diameter of 5 nm, 61 Batchelor’s higher order solution for the drag of slender rods 60 can be used to convert the hydrodynamic radius distribution into a length distribution (panel C) that exhibits the lognormal shape characteristic of dispersed carbon nanotubes. It is fair to assume that dispersed SWCNTs are largely straight and rigid since they are shorter than the typical SWCNT persistence length, which is of the order of 10 µm. 63 This calculated length distribution is consistent with values obtained from atomic force microscopy images of deposited SWCNTs (see Supporting Information). The differences between the results of FTLA and MApNTA become even more apparent when employing the distributions in calculations. For example, certain applications may rely on the number density of nanotubes in solution. Given the concentration of nanotubes in solution by mass, the number density calculated using the MApNTA length distributions in panel C of Figure 9 is 45% greater than the number density calculated with FTLA. Clearly, MApNTA’s identification of particles having small hydrodynamic radii is crucial for the accurate calculation of properties

ACS Paragon Plus Environment

15

p(R)

MApNTA p(R)

CDF

0.0

0.5

0.0 1.0

0.1

0.0

0.1

D1

j

0

i

i

50

D1 Entropy

25 R (nm)

19

FTLA TEM

0.0

0.5

0.0 1.0

0.1

0.0

0.1

0

B

i

i

25 R (nm)

25

50

0.0

0.5

0.0 1.0

0.1

0.0

0.1

0

C

i

i

25 R (nm)

29

50

0.0

0.5

0.00 1.0

0.05

0.00 0.10

0.05

0.10

41

i

i

0 25 50 75 100 R (nm)

16

D

0.0

0.5

0.00 1.0

0.05

0.00

0.05

0

A+C

E

i

i

25 R (nm)

50

0.0

0.5

0.00 1.0

0.05

0.00

0.05

31 42 C+D

i

i

0 25 50 75 100 R (nm)

14

F

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Figure 8: Size distributions, CDFs, and MApNTA covariance matrices for experimental data with monodisperse gold nanoparticles of radius (A) 20 nm, (B) 25 nm, (C) 30 nm, and (D) 50 nm. Column (E) is a mixture of the 20 and 30 nm particles in equal proportions. Column (F) is a mixture of the 30 and 50 nm samples in equal proportions. Results from the FTLA algorithm and histograms from TEM micrographs are compared with MApNTA. The dotted-line distributions in columns (E) and (F) correspond to a linear combination of the D1 solutions from the first four columns in the estimated ratios.

j

16

log|Cov(wi, wj)|

ACS Paragon Plus Environment

Entropy

log|Cov(wi, wj)|

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A

ACS Nano Page 16 of 23

1

j

0.5

Entropy

p(L)

0.01

2 425 nm

CDF

0.02

D

C

j

1.0

FTLA D1 Entropy

B

D1

A p(R)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

0.00

0

100 200 R (nm)

0.0

0

100 200 R (nm)

0

0

1 2 L ( m)

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

log|Cov(wi, wj)|

Page 17 of 23

i

Figure 9: The (A) size distribution for hydrodynamic radius, (B) cumulative distribution function (CDF), (C) length distribution, and (D) MApNTA covariance matrices for the D1 and entropy regularizers for a sample of carbon nanotubes dispersed in aqueous SDS. (A) and (B) show the results of MApNTA compared with the results of FTLA. The length distribution in (C) was calculated using Batchelor’s higher-order slender body theory 60 with an assumed diameter of 5 nm. 61 derived from the particle size distribution. The need for robust particle size analysis is greater than ever before, and these are but a few examples that could potentially benefit from the use of high resolution sizing methods such as MApNTA. Although the use of MApNTA for on-line quality control in manufacturing was not discussed in this manuscript, it is certainly possible. Compared to other methods, MApNTA trades computational time for accuracy. However, the majority of the computational time is spent performing cross validation, and for a given material and experimental setup, λ need only be found once. Subsequent applications of MApNTA with a fixed value of λ are extremely efficient and can easily be employed for on-line monitoring of colloidal samples.

square displacement data alone. The approach also overcomes the problems of overfitting and spurious peak generation that can occur in existing methods. Within MApNTA, cross validation is used as a statistically robust technique to prevent overfitting of data, and the posterior distribution is sampled to provide a quantitative measure of how resulting size distributions fluctuate with respect to the given data. Brownian simulations have been conducted to validate the accuracy of MApNTA, and size distributions for gold nanoparticles and individually dispersed carbon nanotubes have been successfully measured. MApNTA is a general-purpose tool that can be used to analyze single particle tracking data and efficiently infer size distributions of monodisperse and polydisperse samples alike. Furthermore, MApNTA represents a completely general framework, and future work could involve the incorporation of other experimental observables (e.g., intensity, fluorescence) provided that the correct statistical models associated with them can be developed. Given MApNTA’s ability to detect not only the presence of different populations of particles but also their relative proportions, we believe MApNTA can be used as a quantitative tool for the characterization of a wide variety of colloidal dispersions with varying purity or aggregation states that would otherwise be undetectable with standard techniques. As

Conclusions In this work, we have developed a Bayesian method called Maximum A posteriori Nanoparticle Tracking Analysis (MApNTA) for determining the size distributions of colloidal dispersions of nanoparticles using single particle tracking data. By employing the appropriate statistical model for each observable in a typical single particle tracking experiment, MApNTA is able to generate size distributions that are more accurate than methods that use mean

ACS Paragon Plus Environment

17

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

such, MApNTA should be useful for routine characterization of complex nanoparticle dispersions. A software implementation of MApNTA in MATLAB is included with this article.

Page 18 of 23

were treated with oxygen plasma (Glow Research AutoGlow instrument) at a power of 300 W for 2 minutes. Wafers were then exposed to vapors of (3-aminopropyl)triethoxysilane (APTES) at room temperature and pressure in a sealed petri dish under nitrogen for 3 hours. After this silanization procedure, wafers were rinsed with deionized water, and dried with a stream of nitrogen gas. The presence of silane groups on the surface was confirmed with atomic force microscopy (Asylum Research). Gold nanoparticle solutions were placed on top of the wafers for a period of 30 minutes, after which they were washed off with deionized water. AFM was then used to image the substrates, and particle heights were extracted numerically by identifying local maxima (see Supporting Information). To obtain AFM images of SWCNTs, a 1 µL droplet of SDS-SWCNT dispersion was dropcasted onto a silicon wafer and allowed to dry. Following Li and Zhou, 65 the wafer was soaked in glacial acetic acid for more than 24 hours. The wafer was then rinsed with deionized water, soaked in deionized water for another hour, and subsequently dried with a stream of nitrogen before AFM imaging. Lengths of more than 150 individual carbon nanotubes were then measured manually using the Gwyddion software package in order to form a histogram.

Experimental Methods Materials All chemicals, unless otherwise noted, were obtained from Sigma-Aldrich and used as received. Gold nanoparticles dispersed in 2 mM aqueous sodium citrate solution with diameters of 20, 30, 40, 50, 60, and 100 nm were obtained from nanoComposix, and histograms of particle sizes from TEM micrographs were provided. Gold nanoparticle solutions were diluted to a concentration of approximately 5 × 108 particles/mL in 2 mM aqueous sodium citrate solution prepared with deionized water (Millipore Milli-Q system, 18.2 MΩ cm). HiPco single-walled carbon nanotubes (SWCNTs) were purchased from NanoIntegris (batch HR29-039). The SWCNT powder was first sonicated in deionized water and extracted with hexanes to remove impurities. After evaporating off the hexanes on a hot plate, purified SWCNT powder was added to a 2% (70 mM) sodium dodecyl sulfate solution at an initial concentration of 1 mg/mL. The solution was bath sonicated for 15 minutes (Branson 2510 sonicator) and then tip sonicated at an amplitude of 40% for 2 hours (Qsonica Q125). In order to remove residual catalyst and any SWCNT aggregates, the solution was ultracentrifuged (Beckman Coulter Optima L-100 XP) at 33,000 rpm (∼ 1.8×105 g) for 8 hours, after which the top 80% of the supernatant containing individually dispersed SWCNTs was kept. The UV-vis spectrum (Shimadzu UV-3101PC) for the sample with chirality-indexed peaks is shown in the Supporting Information. The concentration of the solution was estimated to be 70 mg/L based on the absorbance at 632 nm. 64 The solution was diluted by a ratio of 1:106 before being analyzed for single particle tracking. Silicon wafers (p-type with h100i orientation) were purchased from UniversityWafer. Wafers

Single particle tracking Malvern’s NanoSight LM10 instrument was used to track the trajectories of nanoparticles in solution. Specifically, particles were tracked using forward scattered 405 nm laser light, a 10x objective lens, and a CMOS camera. Temperature was held constant by the apparatus at 293 K for all experiments. Exposure time and frame rate were recorded by the instrument. A plastic syringe attached to a syringe pump was used to flow diluted nanoparticle solutions through the imaging chamber between video collections so that no particles remained in the chamber between consecutive videos. 30 videos of 30 seconds each were typically collected for each sample. The NanoSight NTA 3.2 software was used to calculate the position of each visible particle

ACS Paragon Plus Environment

18

Page 19 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

across the frames of the videos.

5. Grzelczak, M.; P´erez-Juste, J.; Mulvaney, P.; Liz-Marz´an, L. M. Shape Control in Gold Nanoparticle Synthesis. Chem. Soc. Rev. 2008, 37, 1783–1791.

Acknowledgement K. Silmore was supported by the Department of Energy Computational Science Graduate Fellowship program under grant DE-FG02-97ER25308. M. Strano acknowledges support from the Disruptive and Sustainable Technology for Agricultural Precision (DiSTAP) IRG of the Singapore MIT Alliance for Science and Technology (SMART) program funded by the Singapore National Research Foundation. The authors would like to thank P. Doyle and N. Fakhri for helpful discussions.

6. Zhang, L.; Liang, J.; Huang, Y.; Ma, Y.; Wang, Y.; Chen, Y. Size-Controlled Synthesis of Graphene Oxide Sheets on a Large Scale Using Chemical Exfoliation. Carbon 2009, 47, 3365–3368. 7. Utpalendu, K.; Manika, P. Specific Surface Area and Pore-Size Distribution in Clays and Shales. Geophys. Prospect. 2013, 61, 341–362.

Supporting Information Available Details of the method, proof of convexity for MApNTA optimization, AFM data for gold nanoparticles, AFM data for carbon nanotubes, additional experiments with gold nanoparticles, UV-vis spectrum for carbon nanotube solution, and a MATLAB implementation of MApNTA.

8. Brant, J.; Lecoanet, H.; Wiesner, M. R. Aggregation and Deposition Characteristics of Fullerene Nanoparticles in Aqueous Systems. J. Nanopart. Res. 2005, 7, 545– 553. 9. Merkus, H. G. Particle Size Measurements: Fundamentals, Practice, Quality; Particle technology series v. 17; Springer: New York, 2009. 10. Scotti, A.; Liu, W.; Hyatt, J. S.; Herman, E. S.; Choi, H. S.; Kim, J. W.; Lyon, L. A.; Gasser, U.; FernandezNieves, A. The CONTIN Algorithm and Its Application to Determine the Size Distribution of Microgel Suspensions. J. Chem. Phys. 2015, 142, 234905.

References 1. Gaumet, M.; Vargas, A.; Gurny, R.; Delie, F. Nanoparticles for Drug Delivery: The Need for Precision in Reporting Particle Size Parameters. Eur. J. Pharm. Biopharm. 2008, 69, 1–9.

11. Chu, B.; Liu, T. Characterization of Nanoparticles by Scattering Techniques. J. Nanopart. Res. 2000, 2, 29–41.

2. Desai, M. P.; Labhasetwar, V.; Amidon, G. L.; Levy, R. J. Gastrointestinal Uptake of Biodegradable Microparticles: Effect of Particle Size. Pharm. Res. 1996, 13, 1838–1845.

12. Groetsch, C. W. The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind; Research notes in mathematics 105; Pitman Advanced Pub. Program: Boston, 1984.

3. Champion, J. A.; Walker, A.; Mitragotri, S. Role of Particle Size in Phagocytosis of Polymeric Microspheres. Pharm. Res. 2008, 25, 1815–1821.

13. Rieker, T.; Hanprasopwattana, A.; Datye, A.; Hubbard, P. Particle Size Distribution Inferred from Small-Angle X-Ray Scattering and Transmission Electron Microscopy. Langmuir 1999, 15, 638–641.

4. Narayanan, R.; El-Sayed, M. A. ShapeDependent Catalytic Activity of Platinum Nanoparticles in Colloidal Solution. Nano Lett. 2004, 4, 1343–1348.

ACS Paragon Plus Environment

19

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

14. Liedl, T.; Keller, S.; Simmel, F. C.; R¨adler, J. O.; Parak, W. J. Fluorescent Nanocrystals as Colloidal Probes in Complex Fluids Measured by Fluorescence Correlation Spectroscopy. Small 2005, 1, 997– 1003.

Page 20 of 23

Rod Hydrodynamics and Length Distributions of Single-Wall Carbon Nanotubes Using Analytical Ultracentrifugation. Langmuir 2014, 30, 4895–4904. 23. Kim, H.; Carney, R. P.; Reguera, J.; Ong, Q. K.; Liu, X.; Stellacci, F. Synthesis and Characterization of Janus Gold Nanoparticles. Adv. Mater. 2012, 24, 3857–3863.

15. Frisken, B. J. Revisiting the Method of Cumulants for the Analysis of Dynamic Light-Scattering Data. Appl. Opt. 2001, 40, 4087–4091.

24. Ralston, G. Introduction to Analytical Ultracentrifugation; Beckman Coulter, 1993.

16. Baalousha, M.; Stolpe, B.; Lead, J. R. Flow Field-Flow Fractionation for the Analysis and Characterization of Natural Colloids and Manufactured Nanoparticles in Environmental Systems: A Critical Review. J. Chromatogr. A 2011, 1218, 4078–4103.

25. Dieckmann, Y.; C¨olfen, H.; Hofmann, H.; Petri-Fink, A. Particle Size Distribution Measurements of Manganese-Doped ZnS Nanoparticles. Anal. Chem. 2009, 81, 3889–3895.

17. Giddings, J. Field-Flow Fractionation: Analysis of Macromolecular, Colloidal, and Particulate Materials. Science 1993, 260, 1456–1465.

26. Batchelor, G. K. Sedimentation in a Dilute Dispersion of Spheres. J. Fluid Mech. 1972, 52, 245–268. 27. Harding, S. E.; Berth, G.; Hartmann, J.; Jumel, K.; C¨olfen, H.; Christensen, B. E. Physicochemical Studies on Xylinan (Acetan). III. Hydrodynamic Characterization by Analytical Ultracentrifugation and Dynamic Light Scattering. Biopolymers 1998, 39, 729–736.

18. Zhu, X.; Mason, T. G. Nanoparticle Size Distributions Measured by Optical Adaptive-Deconvolution Passivated-Gel Electrophoresis. J. Colloid Interface Sci. 2014, 435, 67–74. 19. Schuck, P.; Perugini, M. A.; Gonzales, N. R.; Howlett, G. J.; Schubert, D. SizeDistribution Analysis of Proteins by Analytical Ultracentrifugation: Strategies and Application to Model Systems. Biophys. J. 2002, 82, 1096–1111.

28. Martin, J. E.; Wilcoxon, J. P.; Odinek, J.; Provencio, P. Control of the Interparticle Spacing in Gold Nanoparticle Superlattices. J. Phys. Chem. B 2000, 104, 9475– 9486.

20. Laue, T. M.; Stafford, W. F. Modern Applications of Analytical Ultracentrifugation. Annu. Rev. Biophys. Biomol. Struct. 1999, 28, 75–100.

29. Walker, J. G. Improved Nano-Particle Tracking Analysis. Meas. Sci. Technol. 2012, 23, 065605. 30. Persson, F.; Barkefors, I.; Elf, J. Single Molecule Methods with Applications in Living Cells. Curr. Opin. Biotechnol. 2013, 24, 737–744.

21. Nair, N.; Kim, W.-J.; Braatz, R. D.; Strano, M. S. Dynamics of SurfactantSuspended Single-Walled Carbon Nanotubes in a Centrifugal Field. Langmuir 2008, 24, 1790–1795.

31. Chenouard, N.; Smal, I.; de Chaumont, F.; Maˇska, M.; Sbalzarini, I. F.; Gong, Y.; Cardinale, J.; Carthel, C.; Coraluppi, S.; Winter, M.; Cohen, A. R.; Godinez, W. J.;

22. Silvera Batista, C. A.; Zheng, M.; Khripin, C. Y.; Tu, X.; Fagan, J. A.

ACS Paragon Plus Environment

20

Page 21 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Rohr, K.; Kalaidzidis, Y.; Liang, L.; Duncan, J.; Shen, H.; Xu, Y.; Magnusson, K. E. G.; Jald´en, J. et al. Objective Comparison of Particle Tracking Methods. Nat. Methods 2014, 11, 281.

41. Michalet, X.; Berglund, A. J. Optimal Diffusion Coefficient Estimation in SingleParticle Tracking. Phys. Rev. E 2012, 85, 061916. 42. Vestergaard, C. L.; Blainey, P. C.; Flyvbjerg, H. Optimal Estimation of Diffusion Coefficients from Single-Particle Trajectories. Phys. Rev. E 2014, 89, 022726.

32. Filipe, V.; Hawe, A.; Jiskoot, W. Critical Evaluation of Nanoparticle Tracking Analysis (NTA) by NanoSight for the Measurement of Nanoparticles and Protein Aggregates. Pharm. Res. 2010, 27, 796–810.

43. Boyer, D.; Dean, D. S.; Mej´ıaMonasterio, C.; Oshanin, G. Optimal Estimates of the Diffusion Coefficient of a Single Brownian Trajectory. Phys. Rev. E 2012, 85, 031136.

33. Kestens, V.; Bozatzidis, V.; De Temmerman, P.-J.; Ramaye, Y.; Roebben, G. Validation of a Particle Tracking Analysis Method for the Size Determination of Nano- and Microparticles. J. Nanopart. Res. 2017, 19 .

44. T¨ urkcan, S.; Alexandrou, A.; Masson, J.B. A Bayesian Inference Scheme to Extract Diffusivity and Potential Fields from Confined Single-Molecule Trajectories. Biophys. J. 2012, 102, 2288–2298.

34. Crocker, J. C.; Grier, D. G. Methods of Digital Video Microscopy for Colloidal Studies. J. Colloid Interface Sci. 1996, 179, 298– 310.

45. Dimiduk, T. G.; Manoharan, V. N. Bayesian Approach to Analyzing Holograms of Colloidal Particles. Opt. Express 2016, 24, 24045–24060.

35. Allan, D.; Caswell, T.; Keim, N.; van der Wel, C. Trackpy v0.3.2. 2016; http://github.com/soft-matter/trackpy.

46. Monnier, N.; Guo, S.-M.; Mori, M.; He, J.; L´en´art, P.; Bathe, M. Bayesian Approach to MSD-Based Analysis of Particle Motion in Live Cells. Biophys. J. 2012, 103, 616– 626.

¨ 36. Einstein, A. Uber Die von Der Molekularkinetischen Theorie Der W¨arme Geforderte Bewegung von in Ruhenden Fl¨ ussigkeiten Suspendierten Teilchen. Ann. Phys. 1905, 322, 549–560.

47. Yoon, J. W.; Bruckbauer, A.; Fitzgerald, W. J.; Klenerman, D. Bayesian Inference for Improved Single Molecule Fluorescence Tracking. Biophys. J. 2008, 94, 4932–4947.

37. Savin, T.; Doyle, P. S. Static and Dynamic Errors in Particle Tracking Microrheology. Biophys. J. 2005, 88, 623–638. 38. Savin, T.; Spicer, P. T.; Doyle, P. S. A Rational Approach to Noise Discrimination in Video Microscopy Particle Tracking. Appl. Phys. Lett. 2008, 93, 024102. 39. Vestergaard, C. L. Optimizing Experimental Parameters for Tracking of Diffusing Particles. Phys. Rev. E 2016, 94, 022401.

48. Matsuura, Y.; Ouchi, N.; Nakamura, A.; Kato, H. Determination of an Accurate Size Distribution of Nanoparticles Using Particle Tracking Analysis Corrected for the Adverse Effect of Random Brownian Motion. Phys. Chem. Chem. Phys. 2018, 20, 17839–17846.

40. Hartman, J.; Kirby, B. Decorrelation Correction for Nanoparticle Tracking Analysis of Dilute Polydisperse Suspensions in Bulk Flow. Phys. Rev. E 2017, 95, 033305.

49. Naiim, M.; Boualem, A.; Ferre, C.; Jabloun, M.; Jalocha, A.; Ravier, P. Multiangle Dynamic Light Scattering for the Improvement of Multimodal Particle Size Dis-

ACS Paragon Plus Environment

21

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tribution Measurements. Soft Matter 2014, 11, 28–32.

Page 22 of 23

60. Batchelor, G. K. Slender-Body Theory for Particles of Arbitrary Cross-Section in Stokes Flow. J. Fluid Mech. 1970, 44, 419– 440.

50. Wagner, T.; Lipinski, H.-G.; Wiemann, M. Dark Field Nanoparticle Tracking Analysis for Size Characterization of Plasmonic and Non-Plasmonic Particles. J. Nanopart. Res. 2014, 16, 2419.

61. Streit, J. K.; Bachilo, S. M.; Naumov, A. V.; Khripin, C.; Zheng, M.; Weisman, R. B. Measuring Single-Walled Carbon Nanotube Length Distributions from Diffusional Trajectories. ACS Nano 2012, 6, 8424–8431.

51. Michalet, X. Mean Square Displacement Analysis of Single-Particle Trajectories with Localization Error: Brownian Motion in an Isotropic Medium. Phys. Rev. E 2010, 82, 041914.

62. Wang, S.; Liang, Z.; Wang, B.; Zhang, C. Statistical Characterization of Single-Wall Carbon Nanotube Length Distribution. Nanotechnology 2006, 17, 634.

52. Qian, H.; Sheetz, M. P.; Elson, E. L. Single Particle Tracking. Analysis of Diffusion and Flow in Two-Dimensional Systems. Biophys. J. 1991, 60, 910–921.

63. Fakhri, N.; Tsyboulski, D. A.; Cognet, L.; Weisman, R. B.; Pasquali, M. DiameterDependent Bending Dynamics of SingleWalled Carbon Nanotubes in Liquids. PNAS 2009, 106, 14219–14223.

53. Schr¨odinger, E. Zur Theorie Der Fall- Und Steigversuche an Teilchen Mit Brownscher Bewegung. Phys. Z. 1915, 16, 289.

64. Bisker, G.; Dong, J.; Park, H. D.; Iverson, N. M.; Ahn, J.; Nelson, J. T.; Landry, M. P.; Kruss, S.; Strano, M. S. Protein-Targeted Corona Phase Molecular Recognition. Nat. Commun. 2016, 7, 10241.

54. Tweedie, M. C. K. Inverse Statistical Variates. Nature 1945, 155, 453. 55. Gelman, A.; Carlin, J. B.; Stern, H. S.; Dunson, D. B.; Vehtari, A.; Rubin, D. B. Bayesian Data Analysis, 3rd ed.; Chapman & Hall/CRC texts in statistical science; CRC Press: Boca Raton, 2014.

65. Li, H.; Zhou, L. Visualizing Helical Wrapping of Semiconducting Single-Walled Carbon Nanotubes by Surfactants and Their Impacts on Electronic Properties. ChemistrySelect 2016, 1, 3569–3572.

56. Boyd, S. P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK ; New York, 2004. 57. Zhang, P. Model Selection Via Multifold Cross Validation. Ann. Stat. 1993, 21, 299– 313. 58. Xu, Q.-S.; Liang, Y.-Z.; Du, Y.-P. Monte Carlo Cross-Validation for Selecting a Model and Estimating the Prediction Error in Multivariate Calibration. J. Chemom. 2004, 18, 112–120. 59. Hastie, T.; Tibshirani, R.; Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer series in statistics; Springer: New York, NY, 2009.

ACS Paragon Plus Environment

22

Page 23 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Graphical TOC Entry

ACS Paragon Plus Environment

23