Quantification of Analytical Recovery in Particle and Microorganism

Consequently, the analytical recovery of enumeration methods needs to be ...... M. C.; Payment , P.; Prévost , M. Optimization of the detection of th...
0 downloads 0 Views 487KB Size
Environ. Sci. Technol. 2010, 44, 1705–1712

Quantification of Analytical Recovery in Particle and Microorganism Enumeration Methods PHILIP J. SCHMIDT,† M O N I C A B . E M E L K O , * ,† A N D PARK M. REILLY‡ Department of Civil and Environmental Engineering and Department of Chemical Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1

Received July 24, 2009. Revised manuscript received December 21, 2009. Accepted January 6, 2010.

Enumeration-based methods that are often used to quantify microorganisms and microscopic discrete particles in aqueous systems may include losses during sample processing or errors in counting. Analytical recovery (the capacity of the analyst to successfully count each microorganism or particle of interest in a sample using a specific enumeration method) is frequently assessed by enumerating samples that are seeded with known quantities of the microorganisms or particles. Probabilistic models were developed to account for the impacts of seeding and analytical error on recovery data, and probability intervals, obtained by Monte Carlo simulation, were used to evaluate recovery experiment design (i.e., seeding method, number of seeded particles, and number of samples). The method of moments, maximum likelihood estimation, and credible intervals were used to statistically analyze recovery experiment results. Low or uncertain numbers of seeded particles were found to result in variability in recovery data that was not due to analytical recovery, and should be avoided if possible. This additional variability was found to reduce the reproducibility of experimental results and necessitated the use of statistical analysis techniques, such as maximum likelihood estimation using probabilistic models that account for the impacts of sampling and analytical error in recovery data.

Introduction Many of the methods that are routinely used to quantify microorganisms and microscopic discrete particles in aqueous systems are based on enumeration. The concentration of a specific type of particle or microorganism is often estimated by counting the number present in a sample of specified volume. Enumeration methods are commonly used in the quantification of pathogenic microorganisms and associated indicators in drinking water sources and distribution systems (1, 2), in performance demonstrations for treatment and disinfection technologies (3-6), in the evaluation of microbial fate and transport in saturated porous media (7), in health-related assays (8), in testing food and food-processing equipment (9), and in testing for potential biological weapons (10). Enumeration methods however, are * Corresponding author phone: (519) 888-4567 x. 32208; fax: (519) 888-4349; e-mail: [email protected]. † Civil & Environmental Engineering. ‡ Chemical Engineering. 10.1021/es902237f

 2010 American Chemical Society

Published on Web 01/28/2010

inherently variable because of random errors during sample collection, sample processing, and counting (11-13). Random losses during sample processing are often unavoidable; particularly in methods that depend on concentration or purification steps (e.g., refs 8 and 14) to isolate particles of interest from large sample volumes or background debris. Counting errors include random undercounting or overcounting of enumerable particles and identification errors, which can collectively cause repeated counts of a processed sample to vary. Losses and counting errors are described by analytical recovery: the capacity of the analyst to successfully enumerate each microorganism or particle of interest in a sample using a specific enumeration method. Consideration of analytical recovery is essential when analyzing enumeration data because losses or counting errors can result in biased concentration estimates or false-negative samples (i.e., poor sensitivity). Furthermore, inconsistent analytical recovery contributes to the variability in enumeration data, which reduces the precision of concentration estimates and makes it more difficult to show a statistically significant difference in concentration (relative to a target concentration, regulation, or another water source). For these reasons, efforts to develop enumeration methods with better (e.g., higher and less variable) analytical recovery continue and detailed reporting and correct interpretation of data obtained using imperfect enumeration methods are essential (15). Consequently, the analytical recovery of enumeration methods needs to be quantified and properly incorporated into statistical analysis of enumeration data. The analytical recovery of a particular enumeration method, type of particle (e.g., microorganism or uniform, discrete particle), type of sample (e.g., set of water quality attributes), and laboratory is measured by enumerating samples into which known quantities of the particles have been seeded. The fraction of the seeded particles that is observed in each sample is an estimate of analytical recovery. Analysis of seeded samples is important when an enumeration method has relatively low or highly variable analytical recovery, especially if the method is used to obtain data that have important regulatory or health-protection implications. For example, Method 1623 (for enumeration of Cryptosporidium oocysts and Giardia cysts) requires an initial demonstration of laboratory capability and ongoing performance evaluation that includes matrix spikes and additional recovery tests (14). The initial precision and recovery test consists of four reagent water samples seeded with oocysts and cysts (100-500 of each), and the mean and relative standard deviation of the recovery estimates must meet performance criteria (e.g., mean recovery between 24 and 100% and relative standard deviation up to 55% for Cryptosporidium oocysts). Method 1623 also implements a formal performance-based measurement system for validating modifications in methodology at individual or multiple laboratories (14, 16). This paper focuses on experiments that are used to quantify the mean and standard deviation of analytical recovery when the sample composition and processing methodology are sufficiently controlled to regard the recovery data as replicates. In addition to laboratory validation requirements (e.g., ref 14), these controlled recovery experiments are commonly used across a wide range of application areas to compare different enumeration methodologies or to evaluate how variation in sample attributes affects method performance (e.g., refs 8, 17-21). Furthermore, recovery experiments that are representative, in both sample attributes and methodology, of independently enumerated environmental or laboratory samples can be used to estimate the VOL. 44, NO. 5, 2010 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1705

analytical recovery of those samples (11). In these recovery experiments, however, the quantity of seeded particles and the seeding method (i.e., precisely known numbers of particles or an aliquot from a well-quantified stock suspension) will affect the variability of replicate recovery data. This may affect the reproducibility of experimental results (e.g., estimated mean and standard deviation of analytical recovery) or influence method validation and comparison conclusions. In contrast, some other research has focused on incorporating nonreplicate recovery data into the statistical analysis of environmental enumeration data. Crainiceanu, et al. (22) used nonreplicate recovery data (samples with diverse characteristics were collected from many sources and analyzed by different laboratories) to analyze nationwide enumeration data associated with the U.S. Information Collection Rule. Petterson, et al. (23) added prestained oocysts to monitoring samples from a single water supply (i.e., similarly processed samples with potentially variable water quality characteristics) to yield sample-specific recovery estimates in their analysis of the temporal variability of indigenous oocyst concentration. The more rigorous statistical approaches presented in those applications are appropriate for evaluating analytical recovery in environmental enumeration data when recovery may vary substantially due to differences in sample or methodological characteristics, but are unnecessarily complex for the analysis of replicate recovery data. Here, we (1) describe the impacts of random errors in sample seeding, processing, and enumeration on recovery data using simple probabilistic models, (2) present Monte Carlo simulation of recovery data as a simple approach to quantitatively evaluate the effects of the seeding method, the number of seeded particles, and the number of samples on the sample mean and sample standard deviation of replicate recovery estimates, and (3) show that statistical analysis of recovery data using techniques that do not adequately account for the impacts of seeding and analytical error can yield biased estimates of the variability of analytical recovery. Three simple probabilistic models are developed, and these are implemented using Monte Carlo simulations to evaluate and compare recovery experiment designs. These analyses show that more variable (i.e., less reproducible) estimates of the mean and standard deviation of analytical recovery are obtained when low or uncertain numbers of seeded particles are used and that the sample standard deviation of recovery estimates (i.e., the fractions of seeded particles that are observed) is frequently a biased estimate of the variability in analytical recovery. The analyses also show that increasing the number of seeded particles is more beneficial when using enumeration methods with lower variability and that reducing the uncertainty in the number of seeded particles is more beneficial when the mean recovery is higher. Statistical analyses that are based on the recovery estimates (e.g., confidence intervals, hypothesis tests, parameter estimation) can be biased if these estimates vary more than analytical recovery itself. Therefore, maximum likelihood estimation using the presented probabilistic models is recommended to estimate parameters for the distribution of analytical recovery. The resulting likelihood functions can also be incorporated into Bayesian analyses.

Methods Probabilistic Modeling of Random Errors in the Enumeration of Seeded Samples. The analytical recovery of a method that is used to enumerate a particular type of discrete particle or microorganism is evaluated by enumerating samples that are seeded with known quantities of the particles. This process, however, is susceptible to random errors (e.g., seeding and analytical error) that cause the fraction of seeded 1706

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 44, NO. 5, 2010

particles that are observed to be an imperfect measurement of analytical recovery. Furthermore, recovery may vary randomly from sample to sample (nonconstant analytical recovery). These random errors are described herein using distributions that are incorporated into three alternative probabilistic models: the beta-binomial, beta-Poisson, and negative binomial models. Seeding error arises when the number of seeded particles in a sample is not precisely known. The beta-binomial model assumes that there is no seeding error because the number of seeded particles is precisely known. Relatively precise numbers of seeded particles may be achieved by preparing sorted aliquots with flow cytometry (e.g., ref 24). In contrast, the other models assume that each sample is seeded with a specific volume of stock suspension for which the concentration is precisely known (i.e., the expected concentration of each seeded sample is known, but not the exact number of particles). If the particles are distributed randomly throughout a well-mixed stock suspension (i.e., not clumped), and the aliquots of stock suspension that are used to seed the samples are independent (i.e., the stock concentration remains constant and precisely known), then the number of particles seeded into each sample varies according to a Poisson distribution with mean equal to the product of stock concentration and volume of stock added. Clumping is an important phenomenon in sample collection or seeding and in describing the environmental occurrence of microorganisms or particles, but is irrelevant in recovery studies if samples are well homogenized prior to processing or if clumping does not impact recovery. Analytical error describes the difference between the quantities of observed and seeded particles in a sample because of imperfect analytical recovery. If analytical recovery is less than 100% and each seeded particle in the sample is assumed to have an equal probability (equal to analytical recovery) of yielding an observation, then the observation of seeded particles is a Bernoulli process and the number of particles observed in the sample is binomially distributed (11). This assumption is made in both the beta-binomial and beta-Poisson models and has been widely used in other models that address analytical recovery (11-13, 15, 23, 25, 26). If analytical recovery can exceed 100% (e.g., due to counting errors), then it can not be regarded as a probability and the observation of seeded particles is not a Bernoulli process. In this case, recovery can be regarded as the rate of observations per particle present in the sample. In the negative binomial model, it is assumed that the concentration of observable seeded particles in each sample is the product of the actual concentration of seeded particles in the sample and analytical recovery. Therefore, the seeding and analytical error in that model can be jointly modeled by a Poisson distribution with mean equal to the product of stock concentration, volume of stock added, and analytical recovery. No model is presented herein to describe analytical error when recovery exceeds 100% and the number of seeded particles is precisely known. Nonconstant analytical recovery is used herein to describe the sample-to-sample variability in recovery under controlled experimental conditions. It is assumed that if several replicate seeded samples are enumerated using identical methodology, then recovery can vary randomly (within a quantifiable distribution) because of uncontrollable random differences in the samples or methodology. Both the beta-binomial and beta-Poisson models assume that this distribution is a beta distribution. The beta distribution has often been used to model variability in analytical recovery (11-13, 22, 23, 26, 27) because it is confined to the interval [0,1], is practically unimodal (i.e., polymodal beta distributions are not of practical relevance), and is the conjugate of the binomial distribution (which aids mathematical tractability of the models). The negative binomial model uses a gamma

distribution to describe nonconstant analytical recovery because it assumes that recovery can exceed 100% (13). Each of the probabilistic models presented herein describes the variability in the number of seeded particles that are observed (x) due to random errors when sample attributes and methodology are controlled. The beta-binomial model assumes that the number of seeded particles (n) is precisely known, that the analytical recovery (p) varies randomly according to a beta distribution with shape parameters a,b, and that analytical error is binomially distributed. The fraction of seeded particles that are observed (x/n) is an unbiased estimate of each sample’s analytical recovery. The betaPoisson model is equivalent to the beta-binomial model, except that it assumes that the number of seeded particles is Poisson-distributed with mean λ (the product of stock concentration and volume of stock added to the sample). The model contains Poisson, binomial, and beta distributions (11, 13), but can be simplified by consolidating the binomial and Poisson distributions into a single Poisson distribution with mean λp (25). Analytical recovery cannot exceed 100% in the beta-binomial or beta-Poisson models, although the recovery estimate (x/λ) can exceed 100% in the beta-Poisson model due to seeding error. Like the beta-Poisson model, the negative binomial model assumes that seeding and analytical error are collectively modeled by a Poisson distribution with mean λp, but nonconstant analytical recovery is modeled by a gamma distribution (which enables recovery to exceed 100%) with shape parameters (R,β). The Poisson and gamma distributions can be consolidated into a single negative binomial distribution (13, 28, 29). The joint distributions for each of the three models are presented in eqs 1-3. Beta-binomial: n! Γ(a + b) a-1 px(1 - p)n-x p (1 - p)b-1 f(x, p) ) x!(n - x)! Γ(a)Γ(b) (1)

[

][

Beta-Poisson: f(x, p) )

[

]

][

e-λp(λp)x Γ(a + b) a-1 p (1 - p)b-1 x! Γ(a)Γ(b)

Negative binomial: f(x, p) )

[

e-λp(λp)x x!

]

][

p 1 pR-1e- /β β Γ(R)

R

]

(2)

(3)

Monte Carlo Simulation of Recovery Experiments. Recovery experiments are often conducted to quantify the mean and standard deviation of analytical recovery. As described above, however, the data are affected by sources of random error other than just variability in recovery. The seeding method (e.g., known number or concentration of seeded particles), the quantity of seeded particles, and the number of samples are important experimental design factors that affect the accuracy and precision of estimates of the mean and standard deviation of analytical recovery. Probability intervals for the sample mean and sample standard deviation of the recovery estimates (x/n or x/λ) are presented herein as an approach to quantitatively evaluate experimental design. A 95% probability interval for a statistic represents the range of values within which it would be expected to fall 95% of the time if the whole experiment were repeated many times. Experimental designs with relatively wide probability intervals indicate relatively low reproducibility (e.g., the measured mean recovery may randomly be quite different if the whole experiment were repeated). The probability intervals for each particular recovery experiment design presented herein were calculated using Monte Carlo simulation and either the beta-binomial or betaPoisson model. The selected analyses that are illustrated in this paper used parameters (a,b) ) (287.08, 94.76), which

correspond to (µ,σ) ) (0.7518, 0.0221), and are based on the pooled recovery data presented in Supporting Information Table S1. A range of other parameters for the distribution of analytical recovery was also used to explore the results when the mean and variance of the recovery distribution were changed. Recovery data (i.e., number of observed particles in each sample) were simulated 10,000 times for each considered recovery experiment design (i.e., number of samples, and quantity of seeded particles per sample). These data were simulated using eq 1 or eq 2 and pseudorandom number generation algorithms for the respective distributions. For example, the recovery constant p was generated using a beta distribution and then substituted into either a binomial or Poisson distribution (along with the quantity of seeded particles) to simulate the number of particles that were observed. The sample mean and sample standard deviation of these recovery estimates was calculated for each of the simulated experiments, and the 2.5% and 97.5% percentiles of each statistic were used to estimate the associated 95% probability intervals. Probability intervals were calculated for many different quantities of seeded particles (between 10 and 1000) and many numbers of samples (between 3 and 20). Statistical Analysis of Replicate Recovery Data. Conventional statistical analysis of replicate recovery data often assumes that the recovery estimates (x/n or x/λ) are direct measurements of analytical recovery. Analyses based on these recovery estimates (e.g., reporting the sample mean and sample standard deviation, calculating confidence intervals or testing hypotheses, or fitting distributions directly to the recovery estimates) can be biased if the recovery estimates vary more than analytical recovery itself because of the impacts of seeding and analytical error. Several approaches to fit beta or gamma distributions to recovery data are evaluated herein: the method of moments and maximum likelihood estimation based on the recovery estimates, and maximum likelihood estimation using the presented probabilistic models to account for seeding and analytical error. In the method of moments, the sample mean and sample standard deviation of the recovery estimates (xjpˆ and spˆ, where pˆ ) x/n or pˆ ) x/λ) were equated to the mean and variance of the beta or gamma distribution. The resulting estimators for the beta (11) and gamma distributions are presented in eqs 4 and 5, respectively. aˆ ) xj pˆ

(

xj pˆ(1 - xj pˆ) sp2ˆ

R ˆ )

)

bˆ )

βˆ )

sp2ˆ xj pˆ

-1 ,

xj p2ˆ

, 2

spˆ

(1 - xj pˆ) aˆ xj pˆ

(4)

(5)

Maximum likelihood estimation yields distribution parameters for which the probability of observing the set of experimental data (conditional upon these parameters) is maximized. The likelihood function, assuming that the data are independent, is the product of the conditional probability density of each of the data written as a function of the unknown parameters. The likelihood functions for the beta and gamma distributions fitted directly to the recovery estimates {pˆi|i ) 1, 2, ..., r} are presented in eqs 6 and 7, respectively. The maximum likelihood estimates for the feasible region (a > 0,b > 0) or (R > 0,β > 0) were determined numerically because the derivatives can not be solved explicitly. L(a, b) )

( )( r

+ b) pˆ ( Γ(a Γ(a)Γ(b) ) ∏ r

i)1

a-1

i

r

)

∏ (1 - pˆ ) i

i)1

b-1

VOL. 44, NO. 5, 2010 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

(6)

9

1707

(

1 L(R, β) ) R β Γ(R)

)( r

∏ pˆ

)

r

R-1

r

i

i)1

e

-



pˆi/β

(7)

i)1

The likelihood functions for the beta-binomial (26), betaPoisson, and negative binomial models discussed herein are shown in eqs 8-10 (see derivations in the Supporting Information). These functions are fitted to the observed numbers of seeded particles rather than to the recovery estimates, and will account for the impacts of seeding and analytical error. Beta-binomial: L(a, b) ∝

(

Γ(a + b) Γ(a)Γ(b)

r

)∏ r

i)1

Γ(xi + a)Γ(ni - xi + b) Γ(ni + a + b)

(8)

Beta-Poisson: L(a, b) ∝

(

Γ(a + b) Γ(a)Γ(b)

r

)∏∫ e r

i)1

1 -λ p x +a-1 i i

0

p

(1 - p)b-1dp

(9)

Negative binomial: L(R, β) ∝

1 (Γ(R))r

r

Γ(xi + R)βxi

∏ (λ β + 1) i)1

i

xi+R

(10)

Regardless of the method used to estimate the recovery distribution parameters, the estimates are uncertain (e.g., other feasible parameters will simply have a lower likelihood than the maximum likelihood estimates). One approach to quantify uncertainty in the parameter estimates is to apply Bayes’ theorem to the appropriate likelihood function (23). If a uniform, improper prior is used, then the posterior distribution (e.g., for a,b) is proportional to the likelihood function. Credible intervals based upon the beta-binomial model were approximated numerically by evaluating the associated likelihood function across an evenly spaced grid outside which the likelihood was very low. The calculated likelihood values for all the points in the grid were summed, and each value was divided by the sum. These normalized likelihood values were then sorted, and the R% credible interval was approximated as a contour that contained the highest normalized likelihood values with a sum of at least R%.

Results and Discussion Probabilistic Model Assumptions and Alternatives. Three probabilistic models are presented herein to account for the impacts of seeding and analytical error on recovery data and to describe variability in analytical recovery. These assume that analytical recovery varies randomly from sample to sample according to either a beta or gamma distribution, even under controlled experimental conditions. They also assume that the enumeration method (as implemented by the analyst) is in statistical control, that analytical recovery is independent of the quantity of seeded particles, that counting errors are random and can be implicitly included in analytical recovery, and that the entire seeded sample is enumerated. “An experimental procedure is said to be in a state of statistical control when the observations to which it gives rise, under what are assumed to be ‘essentially the same conditions’, fluctuate in a random manner and are free from trends and non-random shifts in magnitude” (30). Any probabilistic model must assume statistical control because nonrandom errors (e.g., in seeding, processing, or enumerating samples, or in controlling experimental factors that affect analytical recovery) can not be modeled. The models presented herein assume that the parameters of the distribution for nonconstant analytical recovery are independent 1708

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 44, NO. 5, 2010

of the seed dose. If seed dose affects analytical recovery, then environmentally relevant quantities of particles should be used in recovery experiments and the results are only applicable for samples with similar numbers of particles. Counting errors include undercounting (failure to count observable particles), overcounting (double-counting observable particles), and false-negative and false-positive identification errors (13). If counting errors are small, random variations among repeated counts of a single prepared sample, and not substantial nonrandom mistakes, then they can be regarded as a component of analytical recovery. Falsepositive observations should generally be regarded as nonrandom errors because they may not be proportional to the number of particles present. Enumeration of only a portion of a sample (e.g., subsampling between sample-processing steps) is common in some methods. Partial sample analysis has been included elsewhere in the statistical analysis of enumeration and recovery data (15, 22, 25), and can be incorporated into the probabilistic models presented herein by adding a parameter to the analytical error distributions (see Supporting Information). Many other statistical models have addressed analytical recovery (11-13, 15, 22, 23, 25-27), but very few of these clearly addressed random errors in recovery data or rigorous estimation of recovery parameters (22, 23, 26), and none addressed design of recovery experiments. Crainiceanu, et al. (22) modeled nonreplicate recovery data using linear regression and covariate data as well as a probabilistic model for seeding and analytical error. Petterson, et al. (23) considered sample-specific recovery estimates (by concurrently counting uniquely labeled seeded oocysts and indigenous oocysts) in environmental monitoring data. In the absence of these paired recovery estimates, Petterson, et al. (23) used a beta-binomial model to describe variability in nonreplicate recovery data, and used a Bayesian approach to evaluate uncertainty in recovery distribution parameters. These alternative models are more clearly contrasted with the models presented herein in the Supporting Information. Further modification of the presented models, or use of existing alternatives, may be required for some types of recovery data; especially nonreplicate recovery data or experiments to evaluate impacts of potential covariates upon recovery. Design of Experiments to Quantify Analytical Recovery. The seeding method (known number or concentration of seeded particles) and quantity of seeded particles affect the variability of recovery data, and thereby impact the reproducibility of recovery experiment results. Probability intervals for the sample mean and sample standard deviation of recovery estimates (x/n or x/λ), obtained through Monte Carlo simulation, are proposed as an approach to quantitatively evaluate experimental designs. This approach can easily be implemented as an experimental design analysis tool by using beta or gamma distribution parameters that are estimated from preliminary results or similar methods. Herein, a wide range of hypothetical parameters was considered to more generally investigate the effects of the number of seeded particles, seeding method, and number of seeded samples upon recovery experiment results. Figure 1 shows 95% probability intervals for the sample mean and sample standard deviation of the recovery estimates when the beta-binomial model is used with parameters (a,b) ) (287.08, 94.76), known numbers of seeded particles ranging from 10 to 1000, and numbers of samples ranging from 3 to 20. The figure shows that larger numbers of seeded particles yielded narrower probability intervals. This is because the impact of analytical error on recovery estimates is reduced as samples are seeded with more particles (i.e., the fraction of seeded particles that are observed becomes a more accurate estimate of analytical recovery).

FIGURE 1. Variability in the sample mean and sample standard deviation of recovery estimates with the beta-binomial (known-spike) recovery model, (a,b) ) (287.08, 94.76): (a) variability in the sample mean, (b) variability in the sample standard deviation with 10-50 seeded particles, (c) variability in the sample standard deviation with 100-1000 seeded particles. The nearly overlapping probability intervals for 500 and 1000 seeded particles in Figure 1a and c, however, show that increasing the number of seeded particles becomes inconsequential when the number of seeded particles is already high. Figure 1b and c show that the probability intervals for sample standard deviation of the recovery estimates do not include the true value of standard deviation of analytical recovery (from the beta distribution) unless the number of seeded particles is more than 100. Not surprisingly, the probability intervals all narrow as more samples are processed (e.g., the standard error of the sample mean is inversely proportional to the square root of the number of data). Relative to an experiment with only three replicate samples,

the width of the probability intervals can be decreased by more than 20% with five samples and nearly 35% with seven samples. The effect of uncertainty in the number of seeded particles was investigated by comparing the Figure 1 analyses to the equivalent experimental designs when the beta-Poisson model is used. Figure 2 shows probability intervals based on the beta-binomial and beta-Poisson models when the (expected) number of seeded particles is 50. The figure shows that the width of the probability intervals increases when the number of particles is not precisely known. Furthermore, Figure 2b shows that using a Poisson seeding process instead of known numbers of seeded particles further increases the VOL. 44, NO. 5, 2010 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1709

FIGURE 2. Comparison of variability in the sample mean and sample standard deviation of recovery estimates with beta-binomial and beta-Poisson recovery models, (a,b) ) (287.08, 94.76): (a) variability in the sample mean, (b) variability in the sample standard deviation. sample standard deviation of the recovery estimates. In this scenario, using any less than 1000 seeded particles per sample would cause the 95% probability interval for standard deviation of recovery estimates to not contain the true variance of recovery (see Supporting Information Figure S1). Analyses with different beta parameters have also been completed (see Supporting Information), with known and Poisson seeding, and yielded results similar to those shown in Figures 1 and 2. In general, the variability in the sample mean and sample standard deviation of the recovery estimates can be reduced by using larger numbers of seeded particles (if analytical recovery is independent of seed dose), by using precisely known numbers of seeded particles, or by increasing the number of samples. Due to the impacts of seeding and analytical error, the sample standard deviation of the recovery estimates was generally observed to exceed the true standard deviation of analytical recovery. In each scenario considered, there was a quantity of seeded particles (known or Poisson-distributed) beyond which the benefits of further increases were negligible. These transitions occurred at lower numbers of seeded particles for methods with more variable recovery. The observed difference in probability intervals between precisely known and Poissondistributed numbers of seeded particles was greater for methods with higher analytical recovery. The presented Monte Carlo simulation technique can be a valuable tool in recovery experiment design because it enables case-specific evaluation of (1) the benefits of using larger or precisely known numbers of seeded particles or more recovery samples, and (2) the extent to which the sample standard 1710

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 44, NO. 5, 2010

deviation of the recovery estimates may be a biased estimate of the true variability in analytical recovery. Quantification of Variability in Analytical Recovery. A reliable estimate of mean analytical recovery is essential because it is needed to calculate unbiased estimates of particle concentration (25). The variability of analytical recovery, even among replicate samples, is also very useful because it affects uncertainty in concentration estimates and detection of significant differences in recovery between different experimental conditions (e.g., sample characteristics, methodology). The standard deviation of the recovery estimates, however, has been shown to typically be a biased estimate of variability in analytical recovery because of the effects of seeding and analytical error. Consequently, several approaches to estimate parameters for the beta or gamma distributions that describe variability in analytical recovery were compared. These included the method of moments and maximum likelihood estimation based solely on the recovery estimates (x/n or x/λ), and maximum likelihood estimation using the probabilistic models presented herein. Table 1 shows the parameter estimates and associated mean and standard deviation of analytical recovery, based on the example data in Supporting Information Table S1, for each considered model and parameter estimation method. These comparisons are for illustrative purposes only; the choice of parameter estimation technique should depend on assumptions about the errors in the recovery data (e.g., known or Poisson-distributed seeding, beta- or gamma-distributed recovery).

TABLE 1. Summary of Parameter Estimates Using Various Models and Estimation Techniques parameter estimation technique

distribution parameters

method of moments-beta maximum likelihood-“beta-only” maximum likelihood-beta-binomial maximum likelihood-beta-Poisson method of moments-gamma maximum likelihood-“gamma-only” maximum likelihood-negative binomial a

beta distribution parameters (a,b).

b

mean recovery

std. dev. of recovery

0.751827 0.751778 0.751795 0.751840 0.751827 0.751827 0.751827

0.043535 0.040302 0.036503 0.022076 0.043535 0.041454 0.021384

a

(73.26, 24.18) (85.62, 28.27)a (104.53, 34.51)a (287.08, 94.76)a (298.24, 0.002521)b (328.93, 0.002286)b (1236.16, 0.000608)b

gamma distribution parameters (R,β).

FIGURE 3. Credible intervals for analysis of Table S1 data with the beta-binomial model. Each of the parameter estimation methods listed in Table 1 yields similar values for mean recovery, but there are considerable differences in the standard deviations. The method of moments and “beta-only” or “gamma-only” maximum likelihood approaches yield the highest values of standard deviation because they assume that all of the variability between recovery estimates is due to variability in recovery. The beta-binomial maximum likelihood approach yields a lower value of standard deviation because some of the variability in the recovery estimates is attributed to analytical error. The beta-Poisson and negative binomial maximum likelihood approaches yield the lowest standard deviations because they account for the effects of seeding and analytical error in the recovery estimates. These differences may be quite small, however, if the impacts of seeding and analytical error on recovery experiment results are small. Maximum likelihood approaches based on the presented probabilistic models are recommended to estimate nonconstant analytical recovery distribution parameters because they better represent the additional sources of variability in the data than the other approaches. Another benefit of maximum likelihood estimation based on the probabilistic models is that it can be used when samples have different numbers of seeded particles. In contrast, the method of moments and “beta-only” or “gamma-only” maximum likelihood estimation give each recovery estimate equal weight, even though samples with larger numbers of seeded particles yield more precise recovery estimates. Numerical optimization of each likelihood function presented herein is sufficiently simple that it can be done in a spreadsheet. BetaPoisson maximum likelihood is the most complicated because evaluation of the likelihood function requires numerical integration.

There is uncertainty in point estimates of recovery distribution parameters, even if statistically rigorous techniques are used. Credible intervals representing the uncertainty in beta distribution parameter estimates (based on the beta-binomial model, a uniform prior, and pooled Supporting Information Table S1 data), are presented in Figure 3. In this figure, the slope of the long axis of the credible regions relates to the mean of the beta distribution (i.e., mean analytical recovery is a/(a + b)) and proximity to the origin relates to variance (i.e., low values of a,b correspond to highly variable recovery). Parameter estimates from beta-binomial and “beta-only” maximum likelihood and from the method of moments are shown to illustrate where the parameters fall in the credible regions. The outer limits of the 90% credible region correspond to mean recovery between 70.4 and 79.3% and standard deviation of analytical recovery between 1.66 and 8.67%. This indicates that there is considerable uncertainty in the parameter estimates, even with such large numbers of seeded particles and nine recovery samples. Therefore, further statistical analysis of recovery data, perhaps using Bayes’ theorem and the likelihood functions presented herein, may be worthwhile in applications where analytical recovery and the uncertainty therein need to be rigorously quantified (e.g., risk analysis, testing hypotheses).

Acknowledgments We thank Dr. Mary Thompson of the Department of Statistics and Actuarial Science at the University of Waterloo for her assistance and the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canadian Water Network for financial support. The comments of three anonymous reviewers contributed to the significant improvement of this manuscript. VOL. 44, NO. 5, 2010 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

1711

Supporting Information Available Example recovery data, derivations of likelihood functions, modeling and discussion of partial sample analysis, discussion of other probabilistic recovery models, figures with Poisson seeding, and Monte Carlo analyses using different recovery parameters. This information is available free of charge via the Internet at http://pubs.acs.org/.

Literature Cited (1) APHA, AWWA, and WEF. Standard Methods for the Examination of Water and Wastewater, 19th ed.; APHA: Washington, DC, 1995. (2) U.S. EPA. National Primary Drinking Water Regulations: Long Term 2 Enhanced Surface Water Treatment Rule; Final Rule. Fed. Regist. 2006, 71, 653–786. (3) Teunis, P. F. M.; Rutjes, S. A.; Westrell, T.; de Roda Husman, A. M. Characterization of drinking water treatment for virus risk assessment. Water Res. 2009, 43, 395–404. (4) Assavasilavasukul, P; Lau, B. L. T.; Harrington, G. W.; Hoffman, R. M.; Borchardt, M. A. Effect of pathogen concentrations on removal of Cryptosporidium and Giardia by conventional drinking water treatment. Water Res. 2008, 42, 2678–2690. (5) Emelko, M. B.; Huck, P. M. Microspheres as surrogates for Cryptosporidium filtration. J. Am. Water Works Assoc. 2004, 96, 94–105. (6) Korich, D. G.; Mead, J. R.; Madore, M. S.; Sinclair, N. A.; Sterling, C. R. Effects of ozone, chlorine dioxide, chlorine, and monochloramine on Cryptosporidium parvum oocyst viability. Appl. Environ. Microbiol. 1990, 56, 1423–1428. (7) Wang, M.; Ford, R. M.; Harvey, R. W. Coupled effect of chemotaxis and growth on microbial distributions in organicamended aquifer sediments: Observations from laboratory and field studies. Environ. Sci. Technol. 2008, 42, 3556–3562. (8) Teixeira, C. F.; Neuhauss, E.; Ben, R.; Romanzini, J.; GraeffTeixeira, C. Detection of Schistosoma mansoni eggs in feces through their interaction with paramagnetic beads in a magnetic field. PLoS Negl. Trop. Dis. 2007, 1, e73. (9) APHA. Compendium of Methods for the Microbiological Examination of Foods, 4th ed.; APHA: Washington, DC, 2001. (10) Edmonds, J. M.; Collett, P. J.; Valdes, E. R.; Skowronski, E. W.; Pellar, G. J.; Emanuel, P. A. Surface sampling of spores in drydeposition aerosols. Appl. Environ. Microbiol. 2009, 75, 39–44. (11) Nahrstedt, A.; Gimbel, R. A statistical method for determining the reliability of the analytical results in the detection of Cryptosporidium and Giardia in water. J. Water Supply Res. Technol. AQUA 1996, 45, 101–111. (12) Emelko, M. B.; Schmidt, P. J.; Roberson, J. A. Quantification of uncertainty in microbial data - reporting and regulatory implications. J. Am. Water Works Assoc. 2008, 100, 94–104. (13) Emelko, M. B.; Schmidt, P. J.; Reilly, P. M. Particle and microorganism enumeration data: Enabling quantitative rigor and judicious interpretation. Environ. Sci. Technol. 2010, 44, doi 10.1021/es9002382a. (14) USEPA. Method 1623: Cryptosporidium and Giardia in Water by Filtration/IMS/FA; EPA 815-R-05-002; U.S.Environmental Protection Agency, Office of Water: Washington, DC, 2005.

1712

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 44, NO. 5, 2010

(15) Young, P. L.; Komisar, S. J. The variability introduced by partial sample analysis to numbers of Cryptosporidium oocysts and Giardia cysts reported under the information collection rule. Water Res. 1999, 33, 2660–2668. (16) Clancy, J. L.; Connell, K.; McCuin, R. M. Implementing PBMS improvements to USEPA’s Cryptosporidium and Giardia methods. J. Am. Water Works Assoc. 2003, 95, 80–93. (17) Vesey, G.; Slade, J. S.; Byrne, M.; Shepherd, K.; Fricker, C. R. A new method for the concentration of Cryptosporidium oocysts from water. J. Appl. Bacteriol. 1993, 75, 82–86. (18) DiGiorgio, C. L.; Gonzalez, D. A.; Huitt, C. C. Cryptosporidium and Giardia recoveries in natural waters by using Environmental Protection Agency Method 1623. Appl. Environ. Microbiol. 2002, 68, 5952–5955. (19) McCuin, R. M.; Clancy, J. L. Modifications to United States Environmental Protection Agency Methods 1622 and 1623 for detection of Cryptosporidium oocysts and Giardia cysts in water. Appl. Environ. Microbiol. 2003, 69, 267–274. (20) Massanet-Nicolau, J. New method using sedimentation and immunomagnetic separation for isolation and enumeration of Cryptosporidium parvum oocysts and Giardia lamblia cysts. Appl. Environ. Microbiol. 2003, 69, 6758–6761. (21) Cartier, C.; Barbeau, B.; Besner, M. C.; Payment, P.; Pre´vost, M. Optimization of the detection of the spores of aerobic sporeforming bacteria (ASFB) in environmental conditions. J. Water Supply Res. Technol. AQUA 2007, 56, 191–202. (22) Crainiceanu, C. M.; Stedinger, J. R.; Ruppert, D.; Behr, C. T. Modeling the U.S. national distribution of waterborne pathogen concentrations with application to Cryptosporidium parvum. Water Resour. Res. 2003, 39, 1235–1249. (23) Petterson, S. R.; Signor, R. S.; Ashbolt, N. J. Incorporating method recovery uncertainties in stochastic estimates of raw water protozoan concentrations for QMRA. J. Water Health 2007, 5, 51–65. (24) Reynolds, D. T.; Slade, R. B.; Sykes, N. J.; Jonas, A.; Fricker, C. R. Detection of Cryptosporidium oocysts in water: techniques for generating precise recovery data. J. Appl. Microbiol. 1999, 87, 804–813. (25) Parkhurst, D. F.; Stern, D. A. Determining average concentrations of Cryptosporidium and other pathogens in water. Environ. Sci. Technol. 1998, 32, 3424–3429. (26) Teunis, P. F. M.; Evers, E. G.; Slob, W. Analysis of variable fractions resulting from microbial counts. Quant. Microbiol. 1999, 1, 63– 88. (27) USEPA. Occurrence and Exposure Assessment for the Final Long Term 2 Enhanced Surface Water Treatment Rule; EPA 815-R06-002; U.S. Environmental Protection Agency, Office of Water: Washington, DC, 2005. (28) Fisher, R. A. The negative binomial distribution. Ann. Eugenic. 1941, 11, 182–187. (29) Margolin, B. H.; Kaplan, N.; Zeiger, E. Statistical analysis of the Ames Salmonella/microsome test. Proc. Natl. Acad. Sci. U.S.A. 1981, 78, 3779–3783. (30) Eisenhart, C.; Wilson, P. W. Statistical methods and control in bacteriology. Bacteriol. Rev. 1943, 7, 57–137.

ES902237F