Subscriber access provided by UNIV OF NEBRASKA - LINCOLN
Article
Statistical models for the analysis and design of digital PCR experiments Robert M Dorazio, and Margaret E. Hunter Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.5b02429 • Publication Date (Web): 05 Oct 2015 Downloaded from http://pubs.acs.org on October 9, 2015
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Statistical models for the analysis and design of digital PCR experiments Robert M. Dorazio
∗
and Margaret E. Hunter
Southeast Ecological Science Center, U.S. Geological Survey, 7920 NW 71st Street, Gainesville, Florida 32653, United States
E-mail:
[email protected] Fax: +1 352-378-4956
Abstract Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantication of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an oset that depends on the dPCR partition volume. These models are both versatile and easy to t using conventional statistical software. Covariates can be used to specify dierent sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the eects of these covariates. For purposes of illustation, we analyzed dPCR data from dierent kinds of experiments, including serial-dilution, evaluation of copy number variation, and quantication of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in
1
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 26
estimates of nucleic acid concentration or to detect dierences in concentration among treatments with prescribed levels of statistical power.
Introduction Detection of nucleic acid molecules for quantication and rare sequence detection is a fundamental tool in medical, agricultural and biological research elds.
Applications include
estimation of copy number variation, gene expression, next-generation sequencing library quantication, and rare variants.
These applications typically require assays that exploit
polymerase chain reaction (PCR) biochemistry, wherein nucleic acids are amplied to provide accurate detection and quantication. Two widely used methods for estimating nucleic acid concentrations are quantitative PCR (qPCR) and digital PCR (dPCR)
1,2
.
Digital PCR is relatively new and provides an absolute quantication of nucleic acid concentration without requiring an external calibration based on samples of known concentration (i.e., standard curve calibration).
In dPCR a sample containing nucleic acid molecules is
physically partitioned into a large number of reaction chambers or droplets. Each of these partitions is quite small (typically nanoliters to picoliters in volume), and the total number of partitions per sample can range from hundreds to millions depending on implementation and manufacturer
35
. Importantly, by distributing the sample among many partitions, some
partitions receive no target molecules, while others receive one to several molecules. Amplication of target molecules occurs separately and independently in each partition and is visualized using a uorescent reporter dye. Quantication of nucleic acid concentration is based on the assumption that target molecules are distributed uniformly in the sample and are thus allocated randomly among the partitions. The number of molecules per partition is assumed to have a Poisson distribution. Assays based on dPCR oer several advantages over those based on qPCR. First, dPCR provides an absolute quantication owing to the limiting-dilution of nucleic acid molecules
2
ACS Paragon Plus Environment
Page 3 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
and to the direct assessment of the presence and absence of target sequences.
This ap-
proach eliminates sources of uncertainty in qPCR assays, including those associated with the construction and application of an external standard curve based on samples of known concentration.
Digital PCR is more repeatable and more reproducible than qPCR both
among and within individual laboratories
68
. Only the number of partitions and the reac-
tion volume are needed to provide absolute quantication
9,10
. In contrast, qPCR requires
the analyst to select a quantication cycle or benchmark for constructing a standard curve, and dierent benchmarks can produce dierent results
1113
. Some benchmarks, such as the
widely used cycle threshold, are prone to an analyst's subjectivity, whereas others depend on and may vary with the instrument, the uorescent reporter dye, and the eciency of the assay
8,12
. Digital PCR is also less susceptible than qPCR to inhibitors due to the partitioning
of the sample
8,14
.
Estimates of nucleic acid concentration based on dPCR are more precise than those based on qPCR, even at very low copy numbers
10,1518
. The increased precision of dPCR is
especially relevant when copy number variation is used as a clinical diagnostic measure based on extracellular samples of genomic DNA
5,19
.
In the detection of rare-sequence variants,
dPCR has produced a signicant increase in precision and a more than 20-fold increase in accuracy compared to qPCR assays
20
. For example, dPCR has been used to detect variant
3
frequencies as low as 1 in 100,000 and a 1% dierence in chromosome copy number . While dPCR oers many advantages over other PCR-based assays, the statistical analysis and design of experiments using dPCR have received only limited attention. Dube et al.
21
proposed a statistical approach to estimate copy number variation from dPCR data that requires three steps: rst, the fraction of partitions that contain DNA is computed for both target and reference molecules to provide estimates of their probabilities of occurrence among the partitions; next, the copy numbers of target and reference molecules are estimated as a nonlinear function of each estimated occurrence probability; last, the ratio of these two estimated copy numbers is computed to provide an estimate of copy number variation. This
3
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 26
approach is rather indirect because the number (or concentration) of DNA molecules is not a formal parameter of the statistical model. Instead, the copy numbers of target and reference molecules and the ratio of these copy numbers are treated as functions of parameters of a simple binomial model of proportions. Therefore, the construction of condence intervals for these functions relies on the asymptotic normality of the sampling distribution of a binomial success parameter and on approximations used to estimate the uncertainty of the estimated copy number ratio
5,21,22
.
This approach to the analysis of dPCR data may give the impression that transformation of the observed proportion of partitions containing DNA into copy number (or concentration)
8
corresponds to nothing more than a Poisson correction of the data . However, this view is misguided because it regards the occurrence of DNA among partitions as the target of inference. Futhermore, it fosters unprincipled applications of top down statistical methods, wherein the transformed proportions are treated as normally distributed data and analyzed using linear models, such as analysis of variance or regression
5,23
.
To address these problems, we propose a class of statistical models for the analysis and design of a broad range of studies using dPCR. Concentrations of nucleic acids are formal parameters of these models, and we describe methods for directly estimating these parameters from dPCR data and for constructing asymptotically valid condence intervals. The class of statistical models that we describe can be used to analyze dPCR data obtained from multiple samples containing dierent concentrations of nucleic acids. Concentrations may dier among samples as a result of experimental design as in serial dilutions
23
or they can
dier because of natural sources of variation as in surveys of environmental DNA copy number variation
26,27
. Our approach is similar to that of Kreutz et al.
28
14,24,25
or
, who proposed
a model for the analysis of data collected in multi-volume dPCR assays. However, the class of models that we propose is more general because covariates are used to specify dierent sources of variation in nucleic acid concentration and parameters are used to quantify the eects of these covariates. In fact, this class of models is mathematically equivalent to a specic type
4
ACS Paragon Plus Environment
Page 5 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
of generalized linear model
29
. We also show how this class of models can be used to select
sample sizes needed to achieve desired levels of precision in estimates of concentration or to detect dierences in concentration with prescribed levels of statistical power. To illustrate this approach with real data, we analyzed dPCR counts observed in several dierent kinds of experiments, including serial dilution of a solution of known DNA concentration (from a synthetic gene), evaluation of copy number variation in the myelocytomatosis oncogene (MYC), and quantication of gene expression (SDPR) in leukocytes.
Statistical models Observations from one sample Suppose a sample
S of DNA is analyzed using a dPCR assay.
In this assay the DNA molecules
are assumed to be uniformly distributed throughout the sample. This assumption implies that the location
s of each DNA molecule in the sample is random and can be modeled using
a Poisson point process of constant, rst-order intensity
λ 30,31 .
In the context of dPCR,
is the limiting, expected concentration (number of molecules per unit volume) at location That is, for a small region
ds
of volume
λ=
for Poisson point process
N 32 .
v(ds)
centered at
λ s.
s,
lim E[N (ds)]/v(ds)
v(ds)→0
The total number
is a Poisson random variable with expected value
N (S)
λ v(S),
of DNA molecules in the sample where
v(S)
is the sample volume.
Furthermore, if the sample is partitioned into a set of disjoint (physically separated) subsamples, say
S1 ∪ · · · ∪ SK = S ,
the number of molecules in each subsample also has a Poisson
distribution more specically, subsample
Sk 30,31 .
N (Sk ) ∼ Poisson(λv(Sk )),
where
v(Sk )
is the volume of
In addition, the number of molecules in each subsample is independent
of the number in any other subsample.
5
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 26
The Poisson point process described above provides the theoretical basis for modeling data observed in dPCR experiments. Consider a data set in which each of
n
replicates of
a DNA sample is analyzed using a dPCR assay; thus, each replicate is physically divided into a large number of partitions (e.g., nanouidic reaction chambers or droplets), each of identical volume (i
= 1, . . . , n).
molecules.
v.
Let
mi
denote the number of partitions associated with the
ith replicate
Some partitions may contain no DNA, while others may contain one or several
However, because DNA molecules are assumed to be uniformly distributed in
the sample, we can assume that the number of molecules in a partition has a Poisson( λv ) distribution. That is, the expected number of DNA molecules in a partition depends on the concentration Let
Nij
λ
of the sample and on the volume
v
of the partition.
denote a Poisson random variable for the number of molecules present in the
partition of replicate
i.
The value of
Nij
j th
is not directly observable. Instead, once the PCRs
are completed, only presence or absence of molecules is observed in each partition; therefore, the quantity actually observed is a quantized version of the original number of molecules,
Zij = I(Nij > 0),
whose value indicates the presence ( Zij
j th
amplied DNA in the
partition of replicate
i.
= 1)
or absence (Zij
= 0)
of
The probabilities of these alternative
outcomes follow directly from the assumed distribution of
Nij ,
that is,
Pr(Zij = 1) = Pr(Nij > 0) = 1 − exp(−λv) Pr(Zij = 0) = Pr(Nij = 0) = exp(−λv)
In other words, we assume that
Zij
has a
Bernoulli(1−exp(−λv)) distribution.
Furthermore,
because the PCR within each partition is independent of the PCRs in other partitions, the number of partitions in the ith replicate that contain nucleic acid molecules, which we denote by the random variable
Yi =
Pmi
j=1
Zij ,
has a
Binomial(mi , 1 − exp(−λv)) distribution.
statistical model provides the basis for estimating the expected concentration acid in the sample or the expected number of copies
λv
in each partition.
6
ACS Paragon Plus Environment
λ
This
of nucleic
Page 7 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Suppose the values each of
n
(yi , mi )
i = 1, . . . , n
for
are observed by applying the dPCR assay to
replicates of a sample. These observations are made independently, so the joint
probability of the observed data is
n Y mi Pr(y1 , . . . , yn |λ) = [1 − exp(−λv)]yi [exp(−λv)]mi −yi y i i=1 Taking logarithms of both sides of this equation and omitting the combinatorial terms (which
λ)
do not involve
yields the following log-likelihood function for
`(λ) =
n X
λ:
yi log(1 − exp(−λv)) − (mi − yi )λv
i=1
= y log(1 − exp(−λv)) − (m − y )λv
where
y =
Pn
i=1
acid and where
yi
m =
observed among the
is the total number of partitions observed to contain targeted nucleic
Pn
i=1
n
mi
is the total number of partitions (with or without nucleic acid)
replicates. The value of
mi
may vary with
i
but it does not depend on the presence of nucleic acid; therefore,
(i.e., among replicates),
mi
can be treated as a
known constant. The maximum-likelihood estimator of maximizes
`(λ).
For this model
ˆ λ
λ,
which we denote by
ˆ, λ
is the value of
λ
that
can be expressed in closed form as follows:
ˆ = −(1/v) log(1 − y /m ) λ
This estimator implies that the dPCR assay will be informative of
λ
if the total number of
partitions (m ) exceeds the number that contain the target nucleic acid ( y ). If necessary, it is standard practice to dilute the sample to ensure that the proportion of partitions that contain nucleic acid (that is,
y /m )
lies between zero and one.
A consistent estimator of the uncertainty in
ˆ λ
may be derived by inverting the observed
7
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 26
information as follows:
ˆ [I(λ)]
−1
−1 d2 ` y = − 2 |λ=λˆ = 2 dλ v m (m − y )
100(1 − α)%
One way of constructing a
λ
condence interval for
is based on the asymp-
totic normality of maximum likelihood estimators that is, asymptotically
Normal(0, [I(λ)]−1 ),
where
2
d ` I(λ) = −E[ dλ 2]
denotes the expected (Fisher) information.
practice, condence intervals are computed using the observed information mate of
I(λ) because the true value of λ is unknown.
interval for
λ
ˆ − λ) ∼ (λ
For example, a
ˆ ± zα/2 [I(λ)] ˆ −1/2 , λ
may be computed as follows:
quantile of a standard normal distribution.
ˆ I(λ)
In
as an esti-
100(1 − α)% condence
where
zα/2
is the
If a maximum likelihood estimate
ˆ λ
(α/2)is close
to zero, this approach can produce condence intervals that include negative values of
λ.
To eliminate this possibility, an asymptotically equivalent interval may be constructed by inverting a likelihood-ratio test for the null hypothesis, pothesis, the
H1 : λ 6= λ0 .
In this test
H0
is accepted if
H0 : λ = λ0 ,
versus alternative hy-
ˆ ≤ χ2 , −2[`(λ0 ) − `(λ)] 1−α
(1−α)-quantile of a chi-squared distribution with one degree of freedom.
region of this test corresponds to a
100(1 − α)%
this interval are the strictly positive values of
λ0
condence interval for
λ.
where
χ21−α
is
The acceptance
The endpoints of
that satisfy the following equation:
ˆ + χ2 /2 = 0 `(λ0 ) − `(λ) 1−α
Because of this equation's nonlinearity, a numerical root-nding algorithm must be used to compute the condence limits, but the calculation is otherwise trivial. A second advantage of this procedure is that its condence limits are invariant to reparameterization of
λ; therefore,
the same limits will be obtained if the model is tted using any one-to-one, dierentiable transformation of
λ,
such as
The observed sums
n
y
θ = log(λ).
and
m
are obviously sucient for estimating the value of
replicates of a single sample; therefore, the information about
8
ACS Paragon Plus Environment
λ
contained in
n
λ
from
replicates
Page 9 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
of size
m ¯
(the sample mean) is mathematically equivalent to the information contained in
a single replicate of size
m
since
m = nm ¯.
The suciency of the statistic
(y , m )
for the
analysis of a single sample provides the theoretical justication for pooling observed counts among replicates, as opposed to using an arithmetic average of replicate-specic estimates of
λ to estimate concentration.
This latter approach, though used in many studies, should only
be considered if DNA concentrations are expected or can be shown to dier among replicates (e.g., due to pipetting errors
33
). In Observations from multiple samples we will describe a
deviance statistic that can be used to detect signicant sources variation among replicates of the same sample.
Calculating number of replicates for one sample As described earlier, the inverse of the expected information asymptotically valid expression for
ˆ . Var(λ)
2
d ` I(λ) = −E[ dλ 2]
yields an
For the model of dPCR data described in the
previous section, it is easily shown that the expected information is
I(λ) =
therefore, asymptotically
v 2 m exp(−λv) ; 1 − exp(−λv)
ˆ = [1 − exp(−λv)]/[v 2 m exp(−λv)]. Var(λ)
For the purpose of study design, assume that the number of partitions per replicate is constant (say,
m)
so that
function of sample size
n
m = nm.
The relative error,
ˆ , Var(λ)/λ
can be expressed as a
as follows:
ˆ Var(λ) 1 − exp(−λv) = λ nmv 2 λ exp(−λv)
Solving this equation for
n
given a xed value of
n=
1 λvp
2
ˆ p = SE(λ)/λ =
1 − exp(−λv) m exp(−λv)
9
ACS Paragon Plus Environment
q ˆ Var(λ)/λ
yields
(1)
Analytical Chemistry
The right hand side of eq 1 provides an asymptotically valid expression for the number of replicates
n
needed to achieve a prescribed level of error
decreasing function of
λ
p.
For a xed value of
p, n
is a
(Figure 1); therefore, as might be expected, many replicates may
be needed to produce reliable estimates of
λ
at low concentrations of DNA.
0.10 0.15 0.20
350
300
Number of replicates
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 26
250
200
150
100
50
0.1
0.2
0.3
0.4
0.5
Concentration (molecules / uL)
Figure 1: Number of replicates needed to estimate nucleic acid concentration with relative errors of 10% (solid line), 15% (dashed line), or 20% (dotted line).
Observations from multiple samples In this section we extend the model described earlier to include observations from multiple samples containing dierent concentrations of a nucleic acid (DNA or RNA). Commonly encountered examples are observations from serial dilutions of an original sample and observations from dierent genes assayed for the purpose of estimating copy number variation or gene expression. We will illustrate these examples momentarily. For now, suppose as before that dPCR assay to each of
n
(yi , mi )
for
i = 1, . . . , n
are observed by applying the
replicates. We assume that the nucleic acid concentration of the
ith replicate can be formulated as a log-linear function of an observed vector of regressors xi 10
ACS Paragon Plus Environment
Page 11 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
and their eects
β
log(λi ) = x0i β ,
as follows:
where the prime symbol denotes the transpose
of a matrix or vector. Dierent codications of types of experiments or surveys.
xi
can be used to model data from dierent
For example, in experiments designed to estimate copy
number variation the value assigned to
xi
ith
determines whether the
replicate corresponds
to a reference or a target locus.
Binomial(mi , 1 − exp(−λi v))
As shown in the previous section, we can assume that the distribution provides the basis for estimating the concentration fore, the log-likelihood function for the parameter
`(β) =
n X
β
λi
of the ith replicate; there-
is
yi log(1 − exp(−λi v)) − (mi − yi )λi v
(2)
i=1
where
λi = exp(x0i β).
the value of
β
The maximum-likelihood estimator of
that maximizes
`(β).
Generally, the value of
form except for relatively simple models (such as
ˆ β
and of the observed information matrix
ˆ β
β,
which we denote by
ˆ, β
is
cannot be expressed in closed
log(λi ) = β , ∀i).
In most cases the value of
ˆ must be computed using numerical methods I(β)
of optimization (see Appendix S1); however, these calculations are relatively straightforward with the aid of modern computers. In fact, the model that we have described is mathematically equivalent to a generalized linear model log-log link function and an oset equal to
29
of binomial responses with a complementary
log(v).
To see this, let
η(p) = log(− log(1 − p))
denote the complementary log-log link function for a binomial parameter the binomial parameter for the for
p
ith observation is 1 − exp(−λi v).
η(1 − exp(−λi v)) = log[− log(exp(−λi v))]
= = x0i β + log(v)
11
ACS Paragon Plus Environment
In our model
Subtituting this expression
yields
= log(λi v)
p.
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Thus,
x0i β
model.
is the linear predictor and
log(v)
Page 12 of 26
is the additive oset of the generalized linear
The implication of this result is that any statistical software program capable of
tting generalized linear models can be used to t the model that we have described. We illustrate these calculations in the Supporting Information using the statistical software program Inferences for
β
34
glm
function of the R
.
are based on the asymptotic multivariate normality of the maximum-
likelihood estimator that is,
ˆ − β) ∼ Normal(0, [I(β)]−1 ), where I(β) is estimated using (β
the observed information matrix the deviance statistic,
ˆ . I(β)
The adequacy of a model can be assessed using
ˆ − `s ), D = −2(`(β)
which measures the dierence between the log-
likelihood of the tted model and log-likelihood of a saturated model. The value of be calculated by substituting
λi = −(1/v) log(1 − yi /mi )
The asymptotic distribution of
χ2ν ),
where
q
D
is chi-squared with
β.
can
into the right hand side of eq 2.
ν = n−q
is the number of estimated parameters in
`s
degrees of freedom (that is,
Therefore, a
p-value
for a model's
Pr(χ2ν > D).
goodness-of-t may be obtained by computing
The model that we have described is quite general and can be used to compute inferences for many scientically relevant functions of
β.
The following examples provide illustrations
of this approach.
Example 1: modeling serial dilutions Strictly speaking, serial dilutions of an original sample of unknown concentration or of a standard solution containing a known concentration are not necessary in dPCR. However, these dilutions can be used to verify that estimated concentrations conform to those expected given a prescribed set of dilutions
23,28
.
As an example, suppose a series of three
d-fold
dilutions of an original sample of un-
known concentration are used to create four samples for a dPCR analysis. concentrations of these samples in order of increasing dilution are and
(1/d)3 .
Let
λ0
The relative
(1/d)0 , (1/d)1 , (1/d)2 ,
denote the unknown concentration of the original (undiluted) sample.
12
ACS Paragon Plus Environment
Page 13 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Given these denitions, the unknown concentration of the where
rj = (1/d)j (j = 0, 1, 2, 3).
to construct a regressor in our model because
x = (1, log(rj ))0
λj
This expression for
sample equals
indicates that
log(rj )
log(λj ) = log(λ0 ) + log(rj ).
is used as a regressor, then the vector
ters: one is the parameter of interest ( β0
j th
= log(λ0 ));
β = (β0 , β1 )0
λj = λ0 rj ,
may be used
Specically, if
contains two parame-
the other is the eect of each dilution
on sample concentration ( β1 ). In principle, the value of
β1
should equal one; however, we
can estimate this parameter to verify the accuracy of the dilutions and the performance of the dPCR assay at dierent concentrations. Once
β
has been estimated, the concentration of each sample (diluted or undiluted) also
can be estimated. Let
x
denote the regressor associated with a particular sample. An esti-
mate of the logarithm of this sample's concentration is valid estimate of its variance is
ˆ = x0 Vˆ x, d Var(log( λ))
ˆ = x0 β ˆ, log(λ) where
and an asymptotically
ˆ −1 Vˆ = [I(β)]
is the inverse of
the observed information matrix. These results imply that the concentration of the sample (on the arithmetic scale) may be estimated using condence interval for is the
(α/2)-quantile
λ may be
ˆ = exp(x0 β) ˆ λ
computed as follows:
and that a
100(1 − α)%
ˆ ± zα/2 (x0 Vˆ x)1/2 ], exp[x0 β
where
zα/2
of a standard normal distribution.
Note that while this condence
ˆ, λ
it is restricted to strictly positive
interval may be asymmetrical relative to the value of concentrations.
Example 2: estimating copy number variation Copy number variations (CNVs) stem from changes in genomic DNA (i.e., deletions, duplications, or structural rearrangements) that produce an abnormal number of copies of a particular sequence (usually, exceeding 500 bp
21
). Therefore, CNVs are evident in sequences
that deviate from the standard denition of two copies of a locus per diploid genome. To quantify the CNV of an individual, the number of copies of a target locus that may have undergone abnormal changes in DNA is compared with the number of copies of a normal, reference locus. The ratio of DNA concentrations between target and reference loci therefore
13
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 26
reects the amount of copy number variation per haploid genome of the target locus. For simplicity, this ratio is often called the copy number ratio . Suppose an individual's DNA is collected for the purpose of estimating concentrations of target and reference genes. Assume also that these two genes are amplied in separate PCRs using separate primers and probes so that independence can be assumed among all replicates. To specify the values of regressors for this experiment, assign the target gene and assign
x = (1, 0)0
assignments, the parameters
β1
and
x = (0, 1)0
to each replicate of the reference gene. Given these
β0
correspond to the log-scaled concentrations of DNA
for target and reference genes, respectively, and their copy number ratio where
c = (−1, 1)0 .
to each replicate of
R
equals
exp(c0 β),
To see this, note that
R = exp(c0 β) = exp(−β0 + β1 ) = λ1 /λ0
Once
β
and, a
has been estimated, the copy number ratio may be estimated using
100(1 − α)%
zα/2 (c0 Vˆ c)1/2 ],
where
condence interval for
ˆ −1 Vˆ = [I(β)]
R
ˆ, ˆ = exp(c0 β) R
may be computed as follows:
ˆ ± exp[c0 β
is the inverse of the observed information matrix. No-
tice that this computational approach is identical to that used in Example 1: modeling serial dilutions.
compute
The only dierence is the linear combination of the parameters of
β
used to
R.
Example 3: planning experiments to detect copy number variation Suppose an individual's copy number variation is expected to lie within some plausible range of values, and we wish to plan an experiment to estimate the actual copy number ratio. As in the previous example, let
R = λ1 /λ0
denote this copy number ratio.
In planning the
experiment, we may want to determine the probability of detecting a particular value of
14
ACS Paragon Plus Environment
R
Page 15 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
given a xed number of replicates per gene; alternatively, we may want to determine the
R.
number of replicates per gene needed to detect a particular value of
Both objectives
R
require us to calculate the statistical power of detecting a particular value of prescribed level of statistical signicance
α.
To specify the design problem more concretely, note that null hypothesis,
H1 : λ0 6= λ1 .
H0 : λ0 = λ1 ,
and that
R 6= 1
R = 1
H1
corresponds to the
corresponds to an alternative hypothesis,
Other alternative hypotheses are possible (such as
power calculations that we develop for
given a
R>1
or
R < 1),
but the
can be used in those cases too (see Appendix S2).
Therefore, for now we consider the original hypotheses, which can be stated in terms of our model's parameters as follows: statistic
Λ
H0
for comparing
H0 : β = β0 ,
and
H1
and
H1 : β = (β0 , β1 )0 .
The likelihood-ratio
equals
Λ = −2(`(βˆ0 ) − `(βˆ0 , βˆ1 ))
If
H0
is true, then asymptotically
one degree of freedom. If
H1
Λ has a central, chi-squared distribution parameterized by
is true, then asymptotically
Λ
has a non-central, chi-squared
distribution parameterized by one degree of freedom and noncentrality the magnitudes of of concluding
H1
β0
and
β1
given that
(and therefore
H1
λ0
and
λ1 ).
calculate the power of detecting a particular copy number ratio
replicates of each gene and on the magnitudes of
λ0
φ,
and
which depends on
Statistical power is the probability
is true; therefore, we must evaluate
This calculation requires the noncentrality parameter
φ,
Pr(Λ > χ21−α |H1 true) to
R = λ1 /λ0
when
H1
is true.
which depends on the number of
λ1
(or
R)
(see Appendix S2).
In Figure 2 we illustrate how power calculations can be used to solve two kinds of design problems. First, we illustrate how the statistical power varies over a range of copy number ratios given a xed number of replicates (5 or 10) per gene. assumed
λ0 = 1.5
partition, and
molecules/µL,
α = 0.05.
m =
For these calculations, we
15000 partitions per replicate,
v = 0.91
Note that the power of detecting a copy number ratio
15
ACS Paragon Plus Environment
R
nL per
increases
Analytical Chemistry
a 1.0
Power
0.8 0.6 0.4 0.2 0.0
No. replicates
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 26
0.6
0.8
1.0
1.2
1.4
0.6
0.8
1.0
1.2
1.4
b c
200 150 100 50 0
Copy number ratio
Figure 2: Design of copy number variation experiments. a. Statistical power of detecting copy number ratio given 5 (dashed line) or 10 (solid line) replicates per gene. b. Number of replicates needed to detect the copy number ratio given xed statistical power of 0.80 (dashed line) or 0.95 (solid line).
with
|R − 1|
and with the number of replicates (Figure 2a). We also calculated the number
of replicates per gene needed to achieve a xed statistical power (0.80 or 0.95) as a function of copy number ratio. Note that the number of required replicates declines with
|R − 1|
and
increases with power (Figure 2b).
Analysis of dPCR Data In this section we illustrate how the class of statistical models described earlier may be used to analyze dPCR counts observed in dierent kinds of experiments. The data and statistical software
34
used in these analyses are provided in the Supporting Information.
16
ACS Paragon Plus Environment
Page 17 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Serial dilution of a synthetic gene We prepared serial dilutions of a synthetic gene as described by Hunter et al.
35
.
Briey,
we used a Burmese python ND4 mitochondrial assay to analyse a 133 bp vector-replicated synthetic gene (Integrated DNA technologies, IDT, Coralville, Iowa). The plasmid containing the gene was linearized using the EcoRV-HF restriction enzyme (New England BioLabs, Ipswich, MA) and quantied on a Qubit 2.0 uorometer (Invitrogen, Life Technologies, CA, USA). Using standard procedures, the digital droplet QX200 (Bio-Rad, CA, USA) was used to determine numbers of droplets with or without DNA for each of six fourfold serial dilutions (10 replicates per dilution). We used the known concentrations of these dilutions (1.54, 6.14, 24.57, 98.29, 393.15, and 1572.59 copies/ µL) to construct a regressor for each replicate. Similar to the approach described in Example 1: modeling serial dilutions , let the
j th
sample, and let
x = (1, log(cj ))0
includes an intercept parameter
β0
cj
denote the known concentration of
be a regressor. Given these choices, the vector
and a slope parameter
β1 .
β
If the serial dilutions were
prepared correctly and if all of the DNA in each sample was available to be amplied, then in principle the intercept and slope parameters should equal zero and one, respectively. Fitting the model to the data observed in the experiment yielded estimates that were close to these
ˆ0 values (β
= 0.046 (SE = 0.016) and βˆ1 = 0.961 (SE = 0.002)).
In Figure 3 the regression line
based on these estimates is plotted with replicate-specic estimates of DNA concentration.
Copy number variation of MYC gene A mutated version of the regulator gene MYC is found in many cancers; therefore, copy number variation (CNV) in MYC can be used as a clinical diagnostic for the detection of these cancers.
As an example, the copy number ratio between MYC and a reference
(housekeeping) locus on Chr4 can be used to quantify levels of CNV in MYC. We used duplicate dPCR assays for each of seven patients (data retrieved from QuantaSoft, BioRad, CA,USA) to estimate the copy number ratio between MYC and Chr4 for each patient
17
ACS Paragon Plus Environment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Estimated concentration (copies / uL)
Analytical Chemistry
Page 18 of 26
1000 500
100 50
10 5
1 1
5
10
50
100
500
1000
Standard concentration (copies / uL)
Figure 3: Comparison of estimated and known concentrations of standards prepared by serial dilution of a synthetic gene. Line indicates the model-based prediction of concentration. Axes are logarithmically scaled.
using the model described in Example 2: estimating copy number variation . Copy number variation in MYC was evident in all but one patient (patient CN2) (Table 1).
Although
one patient had a lower than normal copy number ratio (patient CN1), the copy number ratios of most patients were signicantly higher than one, suggesting that these patients had duplicated MYC genes. Table 1: Estimated copy number ratio
ˆ R
between MYC and Chr4 genes for seven patients.
Columns labeled 2.5% and 97.5% contain limits of 95% condence intervals for labeled GOF contains
p-values
for assessing model goodness-of-t.
Patient ID
ˆ R
2.5%
97.5%
GOF
CN1
0.477
0.458
0.497
0.924
CN2
1.001
0.971
1.032
0.390
CN3
1.493
1.452
1.535
0.186
CN5
2.513
2.445
2.583
0.868
CN6
3.082
2.981
3.187
0.168
CN14
6.907
6.695
7.125
0.275
CN16
8.158
7.944
8.379
0.320
18
ACS Paragon Plus Environment
R.
Column
Page 19 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Expression of SDPR gene Several experiments were conducted to determine the eects of cell isolation methods on gene expression in leukocytes
36
. We analyzed results from one of these experiments wherein the
mRNA concentrations of the target gene SDPR and reference gene RPL27 were estimated for each of three methods of cell isolation: negative selection, positive selection, and uorescenceactivated cell sorting (FACS). In this experiment dPCR was used to measure concentrations of mRNA extracted from the monocytes of ve healthy donors. To analyze the dPCR data from this experiment, we used regressors to indicate the gene (SPDR vs. RPL27) and cell isolation method of each replicate. Specically, we let the gene (u
= 1 for SDPR; u = 0 for RPL27), v
for positive selection;
w = 0
otherwise).
v=0
otherwise), and
w
u
specify
specify the positive selection method ( v specify the FACS method ( w
Given these denitions, we used the regressor
=1
=1
for FACS;
x = (1, u, v, w, uv, uw)0
to t a model whose parameters allowed the expression of SDPR to be estimated for each cell isolation method.
For example, expression of SDPR is dened by the ratio between
mRNA concentrations of SDPR and RPL27. Given our choice of regressor
x,
the logarithm
of this ratio equals a linear combination of the model's parameters. Specically, and
β1 + β5
β1 , β1 + β4 ,
denote the logarithm of SDPR expression for cell isolation methods of negative
selection, positive selection, and FACS, respectively. In general, once the model's parameters have been estimated, gene expression
R
can be estimated using
ˆ, ˆ = exp(c0 β) R
where
c
is
the vector of integers needed to form a specic linear combination of model parameters. A
ˆ ±zα/2 (c0 Vˆ c)1/2 ], where 100(1−α)% condence interval for R can be computed using exp[c0 β ˆ −1 Vˆ = [I(β)]
is the inverse of the observed information matrix.
This experiment included replicate observations from each of ve individuals (i.e., repeated measures); therefore, we extended the model to account for dierences in concentration among individuals that were not associated with covariate eects, and we estimated the magnitude of these additional components of variance (see Appendix S3). The results of our analysis indicated that SPDR expression levels diered signicantly among the three cell iso-
19
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ˆ R
Table 2: Estimated gene expression (ratio
Page 20 of 26
between mRNA concentrations of SDPR and
RPL27) for dierent methods of cell isolation.
Columns labeled 2.5% and 97.5% contain
limits of 95% condence intervals for gene expression. Method of
ˆ R
2.5%
97.5%
Negative selection
63.08
34.91
114.00
Positive selection
1.01
0.56
1.83
FACS
0.29
0.16
0.54
cell isolation
lation methods (Table 2). The dierences in expression that we estimated were qualitatively consistent with those reported previously
36
; however, we also detected substantial varia-
tion in mRNA concentrations among individuals that was not associated with cell isolation methods (σ ˆRPL27
= 0.27, σ ˆSPDR = 0.63).
These individual-level dierences in mRNA con-
centration were at least partially responsible for the relatively low precision of our estimates of SPDR expression.
Discussion In this article we have described and illustrated a class of statistical models for the analysis and design of a broad range of studies based on dPCR. Our approach is more direct and versatile than previously proposed methods of analysis
5,21
and has led to an improved un-
derstanding of those methods. For example, we have provided a theoretical justication for the pooling of observed counts among replicates of a sample; thus, the statistical methods proposed by Dube et al.
21
and Whale et al.
5
will yield asymptotically valid condence in-
tervals for copy number ratios provided that these methods are applied to pooled counts of reference and target molecules. Estimating copy numbers for individual replicates (that is, using unpooled data) and then averaging replicate-specic copy number ratios, while common practice, does not provide a valid estimate or condence interval for the true ratio of copy numbers. However, this practice may be needed if DNA concentrations are expected or can be shown to dier among replicates of the same sample (say, due to pipetting errors
20
ACS Paragon Plus Environment
33
).
Page 21 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
The class of statistical models that we have described honors the mean-variance relationship of the binomial counts observed in dPCR experiments. This aspect of the model is important because it allows the precision of an estimate of nucleic acid concentration to be computed correctly so that condence intervals for making inferences about concentration or copy number are valid. In contrast, many published analyses of dPCR data ignore the binomial nature of the observations. In these analyses the fraction of partitions containing the target molecule is transformed into copy number (or concentration), and this transformed quantity is analyzed as if it was normally distributed (say, using analysis of variance or regression).
We believe this approach should be abandoned in favor of the more-principled
methods of analysis that we have described. Replication is vital in experiments using dPCR because it provides evidence of repeatability. Replication also provides a means to assess whether a set of dPCR data conforms to model assumptions based on the Poisson distribution of nucleic acid molecules. We showed that the deviance statistic, which is based on the likelihood function of our model, may be used in formal tests of goodness-of-t. To our knowledge, assessments of model adequacy, though obviously important, have not been completed in previous analyses of dPCR data. We conclude by noting that our approach to the analysis of dPCR data is easy to implement. By providing R code
34
for analyzing all of the data sets described in this paper,
we have contributed to the growing collection of R software used in the analysis of dPCR data
37
.
Our approach is both versatile and easy to implement, and we anticipate its use
in the analysis of data from many kinds of experiments that require quantication of nucleic acids. Analysis of gene-expression experiments is a particularly important application wherein we extended the generalized linear model to account for individual-level dierences in expression not associated with eects of experimental treatments.
21
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 26
Acknowledgement We thank John Buttereld and Gaia Meigs-Friend (USGS) for assistance with laboratory analysis. We thank Nadejda Beliakova-Bethell for providing gene expression data from an experiment designed to estimate eects of dierent cell isolation methods. The comments of an anonymous reviewer improved an earlier draft of our paper. Any use of trade, product, or rm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Supporting Information Available Appendices, data les, and R code
34
described in the text.
This material is available free of charge via the Internet at
http://pubs.acs.org/ .
References (1) Heid, C. A.; Stevens, J.; Livak, K. J.; Williams, P. M. Genome Research
1996,
6,
986994.
(2) Vogelstein, B.; Kinzler, K. W. Proceedings of the National Academy of Sciences, USA
1999, 96, 92369241. (3) Heyries, K. A.; Tropini, C.; VanInsberghe, M.; Doolin, C.; Petriv, O. I.; Singhal, A.; Leung, K.; Hughesman, C. B.; Hansen, C. L. Nature Methods
(4) Hindson, B. J. et al. Analytical Chemistry
2011, 8, 649651.
2011, 83, 86048610.
(5) Whale, A. S.; Huggett, J. F.; Cowen, S.; Speirs, V.; Shaw, J.; Ellison, S.; Foy, C. A.; Scott, D. J. Nucleic Acids Research
2012, 40, e82.
(6) Hayden, R. T. et al. Journal of Clinical Microbiology
2008, 46, 157163.
22
ACS Paragon Plus Environment
Page 23 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
(7) Hayden, R. T.; Gu, Z.; Ingersoll, J.; Abdul-Ali, D.; Shi, L.; Caliendo, A. M. Journal of Clinical Microbiology
2013, 51, 540546.
(8) Huggett, J. F.; Foy, C. A.; Benes, V.; Emslie, K.; Garson, J. A.; Haynes, R.; Hellemans, J.; Kubista, M.; Mueller, R. D.; Nolan, T.; Pfa, M. W.; Shipley, G. L.; Vandesompele, J.; Wittwer, C. T.; Bustin, S. A. Clinical Chemistry
2013, 59, 892902.
(9) Bhat, S.; Herrmann, J.; Armishaw, P.; Corbisier, P.; Emslie, K. R. Analytical and Bioanalytical Chemistry
2009, 394, 457467.
(10) Sanders, R.; Huggett, J. F.; Bushell, C. A.; Cowen, S.; Scott, D. J.; Foy, C. A. Analytical Chemistry
2011, 83, 64746484.
(11) Ruijter, J. M.; Pfa, M. W.; Zhao, S.; Spiess, A. N.; Boggy, G.; Blom, J.; Rutledge, R. G.; Sisti, D.; Lievens, A.; De Preter, K.; Derveaux, S.; Hellemans, J.; Vandesompele, J. Methods
2013, 59, 3246.
(12) Tellinghuisen, J.; Speiss, A.-N. Analytical Biochemistry
2014, 449, 7682.
(13) Tellinghuisen, J.; Speiss, A.-N. Analytical Biochemistry
2014, 464, 94102.
(14) Doi, H.; Takahara, T.; Minamoto, T.; Matsuhashi, S.; Uchii, K.; Yamanaka, H. Environmental Science and Technology
2015, 49, 56015608.
(15) Lun, F. M. F.; Chiu, R. W. K.; Allen C., K. C.; Leung, T. L.; Lau, T. K.; Lo, Y. M. D. Clinical Chemistry
2008, 54, 16641672.
(16) Tadmor, A. D.; Ottesen, E. A.; Leadbetter, J. R.; Phillips, R. Science
2011, 333, 5862.
(17) White 3rd, R. A.; Quake, S. R.; Curr, K. Journal of Virological Methods 4550.
23
ACS Paragon Plus Environment
2012,
179,
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 26
(18) Devonshire, A. S.; Honeyborne, I.; Gutteridge, A.; Whale, A. S.; Nixon, G.; Wilson, P.; Jones, G.; McHugh, T. D.; Foy, C. A.; Huggett, J. F. Analytical Chemistry
2015,
87,
37063713.
(19) Pretto, D.; Maar, D.; Yrigollen, C. M.; Regan, J.; Tassone, F. Clinical Chemistry
2015,
61, 182190.
(20) Strain, M. C.; Lada, S. M.; Luong, T.; Rought, S. E.; Gianella, S.; Terry, V. H.; Spina, C. A.; Woelk, C. H.; Richman, D. D. PLoS One
(21) Dube, S.; Qin, J.; Ramakrishnan, R. PLoS One
2013, 8, e55943.
2008, 8, e2876.
(22) Whale, A. S.; ; Cowen, S.; Foy, C. A.; Huggett, J. F. PLoS One
2013, 8, e58177.
(23) Pinheiro, L. B.; Coleman, V. A.; Hindson, C. M.; Herrmann, J.; Hindson, B. J.; Bhat, S.; Emslie, K. R. Analytical Chemistry
2012, 84, 10031011.
(24) Nathan, L. M.; Simmons, M.; Wegleitner, B. J.; Jerde, C. L.; Mahon, A. R. Environmental Science and Technology
2014, 48, 1280012806.
(25) Doi, H.; Uchii, K.; Takahara, T.; Matsuhashi, S.; Yamanaka, H.; Minamoto, T. PLoS One
2015, 10, e0122763.
(26) Iafrate, A. J.; Feuk, L.; Rivera, M. N.; Listewnik, M. L.; Donahoe, P. K.; Qi, Y.; Scherer, S. W.; Lee, C. Nature Genetics
(27) Sebat, J. et al. Science
2004, 36, 949951.
2004, 305, 525528.
(28) Kreutz, J. E.; Munson, T.; Huynh, T.; Shen, F.; Du, W.; Ismagilov, R. F. Analytical Chemistry
2011, 83, 81588168.
(29) McCullagh, P.; Nelder, J. A. Generalized linear models, second edition ; Chapman and Hall: London, 1989.
24
ACS Paragon Plus Environment
Page 25 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
(30) Møller, J.; Waagepetersen, R. P. Statistical inference and simulation for spatial point processes ; Chapman & Hall: Boca Raton, 2004.
(31) Illian, J.; Penttinen, A.; Stoyan, H.; Stoyan, D. Statistical analysis and modelling of spatial point patterns ; John Wiley & Sons: West Sussex, England, 2008.
(32) Cressie, N.; Wikle, C. K. Statistics for spatio-temporal data ; John Wiley & Sons: Hoboken, New Jersey, 2011.
(33) Jacobs, B. K. M.; Goetghebeur, E.; Clement, L. BMC Bioinformatics
2014, 15, 283.
(34) R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2015.
(35) Hunter, M. E.; Oyler-McCance, S. J.; Dorazio, R. M.; Fike, J. A.; Smith, B. J.; Hunter, C. T.; Reed, R. N.; Hart, K. M. PLoS One
2015, 10, e0121655.
(36) Beliakova-Bethell, N.; Massanella, M.; White, C.; Lada, S. M.; Du, P.; Vaida, F.; Blanco, J.; Spina, C. A.; Woelk, C. H. Cytometry Part A
2014, 85A, 94104.
(37) Rödiger, S.; Burdukiewicz, M.; Blagodatskikh, K.; Jahn, M.; Schierack, P. The R Journal
2015, 7, 127150.
25
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Graphical TOC Entry PCR
26
ACS Paragon Plus Environment
Page 26 of 26