(dPCR) Experiments - ACS Publications - American Chemical Society

Oct 5, 2015 - observed vector of regressors xi and their effects β as follows: β λ = ′x ..... (20) Strain, M. C.; Lada, S. M.; Luong, T.; Rought,...
0 downloads 0 Views 453KB Size
Subscriber access provided by UNIV OF NEBRASKA - LINCOLN

Article

Statistical models for the analysis and design of digital PCR experiments Robert M Dorazio, and Margaret E. Hunter Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.5b02429 • Publication Date (Web): 05 Oct 2015 Downloaded from http://pubs.acs.org on October 9, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Statistical models for the analysis and design of digital PCR experiments Robert M. Dorazio



and Margaret E. Hunter

Southeast Ecological Science Center, U.S. Geological Survey, 7920 NW 71st Street, Gainesville, Florida 32653, United States

E-mail: [email protected] Fax: +1 352-378-4956

Abstract Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantication of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an oset that depends on the dPCR partition volume. These models are both versatile and easy to t using conventional statistical software. Covariates can be used to specify dierent sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the eects of these covariates. For purposes of illustation, we analyzed dPCR data from dierent kinds of experiments, including serial-dilution, evaluation of copy number variation, and quantication of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in

1

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 26

estimates of nucleic acid concentration or to detect dierences in concentration among treatments with prescribed levels of statistical power.

Introduction Detection of nucleic acid molecules for quantication and rare sequence detection is a fundamental tool in medical, agricultural and biological research elds.

Applications include

estimation of copy number variation, gene expression, next-generation sequencing library quantication, and rare variants.

These applications typically require assays that exploit

polymerase chain reaction (PCR) biochemistry, wherein nucleic acids are amplied to provide accurate detection and quantication. Two widely used methods for estimating nucleic acid concentrations are quantitative PCR (qPCR) and digital PCR (dPCR)

1,2

.

Digital PCR is relatively new and provides an absolute quantication of nucleic acid concentration without requiring an external calibration based on samples of known concentration (i.e., standard curve calibration).

In dPCR a sample containing nucleic acid molecules is

physically partitioned into a large number of reaction chambers or droplets. Each of these partitions is quite small (typically nanoliters to picoliters in volume), and the total number of partitions per sample can range from hundreds to millions depending on implementation and manufacturer

35

. Importantly, by distributing the sample among many partitions, some

partitions receive no target molecules, while others receive one to several molecules. Amplication of target molecules occurs separately and independently in each partition and is visualized using a uorescent reporter dye. Quantication of nucleic acid concentration is based on the assumption that target molecules are distributed uniformly in the sample and are thus allocated randomly among the partitions. The number of molecules per partition is assumed to have a Poisson distribution. Assays based on dPCR oer several advantages over those based on qPCR. First, dPCR provides an absolute quantication owing to the limiting-dilution of nucleic acid molecules

2

ACS Paragon Plus Environment

Page 3 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

and to the direct assessment of the presence and absence of target sequences.

This ap-

proach eliminates sources of uncertainty in qPCR assays, including those associated with the construction and application of an external standard curve based on samples of known concentration.

Digital PCR is more repeatable and more reproducible than qPCR both

among and within individual laboratories

68

. Only the number of partitions and the reac-

tion volume are needed to provide absolute quantication

9,10

. In contrast, qPCR requires

the analyst to select a quantication cycle or benchmark for constructing a standard curve, and dierent benchmarks can produce dierent results

1113

. Some benchmarks, such as the

widely used cycle threshold, are prone to an analyst's subjectivity, whereas others depend on and may vary with the instrument, the uorescent reporter dye, and the eciency of the assay

8,12

. Digital PCR is also less susceptible than qPCR to inhibitors due to the partitioning

of the sample

8,14

.

Estimates of nucleic acid concentration based on dPCR are more precise than those based on qPCR, even at very low copy numbers

10,1518

. The increased precision of dPCR is

especially relevant when copy number variation is used as a clinical diagnostic measure based on extracellular samples of genomic DNA

5,19

.

In the detection of rare-sequence variants,

dPCR has produced a signicant increase in precision and a more than 20-fold increase in accuracy compared to qPCR assays

20

. For example, dPCR has been used to detect variant

3

frequencies as low as 1 in 100,000 and a 1% dierence in chromosome copy number . While dPCR oers many advantages over other PCR-based assays, the statistical analysis and design of experiments using dPCR have received only limited attention. Dube et al.

21

proposed a statistical approach to estimate copy number variation from dPCR data that requires three steps: rst, the fraction of partitions that contain DNA is computed for both target and reference molecules to provide estimates of their probabilities of occurrence among the partitions; next, the copy numbers of target and reference molecules are estimated as a nonlinear function of each estimated occurrence probability; last, the ratio of these two estimated copy numbers is computed to provide an estimate of copy number variation. This

3

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 26

approach is rather indirect because the number (or concentration) of DNA molecules is not a formal parameter of the statistical model. Instead, the copy numbers of target and reference molecules  and the ratio of these copy numbers  are treated as functions of parameters of a simple binomial model of proportions. Therefore, the construction of condence intervals for these functions relies on the asymptotic normality of the sampling distribution of a binomial success parameter and on approximations used to estimate the uncertainty of the estimated copy number ratio

5,21,22

.

This approach to the analysis of dPCR data may give the impression that transformation of the observed proportion of partitions containing DNA into copy number (or concentration)

8

corresponds to nothing more than a Poisson correction of the data . However, this view is misguided because it regards the occurrence of DNA among partitions as the target of inference. Futhermore, it fosters unprincipled applications of top down statistical methods, wherein the transformed proportions are treated as normally distributed data and analyzed using linear models, such as analysis of variance or regression

5,23

.

To address these problems, we propose a class of statistical models for the analysis and design of a broad range of studies using dPCR. Concentrations of nucleic acids are formal parameters of these models, and we describe methods for directly estimating these parameters from dPCR data and for constructing asymptotically valid condence intervals. The class of statistical models that we describe can be used to analyze dPCR data obtained from multiple samples containing dierent concentrations of nucleic acids. Concentrations may dier among samples as a result of experimental design  as in serial dilutions

23

 or they can

dier because of natural sources of variation  as in surveys of environmental DNA copy number variation

26,27

. Our approach is similar to that of Kreutz et al.

28

14,24,25

or

, who proposed

a model for the analysis of data collected in multi-volume dPCR assays. However, the class of models that we propose is more general because covariates are used to specify dierent sources of variation in nucleic acid concentration and parameters are used to quantify the eects of these covariates. In fact, this class of models is mathematically equivalent to a specic type

4

ACS Paragon Plus Environment

Page 5 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

of generalized linear model

29

. We also show how this class of models can be used to select

sample sizes needed to achieve desired levels of precision in estimates of concentration or to detect dierences in concentration with prescribed levels of statistical power. To illustrate this approach with real data, we analyzed dPCR counts observed in several dierent kinds of experiments, including serial dilution of a solution of known DNA concentration (from a synthetic gene), evaluation of copy number variation in the myelocytomatosis oncogene (MYC), and quantication of gene expression (SDPR) in leukocytes.

Statistical models Observations from one sample Suppose a sample

S of DNA is analyzed using a dPCR assay.

In this assay the DNA molecules

are assumed to be uniformly distributed throughout the sample. This assumption implies that the location

s of each DNA molecule in the sample is random and can be modeled using

a Poisson point process of constant, rst-order intensity

λ 30,31 .

In the context of dPCR,

is the limiting, expected concentration (number of molecules per unit volume) at location That is, for a small region

ds

of volume

λ=

for Poisson point process

N 32 .

v(ds)

centered at

λ s.

s,

lim E[N (ds)]/v(ds)

v(ds)→0

The total number

is a Poisson random variable with expected value

N (S)

λ v(S),

of DNA molecules in the sample where

v(S)

is the sample volume.

Furthermore, if the sample is partitioned into a set of disjoint (physically separated) subsamples, say

S1 ∪ · · · ∪ SK = S ,

the number of molecules in each subsample also has a Poisson

distribution  more specically, subsample

Sk 30,31 .

N (Sk ) ∼ Poisson(λv(Sk )),

where

v(Sk )

is the volume of

In addition, the number of molecules in each subsample is independent

of the number in any other subsample.

5

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 26

The Poisson point process described above provides the theoretical basis for modeling data observed in dPCR experiments. Consider a data set in which each of

n

replicates of

a DNA sample is analyzed using a dPCR assay; thus, each replicate is physically divided into a large number of partitions (e.g., nanouidic reaction chambers or droplets), each of identical volume (i

= 1, . . . , n).

molecules.

v.

Let

mi

denote the number of partitions associated with the

ith replicate

Some partitions may contain no DNA, while others may contain one or several

However, because DNA molecules are assumed to be uniformly distributed in

the sample, we can assume that the number of molecules in a partition has a Poisson( λv ) distribution. That is, the expected number of DNA molecules in a partition depends on the concentration Let

Nij

λ

of the sample and on the volume

v

of the partition.

denote a Poisson random variable for the number of molecules present in the

partition of replicate

i.

The value of

Nij

j th

is not directly observable. Instead, once the PCRs

are completed, only presence or absence of molecules is observed in each partition; therefore, the quantity actually observed is a quantized version of the original number of molecules,

Zij = I(Nij > 0),

whose value indicates the presence ( Zij

j th

amplied DNA in the

partition of replicate

i.

= 1)

or absence (Zij

= 0)

of

The probabilities of these alternative

outcomes follow directly from the assumed distribution of

Nij ,

that is,

Pr(Zij = 1) = Pr(Nij > 0) = 1 − exp(−λv) Pr(Zij = 0) = Pr(Nij = 0) = exp(−λv)

In other words, we assume that

Zij

has a

Bernoulli(1−exp(−λv)) distribution.

Furthermore,

because the PCR within each partition is independent of the PCRs in other partitions, the number of partitions in the ith replicate that contain nucleic acid molecules, which we denote by the random variable

Yi =

Pmi

j=1

Zij ,

has a

Binomial(mi , 1 − exp(−λv)) distribution.

statistical model provides the basis for estimating the expected concentration acid in the sample or the expected number of copies

λv

in each partition.

6

ACS Paragon Plus Environment

λ

This

of nucleic

Page 7 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Suppose the values each of

n

(yi , mi )

i = 1, . . . , n

for

are observed by applying the dPCR assay to

replicates of a sample. These observations are made independently, so the joint

probability of the observed data is

 n  Y mi Pr(y1 , . . . , yn |λ) = [1 − exp(−λv)]yi [exp(−λv)]mi −yi y i i=1 Taking logarithms of both sides of this equation and omitting the combinatorial terms (which

λ)

do not involve

yields the following log-likelihood function for

`(λ) =

n X

λ:

yi log(1 − exp(−λv)) − (mi − yi )λv

i=1

= y log(1 − exp(−λv)) − (m − y )λv

where

y =

Pn

i=1

acid and where

yi

m =

observed among the

is the total number of partitions observed to contain targeted nucleic

Pn

i=1

n

mi

is the total number of partitions (with or without nucleic acid)

replicates. The value of

mi

may vary with

i

but it does not depend on the presence of nucleic acid; therefore,

(i.e., among replicates),

mi

can be treated as a

known constant. The maximum-likelihood estimator of maximizes

`(λ).

For this model

ˆ λ

λ,

which we denote by

ˆ, λ

is the value of

λ

that

can be expressed in closed form as follows:

ˆ = −(1/v) log(1 − y /m ) λ

This estimator implies that the dPCR assay will be informative of

λ

if the total number of

partitions (m ) exceeds the number that contain the target nucleic acid ( y ). If necessary, it is standard practice to dilute the sample to ensure that the proportion of partitions that contain nucleic acid (that is,

y /m )

lies between zero and one.

A consistent estimator of the uncertainty in

ˆ λ

may be derived by inverting the observed

7

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 26

information as follows:

ˆ [I(λ)]

−1

 −1 d2 ` y = − 2 |λ=λˆ = 2 dλ v m (m − y )

100(1 − α)%

One way of constructing a

λ

condence interval for

is based on the asymp-

totic normality of maximum likelihood estimators  that is, asymptotically

Normal(0, [I(λ)]−1 ),

where

2

d ` I(λ) = −E[ dλ 2]

denotes the expected (Fisher) information.

practice, condence intervals are computed using the observed information mate of

I(λ) because the true value of λ is unknown.

interval for

λ

ˆ − λ) ∼ (λ

For example, a

ˆ ± zα/2 [I(λ)] ˆ −1/2 , λ

may be computed as follows:

quantile of a standard normal distribution.

ˆ I(λ)

In

as an esti-

100(1 − α)% condence

where

zα/2

is the

If a maximum likelihood estimate

ˆ λ

(α/2)is close

to zero, this approach can produce condence intervals that include negative values of

λ.

To eliminate this possibility, an asymptotically equivalent interval may be constructed by inverting a likelihood-ratio test for the null hypothesis, pothesis, the

H1 : λ 6= λ0 .

In this test

H0

is accepted if

H0 : λ = λ0 ,

versus alternative hy-

ˆ ≤ χ2 , −2[`(λ0 ) − `(λ)] 1−α

(1−α)-quantile of a chi-squared distribution with one degree of freedom.

region of this test corresponds to a

100(1 − α)%

this interval are the strictly positive values of

λ0

condence interval for

λ.

where

χ21−α

is

The acceptance

The endpoints of

that satisfy the following equation:

ˆ + χ2 /2 = 0 `(λ0 ) − `(λ) 1−α

Because of this equation's nonlinearity, a numerical root-nding algorithm must be used to compute the condence limits, but the calculation is otherwise trivial. A second advantage of this procedure is that its condence limits are invariant to reparameterization of

λ; therefore,

the same limits will be obtained if the model is tted using any one-to-one, dierentiable transformation of

λ,

such as

The observed sums

n

y

θ = log(λ).

and

m

are obviously sucient for estimating the value of

replicates of a single sample; therefore, the information about

8

ACS Paragon Plus Environment

λ

contained in

n

λ

from

replicates

Page 9 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

of size

m ¯

(the sample mean) is mathematically equivalent to the information contained in

a single replicate of size

m

since

m = nm ¯.

The suciency of the statistic

(y , m )

for the

analysis of a single sample provides the theoretical justication for pooling observed counts among replicates, as opposed to using an arithmetic average of replicate-specic estimates of

λ to estimate concentration.

This latter approach, though used in many studies, should only

be considered if DNA concentrations are expected or can be shown to dier among replicates (e.g., due to pipetting errors

33

). In Observations from multiple samples we will describe a

deviance statistic that can be used to detect signicant sources variation among replicates of the same sample.

Calculating number of replicates for one sample As described earlier, the inverse of the expected information asymptotically valid expression for

ˆ . Var(λ)

2

d ` I(λ) = −E[ dλ 2]

yields an

For the model of dPCR data described in the

previous section, it is easily shown that the expected information is

I(λ) =

therefore, asymptotically

v 2 m exp(−λv) ; 1 − exp(−λv)

ˆ = [1 − exp(−λv)]/[v 2 m exp(−λv)]. Var(λ)

For the purpose of study design, assume that the number of partitions per replicate is constant (say,

m)

so that

function of sample size

n

m = nm.

The relative error,

ˆ , Var(λ)/λ

can be expressed as a

as follows:

ˆ Var(λ) 1 − exp(−λv) = λ nmv 2 λ exp(−λv)

Solving this equation for

n

given a xed value of

 n=

1 λvp

2

ˆ p = SE(λ)/λ =

1 − exp(−λv) m exp(−λv)

9

ACS Paragon Plus Environment

q ˆ Var(λ)/λ

yields

(1)

Analytical Chemistry

The right hand side of eq 1 provides an asymptotically valid expression for the number of replicates

n

needed to achieve a prescribed level of error

decreasing function of

λ

p.

For a xed value of

p, n

is a

(Figure 1); therefore, as might be expected, many replicates may

be needed to produce reliable estimates of

λ

at low concentrations of DNA.

0.10 0.15 0.20

350

300

Number of replicates

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 26

250

200

150

100

50

0.1

0.2

0.3

0.4

0.5

Concentration (molecules / uL)

Figure 1: Number of replicates needed to estimate nucleic acid concentration with relative errors of 10% (solid line), 15% (dashed line), or 20% (dotted line).

Observations from multiple samples In this section we extend the model described earlier to include observations from multiple samples containing dierent concentrations of a nucleic acid (DNA or RNA). Commonly encountered examples are observations from serial dilutions of an original sample and observations from dierent genes assayed for the purpose of estimating copy number variation or gene expression. We will illustrate these examples momentarily. For now, suppose as before that dPCR assay to each of

n

(yi , mi )

for

i = 1, . . . , n

are observed by applying the

replicates. We assume that the nucleic acid concentration of the

ith replicate can be formulated as a log-linear function of an observed vector of regressors xi 10

ACS Paragon Plus Environment

Page 11 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

and their eects

β

log(λi ) = x0i β ,

as follows:

where the prime symbol denotes the transpose

of a matrix or vector. Dierent codications of types of experiments or surveys.

xi

can be used to model data from dierent

For example, in experiments designed to estimate copy

number variation the value assigned to

xi

ith

determines whether the

replicate corresponds

to a reference or a target locus.

Binomial(mi , 1 − exp(−λi v))

As shown in the previous section, we can assume that the distribution provides the basis for estimating the concentration fore, the log-likelihood function for the parameter

`(β) =

n X

β

λi

of the ith replicate; there-

is

yi log(1 − exp(−λi v)) − (mi − yi )λi v

(2)

i=1

where

λi = exp(x0i β).

the value of

β

The maximum-likelihood estimator of

that maximizes

`(β).

Generally, the value of

form except for relatively simple models (such as

ˆ β

and of the observed information matrix

ˆ β

β,

which we denote by

ˆ, β

is

cannot be expressed in closed

log(λi ) = β , ∀i).

In most cases the value of

ˆ must be computed using numerical methods I(β)

of optimization (see Appendix S1); however, these calculations are relatively straightforward with the aid of modern computers. In fact, the model that we have described is mathematically equivalent to a generalized linear model log-log link function and an oset equal to

29

of binomial responses with a complementary

log(v).

To see this, let

η(p) = log(− log(1 − p))

denote the complementary log-log link function for a binomial parameter the binomial parameter for the for

p

ith observation is 1 − exp(−λi v).

η(1 − exp(−λi v)) = log[− log(exp(−λi v))]

= = x0i β + log(v)

11

ACS Paragon Plus Environment

In our model

Subtituting this expression

yields

= log(λi v)

p.

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Thus,

x0i β

model.

is the linear predictor and

log(v)

Page 12 of 26

is the additive oset of the generalized linear

The implication of this result is that any statistical software program capable of

tting generalized linear models can be used to t the model that we have described. We illustrate these calculations in the Supporting Information using the statistical software program Inferences for

β

34

glm

function of the R

.

are based on the asymptotic multivariate normality of the maximum-

likelihood estimator  that is,

ˆ − β) ∼ Normal(0, [I(β)]−1 ), where I(β) is estimated using (β

the observed information matrix the deviance statistic,

ˆ . I(β)

The adequacy of a model can be assessed using

ˆ − `s ), D = −2(`(β)

which measures the dierence between the log-

likelihood of the tted model and log-likelihood of a saturated model. The value of be calculated by substituting

λi = −(1/v) log(1 − yi /mi )

The asymptotic distribution of

χ2ν ),

where

q

D

is chi-squared with

β.

can

into the right hand side of eq 2.

ν = n−q

is the number of estimated parameters in

`s

degrees of freedom (that is,

Therefore, a

p-value

for a model's

Pr(χ2ν > D).

goodness-of-t may be obtained by computing

The model that we have described is quite general and can be used to compute inferences for many scientically relevant functions of

β.

The following examples provide illustrations

of this approach.

Example 1: modeling serial dilutions Strictly speaking, serial dilutions of an original sample of unknown concentration or of a standard solution containing a known concentration are not necessary in dPCR. However, these dilutions can be used to verify that estimated concentrations conform to those expected given a prescribed set of dilutions

23,28

.

As an example, suppose a series of three

d-fold

dilutions of an original sample of un-

known concentration are used to create four samples for a dPCR analysis. concentrations of these samples in order of increasing dilution are and

(1/d)3 .

Let

λ0

The relative

(1/d)0 , (1/d)1 , (1/d)2 ,

denote the unknown concentration of the original (undiluted) sample.

12

ACS Paragon Plus Environment

Page 13 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Given these denitions, the unknown concentration of the where

rj = (1/d)j (j = 0, 1, 2, 3).

to construct a regressor in our model because

x = (1, log(rj ))0

λj

This expression for

sample equals

indicates that

log(rj )

log(λj ) = log(λ0 ) + log(rj ).

is used as a regressor, then the vector

ters: one is the parameter of interest ( β0

j th

= log(λ0 ));

β = (β0 , β1 )0

λj = λ0 rj ,

may be used

Specically, if

contains two parame-

the other is the eect of each dilution

on sample concentration ( β1 ). In principle, the value of

β1

should equal one; however, we

can estimate this parameter to verify the accuracy of the dilutions and the performance of the dPCR assay at dierent concentrations. Once

β

has been estimated, the concentration of each sample (diluted or undiluted) also

can be estimated. Let

x

denote the regressor associated with a particular sample. An esti-

mate of the logarithm of this sample's concentration is valid estimate of its variance is

ˆ = x0 Vˆ x, d Var(log( λ))

ˆ = x0 β ˆ, log(λ) where

and an asymptotically

ˆ −1 Vˆ = [I(β)]

is the inverse of

the observed information matrix. These results imply that the concentration of the sample (on the arithmetic scale) may be estimated using condence interval for is the

(α/2)-quantile

λ may be

ˆ = exp(x0 β) ˆ λ

computed as follows:

and that a

100(1 − α)%

ˆ ± zα/2 (x0 Vˆ x)1/2 ], exp[x0 β

where

zα/2

of a standard normal distribution.

Note that while this condence

ˆ, λ

it is restricted to strictly positive

interval may be asymmetrical relative to the value of concentrations.

Example 2: estimating copy number variation Copy number variations (CNVs) stem from changes in genomic DNA (i.e., deletions, duplications, or structural rearrangements) that produce an abnormal number of copies of a particular sequence (usually, exceeding 500 bp

21

). Therefore, CNVs are evident in sequences

that deviate from the standard denition of two copies of a locus per diploid genome. To quantify the CNV of an individual, the number of copies of a target locus that may have undergone abnormal changes in DNA is compared with the number of copies of a normal, reference locus. The ratio of DNA concentrations between target and reference loci therefore

13

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 26

reects the amount of copy number variation per haploid genome of the target locus. For simplicity, this ratio is often called the copy number ratio . Suppose an individual's DNA is collected for the purpose of estimating concentrations of target and reference genes. Assume also that these two genes are amplied in separate PCRs using separate primers and probes so that independence can be assumed among all replicates. To specify the values of regressors for this experiment, assign the target gene and assign

x = (1, 0)0

assignments, the parameters

β1

and

x = (0, 1)0

to each replicate of the reference gene. Given these

β0

correspond to the log-scaled concentrations of DNA

for target and reference genes, respectively, and their copy number ratio where

c = (−1, 1)0 .

to each replicate of

R

equals

exp(c0 β),

To see this, note that

R = exp(c0 β) = exp(−β0 + β1 ) = λ1 /λ0

Once

β

and, a

has been estimated, the copy number ratio may be estimated using

100(1 − α)%

zα/2 (c0 Vˆ c)1/2 ],

where

condence interval for

ˆ −1 Vˆ = [I(β)]

R

ˆ, ˆ = exp(c0 β) R

may be computed as follows:

ˆ ± exp[c0 β

is the inverse of the observed information matrix. No-

tice that this computational approach is identical to that used in Example 1: modeling serial dilutions.

compute

The only dierence is the linear combination of the parameters of

β

used to

R.

Example 3: planning experiments to detect copy number variation Suppose an individual's copy number variation is expected to lie within some plausible range of values, and we wish to plan an experiment to estimate the actual copy number ratio. As in the previous example, let

R = λ1 /λ0

denote this copy number ratio.

In planning the

experiment, we may want to determine the probability of detecting a particular value of

14

ACS Paragon Plus Environment

R

Page 15 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

given a xed number of replicates per gene; alternatively, we may want to determine the

R.

number of replicates per gene needed to detect a particular value of

Both objectives

R

require us to calculate the statistical power of detecting a particular value of prescribed level of statistical signicance

α.

To specify the design problem more concretely, note that null hypothesis,

H1 : λ0 6= λ1 .

H0 : λ0 = λ1 ,

and that

R 6= 1

R = 1

H1

corresponds to the

corresponds to an alternative hypothesis,

Other alternative hypotheses are possible (such as

power calculations that we develop for

given a

R>1

or

R < 1),

but the

can be used in those cases too (see Appendix S2).

Therefore, for now we consider the original hypotheses, which can be stated in terms of our model's parameters as follows: statistic

Λ

H0

for comparing

H0 : β = β0 ,

and

H1

and

H1 : β = (β0 , β1 )0 .

The likelihood-ratio

equals

Λ = −2(`(βˆ0 ) − `(βˆ0 , βˆ1 ))

If

H0

is true, then asymptotically

one degree of freedom. If

H1

Λ has a central, chi-squared distribution parameterized by

is true, then asymptotically

Λ

has a non-central, chi-squared

distribution parameterized by one degree of freedom and noncentrality the magnitudes of of concluding

H1

β0

and

β1

given that

(and therefore

H1

λ0

and

λ1 ).

calculate the power of detecting a particular copy number ratio

replicates of each gene and on the magnitudes of

λ0

φ,

and

which depends on

Statistical power is the probability

is true; therefore, we must evaluate

This calculation requires the noncentrality parameter

φ,

Pr(Λ > χ21−α |H1 true) to

R = λ1 /λ0

when

H1

is true.

which depends on the number of

λ1

(or

R)

(see Appendix S2).

In Figure 2 we illustrate how power calculations can be used to solve two kinds of design problems. First, we illustrate how the statistical power varies over a range of copy number ratios given a xed number of replicates (5 or 10) per gene. assumed

λ0 = 1.5

partition, and

molecules/µL,

α = 0.05.

m =

For these calculations, we

15000 partitions per replicate,

v = 0.91

Note that the power of detecting a copy number ratio

15

ACS Paragon Plus Environment

R

nL per

increases

Analytical Chemistry

a 1.0

Power

0.8 0.6 0.4 0.2 0.0

No. replicates

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 26

0.6

0.8

1.0

1.2

1.4

0.6

0.8

1.0

1.2

1.4

b c

200 150 100 50 0

Copy number ratio

Figure 2: Design of copy number variation experiments. a. Statistical power of detecting copy number ratio given 5 (dashed line) or 10 (solid line) replicates per gene. b. Number of replicates needed to detect the copy number ratio given xed statistical power of 0.80 (dashed line) or 0.95 (solid line).

with

|R − 1|

and with the number of replicates (Figure 2a). We also calculated the number

of replicates per gene needed to achieve a xed statistical power (0.80 or 0.95) as a function of copy number ratio. Note that the number of required replicates declines with

|R − 1|

and

increases with power (Figure 2b).

Analysis of dPCR Data In this section we illustrate how the class of statistical models described earlier may be used to analyze dPCR counts observed in dierent kinds of experiments. The data and statistical software

34

used in these analyses are provided in the Supporting Information.

16

ACS Paragon Plus Environment

Page 17 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Serial dilution of a synthetic gene We prepared serial dilutions of a synthetic gene as described by Hunter et al.

35

.

Briey,

we used a Burmese python ND4 mitochondrial assay to analyse a 133 bp vector-replicated synthetic gene (Integrated DNA technologies, IDT, Coralville, Iowa). The plasmid containing the gene was linearized using the EcoRV-HF restriction enzyme (New England BioLabs, Ipswich, MA) and quantied on a Qubit 2.0 uorometer (Invitrogen, Life Technologies, CA, USA). Using standard procedures, the digital droplet QX200 (Bio-Rad, CA, USA) was used to determine numbers of droplets with or without DNA for each of six fourfold serial dilutions (10 replicates per dilution). We used the known concentrations of these dilutions (1.54, 6.14, 24.57, 98.29, 393.15, and 1572.59 copies/ µL) to construct a regressor for each replicate. Similar to the approach described in Example 1: modeling serial dilutions , let the

j th

sample, and let

x = (1, log(cj ))0

includes an intercept parameter

β0

cj

denote the known concentration of

be a regressor. Given these choices, the vector

and a slope parameter

β1 .

β

If the serial dilutions were

prepared correctly and if all of the DNA in each sample was available to be amplied, then in principle the intercept and slope parameters should equal zero and one, respectively. Fitting the model to the data observed in the experiment yielded estimates that were close to these

ˆ0 values (β

= 0.046 (SE = 0.016) and βˆ1 = 0.961 (SE = 0.002)).

In Figure 3 the regression line

based on these estimates is plotted with replicate-specic estimates of DNA concentration.

Copy number variation of MYC gene A mutated version of the regulator gene MYC is found in many cancers; therefore, copy number variation (CNV) in MYC can be used as a clinical diagnostic for the detection of these cancers.

As an example, the copy number ratio between MYC and a reference

(housekeeping) locus on Chr4 can be used to quantify levels of CNV in MYC. We used duplicate dPCR assays for each of seven patients (data retrieved from QuantaSoft, BioRad, CA,USA) to estimate the copy number ratio between MYC and Chr4 for each patient

17

ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Estimated concentration (copies / uL)

Analytical Chemistry

Page 18 of 26

1000 500

100 50

10 5

1 1

5

10

50

100

500

1000

Standard concentration (copies / uL)

Figure 3: Comparison of estimated and known concentrations of standards prepared by serial dilution of a synthetic gene. Line indicates the model-based prediction of concentration. Axes are logarithmically scaled.

using the model described in Example 2: estimating copy number variation . Copy number variation in MYC was evident in all but one patient (patient CN2) (Table 1).

Although

one patient had a lower than normal copy number ratio (patient CN1), the copy number ratios of most patients were signicantly higher than one, suggesting that these patients had duplicated MYC genes. Table 1: Estimated copy number ratio

ˆ R

between MYC and Chr4 genes for seven patients.

Columns labeled 2.5% and 97.5% contain limits of 95% condence intervals for labeled GOF contains

p-values

for assessing model goodness-of-t.

Patient ID

ˆ R

2.5%

97.5%

GOF

CN1

0.477

0.458

0.497

0.924

CN2

1.001

0.971

1.032

0.390

CN3

1.493

1.452

1.535

0.186

CN5

2.513

2.445

2.583

0.868

CN6

3.082

2.981

3.187

0.168

CN14

6.907

6.695

7.125

0.275

CN16

8.158

7.944

8.379

0.320

18

ACS Paragon Plus Environment

R.

Column

Page 19 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Expression of SDPR gene Several experiments were conducted to determine the eects of cell isolation methods on gene expression in leukocytes

36

. We analyzed results from one of these experiments wherein the

mRNA concentrations of the target gene SDPR and reference gene RPL27 were estimated for each of three methods of cell isolation: negative selection, positive selection, and uorescenceactivated cell sorting (FACS). In this experiment dPCR was used to measure concentrations of mRNA extracted from the monocytes of ve healthy donors. To analyze the dPCR data from this experiment, we used regressors to indicate the gene (SPDR vs. RPL27) and cell isolation method of each replicate. Specically, we let the gene (u

= 1 for SDPR; u = 0 for RPL27), v

for positive selection;

w = 0

otherwise).

v=0

otherwise), and

w

u

specify

specify the positive selection method ( v specify the FACS method ( w

Given these denitions, we used the regressor

=1

=1

for FACS;

x = (1, u, v, w, uv, uw)0

to t a model whose parameters allowed the expression of SDPR to be estimated for each cell isolation method.

For example, expression of SDPR is dened by the ratio between

mRNA concentrations of SDPR and RPL27. Given our choice of regressor

x,

the logarithm

of this ratio equals a linear combination of the model's parameters. Specically, and

β1 + β5

β1 , β1 + β4 ,

denote the logarithm of SDPR expression for cell isolation methods of negative

selection, positive selection, and FACS, respectively. In general, once the model's parameters have been estimated, gene expression

R

can be estimated using

ˆ, ˆ = exp(c0 β) R

where

c

is

the vector of integers needed to form a specic linear combination of model parameters. A

ˆ ±zα/2 (c0 Vˆ c)1/2 ], where 100(1−α)% condence interval for R can be computed using exp[c0 β ˆ −1 Vˆ = [I(β)]

is the inverse of the observed information matrix.

This experiment included replicate observations from each of ve individuals (i.e., repeated measures); therefore, we extended the model to account for dierences in concentration among individuals that were not associated with covariate eects, and we estimated the magnitude of these additional components of variance (see Appendix S3). The results of our analysis indicated that SPDR expression levels diered signicantly among the three cell iso-

19

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ˆ R

Table 2: Estimated gene expression (ratio

Page 20 of 26

between mRNA concentrations of SDPR and

RPL27) for dierent methods of cell isolation.

Columns labeled 2.5% and 97.5% contain

limits of 95% condence intervals for gene expression. Method of

ˆ R

2.5%

97.5%

Negative selection

63.08

34.91

114.00

Positive selection

1.01

0.56

1.83

FACS

0.29

0.16

0.54

cell isolation

lation methods (Table 2). The dierences in expression that we estimated were qualitatively consistent with those reported previously

36

; however, we also detected substantial varia-

tion in mRNA concentrations among individuals that was not associated with cell isolation methods (σ ˆRPL27

= 0.27, σ ˆSPDR = 0.63).

These individual-level dierences in mRNA con-

centration were at least partially responsible for the relatively low precision of our estimates of SPDR expression.

Discussion In this article we have described and illustrated a class of statistical models for the analysis and design of a broad range of studies based on dPCR. Our approach is more direct and versatile than previously proposed methods of analysis

5,21

and has led to an improved un-

derstanding of those methods. For example, we have provided a theoretical justication for the pooling of observed counts among replicates of a sample; thus, the statistical methods proposed by Dube et al.

21

and Whale et al.

5

will yield asymptotically valid condence in-

tervals for copy number ratios provided that these methods are applied to pooled counts of reference and target molecules. Estimating copy numbers for individual replicates (that is, using unpooled data) and then averaging replicate-specic copy number ratios, while common practice, does not provide a valid estimate or condence interval for the true ratio of copy numbers. However, this practice may be needed if DNA concentrations are expected or can be shown to dier among replicates of the same sample (say, due to pipetting errors

20

ACS Paragon Plus Environment

33

).

Page 21 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

The class of statistical models that we have described honors the mean-variance relationship of the binomial counts observed in dPCR experiments. This aspect of the model is important because it allows the precision of an estimate of nucleic acid concentration to be computed correctly so that condence intervals for making inferences about concentration or copy number are valid. In contrast, many published analyses of dPCR data ignore the binomial nature of the observations. In these analyses the fraction of partitions containing the target molecule is transformed into copy number (or concentration), and this transformed quantity is analyzed as if it was normally distributed (say, using analysis of variance or regression).

We believe this approach should be abandoned in favor of the more-principled

methods of analysis that we have described. Replication is vital in experiments using dPCR because it provides evidence of repeatability. Replication also provides a means to assess whether a set of dPCR data conforms to model assumptions based on the Poisson distribution of nucleic acid molecules. We showed that the deviance statistic, which is based on the likelihood function of our model, may be used in formal tests of goodness-of-t. To our knowledge, assessments of model adequacy, though obviously important, have not been completed in previous analyses of dPCR data. We conclude by noting that our approach to the analysis of dPCR data is easy to implement. By providing R code

34

for analyzing all of the data sets described in this paper,

we have contributed to the growing collection of R software used in the analysis of dPCR data

37

.

Our approach is both versatile and easy to implement, and we anticipate its use

in the analysis of data from many kinds of experiments that require quantication of nucleic acids. Analysis of gene-expression experiments is a particularly important application wherein we extended the generalized linear model to account for individual-level dierences in expression not associated with eects of experimental treatments.

21

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 26

Acknowledgement We thank John Buttereld and Gaia Meigs-Friend (USGS) for assistance with laboratory analysis. We thank Nadejda Beliakova-Bethell for providing gene expression data from an experiment designed to estimate eects of dierent cell isolation methods. The comments of an anonymous reviewer improved an earlier draft of our paper. Any use of trade, product, or rm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Supporting Information Available Appendices, data les, and R code

34

described in the text.

This material is available free of charge via the Internet at

http://pubs.acs.org/ .

References (1) Heid, C. A.; Stevens, J.; Livak, K. J.; Williams, P. M. Genome Research

1996,

6,

986994.

(2) Vogelstein, B.; Kinzler, K. W. Proceedings of the National Academy of Sciences, USA

1999, 96, 92369241. (3) Heyries, K. A.; Tropini, C.; VanInsberghe, M.; Doolin, C.; Petriv, O. I.; Singhal, A.; Leung, K.; Hughesman, C. B.; Hansen, C. L. Nature Methods

(4) Hindson, B. J. et al. Analytical Chemistry

2011, 8, 649651.

2011, 83, 86048610.

(5) Whale, A. S.; Huggett, J. F.; Cowen, S.; Speirs, V.; Shaw, J.; Ellison, S.; Foy, C. A.; Scott, D. J. Nucleic Acids Research

2012, 40, e82.

(6) Hayden, R. T. et al. Journal of Clinical Microbiology

2008, 46, 157163.

22

ACS Paragon Plus Environment

Page 23 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(7) Hayden, R. T.; Gu, Z.; Ingersoll, J.; Abdul-Ali, D.; Shi, L.; Caliendo, A. M. Journal of Clinical Microbiology

2013, 51, 540546.

(8) Huggett, J. F.; Foy, C. A.; Benes, V.; Emslie, K.; Garson, J. A.; Haynes, R.; Hellemans, J.; Kubista, M.; Mueller, R. D.; Nolan, T.; Pfa, M. W.; Shipley, G. L.; Vandesompele, J.; Wittwer, C. T.; Bustin, S. A. Clinical Chemistry

2013, 59, 892902.

(9) Bhat, S.; Herrmann, J.; Armishaw, P.; Corbisier, P.; Emslie, K. R. Analytical and Bioanalytical Chemistry

2009, 394, 457467.

(10) Sanders, R.; Huggett, J. F.; Bushell, C. A.; Cowen, S.; Scott, D. J.; Foy, C. A. Analytical Chemistry

2011, 83, 64746484.

(11) Ruijter, J. M.; Pfa, M. W.; Zhao, S.; Spiess, A. N.; Boggy, G.; Blom, J.; Rutledge, R. G.; Sisti, D.; Lievens, A.; De Preter, K.; Derveaux, S.; Hellemans, J.; Vandesompele, J. Methods

2013, 59, 3246.

(12) Tellinghuisen, J.; Speiss, A.-N. Analytical Biochemistry

2014, 449, 7682.

(13) Tellinghuisen, J.; Speiss, A.-N. Analytical Biochemistry

2014, 464, 94102.

(14) Doi, H.; Takahara, T.; Minamoto, T.; Matsuhashi, S.; Uchii, K.; Yamanaka, H. Environmental Science and Technology

2015, 49, 56015608.

(15) Lun, F. M. F.; Chiu, R. W. K.; Allen C., K. C.; Leung, T. L.; Lau, T. K.; Lo, Y. M. D. Clinical Chemistry

2008, 54, 16641672.

(16) Tadmor, A. D.; Ottesen, E. A.; Leadbetter, J. R.; Phillips, R. Science

2011, 333, 5862.

(17) White 3rd, R. A.; Quake, S. R.; Curr, K. Journal of Virological Methods 4550.

23

ACS Paragon Plus Environment

2012,

179,

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 26

(18) Devonshire, A. S.; Honeyborne, I.; Gutteridge, A.; Whale, A. S.; Nixon, G.; Wilson, P.; Jones, G.; McHugh, T. D.; Foy, C. A.; Huggett, J. F. Analytical Chemistry

2015,

87,

37063713.

(19) Pretto, D.; Maar, D.; Yrigollen, C. M.; Regan, J.; Tassone, F. Clinical Chemistry

2015,

61, 182190.

(20) Strain, M. C.; Lada, S. M.; Luong, T.; Rought, S. E.; Gianella, S.; Terry, V. H.; Spina, C. A.; Woelk, C. H.; Richman, D. D. PLoS One

(21) Dube, S.; Qin, J.; Ramakrishnan, R. PLoS One

2013, 8, e55943.

2008, 8, e2876.

(22) Whale, A. S.; ; Cowen, S.; Foy, C. A.; Huggett, J. F. PLoS One

2013, 8, e58177.

(23) Pinheiro, L. B.; Coleman, V. A.; Hindson, C. M.; Herrmann, J.; Hindson, B. J.; Bhat, S.; Emslie, K. R. Analytical Chemistry

2012, 84, 10031011.

(24) Nathan, L. M.; Simmons, M.; Wegleitner, B. J.; Jerde, C. L.; Mahon, A. R. Environmental Science and Technology

2014, 48, 1280012806.

(25) Doi, H.; Uchii, K.; Takahara, T.; Matsuhashi, S.; Yamanaka, H.; Minamoto, T. PLoS One

2015, 10, e0122763.

(26) Iafrate, A. J.; Feuk, L.; Rivera, M. N.; Listewnik, M. L.; Donahoe, P. K.; Qi, Y.; Scherer, S. W.; Lee, C. Nature Genetics

(27) Sebat, J. et al. Science

2004, 36, 949951.

2004, 305, 525528.

(28) Kreutz, J. E.; Munson, T.; Huynh, T.; Shen, F.; Du, W.; Ismagilov, R. F. Analytical Chemistry

2011, 83, 81588168.

(29) McCullagh, P.; Nelder, J. A. Generalized linear models, second edition ; Chapman and Hall: London, 1989.

24

ACS Paragon Plus Environment

Page 25 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(30) Møller, J.; Waagepetersen, R. P. Statistical inference and simulation for spatial point processes ; Chapman & Hall: Boca Raton, 2004.

(31) Illian, J.; Penttinen, A.; Stoyan, H.; Stoyan, D. Statistical analysis and modelling of spatial point patterns ; John Wiley & Sons: West Sussex, England, 2008.

(32) Cressie, N.; Wikle, C. K. Statistics for spatio-temporal data ; John Wiley & Sons: Hoboken, New Jersey, 2011.

(33) Jacobs, B. K. M.; Goetghebeur, E.; Clement, L. BMC Bioinformatics

2014, 15, 283.

(34) R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2015.

(35) Hunter, M. E.; Oyler-McCance, S. J.; Dorazio, R. M.; Fike, J. A.; Smith, B. J.; Hunter, C. T.; Reed, R. N.; Hart, K. M. PLoS One

2015, 10, e0121655.

(36) Beliakova-Bethell, N.; Massanella, M.; White, C.; Lada, S. M.; Du, P.; Vaida, F.; Blanco, J.; Spina, C. A.; Woelk, C. H. Cytometry Part A

2014, 85A, 94104.

(37) Rödiger, S.; Burdukiewicz, M.; Blagodatskikh, K.; Jahn, M.; Schierack, P. The R Journal

2015, 7, 127150.

25

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical TOC Entry PCR

26

ACS Paragon Plus Environment

Page 26 of 26