Bias and Imprecision in Analysis of Real-Time Quantitative

Aug 3, 2015 - Bias and Imprecision in Analysis of Real-Time Quantitative Polymerase Chain Reaction Data. Joel Tellinghuisen ,. Department of Chemistry...
1 downloads 13 Views 957KB Size
Page 1 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Bias and Imprecision in the Analysis of Real-Time Quantitative Polymerase Chain Reaction Data

Joel Tellinghuisen,* Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States Andrej-Nikolai Spiess Department of Andrology, University Hospital Hamburg-Eppendorf, 20246 Hamburg, Germany

Corresponding Author Email: [email protected]. Phone: (615) 322-4873. Fax: (615) 343-1234

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

-2ABSTRACT Monte Carlo simulations are used to examine the bias and loss of precision that result from experimental error and analysis procedures in real-time quantitative PCR. In the limit of small copy numbers (N0), Poisson statistics govern the dispersion in estimates of the quantification cycle (Cq) for replicate experiments, permitting the estimation of N0 from the Cq variance, which is inversely proportional to N0. We derive corrections to expressions given previously for this determination. With increasing N0, the Poisson contribution decreases and other effects, like pipette volume uncertainty (typically > 3%) dominate. Cycle-to-cycle variability in the amplification efficiency E produces scale dispersion similar to that for variability in the sensitivity of fluorescence detection. When this E variability is proportional to just the amplification (E1), there is insignificant effect on Cq if scale-independent definitions are used for this marker. Single-reaction analysis methods based on the exponential growth equation are inherently low-biased in E and high-biased in N0, and these biases can amount to factor-of-4 or greater error in N0. For estimating Cq their greatest limitation is use of a constant absolute threshold, making them inefficient for data that exhibit scale variability.

Key Words:

qPCR, data analysis, nonlinear least squares, Monte Carlo, variance analysis,

absolute copy number

ACS Paragon Plus Environment

Page 2 of 24

Page 3 of 24

-3-

The goal of real-time quantitative polymerase chain reaction (qPCR) experiments is the determination of the number N0 (copy number) of molecules of a specific genetic material in a sample. This is accomplished by highly amplifying the target material through a thermal cycling process of denaturation, annealing, and elongation, in which the number of amplicons N ideally doubles in each cycle.1 In conventional qPCR, this amplification is usually monitored by fluorescence1,2 and occurs largely undetected, because in early cycles the target fluorescence is buried in background fluorescence in what is called the baseline region (Figure 1). When the target signal (y) emerges from the background and enters the growth phase, typically after 10-30 cycles, the amplification efficiency E begins to diminish. Within another ~15 cycles the signal saturates, as the system enters the plateau region, where no further amplification occurs. y = (eq 2) Value 1.0e-06

Error 4.7e-08

ymax 5000 E0 1.9000 d 5.e-06 Chisq 6.7181e-09 R 1.00000

2.9 0.0025 1.9 NA NA

Fluorescence signal (arbitrary units)

y0

ymax d FDM SDM

y = (eq 3') Value Error 5000 2.9 5e-06 1.9 34.794 0.0033 32.742 0.0051

2.0

5000 Plateau 4000

1.5 E

3000

Growth 1.0

2000

LRE model

1000

FDM Cy0

Baseline

y = (eq 3") Value Error 0.5 34.794 0.0033 31.678 0.0068

0 20

25

30

35

40

45

Amplification Efficiency

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0.0

cycle

Figure 1:

Synthetic qPCR amplification profile and corresponding amplification efficiency (scale right), for the logistic model that is used in the present MC simulations. The three regions of interest are labeled. The fit results are obtained fitting cycles 2545 using eq 2 (top left) and two version of eq 3, in which k is replaced by the SDM (top right) and Cy0, using eqs 5 and 6. The values of "Chisq" (2) and R in the first of these are the result of fitting exact data and are omitted from the other two boxes. The parameter SEs ("Error") are for constant data error having y = 4.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 24

-4-

A number of methods for estimating N0 from qPCR data have been proposed in the last 15 years.3-8 These have typically been demonstrated on specific datasets, sometimes with comparisons against other methods. However, there has been virtually no purely statistical testing to address questions of inherent bias and imprecision. Such questions can be addressed through Monte Carlo simulations — the topic of the present contribution. The traditional method for quantification in qPCR has been calibration with reactions having known N0 spanning an appropriate concentration range.3,4 To use such methods, one must establish a benchmark on the qPCR curves to relate to the known N0. This quantification cycle Cq can be defined a number of ways,5,9,10 most simply as the "crossing point" or “threshold cycle” (noninteger, often designated as Ct) where the signal achieves a specified level yq. Under the well-supported assumption that y is proportional to N, N = Nq when y = yq. Then, assuming that up to cycle x = Cq the system follows the exponential growth equation, N = N0 Ex,

(1)

we have Nq = N0 ECq; and the dependence of Cq on log N0 provides the desired calibration relation. If the amplification profiles have the same shape for all N0, other benchmarks, like the first derivative maximum (FDM), can be used as well or even better in calibration,10,11 even though eq 1 no longer holds. This is because with fixed curve shape there is a constant shift Cq between the FDM and any smaller benchmark cycle where eq 1 remains valid. Calibration procedures are needed because E is often 3-10% smaller than its ideal value of 2. For reference, the effect of a 5% diminution of E is a factor of ~4 over 25 cycles. In recent years, much effort has been dedicated to what might be considered the "holy grail" of qPCR: the determination of N0 from data for a single reaction.12-16 These methods typically employ eq 1 (in its y form) and attempt to estimate E and y0 (and hence N0) from the data in the early growth phase. The resulting E estimates are a compromise between imprecision and bias  from restricting the range of analyzed data to just the earliest growth cycles vs. exceeding the range where eq 1 holds. The simplest model that acknowledges the decline of E in

ACS Paragon Plus Environment

Page 5 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

-5the growth phase is perhaps the LRE (“linear regression of efficiency”) model.17 Here E is a linear function of y, declining from E0 in the baseline region to 1 in the plateau; specifically, if Ex is taken as yx/yx1, Ex declines linearly with yx. However, the dependence of Ex on cycle number is sigmoidal, since yx  0 (giving Ex  E0) throughout the baseline region (see Figure 1). The signal y is given by17,18 y 0 y max E0x y y 0 E0x  y max  y 0

(2)

where ymax is the plateau amplitude. y0 is several orders of magnitude smaller than ymax; if we neglect it in the last term in the denominator we can re-express the relation as y max y max y  y 1  exp(( x1 / 2  x) / k ) 1  max E0 x y0

(3)

where the second expression on the right-hand side is related to the first through E0 = exp(1/k) and E0x1 / 2 = ymax/y0. In fact, least-squares (LS) fitting to this second version of eq 3 was employed in qPCR analysis before the linear relationship between E and y was recognized.19 The logistic function in eq 3 is sigmoidal and symmetrical about x1/2 (which is also the FDM); and qPCR data often fail to follow this symmetry, as evidenced by their much better fit to modified versions of this and the similar log-logistic relation that contain an additional parameter for asymmetry.20 In this situation, analysis with eq 3 gives results for E0 and y0 that vary with the range of cycles included in the LS fit. Further, even when this range dependence is not significant, analysis by both eqs 3 and 1 assumes that the amplification efficiency is constant in the baseline region — at E for methods based on eq 1, E0 for eq 3. Thus, if E varies in this region, as has been indicated in some studies,10,21 the reliability of the analysis is further compromised. All of these methods can estimate y0 directly from the LS fit of the data, and in doing so they also estimate E, either directly (eqs 1 and 2) or tacitly (2nd expression in eq 3). Most have been implemented as multistep procedures involving separate estimation of the baseline,

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

-6efficiency, and y0.5 There are also "mechanistic" models that estimate y0 without estimating E.6,22 However, these tacitly set E = 2 in the baseline region,23 so they yield lower bounds on N0. These single-reaction methods are sometimes called “calibration-free,” since they involve no standard curve. However, they do require at least one calibration constant, to relate fluorescence intensity to the amount of genetic material (y0 to N0). The truly calibration-free methods rely on a statistical analysis of replicate results for small N0, where the Poisson distribution governs observational frequencies. In the realm of conventional qPCR, one such method is “limiting dilution assay” (LDA),17 in which with sufficient dilution, the amplification profiles separate into curves that can be assigned to N0 = 1, 2, 3, … as well as some showing no amplification, hence belonging to N0 = 0. Digital PCR methods (dPCR) require dilution to an average N0 well below 1, so that most reactions will be nulls and only a few will have N0 > 1.2427

Recently we described a Poisson-based method appropriate for the copy number range ~10-

200 and doable with conventional qPCR instrumentation.8 The analysis used error propagation to treat the anomalous dispersion of Cq seen in replicate experiments for small N0.28-30 A method based on the same principle but fitting Cq distributions in place of variance analysis was also published recently.7 In the present work we use nonlinear LS (NLS) analysis of Monte Carlo (MC) simulated qPCR data to investigate imprecision and bias in qPCR analysis methods. With respect to our Cq variance (Cq2) method,8 we find that (1) the suggested corrections to the expression relating

Cq2 to N0 are warranted, (2) cycle-to-cycle E variability produces scale variability in the amplification profiles but contributes negligibly to Cq2 when scale-independent Cq benchmarks are used, and (3) the contributions to Cq2 from estimation uncertainty, pipette volume uncertainty, and Poisson uncertainty are additive. For the single-reaction methods, we find that (1) eq 1-based methods give low-biased estimates of E and high-biased estimates of y0; and (2) the use of an absolute threshold to define Cq yields significant precision losses with data

ACS Paragon Plus Environment

Page 6 of 24

Page 7 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

-7-

exhibiting scale variability. We illustrate these results on several large-scale replicate datasets in the literature. •

MATERIALS AND METHODS Computational Methods. The LS calculations described here were done using methods

like those summarized in our other works on qPCR data analysis8,10,23 — the KaleidaGraph program (KG, Synergy Software) for single data sets,31 and FORTRAN and R codes for processing large replicate data sets.32,33 To generate Gaussian and Poisson random deviates, we employed routines provided by Press, et al.,34 or similar. Most of our MC runs included 104 simulated data sets. Since we have described these methods previously, we will say little about them here, referring the reader to the Supporting Information for more detail. Our new method in ref 8 is based on analysis of the variance of Cq from replicate experiments, with N0 being inversely proportional to the Poisson component of Cq2. There are at least four other contributors to the variance: LS estimation, amplification fluctuations, pipette volume uncertainty, and E variability, both cycle-to-cycle and reaction-to-reaction. Our MC simulation codes permit us to investigate these individually and in any combination, the latter to check variance additivity. We omit consideration of amplification fluctuations, except to note that our earlier treatment tacitly assumed that the fluctuations occurred in the denaturation step. It may be more reasonable to assume that they occur in the annealing step, which effectively doubles the starting N and halves the correction to typically ~3% of the Poisson contribution. Since it has the same dependence on N0, the correction is easily added to the Poisson variance. Estimation uncertainty is present in any LS analysis. The estimated parameter standard errors (SEs) are the square roots of the diagonal elements of the covariance matrix, which is usually obtained in the course of solving the LS equations.31,34,35 There has been concern in the literature over the validity of these SE estimates, because nonlinear estimators are inherently nonGaussian and often don't even have finite variance. However, we have previously shown that such concerns are mostly overblown.32 We confirm here that for typical qPCR data, the key

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 24

-8-

parameter distributions are close to normal (Gaussian) or log-normal (see Supporting Information). Further, for typical qPCR data, this contribution to the total variance is negligible compared with the Poisson contribution, up to N0 > 103. qPCR Models. Our workhorse model for the simulations is the 3-parameter logistic function of eqs 2 and 3 (LRE3), as this is a straightforward way to generate realistic sigmoidal amplification profiles over the a full cycle range while still following the exponential growth form of eq 1 in the baseline region (where Ex  E0). This permits us to produce synthetic data from Ex, which is necessary to investigate cycle-to-cycle variation in E. To study effects of model error we have also used a 4-parameter log-logistic function, LL4(x) = ymax [1 + (g/x)h]p,

(4)

as well as a 4-parameter version of eq 3 (LRE4), obtained by raising the denominator to the power p. In fitting the synthetic data, we include one extra parameter to represent a constant background. The background (baseline) is always present with actual data, and constant is its simplest form. We are interested in the scale-independent definitions of Cq as well as the scale-dependent threshold cycle Ct.5,8 As already noted, x1/2 in eq 3 is the FDM. For that model, the SDM is SDM = FDM  k ln(2+31/2),

(5)

while Cy0 (the x-axis intercept of a line tangent to the curve at the FDM) is Cy0 = FDM  2k ,

(6)

Cr = FDM  k ln(r1  1),

(7)

The relative threshold Cr is

where r is a specified fraction of ymax. (For the absolute threshold Ct, substitute ymax/yq for r1 in eq 7.) Any of these can be obtained directly from the NLS fit by using eqs 5-7 to replace k in eq 3 by a choice among SDM, Cy0, Cr, and Ct (see Figure 1). To investigate the methods that employ eq 1, we use the threshold definition of Cq in the following versions of eq 1:

ACS Paragon Plus Environment

Page 9 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

-9y(x) = bas(x) + y0 Ex  bas(x) + yq E(xCq) ,

(8)

in which bas(x) represents the baseline function. This is a constant in most of the MC simulations, but can be more complex in the analysis of real data.8 The two versions of eq 8 are statistically equivalent, with E being the same in both, and the first being used when y0 is of interest, the second when Cq is wanted for the specified signal level yq. Data. Our MC results have led us to further examine the data we analyzed in ref 8. Our primary source was the 944Reps dataset provided in the Supplement to ref 5 (94 replicates at each of 4 dilutions). We considered also datasets from three other sources (downloadable at www.dr-spiess.de): 20 replicates at each of 6 dilutions from Rutledge and Cote (minimum N0 stated to be 417),4 127 from Guescini, et al. (N0 = 31),36 and 185 from Lievens, et al. (N0 = 160).37 •

RESULTS AND DISCUSSION Monte Carlo Model: Estimation Variance. The logistic model used to produce the

amplification profile in Figure 1 is based loosely on the most dilute reactions in the 944Reps data from ref 5, for which we previously found a data error of ~4 (average y2 in Table S-1 of the ref 8 supplement was ~16); this value was used to produce the parameter SEs in Figure 1. The MC simulations confirm the predicted SEs in Figure 1 and show that these Cq indices are normally distributed at least up to data error 25 times larger than that used to produce Figure 1 (see SI). For the estimation of N0 with the LRE model, log y0 is approximately normal up to y = 40. This log-normal property of y0 was noted previously for replicate data.23 For y = 4 (about 0.1% of plateau signal level), the Figure 1 results indicate a precision better than 5% for y0. The error decreases to 3% when the profile is shifted down the cycle axis by 10 units; and it is comparable for 0.1% proportional error in y — a noise level that seems reasonable for data collected with modern instrumentation. This precision is comparable to pipetting volume uncertainty (see below), and it does mean that for data satisfying this model, y0 can be estimated

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 24

-10-

with this precision from a single reaction. The problem is ensuring that this model is correct for a given data set, so bias is a greater concern than precision, as is discussed below. For y = 4, the estimation variance for Cq is less than the Poisson variance up to N0 ≈ 105 (see below), so estimation variance should not be limiting in our variance analysis method of estimating N0, even for signal noise 10 times greater. Poisson Variance in Cq. In ref 8 we used error propagation to predict the variance in Cq resulting from Poisson uncertainty in N0 and obtained

Cq2 = [(lnE)2 N0]1 ,

(9)

where N0 represents the average copy number, as expected from serial dilutions, for example. However, for small N0 the Poisson distribution deviates from the normal distribution, and that, together with the nonlinear mapping between Cq and N0, raised doubts about the reliability of eq 9. Preliminary calculations suggested that it predicts variances ~10% too small in the N0 = 10-30 region. We confirm those indications here. We first assess the variance dependence of Cq on N0 formally, using the definitions

Cq2 = Cq2  Cq2 , and

CqnP(N0) Cq,N0n ,

(10) (11)

where the Poisson distribution is P(m) = m e/m!, with  the mean value (here N0 and m = 0, 1, 2, ... . To map the Poisson distribution of N0 into Cq, we use eq 1 and calculate Cq for Nq = 109. For E = 2, this gives Cq  27 for N0 = 8, dropping to 25 for N0 = 30; however the results are not sensitive to this choice of Nq. The N0 and Cq distributions are illustrated for N0 = 8 in Figure 2. Numerical evaluation of eq 10 for a range of average N0 values shows that eq 9 is valid only in the large-N0 limit. The disparity increases with decreasing N0 in a manner that can be well represented by a quadratic expression in N0, as shown in Figure 3. The same treatment for E = 1.9 gives identical results, so the only dependence of the Cq variance on E is that in eq 9. The correction is also not dependent on the use of eq 1 to relate N0 to Cq, as we confirmed by using the model of Figure 1 to map N0 into Cq, with Cq taken as the FDM.

ACS Paragon Plus Environment

Page 11 of 24

-11-

With experimental data, one estimates Cq for each of the replicate reactions and then computes Cq2 from the sampling statistics. We check this procedure using the MC simulations with the model of Figure 1. The results are included in Figure 3 and agree with the predictions. Here Cq was taken as the FDM; however, nearly identical variances were obtained for the SDM, Cr, and Cy0, as is expected, since the estimation variance is negligible and the curve shape is fixed. As follows from the above, the Poisson-mapped distributions of Cq are not normal, and up to our maximum N0 = 60, the discrete nature of Cq persists, as is evident from attempts to histogram 0.15

P(N0)

0.10

0.050

0.0 0

5

10 N0

15

20

0.15

0.10 P8(Cq)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0.05

0.00 28

Figure 2:

29

30 Cq

31

32

Poisson probability distributions in N0 (top) and Cq for N0= 8.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 24

-1230 25 20

predicted cor % Monte Carlo

15 10 5 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 1/N0

Figure 3:

Excess variance in Cq (%) from Poisson uncertainty in N0. The reference is the prediction from eq 9. Solid points were computed using eqs 10 and 11. Points with error bars (1 ) are results from 104 MC simulations each, taking Cq as the FDM from the LS fit to 21 points. For y = 4 the estimation variance is negligible, and the error bars are derived from the sampling based relative SD of (2/104)1/2 for the variance estimates, from the properties of the 2 distribution.38 The solid curve is the LS fit of the solid points to ax + bx2, with a = 152(4) and b = 461(39).

the MC Cq estimates. Although the Poisson distribution does become normal at large , deviations from normality are still evident at N0= 60. Cycle-to-Cycle Amplification Efficiency Variability. In ref 8 we found that E variability might contribute to Cq2 at large N0, where the Poisson contribution was negligible. We considered E variability (1) reaction-to-reaction and (2) cycle-to-cycle within each reaction and found no clear evidence for the former in the LS analyses. Its effect on the Cq variance is anyway easy to predict, so we do not consider it further here. For cycle-to-cycle E variability, we obtained Cq Ex Cq = E ln E

,

(12)

where we use Ex to distinguish this E variability from the estimation SE for E. To check eq 12 with MC simulations, we first generate Ex from the LRE model (eq 2; x = 1, 2, 3, ...), from which we produce the true curve using

ACS Paragon Plus Environment

Page 13 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

-13yx = y0 E1E2 E3 ... Ex,

(13)

We then apply error randomly to E in each cycle to generate the variable-E curves. We can add Poisson variability in y0 and random noise to each yx to examine the effects in combination. However, we must now use weighted NLS to fit the data, because the E variability translates into increased noise in y, predicted most simply from yx = yx1 Ex,

(14)

from which yx = yx1 Ex , if we neglect uncertainty in yx1 (as is justified by the results; see SI). We assume that E noise and y noise are additive in the variances and compute the LS weights as the reciprocals of the total variances. The simplest assumption for E variation is constant Ex. However, it seems likely that the variability would be limited to just the amplification portion of E, or E1, in which case the error might reasonably be taken as proportional to E1. We have examined both of these in the simulations, taking the random variation in E to be normally distributed. For both error models, the main effect of E error is scale variability in the entire amplification profile, as is shown in Figure 4 5000

4000

3000 y 2000 5000

1000 4500 38

0 25

30

35

40

42

40

cycle

ACS Paragon Plus Environment

44

46

45

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 24

-14Figure 4:

Monte Carlo simulated amplification profiles for randomly variable E, with Ex = 0.01  (E1) (solid) and 0.01  (E01) (dashed), with E0 = 1.9. Plateau region is shown amplified in inset.

for E variability 8 times larger than the estimation SE for E0 in Figure 1. Because the true E is de facto constant in the baseline region, the two error models agree there; they disagree most in the plateau region, where there is no channel-to-channel scatter for the proportional error model, since the E error vanishes there for it. In the Cq variances we find negligible increase in the proportional error model for the scale-independent Cqs (FDM, SDM, Cy0, Cr); but we do see increase for these in the constant-error model. Equation 12 was derived for Ct at constant yq, and indeed the variance in the constant-threshold Cq (Ct) does increases in both error models. We have investigated the differences in more detail by running simulations in which the E variability is applied just to cycles 0 to n2, n2 = 5, 10, ... 45. The results of these calculations confirm the predicted Cq dependence in eq 12 and clarify other differences between the two error models (see SI). The important result of these computations is that the use of a scale-independent Cq completely neutralizes effects of cycle-to-cycle E variability when this is proportional to (E1). Exactly the same behavior occurs for scale variability from sensitivity of detection.10 The question then arises, how might one distinguish these two effects? One possibility lies in examining the residuals from the LS fits collectively. Equation 14 predicts that the error in y should peak near the FDM when the error in E is proportional to (E1). For 1% error in (E1) in the model of Figure 1, this effect is only about twice the estimated data error, so it might be hard to detect, especially if the data error is itself proportional to signal, a behavior often observed in qPCR data,10,39 which can result from fluctuations in the laser power in the detection system, for example. We have looked for the signature of proportional E error in the LS fit residuals from the four replicate datasets we analyzed in ref 8. This task is complicated by the presence of systematic error, which we have observed for every dataset we have analyzed and every fit model

ACS Paragon Plus Environment

Page 15 of 24

-15we have tried (see Graphic and SI Figure S-5). However, the scatter of the residuals about their average for a given cycle is a measure of the true dispersion of the data for that cycle. To analyze the residuals for data from multiple dilutions collectively, we must take a common structural feature, like the FDM, as the reference cycle for each profile. With this approach, the 944Reps data do show a peak at the FDM, roughly commensurate with 1% proportional E error (SI, Figure S-6); but the large scatter of the SD estimates in this region softens this conclusion. Of the other analyzed datasets, only that from Guescini, et al.36 shows evidence of this effect (Figure 5), while the residuals for the Lievens, et al. data37 reflect the often observed increase in data noise with signal. 0.20 Guescini, et al. Lievens, et al. ( 0.01) standard deviation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0.15

0.10

0.05

0.00 -10

Figure 5:

-5

0

5 10 Relative cycle

15

20

Residuals analysis of datasets from refs 36 (Guescini) and 37 (Lievens). The scale factor for the latter compensates for different y-scales in the two datasets. Here the reference for the relative cycle is the location parameter g in eq 4.

Pipette Volume Uncertainty and Variance Additivity. We have found little about pipette volume uncertainty in the research literature, except the suggestion that electronic pipetting is better than manual.40 The most optimistic quantitative figures we have seen are for the Tecan Freedom Evo robotic systems, which state 50 different amplicons (private communication). In his LRE method, n2 is chosen from visual examination of the linearity of the Ex vs. yx plot; the present fit variance method might facilitate automation of this procedure. It is noteworthy that the LRE errors found here would be acceptable in many biological applications, where even a factor of 2 or more error in N0 can be tolerated. In addition to the bias problems, the eq-1-based methods compared in ref 5 incorporate procedures that give needless precision losses. Most estimate the baseline separately from the growth parameters, and some employ an R2 or other test to select "best" results. Those which use a log transformation to fit a straight line neglect the data weighting that should accompany such a transformation.10,32 In estimating Ct, many choose a threshold signal level that is too low.23 Some instruments output baseline-subtracted and perhaps smoothed data, but it is not always clear just what is done and how. These procedural deficiencies are quantified in the SI. However, from a practical standpoint, these precision losses are not a source of great uncertainty in the estimation of Cq. From the comparisons we have made with the large replicate test datasets, the one choice that can make a clear difference in the Cq precision is the use of a scale-independent definition for this marker,8,10 since these (FDM, SDM, Cy0, relative threshold) are all immune to detectionrelated scale effects, and now, from the present work, to the most likely effects from random E variability. The virtues of amplitude normalization have been recognized for some time,39 but this seems not to have been widely adopted; and most of the eq-1-based methods compared in ref 5 use absolute threshold to define Cq.

ACS Paragon Plus Environment

Page 19 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

-19-

CONCLUSION Using Monte Carlo simulations, we have examined several sources of random and systematic error in experimental qPCR methods: random signal noise, Poisson statistics, pipette volume uncertainty, and cycle-to-cycle variability in the amplification efficiency. We have derived corrections for the Poisson contribution to the variance and have shown that the main effect of E variability is scale variability of the whole amplification profile. For Ex proportional to (E1), the scale variability is completely analogous to that from detection sensitivity, meaning there is insignificant effect on the Cq variance for the scale-independent definitions of Cq. For typical experimental data, the statistical distributions of E, Cq, and log y0 are approximately normal, with the notable exception of the Poisson-based distribution of Cq for small N0. Still, the variance contributions from these several sources are additive. The growth-equation methods widely used to analyze qPCR data do not yield optimal precision, but their practically significant drawback is their reliance on a constant absolute threshold to define Cq. On the other hand, in estimating E and y0, they are significantly low- and high-biased, respectively. For copy numbers < 102, Poisson statistics dominate the Cq variance, and one can determine the copy number from this variance, since it is inversely proportional to N0. Our present results provide a needed correction to the expression used previously to implement this method for replicate data.8 As N0 increases, the variance contribution from pipette volume uncertainty may dominate. Beyond manufacturers' specifications, there is relatively little known about this error source. Nor have workers paid it much attention in designing their experiments. It seems likely that some combination of improvements in the pipetting devices, use of reference dyes, and avoiding small volumes in experimental protocols might suffice to reduce the volume uncertainty to the 1% regime, permitting reliable estimation of N0 from the Cq variance for copy numbers exceeding 103.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

-20-

Acknowledgment We thank Mikael Kubista for pointing out that amplification fluctuations likely originate in the annealing step of PCR. This work was supported by the German Research Foundation under Grant Sp721/4-1 to ANS. Supporting Information Available This information is available free of charge via the internet at http://pubs.acs.org.

ACS Paragon Plus Environment

Page 20 of 24

Page 21 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

-21-

References (1) Higuchi, R.; Fockler, C.; Dollinger, G.; Watson, R. Biotechnology (N Y) 1993, 11, 1026– 1030. (2) Morrison, T.B.; Weiss, J.B.; Wittwer, C.T. Biotechniques 1998, 24, 954-958. (3) Pfaffl, M.W.; Hageleit, M. Biotechnol. Lett. 2001, 23, 275-282. (4) Rutledge, R.G.; Cote, C. Nucleic Acids Res. 2003, 31, e93. (5) Ruijter, J.M.; Pfaffl, M.W.;Zhao, S.; Spiess, A.N.; Boggy, G.; Blom, J.; Rutledge, R.G.; Sisti, D.; Lievens, A.;De Preter, K.; Derveaux, S.; Hellemans, J.; Vandesompele, J. Methods 2013, 59, 32-46. (6) Carr, A.C.; Moore, S.D. PLoS ONE 2012, 7, e37640. (7) Mojtahedi, M.; d'Hérouël, A. F.; Huang, S. Nucleic Acids Res. 2014, 42, e126. (8) Tellinghuisen, J.; Spiess, A.-N. Anal. Chem. 2015, 87, 1889-1895. (9) Cikos, S.; Koppel, J. Anal. Biochem. 2009, 384, 1-10 (10) Tellinghuisen, J.; Spiess, A.-N. Anal. Biochem. 2014, 449, 76-82. (11) Rutledge R.; Stewart, D. BMC Mol. Biol. 2008, 9, 96. (12) Liu, W.; Saint, D.A. Anal. Biochem. 2002, 302, 52-59. (13) Tichopad, A.; Dilger, M.; Schwarz, G.; Pfaffl, M.W. Nucleic Acids Res. 2003, 31, e122. (14) Ramakers, C.; Ruijter, J.M.; Deprez, R.H.; Moorman, A.F. Neurosci. Lett. 2003, 339, 62-66. (15) Swillens, S.; Goffard, J.C.; Marechal, Y.; d'Exaerde, A.D.; El Housni, H. Nucleic Acids Res. 2004, 31, e122. (16) Smith, M.V.; Miller, C.R.; Kohn, M.; Walker, N.J.; Portier, C.J. BMC Bioinformatics 2007, 8, 409. (17) Rutledge, R.G.; Stewart, D. PLoS ONE 2010, 5, e9731. (18) Chervoneva, I.; Li, Y.; Iglewicz, B.; Waldman, S.; Hyslop, T. Statist. Med. 2007, 26, 55965611. (19) Liu, W.; Saint, D.A. Biochem. Biophys. Res. Commun. 2002, 294, 347-353.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

-22-

(20) Spiess, A.N.; Feig, C.; C. Ritz, C. BMC Bioinformatics 2008, 9, 221. (21) Nogva, H.K.; Rudi, K. Biotechniques 2004, 37, 246-253. (22) Boggy, G.J.; Woolf, P.J. PLoS ONE 2010, 5, e12355. (23) Tellinghuisen, J.; Spiess, A.-N. Anal. Biochem. 2014, 464, 94-102. (24) Vogelstein, B.; Kinzler, K.W. Proc. Nat. Acad. Sci. USA 1999, 96, 9236-9241. (25) Kreutz, J. E.; Munson, T.; Huynh, T.; Shen, F.; Du, W.; Ismagilov, R. F. Anal. Chem. 2011, 83, 8158-8168. (26) Hindson, B.J.; Ness, K.D.; Masquelier, D.A.; Belgrader, P.; Heredia, N.J.; Makarewicz, A.J.; Bright, I.J.; Lucero, M.Y.; Hiddessen, A.L.; Legler, T.C. Anal. Chem. 2011, 83, 86048610. (27) Nixon, G.; Garson, J. A.; Grant, P.; Nastouli, E.; Foy, C. A.; Huggett, J. F. Anal. Chem. 2014, 86, 4387-4394. (28) Morrison, T.; Weis, J.J.; Wittwer, C.T. Biotechniques 1998, 24, 954–962. (29) Stenman, J.; Opana, A. Nat. Biotechnol. 2001, 19, 1011-1012. (30) Lievens, A.; Van Aelst, S.; Van den Bulcke, M.; Goetghebeur, E. Plos ONE 2012, 7, e47112. (31) Tellinghuisen, J. J. Chem. Educ. 2000, 77, 1233-1239. (32) Tellinghuisen, J. J. Phys. Chem. A 2000, 104, 2834-2844. (33) Ritz, C.; Spiess, A. N. Bioinformatics 2008, 24, 1549-1551. (34) Press, W.H.; Flannery, B.P.; Teukolsky, S. A.; Vetterling, W. T. Numerical Recipes, Cambridge Univ. Press, Cambridge, U. K. (1986). . (35) Bevington, P.R. Data Reduction and Error Analysis for the Physical Sciences, McGrawHill, New York (1969). (36) Guescini, M.; Sisti, D.; Rocchi, M.B.L.; Stocchi, L.; Stocchi, V. BMC Bioinformatics 2008, 9, 326. (37) Lievens, A.; Van Aelst, S.; Van den Bulcke, M.; Goetghebeur, E. Nucleic Acids Res. 2012, 40, e10.

ACS Paragon Plus Environment

Page 22 of 24

Page 23 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

-23-

(38) Tellinghuisen, J. Analyst 2008, 133, 161-166. (39) Larionov, A.; Krause, A.; Miller, W. BMC Bioinformatics 2005, 6, 62. (40) Curry, J. D.; McHale, C.; Smith, M. T. Mol. Biol. Today 2002, 3, 79-84.

ACS Paragon Plus Environment

Analytical Chemistry

-24-

for TOC only 40 Least Squares Residuals

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

30 20 10 0 -10 -20 -30 -40 -10

-5

0 5 Relative Cycle

10

ACS Paragon Plus Environment

Page 24 of 24