Estimate of the probability of diffusional misordering in high-speed

Oct 1, 1993 - Estimate of the probability of diffusional misordering in high-speed DNA sequencing. Lawrence R. Pratt, Richard A. Keller. J. Phys. Chem...
0 downloads 0 Views 244KB Size
J. Phys. Chem. 1993,97, 10254-10255

10254

Estimate of the Probability of Diffusional Misordering in High-speed DNA Sequencing Lawrence R. Pratt' and Richard A. Keller LQS Alamos National Laboratory, LQs Alamos, New Mexico 87545 Received: July 16, 1993'

The issue of diffusional misordering of sequentially cleaved pairs of nucleic acid bases in high-speed DNA sequencing is studied by extracting the expected fraction of misordered pairs from solutions of diffusion-flow equations. These results are used to determine cutting rates, flow velocities, and distances to detectors that correspond to target fractions of misordered pairs.

A proposed scheme for high-speed DNA sequencing' involves rapid, processive cleavage of tagged nucleic acid bases from a DNA chain placed in the center of a capillary through which an aqueous solution flows. The cleaved bases are then detected downstream, one at a time, in the order of their release. The cleavage process is expected to operate at a rate of approximately 102-103 s-1. A question raised about this approach is whether the diffusional misordering of the released bases will lead to a troublesome error rate for achievable experimental designs. The set of design parameters includes the distance, d, between the release point and the detection region and the flow velocity, u. The cleavage rate, K , can also be adjusted by varying the exonuclease, the exonuclease concentration, the cofactor concentration, and the temperatureor by the use of inhibiting agents. The identification of satisfactory ranges for these parameters is important. It is worthwhile to consider diffusional misordering separately from other complicationssuch as those associated with nonuniformity of the flows near supports and walls, with wall effects more generally, or with manipulation of the DNA. Consequently, this Letter analyzes the simplest possibility: diffusion in a uniform flow with no additional complications. We present calculations of those design parameters (d,u,K) that can lead to a specified misordered fraction. The macroscopic equation for the diffusion-flow process that brings a base from the release point to a detection region is2

Here we treat the process as operating in one spatial dimension. This neglects x - y spatial nonuniformity of flow and other effects of the spatial confinement of the flowing material. p(z,t) is the conditional probability density of a base released at z = 0 at the initial time t = 0. D is the self-diffusion coefficient of a tagged base in the solution. One method of estimating the probability of misordered detection of bases utilizes the distribution of first passage times, cp(t), to a detection region downstream. If the delay time between release of the first and the second base of a pair is K-1, then the probability, or fractionf, of pairs of bases entering the detection region in misordered sequence is

We note that in the actual experiment cuts are not regularly spaced by times r l ; instead, there is a statistical distribution of waiting times. The reported ~ - 1is a time constant that parametrizes that distribution,but we will consider later the consequences of allowing the time between cuts to become a random variable. Equation 2 is understood in the following way: If base 1 arrives xithin time K-'-before base 2 is released-there is no possibility Author to whom correspondence should be addressed.

* Abstract published in Advance ACS Abstracts, October 1, 1993. 0022-3654/93/2097- 10254$04.00/0

of misordering, and we need only consider the events for which base 1 takes longer than ~ - 1to arrive. The probability that base 1 first arrives within dt of time t is q(t) dt. The probability that base 2 arrived earlier is J I ; ' I d t ? dt'. The desired misordering probability is the product of these two factors summed over arrival times for base 1 that are greater than r l .Note that this formula quite sensibly gives l / 2 for the misordering probability in the limit that the release delay vanishes. To analyze this misordering probability, we first express eq 1 in conventional dimensionless variables:

(3) Distances are expressed relative to the distance from the release point to the detection region; times are in units of the time for flow over that distance, (d/u);the dimensionless parameter Pe = ud/D, the Peclet number, characterizes the solutions found. We next note that Schrtidinger3has given the required distribution of first passage times:

(4) These functions used in eq 2 produce the results of Figure 1 for the misordering probability as a function of the dimensionless release delay T = &(v/d). The results of Figure 1 allow us to calculate the design parameters (d,u,K)that lead to a fractionfof misordering errors. We take as parameters fixed by the physical nature of the molecular systems the release delay in physical units and the selfdiffusion coefficient. From the results of Figure 1, we obtain for each value of Pe the dimensionless release delay ~ ( that n leads to a misordering probability$ We then revert to the dimensional variables of interest, finding d = [ D P e / r ( n ~ ] and v = [T(nKDpe]'/'. Adoption of reasonable values for D and K finally leads to the curves (d,u) shown in Figure 2 corresponding to specified misordered fractions. The value adopted throughout for the self-diffusion coefficient was D = 5 X 10-*0 mz/s. For the tagged nucleotide involved in the proposal, this value is within the high range of reasonable values of D. We see from Figure 2 that when the cutting rate is K = 103 s-1 a velocity of about u = 10 mm/s and a distance to the detector of d = 40 pm leads to a misordered fraction o f f = 0.1%. Decreasing the distance to the detector from this point or increasing the rate of flow would lower the misordered fraction. At d = 40 pm, decreasing the flow velocity to v = 8 mm/s is expected to increase the misordered fraction to f = 1%; at a flow velocity of u = 10 mm/s, increasing the distance to the detector to d = 100 pm would again increase the misordered fraction to f = 1%. If the cutting rate is decreased to K = 102 s-1, then a 0 1993 American Chemical Society

Letters

The Journal of Physical Chemistry, Vol. 97, No. 40, 1993 10255 I

I

I

10-1

L

10'~ 10-4 ~. ...

1

20 000

0.0

0.2

0.4

0.6

z Figure 1. Fraction of pairs of bases that enter the detection region misordered as a function of the dimensionlessrelease delay T = ~ - l ( u / d ) . see eq 2.

a more detailed treatment of the distribution of release delays willaffect the present results. To this end, we sharpen thenotation slightly by considering an ensemble of pairs of bases for which the second base is released a t a specific time A after the first. The (d,v) that corresponds to a specific fraction of misordered pairs is given by precisely the relations above: d = [ D p e A / ~ ( f112 ) ] and u = [ ~ ( f ) D p e / A ] Now ~ / ~ .we consider a distribution of release delays for the second base of the pair. An average value computed using that distribution will be denoted by the overline z,The only quantity in the formulas above affectedby this averaging is A. Thus we have d = [ D P e / ~ ( f ) 1 ' / ~ A land / ' u = [T (f)DPe]'/2A-1/2.This will not change the relation u a dW but does change the proportionality constant. For example, if an exponential distribution of release delays Ke*A were correct, the constant of proportionality, (u3/d)V3, would be increased by a factor of ( ~ T ) I / and ~ the relations would b e d = [TDPe/&(f)K]112 and u = [ a r ( f ) ~ D P e ] I This / ~ . result might be sensitive to the distribution of release delays near short times. However, if the dispersion of release delays were small, we would consider only the mean and the second moment about the mean [A - (A)]' = 6A2. The proportionality constant would be increase linearly with 6A' as ( 1 + 5(~6A)'/12)where we have made the identification = K-I. Evaluated for the exponential distribution, (KSA)' = 1, this latter, more general increase factor is numerically quite similar to ( 2 ~ ) ' / Thus, ~ . the more general formula will provide a realistic estimate of the effect of variations in release delays even for moderately unfavorable cases. It is worth emphasizing also that calculations like those presented here can give a quantitative assessment of the redundancy required of sequencing measurements in order to produce reliable sequences based upon the value off. Extension of these methods to treat misordering of triples, etc., is also conceptually straightforward. We reiterate that these calculations have not considered many potential complications including the simultaneous presence of more than one base in the detection region, nonuniformities of the flow due to walls and supports, or wall effects more generally. The effects listed should be quite accessible on the basis of Monte Carlo simulations.

a

10''

d (m) Figure 2. Curves ( d p ) corresponding to specified misordered fractions. The solid lines are for a cutting rate of K = lo3 s-I, and the dashed lines are for a cutting rate of K = lo2 s-I. Within each of these sets, the upper, middle, and lower lines are for misordered fractions off= 0.1%, 1%, and IO%, respectively. D = 5 x 10-10 m2/s.

misordered fraction off = 0.1% can be achieved with d = 100 pm and u = 3 mm/s. We note that the results given conform to the simple relation u a d1i3motivated by the idea4that the conditions of interest are those for which diffusional misordering is slight and, therefore, diffusional motion can be treated as secondary to the flow displacement. The relations given above isolate the role of the cutting rate K from the other parameters. This makes it simple to see how

Acknowledgment. We thank A. G. Petschek for helpful discussions. References and Notes (1) (a) Goodwin, P. M.; Schecker, J. A.;

Wilkerson, C. W.; Hammond,

M.L.; Ambrose, W. P.; Jett, J. H.; Martin, J. C.; Marrone, B. L.; Keller, R.

A,; Haces, A.; Shin, P.-J.; Harding, J. D. Proc. SPIE 1993, 1891, 127. (b) Jett, J. H.; Keller, R. A.; Martin, J. C.; Marrone, B.L.; Moyzis,R. K.; Ratliff, R. L.; Seitzinger, N. K.; Shera, E. B.;Stewart, C. C. J . Eiornol. S t c c r . Dyn. 1989, 7, 301. (2) See,for example: van Kampen,N. G. Stochustic Processes in Physics and Chemistry; North-Holland: New York, 1981. ( 3 ) Schridinger, E. Phys. Z . 1915, 16, 289. (4) Petschek, A. G., private communication.