Theory of DNA Sequencing Using Free-Solution Electrophoresis of

William E. ArterJérôme CharmetJinglin KongKadi L. SaarTherese W. ... Jennifer S. Lin , Jennifer Coyne Albrecht , Robert J. Meagher , Xiaoxiao Wang ,...
0 downloads 0 Views 426KB Size
Technical Notes Anal. Chem. 1994,66, 1777-1780

Theory of DNA Sequencing Using Free-Solution Electrophoresis of Protein-DNA Complexes Pascal Mayer,'vt Gary W. Slater,$ and Guy Drouint Department of Biology and Department of Physics, University of Ottawa, Ottawa, Ontario K 1N 6N5, Canada

Large-scalesequencing projects, like the humangenome project, require the development of new DNA separation methods that are more efficient than currently used gel electrophoresis methods. Here, we present the theoretical limits of the freesolution electrophoretic separation of DNA molecules endlabeled with a protein (or another monodispersechemical) to generate additional friction. We call this new method ELFSE for end-labeled free-solution electrophoresis. Our results are based on a free-draining coil model and the Einstein relation between friction and diffusion. We show that off-the-shelf streptavidin-DNA free solutioncapillary electrophoresis could already outperform current DNA separation technologies. By using small loading widths and large monodisperse labeling proteins or chemicals, ELFSE should allow one to resolve over 2000 bases per sequencingreaction within minutes. Therefore, since a sieving network is not necessary, ELFSE might open the way to automated supersequencingsystems and dramatically speed up the human genome project. Sequencing DNA requires the physical separation of the DNA fragments produced during a sequencing reaction.' This is usually accomplished using slab or capillary denaturing polyacrylamide gel electrophoresis. However, using sieving networks leads to many technological problem^.^^^ Also, resolution is limited to DNA fragments of 500-700 bases in length. Therefore, such methods are inappropriate in massively parallel shotgun approaches used in large-scale genome sequencing project^,^ and developing new methods of DNA fragment separation is now essential.5 Here, we study the theoretical limits of free-solution electrophoretic separation of DNA molecules which are endlabeled with a protein (or another monodisperse chemical) to generate additional friction. We first present the basic elements of our calculation: a free-draining coil model of DNA6and the Einstein relation between friction and diff~sion.~ Department of Biology. Department of Physics. ( I ) (a) Maxam, A. M.; Gilbert, W. Proc. Nafl. Acad. Sci. V.S.A. 1977, 74, 560-4. (b) Sanger, F.; Niklen, S.;Coulson, A. R. Proc. Natl. Acad. Sci. V.S.A. 1917, 74, 5463-7. (2) (a) Smith, L. M. Nature 1991,349, 812-3. (b) Swerdlow, H.; Zhang, J. Z.; Chen, D. Y.; Harke, H. R.; Grey, R.; Wu, S.; Dovichi, N. J.; Fuller, C . Anal. Chem. 1991,63, 2835-41. (3) Slater, G. W.; Drouin, G. Electrophoresis 1992, 13, 574-82. (4) Olson, M. V. Proc. Nafl. Acad. Sci. U.S.A. 1993, 90, 433844. (5) Collins, F.; Galas, D. Science 1993, 262, 43-6. (6) Hermanns, J. J. J . Polym. Sci. 1955, 18, 529-34.

0003-2700/94/0366-1777$04.50/0

0 1994 American Chemical Society

We then derive analytically and numerically the separation power of ELFSE and compare it with gel electrophoresis. We conclude by discussing the possible impact of ELFSE on genome sequencing.

PHYSICAL MODEL AND THEORETICAL DEVELOPMENTS Electrophoretic Mobility. In free solution, a DNA molecule M-bases long behaves as a free-draining coil, and its electrophoretic mobility p ( M ) , Le., its velocity u(MJ divided by the electric field intensity, E , is equal to the ratio of its electric charge to its friction coefficient. Since these two molecular properties scale linearly with M, the mobility is independent of molecular size? and separation is impossible.8 However, labeling the DNA with a molecular species having a different charge/friction ratiocan lead to size-dependent mobility. Here, we study the electrophoretic properties of end-label-DNA (ELDNA) complexes assuming that we can neglect euentual electrostatic and hydrodynamic interactions between the DNA and the friction generating label, Le., that the free-draining coil behavior of native DNA is retained for the EL-DNA complex. With this assumption, the free-solution mobility of the EL-DNA complex simply equals its charge to friction ratio. In the following, we call a the friction due to the endlabel and express it in units of the friction [ of one base: the total friction of the EL-DNA complex is thus [(M + a).We call -0the effective charge carried by the end-label and express it in units of the electric charge p carried by one base (the negative sign in front of @ arises from the fact that DNA is negatively charged): the total charge of the EL-DNA complex is thus p(M- 0). The free-solution mobility of the EL-DNA free draining coil is then given by

where po = p / E is the free-solution mobility of a normal freedraining DNA molecule. This equation shows that the mobility of a EL-DNA complex is now a function of the size of the DNA fragment when a # -0,Le., separation is possible (7) Netter, H. Theorefical Biochemistry; John Wiley & Sons: New York, 1969; p 87. ( 8 ) Olivera, B. M.; Baine, P.; Davidson, N. Biopolymers 1964, 2, 245-57.

AnaiyticalChemisfry, Vol. 66,No. 10, May 15, 1994

1777

with end-labels having a charge to friction ratio that is different from the charge to friction ratio of DNA (in practice, M > fi is also required to ensure that all the molecules migrate in the same direction), In contrast to classical gel electrophoresis, complexes containing long DNA fragments will have higher velocities than complexes containing short DNA fragments. Diffusion Coefficient. To obtain insights on the intrinsic limitations of this method, we first consider only Brownian diffusion as a source of band broadening, and as discussed below, we neglect all the technology-dependent ones (e.g., for capillary technology: transverse temperature gradients and detector's laser beam-width induced band broadening, see discusion section). In free solution, the Einstein relation7 should relate the diffusion constant D(M) to the mobility p ( M ) following:

where ke is the Boltzmann constant and Tis the temperature. Limits of Separation. We assume that consecutive ELDNA peaks have the same intensity and a Gaussian shape. In thiscase, the minimum migration distance&,(MJ required to detect the separation between molecules differing in length by a single base is appr~ximately:~

PI

I

(3)

where wois the initial width of the bands, and S is a numerical factor of order unity which depends on the efficiency of the detection method.

RESULTS AND DISCUSSION Limiting Behaviors. Using eqs 1-3, it is easy to verify that L,in(M) increases with molecular size M . Therefore, the of bases that can be read using a fixed migration number Mmax distance L is, in the limit where M >> a and wo is small (such that y > l), Mmax is (5)

Here, Mmaxdepends on wo but not on V. Note that in both cases Mmax is found to increase with a and 0. (9) WemodifiedtheapproachgivenbyAldroubi,A.;Garner, M.M.;Biotechniques 1992,13,620-4 to obtain a more precise result for S (not shown). We found that, for peaks of equal intensity, Scan be taken equal to one-half of thevalue given in this reference. The accuracy of our approach is illustrated in Figure 1, where the resolution calculated for 1610 bases indeed corresponds to the simulated signal.

1778

Analytical Chemistry, Vol. 66,No. 10, May 15, 1994

Table 1. Theoretlcal Separatlon Performance of Proteln-DNA ELFSE.

w 0 V a (pm) (kV) (b)

L (cm)

50 50 50 50 100 100 250 100 250 100 250 250 500

25 50 100 50 100 100 50 50 100 100 50 100 100

100 100 100 10 100 100 100 10 100 10 10 10

1

30 60 60 30 30 60 60 30 60 60 30 60 100

(ky)ta(50) (min) 13 13 7 17 3 7 12 22 6 18 27 23 73

ta(Mmad

(min)

2

1

4 15 7 44

2

22 11 11

44 22 22 44 48

8 4

17 9 3 4 10 8 4 8 5

Mma. (b) 200 280 390 450 510 550 590

700 850 910 1220 1610 3110

Estimation of the performance of the separation method. The maximum number of readable bases, M,-, the duration of the separation for a EL-DNA complex containin a DNA fragment of 50 bases, ts(50),orafragmentofM- bases, t,(d,-),and the maximum number of EL-DNA bands crossing the detector per second f-, as obtained from eq 3, are ven for selected values of the initial dispersion of the bands, UIO,$ea plied vol e, V ,the effectivefriction coefficient of the neutral end-lagel, a, and t e length of the capillary, L, using p , ~= 3.8 X 10-8 m2 s-l V-l (ref 19),T = 298K, p = 0.5e/base, e = 1.6 X C,j3 = 0, and S = 6. The char e density is typical of double-stranded DNA in free solution.20 ?he factor S = 6 is calculated for two adjacent bands separated such that the detected signal between the peaks is at most 90% of the detected signal at the peak positions of the bands. The calculation ap lies to capillary setups where detection occurs at the end of the capiiary. The results are rounded to the nearest multiple of 10 (M-),to the nearest upper minute (ts),and to two significant digits (f-). The value of LY for streptavidin, which has already been used as an end-label,l3 was estimated indirectly from mobility measurements on streptavidinlabeled DNA in denaturing polyacrylamide gels (Drouin and Mayer, unpublished): depending on how the migration of the complex is modeled, values of 20,50, and 130bases can be obtained, from which we choose 50 as a representative value and 100 and 250 as values which could likely be obtained using streptavidin multimers." The chosen voltages are available on commercial power supplies. Lon capillaries (up to 3 m)16could be used because they need not be fillefi with a gel.

9

Numerical Estimates of SeparationLimits. The upper part of Table 1 shows off-the-shelf conditions; reasonable technology enhancements (see below) should allow one to reach the values given in the lower part of the table. Even the extreme values obtained for the last conditions are ultimately within the reach of technology: 100-kV power supplies are commercially available and micrometer loading widths may be obtained using microlithographic devices.I0 Clearly, the predicted results, for both speed and resolution, greatly exceed current gel capillary electrophoresis performance. The actual duration t s of the separation depends on the detection methods. For classical capillary systems, it is the time taken by the last relevant EL-DNA complexes to reach the detector, e.g., t,(M = 50). However, shorter fragments are well-separated long before M,,,. With (yet to be developed) systems that can scan the whole length of the capillary (or, for that purpose, of an ultrathin slab electrophoresis system), the actual time would be t,(M = M m a x ) , which is much shorter. The rate&,, at which the fastest migrating complexes containing fragments of Mmax bases cross the detection area could impose technological limitations due to the limited speed of the detection method. Here we restricted ourself tofmax