DNA sequencing using capillary array electrophoresis | Analytical

CO;2-I. John W. Simpson, Marie C. Ruiz-Martinez, Greg T. Mulhern, Jan Berka, Darin R. ... Mark A. Quesada, Harbans S. Dhadwal, David Fisk, F. William ...
0 downloads 0 Views 2MB Size
Anal. Chem. 1992, 64, 2149-2154

2149

DNA Sequencing Using Capillary Array Electrophoresis Xiaohua C. Huang, Mark A. Quesada, and Richard A. Mathies' Department of Chemistry, University of California, Berkeley, California 94720

A DNA sequencing method Is presented that utlllzes caplllary array electrophoresls,two-color fluorescence detectlon, and a two-dye labeling protocol. Sanger DNA sequenclng fragments are separated on an array of caplllarles and detected on-column wlng a two-wlor, laser-exclted, confocal-fluorescence scanner. The four sets of DNA sequenclng fragments are separated In a dngle Capillary and then dlstlngulshed by uslng a blnary coding scheme where each fragment set Is labeled wlth a characterlstlc ratio of two dye-labeled prlmers. Slnce only two dye-labeled prlmersare requlred, It lo posslble to select dyes that have ldentlcal moblllty shtfts. I t Is also shown that the ratlo of the a n a l In the two detectlonchannels provldes a rellable ldentlflcatlon of the sequenclng fragment. DNA sequendngresultson a 25-caplllary array are presented.

To transcend this limitation, we have recently introduced the new technique of capillary array electrophoresis (CAE).I8 It was demonstrated that arrays of capillaries can be used to perform rapid, parallel separations followed by on-column detection using a one-color, confocal-fluorescence scanner.18 Because of capillary-to-capillary variations in the migration velocity, we concluded that DNA sequencing using CAE would probably require multicolor detection of the four seta of sequencing fragments after separation in the same capillary. The confocal-fluorescence gel scanner that we have developedlSz1has also been used to perform multicolor detection of slab gels.21*22 We show here that this multicolor, confocalfluorescence scanner can be used to detect DNA sequencing fragments separated on capillary arrays. In addition, a binary coding protocol for labeling the DNA fragments is introduced that permits us to sequence DNA using only two fluorescently labeled dye primers and a two-color detection system.

INTRODUCTION EXPERIMENTAL SECTION

The development of a high-speed, high-throughput DNA sequencing method is necessary for achieving the goals of the Instrumentation. Figure 1presents a schematic of the laserHuman Genome Automated DNA sequencing is excited,confocal-fluorescencecapillaryarray scanner. Excitation currently performed using either one-~olor"~ or f o u r - c ~ l o r ~ ~ light (488 nm, 1mW) from an argon ion laser (Spectra-Physics, Model 2020, Mountain View, CAI is reflected by a long-pass labeling of DNA fragments followed by separation on slab dichroic beam splitter (480DM,Omega Optical,Brattleboro,VT), gels and fluorescencedetection. Recently, high-fieldultrathin passed through a 32X, N.A. 0.4 microscope objective (LD Planslab gel electrophoresis has been introduced as one method Achromat 440850, Carl Zeiss, Germany), and brought to a 10for enhancing the rate of DNA sequencing.1° Capillary pm-diameter focus within the 100-pm-i.d.capillaries in the array. electrophoresis (CE) also appears to be a promising highThe fluorescenceis collectedby the objective,passed back through speed DNA sequencingmeth0d.l'-16 Although CE separations the first beam splitter to a second dichroic beam splitter (565LP, are rapid because of the high electric fields that can be applied, Omega Optical, Brattleboro,VT) that separates the red (A > 565 the throughput is about the same as that of conventional slab nm) and green (A < 565 nm) channels. The beams are then gels because only one capillary can be run and detected a t a focused on 400-pm-diameterconfocal pinholes. The emission is time.17 spectrallyfiltered by a 35-nm band-pass filter (590DF35,Omega Optical, Brattleboro, VT) centered at 590 nm (red channel) and * Corresponding author. a 10-nmband-pass filter (525DF10,Omega Optical, Brattleboro, (1) Smith, L. M. Genome 1989,31,929-937. VT)centered at 525 nm (green channel) followed by photomul(2) Cantor, C. R. Scrence 1990,248, 49-51. tiplier detection. The output is preamplified, filtered, digitized, (3) Watson, J. D. Science 1990,248,44-49. and then stored in an IBM PS/2 computer. A computer(4) HunkaDiller. . T.: Kaiser. R. J.: Koou. B. F.: Hood. L. Science 1991. 254,59-67. controlled stage is used to translate the capillary array past the (5) Ansorge, W.; Sproat, B.; Stegemann, J.; Schwager, C.; Zenke, M. optical system at 20 mm/s. The fluorescence is sampled Nucl. Acids Res. 1987,15,4593-4602. unidirectionally at 1500 Hdchannel. The image resolution is (6) Ansorge, W.; Zimmermann,J.; Schwager, C.; Stegemann, J.; Erfle, 13.3 pmlpixel. An image of the migrating bands is built up by H.; V a s , H. Nucl. Acids Res. 1990, 18, 3419-3420. accumulating periodic 1.4-5 sweeps of the exposed region of the (7) Tabor, S.; Richardson, C. C. J. Biol. Chem. 1990,265,8322-8328. (8) Smith, L. M.; Sanders, J. Z.; Kaiser, R. J.; Hughes, P.; Dodd, C.; capillaries. Postacquisitionimage processing was performed on Connell, C. R.; Heiner, C.; Kent, S. B. H.; Hood, L. E. Nature 1986,321, a Mac11 using the NIH program Image 1.29, and the image and 674-679. electropherogram displays were prepared using the commercial (9) Prober, J. M.; Trainor, G. L.; Dam, R. J.; Hobbs, F. W.; Robertson, programs, Canvas and Kaleidagraph. C. W.; Zagursky, R. J.; Cocuzza, A. J.; Jensen, M. A.; Baumeister, K. Science 1987,238, 336-341. Preparation of Capillary Arrays. Fused silica capillaries (IO) Kostichka, A. J.; Marchbanks, M. L.; Brumley, R. L.; Drossman, with a 100-pm i.d. and 200-pm 0.d. (Polymicro Technologies, H.; Smith, L. M. BiolTechnology 1992,10, 78-81. Phoenix, AZ) were filled with a non-cross-linked 9% T, 0% C (11) Drossman, H.;Luckey,J. A.; Kostichka, A. J.; DCunha, J.;Smith, L. M. Anal. Chem. 1990,62,900-903. polyacrylamide gel in Tris, boric acid, EDTA buffer (pH 8.3) I

.

(12) Luckey, J. A.; Drossman, H.; Kostichka, A. J.; Mead, D. A.; DCunha, J.; Norris, T. B.; Smith, L. M. Nucl. Acids Res. 1990,18,4417-

4421.

(13) Swerdlow, H.; Wu, S.; Harke, H.; Dovichi, N. J. J . Chromatogr. 1990,516, 61-67. (14) Swerdlow, H.; Gesteland, R. Nucl. Acids Res. 1990, 18, 14151419. (15) Cohen, A. S.; Najarian, D. R.; Karger, B. L. J. Chromatogr. 1990, 516, 49-60. (16) Swerdlow, H.; Zhang, J. Z.; Chen, D. Y.; Harke, H. R.; Grey, R.; Wu, S.; Dovichi, N. J.; Fuller, C. Anal. Chem. 1991, 63, 2835-2841. (17) Smith, L. M. Nature 1991,349, 812-813. 0003-2700/92/0364-2149$03.00/0

(18) Huang, X. C.; Quesada, M. A.; Mathies, R. A. A w l . Chem. 1992, 64,967-972. (19) Glazer, A. N.; Peck, K.; Mathies, R. A. R o c . Natl. Acad. Sci. U.S.A. 1990,87, 3851-3855. (20) Rye, H. S.; Quesada, M. A.; Peck, K.; Mathies, R. A.; Glazer,A. N. Nucl. Acrds Res. 1991,19, 327-333. (21) Quesada, M. A.;Rye, H. S.; Gingrich,J. C.;Glazer,A. N.;Mathies, R. A. BioTechniques 1991,10, 616-625. (22) Rye, H. S.; Yue, S.; Quesada, M. A.; Haughland, R. P.; Mathi-, R. A.; Glazer, A. N. Meth. Enzymol., in press. 0 1992 American Chemlcal Society

2150

ANALYTICAL CHEMISTRY, VOL. 84, NO. 18, SEPTEMBER 15, 1992

SPECTRAL FILTER CONFOCAL SPATIAL FILTER

DICHROIC BEAM SPLITTER

DICHROIC BEAM SPLITTER

LASER INPUT

Figure 1. Schematic of the two-color, confocal-fluorescence capillary

array scanner.

Table I. Binary Coding of DNA Sequencing Fragments FAM

JOE

A

94 1

A G

T C with 7 M urea as the The gel-filled capillary array was then assembled in a holder mounted on a computercontrolled translation stage. To achieve uniform detection sensitivity and background, the holder was designed to keep the exposed region of the capillaries precisely in the same plane. Typically, the length from the inlet to the detection window was 24 cm and the applied field was -225 V/cm. A detailed description of the capillary array fabrication has been reported previously.'* In initial studies, we have been able to reuse capillaries three or more times. Preparation of DNA Sequencing Sample. Chain-terminated M13mp18DNAsequencing fragmentswere produced using a Sequenase 2.0 kit (United States Biochemical Corp., Cleveland, OH). Commercially availableFAM and JOE-taggedprimers (400 nM, Applied Biosystems, Foster City, CA) were employed in the primer-template annealing step. Three annealing solutions were prepared: (1)4 pL of reaction buffer, 13pL of M13mp18 singlestranded DNA, and 3 p L of FAM; (2) 6 pL of reaction buffer, 20 p L of M13mpl8 DNA, 1.5 pL of FAM, and 3 p L of JOE; and (3) 6 pL of reaction buffer, 20 pL of M13mp18 DNA, and 4.5 p L of JOE. The tubes were heated to 65 OC for 3 min and then allowed to cool to room temperature for 30 min. When the temperature of the annealing reaction mixtures had dropped below 30 OC, 2 p L of 0.1 M DTT solution, 4 pL of reaction buffer, and 10 pL of ddT termination mixture were added in tube 1; 3 pL of DTT solution, 6 pL of reaction buffer, and 15 rL of ddA termination mixture were added in tube 2; and 3 pL of DTT, 6 pL of reaction buffer, and 15 pL of ddG termination mixture were added in ~~

(23) Cohen, A. S.;Najarian, D. R.; Paulus, A.; Guttman, A.; Smith, J. A.; Karger, B. L. R o c . Natl. Acad. Sci. U.S.A. 1988,85, 966C-9663. (24) Heiger, D.N.; Cohen, A. S.; Karger, B. L. J . Chromatogr. 1990, 516, 33-48.

Time

-

Figure 2. Comparison of the mobility shift of different dye primers on M13mp18 G fragment DNA samples. In A the sample was produced using an equimolar mixture of F A M and TAMRA-labeled prlmers. I n B the sample was produced using an equimolar mixture of F A M and JOE-labeled primers. The solid and dotted lines are the fluorescence signals detected in the green and red channels, respectively. The numbers above the peaks Indicate the base posklon.

tube 3. Diluted Sequenase 2.0 (4 pL) was added in tube 1,and 6 pL of diluted Sequenase was added in tubes 2 and 3. The mixtures were incubated at 37 O C for 5 min. Ethanolprecipitation was used to terminate the reaction and to desalt the DNA sequencing sample. The samples were then resuspended and pooled in 6 pL of 80% (v/v) formamide. The sample wa8 heated at 90 "C for 3 min to denature the DNA and then placed on ice

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18, SEPTEMBER 15, 1992

Green Channel in

15

0

2151

Red Channel 20

I

7

1

5

1c

20

25

A(23)

T( 112)

Flgure 3. DNA sequencing using a 25capillary array and binary coding to label the DNA fragments. The left-hand panel presents a pseudocolor display of the 525-nm, green channel and the right panel presents the 590-nmYred channel. The elapsed time to obtain this portion of the total image was 30 min. The length from the inlet to the detection zone was 24 cm. The applied voltage over the 4 k m total length of the 100-pm-i.d., 200-pn0.d. capillaries was 9 kV.

until sample injection. Electrokinetic injection was performed at 9 kV for 10 s.

RESULTS AND DISCUSSION To detect four sets of DNA sequencing fragments using a two-color fluorescence detection system, a new protocol for labeling the sequencing fragments was devised. Previous methods have employed either labeling each set of DNA fragmentswith a different dye followed by four-color detection or labeling each fragment with the same dye followed by detection based on different band intensities.e8 An alternative approach using four different dye labels and a twocolor detection system has been applied to slab gel sequencing: to small (0.5-mm) diameter, low-field tube gel sequen~ing?~ and subsequently to sequencing by high-field capillary gel electrophoresis.16 Two-dye labeling followed by two-color detection provides a simple and sensitive alternative. In this method, “binary combinations” of two dye-labeled primers, having the same priming sequence but tagged with different fluorescent dyes, are used to encode the four sets of DNA fragments. This is illustrated in Table I where a 1 denotes that that set of DNA fragments is synthesized with the corresponding dye primer and a 0 denotes the absence of the corresponding labeled dye primer. The (1,l) coding indicates that the A fragments are synthesizedwith a mixture of both dye-labeled primers. The (0,l) coding indicates that the G fragmentsare synthesizedwith the JOE-labeledprimer, and (1,O)indicates that the T fragments are synthesized with (25) Zagursky, R.J.; McCormick, R.M. BioTechniques 1990,9,74-79.

the FAM-labeled primer. Fragments terminating in C are not synthesized, and this is denoted by (0,O). There are two requirements for the dyes used in binary coding. First, it is critical that there be no electrophoretic mobilitydifference between DNA fragments labeled with the different dyes. When there are mobility shifts, correction procedures are required to read the DNA sequence.s The shift of the mobility of the DNA fragments due to the presence of the dye can be simply detected by capillary gel electrophoresis. Figure 2A presents an electropherogram of M13mp18 G fragments, half of which were labeled with the commerciallyavailable dye primer FAM and half labeled with the dye primer TAMRA. Peaks are observed at two different times for each DNA fragment due to the different mobility shift of FAM and TAMRA. The earlier peak, which is solely detected in the red channel, is due to the TAMRA-labeled DNA fragments; the later one, which is mainly detected in the green channel,is due to the FAM-labeled DNA fragments. The observed mobility shift between TAMRA-labeled fragments and FAM-labeled fragments is equivalent to a onebase change in the length of the DNA fragment. A similar shift is observed for fragments from 20 to more than 250 bases in length. This mobility shift is especially problematic when two fragments differ in length by one base (e.g., bases 87 and 88 in Figure 2A). Figure 2B presents an electropherogram of M13mp18 G fragments, half of which were labeled with the dye primer FAM and half labeled with the dye primer JOE. Within the resolution of the separation and under these conditions, we can detect no difference in the mobility of the two different labeled fragments.

2152

0

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18,SEPTEMBER 15, 1992

A A G C T T G C A T G C C T G C A G G T C G A C T C T A G A G G A T C C C C G G GTACCGA G C

T C G A A T TCGTAA TCA T

1700 I

G G T C A T A G C T G T T T C C T O T G T G A A A T T G T T A TC C G C T C A C A A T T C C A C A C A A C A T A C G A G C C G G A A G C A T A A A (150)

I G T G T A A A O C C T O G G G T G C C T A A T G A G T G A G C T A A C T C ACA T T A A T T G C G T T G C G C T C A C T G C C C G C T T T C C A G T C G

I

cmo)

Figure 4. Analysis of the DNA sequence from one capillary In a capillary array. The red image and onedimensional trace represent the signal 9" channel. The green image and trace represent the corresponding signal from the green, detected as a function of time from the red, 5 525-nm channel. G fragments are labeled wlth JOE (0,l)which emits predominantlyin the red channel. T fragments are labeled with FAM (1,O) which emits in the green and red channels at a ratio of -2:l for the conditions used here. A fragments are labeled with both JOE and FAM (1,l). The mdar ratio of JOE to FAM was chosen to give a greenlred detection ratio of ~0.9:l.Gaps in the sequence indicate the location of C. The intensities of the middle two sets of traces were muttiplied by a factor of 2 compared to the top traces; the bottom traces were muttiplied by a factor of 4.

A second requirement for two-color binary coding is that the dyes should have readily distinguishable fluorescence emissions. The dye primers FAM and JOE do not fully meet this requirement because with our apparatus FAM is also detected in the red channel. We have found, however, that computation of the ratio of outputs in the two channels can effectively solve this problem. This ratio is completely independent of the amount of DNA in a band, it is insensitive to a variety of instrumental detection sensitivityfluctuations, and when there is cross talk between the two detection channels, the ratio is a constant parameter that can be recognized in the analysis(see below). The preceding analysis indicates why we chose FAM and JOE as the two dye labels for the binary coding scheme.

Figure 3 presents a DNA sequencing run performed using an array of 25 capillaries. Each capillary image is presented twice, once for the red channel and once for the green channel. The horizontal dimension represents the physical image of the lWpm-i.d. capillaries, and this dimension of the pixels is 13pm. The vertical dimension is proportionalto the elapsed time, and this dimension of the pixels is 1.4 a. The image in Figure 3 presents resolved DNA bands up to -120 bases beyond the primer, and it was typically possible to sequence up to 3W350 bases per capillary in such an array under our conditions (see below). Figure 4 presents a more detailed analysis of the DNA sequence from one representative capillary in an array. An image of the output from the two detection channels is

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18, SEPTEMBER 15, 1992 I

I

I

4

'

I

31

AAAAAA

0

20

$AA.

60

A

b u l kA A

100

*.'

+.

,%&A

140

180

220

A*

' * A A ~ A ~

I

215s

G A C TC T A G A G G A TCCCCGG

$3;-

260

300

Base Number

Flgure5. Plot of the fluorescenceIntensity in the green channeld

W by that In the red channel for each of the peaks In Figure 4. (e) T fragmentslabeled solely with FAM; (A)0 fragmentslabeled solely with JOE; and (0)A fragments labeled with both FAM and JOE. The ratlo was calculated based on the peak maxlma.

presented along with one-dimensional traces formed by integrating the signal across the width of the capillary. Single base resolution is achieved throughout the run, and the sequencecan be read out to base 316. The bases can be called throughout the run by examining the ratio of the signals. For example, when the band appears only in the red channel it is a G {O,l}, when the red and green intensities are nearly equal it is an A {l,l}, when the green intensity is -2 times larger than the red it is a T {l,O],and when a gap appears in the sequence it is a C {O,O}. Figure 5 presents a plot of the fluorescenceintensity in the green channel divided by that in the red channel for each of the peaks in Figure 4. Three distinct distributions of ratios are observed. Peaks with a ratio of 2 are due to T fragments that are just labeled with FAM. A fragments that are labeled with both FAM and JOE give a characteristic ratio of about 0.9. Finally, the G fragments that are labeled with JOE are detected solely in the red channel. Over the entire range displayed, there is essentially no overlap between the distributions, indicating that the individual bases can be accurately called. The ratio of the signal in the green and red channels is a much more reliable parameter than the raw signal strength in any one channel. The fluorescence intensity of the DNA bands can fluctuate due to a variety of factors, especially sequence-dependent termination.26 In the data presented here, the fluorescenceintensity for a particular set of fragments was found to vary by as much as a factor of 20 across the run while the ratios only fluctuate by a factor of about 1.7. Thus, the use of the ratio to call the base identity can reduce the uncertainty of the determination. For example, for the 209 directly observed bands (A, G, T) in Figures 4 and 5, there was only one ambiguous call. The utility of two-color ratio detection has also been recognized in previous slab gel studies which employed a four-dye labeling protoc01.~ When there is residual secondary structure or anomalous migration of the sequencing fragments, the lack of direct detection of the C or {O,O) coded fragments can cause sequencing errors because the presence of C in the sequence is determined solely by the gaps between the other labeled fragments. For the sequencing run presented in Figure 4, the error rate due to this effect was 5.4 5% (15errors/280 bases). There are a number of ways to address this issue. To achieve a low error rate, sequencing is typically performed at least twice, either by repeated sequencing of the same strand or by sequencingthe complementary strand. If the same strand is resequenced,then a different coding algorithm can be used.

-

(26) Tabor, S.;Richardson, 86, 4076-4080.

C . C . Proc. Natl. Acad. Sci. U.S.A. 1989,

13

55

49

61

Base Sumber

Figure 6. The top trace presents the M13mp18 DNA sequence from bases 43 to 61 using the blnary coding assignment in Table I. The bottom trace presents a second sequencing run on M13mp18 using a modlfled binary coding assignment where the C fragmentsare labeled with FAM (1,0}and the T fragments are not synthesized (0,O). The solid line is the fluorescence slgnal detected in the green channel; the dotted line Is the signal detected in the red channel.

For example, we can change the coding of the C fragments to {l,O) and that of the T fragments to {O,O}. This approach is illustrated in Figure 6. The top traces in Figure 6 show that the number of C fragments between bases 55 and 60 is potentially ambiguous using the original binary coding. By interchanging the coding for the T and the C fragments, it is easily determined that there are four consecutive C fragments between Tss and the next G. A second approach to solving the spacing problem would be to sequence the complementary strand with the original coding. Using the binary coding assignment in Table I, the presence of C in the original sequenceis indicated by the detection of a G fragment with a (0,1}coding in the complementary strand. Finally, an alternative approach is to label all four sets of fragments with a unique ratio of the dyes JOE and FAM. The data in Figure 5 suggest that the ratios are sufficiently distinctive and separate that the use of four ratios of FAM and JOE will be practical. Experiments in this direction are now in progress.

CONCLUSIONS Capillary array electrophoresis coupled with two-color, confocal-fluorescence detection and two-dye labeling has the potential to be a useful, high-speed, high-throughput DNA sequencing method. An important and intrinsic advantage of CAE over high-speed DNA sequencing methods based on slab ge1s"J is the ease with which multiple samples can be electrokineticdy loaded (e.g., see Figure 1). However, methods must be developed for easily filling and manipulating arrays of capillaries-this effort may benefit from the use of low-viscosity, linear-polymer separation mat rice^.^' The instrumental limitations on the overall throughput of our system depend on the total number of capillaries that can be scanned, the scan rate, the scan repetition period, and the capillary outside diameter. If we employ 150-pm-0.d. cap(27) Grossman, P.D.;Soane, D.s. Biopolymers 1991,31,1221-1228.

2154

ANALYTICAL CHEMISTRY, VOL, 64, NO. 18, SEPTEMBER 15, 1992

illaries, a 3 cm/s scan rate, and a 2-s repetition period, then 200 capillaries can be scanned. The length of the sequencing run and the reliability of the base calls depend on the gel matrix, the sequencing reaction conditions, the dye labeling protocol, and the detection sen~iti~ty.1~-~3,15,16 Using the conditions reported here, we can sequence -300 bases per capillary,and sequencing500 bases or more in a single capillary runhas been reported.28 Assuming that the reliable detection of 500 bases per capillary can ultimately be achieved, CAEbased DNA sequencing should produce a raw sequencing rate of 100 O00 bases in -2 h using 200 capillaries. This is close to the rate necessary for the success of the Human Genome Projects4 Recent success in the application of CE to theseparation of restriction fragments and PCR-amplified producta24p29,30 suggests that capillary array electrophoresis may (28) Chen, D. Y.; Swerdlow, H. P.; Harke, H. R.; Zhang, J. Z.; Dovichi, N. J. J. Chromatogr. 1991,559, 237-246. (29) Cohen, A. S.; Najarian, D.; Smith, J. A.; Karger, B. L. J. Chronuatogr. 1988,458,323-333.

also be useful in the, development of high-speed, highthroughput mapping and diagnostics.

ACKNOWLEDGMENT We thank Alexander N. Glazer and Jiun-Wei Chen for valuable discussions and assistance. This research was supported by the Director, Office of Energy Research, Office of Health and Environmental Research, of the US. Department of Energy under Contract DE-FG-91ER61125. X.H. was supported by a Human Genome Distinguished Postdoctoral Fellowship sponsored by the U.S.Department of Energy, Office of Health and Environmental Research, and administered by the Oak Ridge Institute for Science and Education.

for review March 4, 1992. Accepted July 7, (30) Schwartz,H.E.;Ulfelder,K.;Sunzeri,F.J.;Busch,M.P.;Brownlee,RECEIVED R.G. J. Chromatogr. 1991,559, 261-283. 1992.