Eight Hundred-Base Sequencing in a Microfabricated Electrophoretic

Jun 14, 2000 - Lance Koutny, Dieter Schmalzing, Oscar Salas-Solano, Sameh ... Scott Buonocore, Kevin Abbey, Paul McEwan, Paul Matsudaira,† and D...
0 downloads 0 Views
Anal. Chem. 2000, 72, 3388-3391

Correspondence

Eight Hundred-Base Sequencing in a Microfabricated Electrophoretic Device Lance Koutny, Dieter Schmalzing, Oscar Salas-Solano, Sameh El-Difrawy, Aram Adourian, Scott Buonocore, Kevin Abbey, Paul McEwan, Paul Matsudaira,† and D. Ehrlich*

Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142, and Department of Biology and Division of Bioengineering and Environmental Health, Massachussetts Institute of Technology, Cambridge, Massachusetts 02142

The human genome will be sequenced using capillary array electrophoresis technology. Although currently achieving only 550 base reads per run, capillary arrays have increased the efficiency and lowered the cost of sequencing by eliminating gel plate preparation, reducing sample volumes, and offering automation and speed. However, much higher throughput and greater cost reductions are needed. The next major advancement in sequencing technology is expected from the development of arrays of microfabricated channels in a plate or “chip” format. For de novo sequencing, the practical utility of the microdevice approach has been limited by device length to a read of 500-600 bases per run. We demonstrate a significant milestone for a microfabricated device by obtaining an average read length of 800 bases in 80 min (98% accuracy) for either M13 standards or DNA sequencing samples from the Whitehead Institute Center for Genomic Research (WICGR) production line. This result is achieved in 40-cm-long channels using a new class of large-scale microfabricated devices. Both microfabrication of extended structures and achievement of long reads are essential steps toward a 384-lane very-largescale microfluidic (VLSMF) system for ultrahigh-throughput DNA sequencing analysis, currently under construction in our laboratory. Recently we and others reported reading 500 and 565 bases in 20 and 27 min, respectively, in short electrophoretic microchannels filled with linear polyacrylamide (LPA) as the sieving matrix.1,2 This is far superior to the performance of capillaries for comparable read length.3 While these results are adequate for many sequencing and screening applications,4 whole genome * Corresponding author. Email: [email protected]. † Massachusetts Institute of Technology. (1) Liu, S.; Shi, Y.; Ja, W. W.; Mathies, R. A. Anal. Chem. 1999, 71, 566-573. (2) Schmalzing, D.; Tsao, N.; Koutny, L.; Chisholm, D.; Srivastava, A.; Adourian, A.; McEwan, P.; Matsudaira, P.; Ehrlich, D. Genome Res. 1999, 9, 853858. (3) Dovichi, N. J. Electrophoresis 1997, 18, 2393-2399. (4) Schmalzing, D.; Belenky, A.; Novotny, M. A.; Koutny, L.; Salas-Solano, O.; El-Difrawy, S.; Adourian, A.; Matsudaira, P.; Ehrlich, D. Nucleic Acids Res. 2000, 28, e43.

3388 Analytical Chemistry, Vol. 72, No. 14, July 15, 2000

sequencing requires significantly longer reads to minimize sequence assembly.5,6 Finishing costs and logistics are known to be essential factors in genome center productivity. On the basis of elecrophoretic performance of the best sieving materials, we have concluded that longer reads demand longer channels.7 This requirement is explained by our finding that resolution on shorter devices is not significantly improved by ultra-short micro-injectors, reduced diffusion times, or fine tuning of voltage. We have, therefore, constructed long straight separation channels in 25 cm × 50 cm glass plates to extend the read length for microfabricated devices and have evaluated the device performance. EXPERIMENTAL SECTION Microfabrication. Devices were built from 0.11-cm-thick × 25-cm-wide × 50-cm-long glass plates (Corning 1737F) using photolithography, chemical wet-etching methods, laser drilling to form access holes, and thermal bonding. The larger format required specialized photoresist spinning techniques which were performed by Telic Co., Santa Monica, CA. To avoid potential difficulties with contact printing of the large thin plates, each plate was directly written with a UV laser system (Advance Reproductions, North Andover, MA). Channel etching and cover-plate bonding proceeded as described previously8 with the aid of larger custom-built plate holders and greater attention to cleanliness. There were 32 identical, isolated channels per plate. Total channel length was 45 cm; effective length was 40 cm. The channels were 40 µm deep and 90 µm wide at the top. The cross-injectors were 150-µm-long and the two sidearms forming the loading channels were each 0.5 cm in length. Glass reservoirs (Ace Glass, Vineland, NJ) of 50-µL volume were affixed around the exit holes to hold sample and buffer. Sample Preparation. Samples were prepared at WICGR. The vector was M13mp18 or M13mp18 with approximately 2 kb human DNA inserts from chromosome 17. DYEnamic ET M13(21) primer (5) Koonin, S. E. Science (Washington, D.C.) 1998, 279, 36. (6) Mullikin, J. C.; McMurray, A. A. Science (Washington, D.C.) 1999, 283, 1867-1868. (7) Schmalzing, D.; Adourian, A.; Koutny, L.; Ziaugra, L.; Matsudaira, P.; Ehrlich, D. Anal. Chem. 1998, 70, 2303-2310. (8) Koutny, L. B.; Schmalzing, D.; Taylor, T. A.; Fuchs, M. Anal. Chem. 1996, 68, 18-22. 10.1021/ac9913614 CCC: $19.00

© 2000 American Chemical Society Published on Web 06/14/2000

Figure 1. Eight hundred base read of a WICGR four-color DNA sequencing sample in a 40-cm-long microchannel. The 4 panels show the software-processed sequencing profiles and base calls at the beginning, the middle, and the end of the run. Conditions: 150 V/ cm, 50 °C, and 2% (w/ v) LPA in 1XTTE/7 M urea.

chemistry (Amersham Life Science, Inc., Cleveland, OH) was used to prepare the sequencing reaction mixtures. Fifty to 200 ng of template DNA was added to each of the four monomer reactions. The reactions were thermocycled for 20 cycles using standard conditions, pooled together, and gel-filtrated according to the manufacturer’s protocol (Centrisep, Princeton Separations, Adelphia, NJ). The samples were brought to a final volume of 50 µL with deionized water. Electrophoresis. The instrumental setup with laser-induced fluorescence detection has been described recently.7 The entire microchannel structure was chemically passivated to prevent

electroosmotic flow and sample adsorption to the walls.9 LPA sieving matrix was synthesized in-house as powder by inverse imulsion polymerization.10 LPA powder was dissolved in 1 X TTE (Tris/TAPS/EDTA) buffer containing 7 M urea. The 2% (w/v) LPA sieving solution was replaced between each run from the anodic end of the separation channel. Pre-electrophoresis was for 15 min at 150 V/ cm and 50 °C. Ten microliters of sample was pipetted onto the device. The samples were not heat(9) Hjerten, S. J. Chromatogr. 1985, 347, 191-198. (10) Goetzinger, W.; Kotler, L.; Carrilho, E.; Ruiz-Martinez, M. C.; Salas-Solano, O.; Karger, B. L. Electrophoresis 1998, 19, 242-248.

Analytical Chemistry, Vol. 72, No. 14, July 15, 2000

3389

denatured. For loading, they were electrophoresed for 3 min at 300 V/ cm across the loading channel. For injection and separation, the voltages were switched to create a field strength of 150 V/cm in the separation channel. Pullback voltage was 20 V/cm. At the WICGR, samples had been run on ABI377 using 52-cm plates with 48 cm from well to read at 2.4 kV or approximately 46 V/cm for 10 h. The gel was 5% Long Ranger cross-linked acrylamide. Data Processing. ABI377 data was processed using Plan package and base-called using Phred.11,12 Microdevice data was collected at an acquisition rate of 6 Hz using a custom software written in HPVEE (Hewlett-Packard, San Jose, CA) and processed using the base caller Trout. Trout is available on the WICGR ftp site (genome.wi.mit.edu) in the directory distribution/software/ trout. Documentation is provided with the program. Resolution Calculation. Resolution was calculated as described recently.2 RESULTS AND DISCUSSION The microchannel layout consisted of a simple cross formed by a 40-cm-long straight separation channel and two short offset sidearms each of 0.5 mm in length. The intersecting arms defined a 150-µm-long injector channel with an injection volume of approximately 0.5 nL. LPA was used as the separation medium.13 Figure 1 depicts three representative sections of a four-color sequencing run performed in a 40-cm-long microchannel. The DNA sequencing sample was generated from the WICGR production line, and the data were processed by Trout sequencing software. The primer eluted at 20 min. DNA fragments of size 100, 400, 600, and 800 bases passed the detector in approximately 27, 42, 52, and 59 min, respectively. The total run time was 78 min. The total read length for this particular run was 800 bases with an accuracy of 98%. The same 800-base analysis took 10 h on an ABI377. For comparison, the capillary-based ABI3700 currently requires 2 h to obtain only 550 bases with 98.5% accuracy using POP flowable gel and 3 h for 800 bases at 98.55% accuracy on the MegaBACE 1000 system using linear polyacrylamide gel.14 Device performance was evaluated with nine M13mp18 sequencing samples. Because read length is dependent on resolution but influenced greatly by other factors including base-calling software, we plotted average single-base resolution as a function of DNA fragment size (Figure 2). Resolution is a quantitative measure of pure electrophoretic performance,15 and single-base resolution of 0.5 is commonly set as the threshold.16 Below this value, no accurate sequence information can be obtained without additional support from sequencing software. The figure shows that average resolution stayed above 0.5, between approximately 40 and 800 bases, and ranged between 0.7 and 1.1 throughout the entire midrange, 100-700 bases, of fragment sizes. The resolution decreased only gradually beyond 800 bases. It mea(11) Ewing, B.; Hillier, L. D.; Wendl, M. C.; Green, P. Genome Res. 1998, 8, 175-185. (12) Ewing, B.; Green, P. Genome Res. 1998, 8, 186-194. (13) Carrilho, E.; Ruiz-Martines, M. C.; Berka, J.; Smirnov, I.; Goetzinger, W.; Miller, A. W.; Brady, D.; Karger, B. L. Anal. Chem. 1996, 68, 3305-3313. (14) Swanson, D. The Scientist 2000, 14 (3), 23-24. (15) Luckey, J. A.; Norris, T. B.; Smith, L. M. J. Phys. Chem. 1993, 97, 30673075. (16) Best, N.; Arriaga, E.; Chen, D. Y.; Dovichi, N. J. Anal. Chem. 1994, 66, 4063-4067.

3390 Analytical Chemistry, Vol. 72, No. 14, July 15, 2000

Figure 2. Average single-base resolution as a function of fragment size on a 40-cm-long microfabricated channel calculated from the A traces of eleven four-color sequencing runs of M13.

sured approximately 0.4 at 900 bases and 0.3 at 940 bases. In comparison, in 11.5-cm channels,2 only fragments in the range from 35 to 510 bases had a resolution above 0.5, and resolutions better than 0.7 were only measured for bases 50-470. Thus, as predicted by the model, electrophoresis in a long device increases the overall resolution. We then analyzed the quality of the M13mp18 data (Figure 3) by plotting the average number of total and specific errors as a function of fragment size after comparison with the M13mp18 consensus sequence. The graph shows that the average read length was 800 bases with an accuracy of 98% (the 2% error line crosses the total error line at 800 bases). On average, there were 10 errors between 0 and 700 bases. Half of these errors were under calls and were localized within the first 40 bases. This region is known from slabs and capillaries to be highly susceptible to errors due to anomalous migration behavior.17 The region between 300 and 700 bases was error-free. We found, on average, 8 additional errors between 700 and 800 bases, mainly under calls and mismatches. The error rate increased dramatically beyond 800 bases. Comparison of Figures 2 and 3 reveals that the number of base-calling errors increased as the resolution threshold was approached. This finding implies that Trout was not extending the read length beyond the one reached by electrophoretic performance alone. Modification of Trout or application of other base-calling software could lead to some extension of total read length, even without any additional adjustment of electrophoretic parameters. Although the increased length of the microchannel notably extended read length, two other factors, the good signal-to-noise (17) Tong, X.; Smith, L. J. DNA sequencing and mapping. 1993, 4, 151-162.

Figure 3. Analysis of total and specific base-calling errors. Eleven four-color sequencing runs of M13mp18 samples were processed and base-called by Trout and compared with the M13mp18 consensus sequence. Many errors occurred in the first 40 bases and after 700 bases. Over calls are when a base is incorrectly inserted in the sequence, under calls are bases not called.

ratio and signal uniformity observed in all the runs, contributed as well. In contrast, signal strength and stability can drastically vary in capillary runs and impair read length and reproducibility. The elektrokinetic injection method used in capillaries is rather sensitive to variations in salt and template concentrations,18 which are difficult to control in high-production environments. This problem can be circumvented by cumbersome and costly sample cleanup procedures tailored for capillaries.19,20 Microfabricated devices might be far superior in this respect, since simple sample cleanup appears to be sufficient for robust long-read DNA sequencing on such devices, due to the unique microfabricated cross-injector.21 More detailed studies of these injection effects and base-calling optimization will be undertaken when the VLSMF 384 lane system becomes fully functional, allowing enough runs to obtain definitive and quantitative results. (18) Figeys, D.; Ahmadzedeh, H.; Arriaga, E.; Dovichi, N. J. J. Chromatogr., A 1996, 744, 325-331. (19) Ruiz-Martinez, M. C.; Salas-Solano, O.; Carrilho, E.; Kotler, L.; Karger, B. L. Anal. Chem. 1998, 70, 1516-1527. (20) Salas-Solano, O.; Ruiz-Martinez, M. C.; Carrilho, E.; Kotler, L.; Karger, B. L. Anal. Chem. 1998, 70, 1528-1535. (21) Jacobson, S. C.; Hergenroeder, R.; Koutny, L. B.; Warmack, R. J.; Ramsey, J. M. Anal. Chem. 1994, 66, 1107-1113.

CONCLUSIONS In summary, we have demonstrated the fabrication of long microchannels in large glass plates and the feasibility of fast and reliable long-read DNA sequencing on such devices. These advances are critical for the construction and operation of VLSMF systems for ultrahigh-throughput DNA sequencing. Moreover, long reads, superior injection schemes, small injection volumes, and simplified sample handling methods should greatly reduce sequencing cost. We believe that this technology, once fully in place, will have a strong influence not only on production sequencing but on many types of DNA analysis by combining speed, robustness, automation, and simplicity. ACKNOWLEDGMENT The work was supported by National Institutes of Health (NIH) under the Grant number HG01389 and by AFOSR (F49620-98-10235).

Received for review November 29, 1999. Accepted April 29, 2000. AC9913614

Analytical Chemistry, Vol. 72, No. 14, July 15, 2000

3391