Human Genome Initiative Analytical
Challenges 1
our steps—electrophoresis, detection, pattern analysis, and transcription to computer— take eighty percent of the labor in sequencing [DNA]," explains Lloyd Smith of the University of Wisconsin at. Madison. For researchers those steps are, admits Smith, boring. "This is not brainy work." Not only do these steps need to become less labor intensive, but the associated costs must drop to less than a penny per base to make sequencing the entire human genome an affordable project. Here is where analytical chemists can have a major impact on the Human Genome Initiative. "The initiative is technology driven," explains Smith, "but biochemists and geneticists are not instrument developers. Analytical chemists have the expertise to do this." At the ACS fall meeting in Washington, DC, a four-day symposium on the genome project organized by Smith detailed some of the progress and problems of the initiative. In this second FOCUS on that symposium (Part I appeared in the Dec. 15,1990, issue), the analytical techniques and strategies being developed to sequence DNA are presented. Geneticists calculate that there are about 3 billion bases in the human genome. Just over 5 million have been translated to date, primarily in identified genes (i). Separating the gene sequences are large stretches of what has been dubbed "nonsense DNA." Some biologists argue that the name is a misnomer—that the purpose of these stretches is yet to be determined. Sequencing of the genome should help to settle that debate. The importance of accuracy and the inherent variability in chromosomal DNA dictate the need to determine sequences several times. Although the
"Fi
United States has a leading role in the project, the task will require an international effort. The question of whose genome will be translated is therefore not relevant. Working in the labs of Leroy Hood at the California Institute of Technology, Smith pioneered one of the first automated approaches to the four laborintensive steps in sequencing. That work now forms the basis of some of the commercially available automated sequencers (2, 3). Basically, the procedure automates and improves methods first introduced by Sanger and Coulson (4). The DNA segment to be sequenced serves as a template for an enzyme-catalyzed reaction that assembles multiple copies of the target DNA. A modified nucleo-
FOCUS side (which is a DNA base coupled to a carbohydrate) is included in the synthesis. Anytime this nucleoside is incorporated into the oligonucleotide sequence it terminates the chain, producing DNA pieces ending with a particular base. In addition, the primer oligonucleotide used as the starting point in the synthesis is uniquely tagged with one of four fluorophores, a different one for each base-specific reaction. The reaction is repeated three more times, using different chain-terminating nucleosides for the other bases. The DNA fragments are pooled together and run on an electrophoresis slab gel. As the DNA separates and moves toward the end of the gel, a laser scans the gel and excites the fluorophores. The resulting color is "read" by a photomultiplier tube, providing the identity of the end base. Because the electrophoretic separation is based on
size, the order of reading also corresponds to the DNA sequence. By automating the process, up to 24 samples can be run simultaneously, generating sequences 250-450 bases long in each lane during a 12-h run. However, says Smith, sequencing efficiency still needs to be improved by at least an order of magnitude before geneticists can tackle wholesale DNA decoding. Separations with capillary electrophoresis Smith's group recently took one step toward that goal by replacing slab gel with capillary gel electrophoresis (CGE). Employing the same chemistry to label DNA fragments, they have been running their separations on polyacrylamide gel-filled capillaries at 200 V/cm, resolving >300 bases per run. They report a tenfold improvement in sensitivity, detecting fluorophores at attomole levels. They also report separation rates for experiments run at 400 V/cm that are ~ 2 5 times faster than those obtained with conventional slab gel techniques (5). Smith's group is now developing the techniques to run multiple samples simultaneously. Norman Dovichi's group at the University of Alberta has also turned to CGE to separate DNA fragments. Using a sheath flow detection system and the four-color fluorescent probe technique, they can sequence ~500 bases per run and achieve better resolutions than those of conventional slab gel electrophoresis. Furthermore, by employing an Ar-ion laser to excite the fluorescein-labeled DNA and a microscope objective mounted at right angles to the laser for detection, they can measure subattomole concentrations— as little as 600 fluoroprobes. This allows detection of the longer DNA segments present at low concentrations.
ANALYTICAL CHEMISTRY, VOL. 63, NO. 1, JANUARY 1, 1991 • 25 A
FOCUS Separations have been demonstrat ed at potentials as high as 500 V/cm, yielding run times faster than those of commercial systems. However, capil lary gel stability and thus reproducibil ity are a problem at potentials that are greater than 350 V/cm. Dovichi has recently received fund ing to expand this system. His group will be developing methods to run up to 50 capillaries in parallel. Barry Karger, who helped to pioneer capillary electrophoresis (CE), report ed that his group at Northeastern Uni versity has been studying gel stability and performance. The key is in the preparation of the polyacrylamide gel capillaries. Karger's group recom mends using the highest quality mate rials, making sure that the gel is bub ble-free, and loading by ramping the voltage slowly before initiating a run. Another important consideration is the conductivity of the sample solution be ing injected. Following this procedure, his group is able to run sequencing sep arations multiple times and reproducibly at 300 V/cm on the capillary gels. Karger also tackled the question of how far CE can be pushed. His "back of the envelope" calculations indicate that a 2-h sequencing run with peaks emerging every 0.1 min, so that the rel ative separation or ratio of times be tween two adjacent peaks (a) is just 1.0008, could require almost 25 million theoretical plates. These numbers are achievable, says Karger, who has mea sured theoretical plates in excess of 3 X 107 per meter. Another speaker advocating CE sep arations, although with a twist, was Thomas Brennan of Genomyx Corpo ration. Instead of running DNA frag ments with fluorophores tied to the end, Brennan has been investigating the use of four stable isotopes of sulfur: 32S> 33S) 34S> and 36S. T h e s e i s o t opes are then detected by MS, possibly offering a sequencing rate of 10 kb per day. Like the fluorophores, each sulfur isotope marks a particular DNA base at the end of the fragment. Once again Sanger-type chemistry is employed, using α-thiotriphosphate to introduce the particular sulfur isotope. As the pooled DNA fragments sepa rate and exit the capillary, each droplet is swept by a piezoelectric jet across a 1-cm air gap and into a 1000 °C fur nace. Everything in the ~50-pL droplet is burned, converting the sulfur tags to SC*2. The isotopically labeled SO2 is then fed into a mass spectrometer, and the terminal DNA base is determined by the observed mass. The CE/MS system collects samples at the rate of 10 drops per second. For peaks about 5 s wide, Brennan mea
sures about 3 X 107 molecules per peak. According to Brennan, the four sulfur isotopes are available in > 99.7% puri ty. He calculates that the isotope cost should be around one-tenth of a cent per 100 base pairs. Mass spectrometry techniques
Brennan was not alone in looking at MS as a means to sequence DNA. Rich ard Smith's group at Pacific Northwest Laboratories has coupled CE with MS using electrospray ionization. This technique has been successful in deter mining a mass of about 24 800 for transfer RNA from yeast. Smith says that the uncertainty in the mass is at tributable to the difficulty in determin ing the numbers of Na atoms associat ed with the RNA. The presence of Na also reduces signal intensity, although his group observes a very high ioniza tion efficiency for large molecules into the million Dalton range. To sequence DNA, Smith suggests that tandem MS techniques might then be employed. His group is currently developing this approach to sequence proteins. He pro poses that methods based on ion cyclo tron resonance MS using sequential photodissociation steps could be suit able for large DNA segments. A number of groups have adopted matrix-assisted laser desorption MS, developed by Franz Hillenkamp and Michael Karas at the University of Munster in Germany (6), as a means to determine molecules as heavy as 200 000 Da. Much of the work to date has focused on proteins (7,8), although the same approaches may be applicable to sequencing DNA oligonucleotides. For example, Peter Williams and his group at Arizona State University are investigating laser ablation of DNA frozen in aqueous buffer solutions cou
pled with analysis by time-of-flight (TOF) MS. Williams calls it "steampowered M S " (9,10). In this process, a DNA solution is frozen as a thin film on an oxidized Cu substrate cooled by liq uid nitrogen. An excimer-pumped dye laser operating in the range of 578-589 nm blasts the sample with > 108 W/ cm2. In a nanosecond the copper tem perature zooms to several thousand Κ and the ice is shock-heated to several hundred K. It explodes into vapor, car rying the sample into the gas phase. The Cu substrate plays an important role, acting as a chromophore to absorb the radiation and as a "shock driver" to rapidly heat the ice film. Ionization is believed to result from multiphoton ionization of solutes in the ice, generating ions such as Na + that attach to DNA molecules. The ionized DNA is then mass-separated by TOF. According to Williams, the ability to obtain intact molecular ions with little fragmentation indicates that MS could replace electrophoresis in the analysis of Sanger-type sequencing mixtures. Multiplex sequencing
Another way to accelerate the rate of sequencing is to combine steps into a multiplex approach (11,12). According to Raymond Gesteland of the Universi ty of Utah, established techniques are run in parallel, a process carried out by piggybacking many of the labor-inten sive steps. The multiplex procedure starts early in the sequencing process (Figure 1). A chromosomal fragment is cut into small pieces and inserted into specially modified vectors for cloning. Each vec tor has two unique identifier sequences next to the ends of the foreign DNA fragments, so that up to 40 vectors
Figure 1. A diagram of multiplex DNA sequencing using 32P hybridization probes.
26 A • ANALYTICAL CHEMISTRY, VOL. 63, NO. 1, JANUARY 1, 1991
yield 40 different DNA inserts marked with 80 identifiers. Eighty, says Gesteland, is the maximum number they can handle per run. The vectors are pooled, amplified, and then sequenced. Sequencing relies on the Sanger en zyme method. Samples are withdrawn from the DNA pool and divided into four tubes. The DNA in each tube is copied from the universal primer site, and each tube contains one of the four chain terminators. Each terminated mixture is loaded onto a lane of a polyacrylamide sequencing gel so that 96 lanes represent 24 sets of samples. The DNA separates by size, and the frag ments are transferred to a nylon mem brane for analysis. Oligonucleotide probes are prepared that hybridize uniquely to one of the identifier tags. The probes are labeled with 32 P for detection by autoradiogra phy. Gesteland's group loads the mem brane into a drum that rotates with a solution of one of the 32 P probes. After hybridization is complete, a 6-ft-long
of the transposon, which would insert into a type of cloning system known as a cosmid. In this way they could double the length of the sequence they read with no increase in labor. To accom plish this they need to map where these transposons insert. Their goal is to se quence one cosmid (which can be up to 45 kb) per week. Single molecule detection The most original and probably the most technically challenging strategy for sequencing DNA was that proposed by Richard Keller from Los Alamos National Laboratory. "We," he de clares, "are going for rate and fragment length." The Los Alamos team hopes to se quence 1000 bases per second, tackling DNA pieces several tens of kilobases in length. This approach significantly re duces DNA handling. To do this they are developing a brute-force technique based on laser-induced fluorescence of single molecules carried in a laminar
"W,
says Richard Keller, "are going for rate and fragment length."
film that can record up to 20 kb is placed against the membrane. A bal loon inside the drum inflates, pressing the film against the membrane. The probe is then washed off, and the pro cedure is repeated for another identifi er tag. Therefore with 40 vectors 80 re peats are necessary. The films are read like the fluoroprobe gels to yield the sequence. Gesteland says that his group can do one probe per day, but a major time-con suming step is reading the film. Efforts to eliminate the radioisotope have been frustrated by the background fluores cence of the nylon membrane. Howev er, Gesteland says that his group has had some success with chemiluminescent detection, which promises im proved sensitivity. The Utah group has also been look ing at ways to organize the multiplex procedure by using insert transposons. These are sequences that insert into different locations in a length of DNA, like cutting into a queue. Their strate gy is to place the identifiers on the end
flowing stream of water (see Reference 13). It is an approach best suited for a national laboratory that can assemble a research team of physicists, chemists, biologists, mathematicians, and engi neers. The procedure begins with a frag ment of unknown DNA serving as a template to build a copy known as a complementary DNA (cDNA) se quence. Each base used to build the cDNA sequence is labeled with one of the four-colored fluorophore tags. To manipulate a single cDNA strand, a biotin molecule would be at tached to one end of the sequence. Bio tin binds to the protein avidin which, in turn, is connected to a microsphere. The microsphere allows researchers to handle a single cDNA via micropipette or optical manipulation techniques. To sequence the DNA, the labeled cDNA is suspended in a flowing stream in the presence of an exonuclease, an enzyme that rapidly snips off one base at a time from the end (the fluorescent tag with its linker arm to the nucleotide
must be designed not to interfere with the cutting). As fast as the enzyme cuts, bases are carried away by the flowing water. The base flows through a fo cused laser beam that excites the fluo rescent tag millions of times, limited only by photobleaching. The maximum number of times a tag can be cycled is determined by the transit time divided by the fluorescent lifetime. The result ing fluorescence color of the individual molecules determines the base. Detection is hindered by interfer ences that arise from Raman and Rayleigh scattering from the solution as well as light scatter off cell walls. These interferences are reduced by hydrodynamically focused flows to minimize water volume to ~10~ 1 2 L in the obser vation cell. Furthermore, to distinguish between background and sample fluorescence, the fluorophores are excited with pico second pulses and fluorescence detec tion is delayed until the background subsides. With this detection tech nique, the Los Alamos researchers have detected individual rhodamine-6G molecules in a digital fashion as they pass through the focused laser beam. All of these approaches still have a ways to go before we will know if they live up to their promises. Fortunately, there is a grace period before largescale sequencing of the genome begins. For the next 10-15 years chromosomal mapping will hold center stage, provid ing the landmarks needed for sequenc ing. Now is the time for analytical chemists to join the genome project, adding their talents to this unique and historic endeavor. Alan R. Newman References (1) Stephens, J. C; Cavanaugh, M. L.; Gradie, M. L; Mador, M. L.; Kidd, Κ. Κ. Sci ence 1990,250, 237. (2) Smith, L. M. Anal. Chem. 1988, 60, 381 A. (3) Trainor, G. L. Anal. Chem. 1990, 62, 418. (4) Smith, A.J.H. Methods Enzymol. 1980, 65, 560. (5) Drossman, H.; Luckey, J. Α.; Kostichka, A. J.; D'Cunha, J.; Smith, L. M. Anal. Chem. 1990,62, 900. (6) Karas, M.; Hillenkamp, F. Anal. Chem. 1988 60 2299 (7) Voress, L. Anal. Chem. 1990,62,919 A. (8) Basic, C; Freeman, J. Α.; Yost, R. A. Anal. Chem. 1990,62,1113 A. (9) Nelson, R. W.; Rainbow, M. J.; Lohr, D. E.; Williams, P. Science 1989, 246, 1585. (10) Nelson, R. W.; Thomas, R. M.; Wil liams, P. Rapid Commun. Mass. Spectrom. 1990,4, 348. (11) Church, G. M.; Kieffer-Higgins, S. Sci ence 1988,240,185. (12) Yang, M. M.; Youvan, D. C. BioTechnology 1989, 7, 576. (13) Nguyen, D. C; Keller, R. Α.; Jett, J. H.; Martin, J. C. Anal. Chem. 1987,59, 2158.
ANALYTICAL CHEMISTRY, VOL. 63, NO. 1, JANUARY 1, 1991 • 27 A