Automated Synthesis and Sequence Analysis of Biological

and sequence analysis of the two major biopolymers, protein and DNA (1). This article contains descriptions of these instruments, the chemistries on w...
0 downloads 0 Views 13MB Size
Automated Synthesis and Sequence Analysis of Biological Macromolecules

Lloyd M. Smith Department of Chemistry University of Wisconsin—Madison Madison, Wis. 53706

The traditional distinctions between the fields of physics, chemistry, and biology have blurred with time. As the important questions in biological research have become increasingly detailed and molecular in nature, the techniques needed to answer these questions have drawn increasingly on principles and methods usually ascribed to the fields of physics and chemistry. This fusion has resulted in the instruments and chemistries that constitute the technological foundations of modern biology and that are critical components in the new methods responsible for the explosive growth of modern biology during the last decade. Many of these instruments, such as microscopes and spectrophotometers, have existed for decades; however, technological advances such as the use of imaging methods in NMR have greatly expanded their power and versatility. In the past several years, a new generation of instruments, whose everyday use has had revolutionary consequences, has come into existence. Central among these are the instru0003-2700/88/0360-381 A/$01.50/0 © 1988 American Chemical Society

ments concerned with the synthesis and sequence analysis of the two major biopolymers, protein and DNA (2). This article contains descriptions of these instruments, the chemistries on which they are based, and some of their manifold applications. Protein-peptide synthesis Protein-peptide synthetic methods are important tools in modern biological research. Small peptide fragments corresponding to a given region of a protein may be chemically synthesized in quantity and used as immunogens for

REPORT the preparation of antipeptide antibodies. In many cases, these antibodies will react with the same peptide region in the intact protein and the scientist will thereby have generated with relative ease a powerful and versatile reagent for use in work directed at the isolation, purification, and study of the protein molecule. Peptide hormones may be chemically synthesized for use as pharmaceutical agents or as research tools to study the relationship between structure and function in the intact molecule. Synthetic peptides are also

important in vaccine development, where they permit the rapid investigation of the immunogenicity of different antigens and regions of a given protein. The synthesis of a molecule as complex as a protein by solution organic chemistry is difficult. Proteins are linear polymers composed of 20 different amino acid components t h a t vary greatly in solubility, charge, size, and the presence of reactive substituents requiring protection during synthesis. This heterogeneity in the protein components imparts a similar diversity to the proteins, which complicates attempts to purify the molecules at each step of synthesis. The major breakthrough permitting the chemical synthesis of peptides and small proteins was made by Merrifield in work for which he recently received the Nobel prize (2). The key innovation was the idea of synthesizing the protein molecule attached to a polymeric support. This allows the intermediates in a protein synthesis to be purified easily at each step simply by washing and filtration of the particles. A block diagram of one major strategy used in solid-phase peptide synthesis is shown in Figure 1 (3). In such procedures an insoluble polymeric support is functionalized by the introduction of a reactive group. The C-terminal amino acid, protected

ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988 · 381 A

at the N-terminus, is covalently cou­ pled to the support at the carboxyl group. The N-terminal protecting group of this first support-bound amino acid is removed by treatment with trifluoroacetic acid (TFA), and an amide bond is formed with the activat­ ed carboxyl of a second N-protected amino acid. After appropriate washing steps, the N-terminal amino-protecting group on the newly added amino acid is again removed with TFA, and the cycle is repeated with subsequent amino acids. Any reactive groups on the amino acid substituents are pro­ tected during the synthesis with groups that are not removed by TFA, but only by a later treatment with hydrofluoric acid (HF). Thus the protected poly­ peptide chain is built up systematically on the polymeric support. When the synthesis is complete, the peptide is cleaved from the support, remaining protecting groups are removed by treatment with HF, and the free pep­ tide is isolated. The process of solid-phase peptide synthesis, as described above, is essen­ tially a repeated series of treatments of the support particles with a number of different solutions and solvents, inter­ spersed with appropriate washing, fil­ tration, and, in some cases, heating steps. Although the process itself is not difficult, it is tedious, repetitive, and time consuming, and close attention must be paid to chemical procedures, many of which are unfamiliar to most biologists interested in using the tech­ nology. It was therefore very important that it be automated so that it would be widely available to the research com­ munity. Several commercial instruments for peptide synthesis are currently avail­ able, and in a general sense they are fairly similar. A peptide synthesis in­ strument consists of a computer, which controls the instrument and may be programmed with the amino acid se­ quence to be synthesized; a set of chemicals and solvents, including the 20 or more suitably protected amino acids; a valving system for controlling the delivery of various solutions and solvents; one or more reaction cham­ bers and an interconnecting tubing network; and a physical structure to house these subsystems. A block diagram of one such instru­ ment is shown in Figure 2 (3). This in­ strument has three vessels in which the important reaction steps occur. In the activator vessel, the activated form of the protected amino acid to be coupled is prepared in a solution of dichloromethane (DCM). This solution is transferred to the concentrator vessel, where the DCM is evaporated with ap­ plied heat and replaced with a smaller volume of dimethyiformamide (DMF). This solution is then transferred to the

Figure 1. The chemistry of solid-phase peptide synthesis. [Adapted with permission from Biomedical Polymers; Goldberg, E. D.; Nakajlma, Α., Eds.; Academic: Or­ lando, 1980.)

Amino acid auto injector

Figure 2. Block diagram of an instrument for automated peptide synthesis. The following abbreviations are used in this diagram: TFA (trifluoroacetic acid), DIEA (diisopropylethylamine), DCC (dicyclohexylcarbodiimide), HOBT (hydroxybenzotriazole), BOC-AA (butyloxycarbonylamino acid), and DMF (dimethyiformamide). Numbers 1-6 denote valve blocks. (Adapted with permission from Reference 3.)

382 A · ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988

reaction vessel and reacted with the free N-terminus of the support-bound peptide being synthesized. This N-terminus had been automatically deprotected using TFA in a separate series of steps. In each cycle of the synthesis, a small sample of the deprotected resin is automatically removed for subsequent determination of the level of free amino groups present on the support, thus permitting the efficiency of the synthesis to be monitored. A key feature of this process, and indeed of any support-based chemical process of this type, is the importance of high yield in each step of the solidphase synthesis. Table I shows the theoretical yields of product expected for peptides of different lengths with stepwise yields of either 99.8% or 96.0%. To obtain reasonable overall yields, it is critical to have very high efficiency reactions at each step of the synthesis. Using an optimized chemistry with an average yield of 99.4% per step, it was possible to chemically synthesize the hormone interleukin-3, which is 140 amino acids in length (4). This achievement represents the current state of the art in chemical protein synthesis. Current research on chemical peptide synthesis is focused on the following: further optimizing the chemistry, to permit the synthesis of longer and longer protein molecules in good yield; learning how to fold the resultant proteins into their biologically active conformations (a nontrivial task for the larger molecules); and developing novel amino acid analogues that contain desired functional groups or detectable moieties that may be placed in the protein at any desired site. Site-specific protein modification is particularly important because it is impossible to achieve this in any other manner.

Table 1.

Calculated overall yields in peptide synthesis Overall yields (%) For 96.0% yield per step

For 99.8% yield per step

No. of residues

66 44 29 13 1.7

98 96 94 90 82

11 21 31 51 100

isolated and produced in large quantities for detailed analysis and manipulation. Hence arises the importance of DNA synthetic chemistry. This process allows the chemical synthesis of any desired short stretch of DNA, or oligonucleotide. The ability to obtain at will any specific single-stranded DNA fragment desired, which in turn will specifically recognize only its complementary sequence and not any other, is essential to a wide variety of key molecular biological applications.

It is possible today to routinely synthesize oligonucleotides up to 200 nucleosides in length on commercially available instruments operated by a technician. The cost (from one to several dollars per base for synthesis of one micromole of material) is reasonable, and lengths of ~20-40 bases are suitable for most applications. The instruments are available from several companies and are generally priced in the range of $20,000-$40,000. The most widely used technique involves phosphoramidite chemistry, a process that

DNA synthesis

A fundamental property of nucleic acids, and one at the heart of modern molecular biology, is hybridization, in which a DNA or RNA molecule specifically recognizes and binds to a complementary DNA or RNA sequence. This property derives from the doublestranded nature of DNA, deduced by Watson and Crick in 1953 (5). The specific base pairs formed between the purines and pyrimidines on opposite strands of nucleic acids drive the molecular recognition process (Figure 3). This phenomenon is exploited in DNA sequencing, a technique that permits the precise molecular structure of DNA molecules to be determined; in DNA cleavage and ligation, in which particular enzymes cut DNA molecules into fragments at specific locations and other enzymes allow them to be reattached in new forms; and in cloning, which allows particular genes to be selectively

Figure 3. Structure of double-stranded DNA showing the specific associations between the nucleoside subunits on opposite strands (base pairing).

384 A · ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988

is described below (6'). As was the case for peptides, the synthesis of a molecule as large and complex as DNA, using conventional organic chemical methods, is so formidable that it is impossible. In 1957 the synthesis of a DNA fragment only two nucleosides in length was a substantial achievement (7). This situation changed dramatically in the early 1970s with the introduction of solidphase synthetic methods, similar conceptually to those described above for peptide synthesis. As in peptide synthesis, the oligonucleotide to be made is built up on the solid support one nucleoside at a time in a repeated series of chemical reactions using protected nucleoside precursors (Figure 4). The first nucleoside is attached to the support with a cleavable linking group and protected at the 5'-hydroxyl with an acid-labile protecting group (see Figure 4 for the numbering system used on the sugar ring of nucleosides). In each cycle of the process, the acidlabile protecting group is removed and the next monomer is activated and reacted with the now free 5'-hydroxyl to form a phosphotriester linkage, which is then oxidized to give a phosphodiester linkage. The support is then washed, and the acid-labile group on the newly added monomer is removed prior to the next addition. Thus any desired oligonucleotide sequence is built up on the support by sequential coupling with the various monomers in the appropriate order. At the conclusion of these reactions, the oligonucleotide is cleaved from the support and the protecting groups on the bases are removed to give the desired oligonucleotide product. The overall yield is the product of the yields at each coupling step; the highly optimized chemistry currently used has typical stepwise yields exceeding 99%, leading to a high recovery of the correct product—even for fairly long oligonucleotide sequences. When necessary, the product is further purified, either by preparative gel electrophoresis or by high-performance liquid chromatography (HPLC). As in the case of the synthesis of peptides, the automation of DNA synthetic chemistry was essential for its widespread dissemination in the research community. Conceptually, the instruments used in automated DNA synthesis are very similar to those used in peptide synthesis. The major differences in the instruments stem from the different requirements of the chemistries. The DNA chemistry is somewhat simpler to automate than the peptide chemistry, because no solvent exchange or concentration is necessary, and the support used in DNA synthesis—controlled-pore glass—is mechanically more stable than the resin sup-

Figure 4. The chemistry of solid-phase DNA synthesis. (Adapted with permission from Reference 1.)

ports used in the peptide synthesis. A block diagram of one of the first prototype automated DNA synthesizers is shown in Figure 5 (8). In this early instrument, the activation of the nucleoside phosphoramidite with tetrazole (see Figure 4) takes place in a prereactor

vessel. The activated compound is then delivered to the reactor vessel for coupling to the support-bound oligonucleotide. After the subsequent washing and oxidation steps, the dimethoxytrityl (DMT) group protecting the 5'-hydroxyl is removed with a mild acid and

Figure 5. Block diagram of an instrument for automated DNA synthesis. ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988 · 385 A

collected in a fraction collector. The DMT cation is highly colored, and therefore the reaction may be followed by spectrophotometric quantitation of the released trityl group. In newer instruments, the prereactor vessel is dispensed with, and activation of the phosphoramidite occurs in the reactor vessel itself, simplifying the instrument geometry. Although this technology is effective and is widely used, much room remains for further innovation, primarily in the development of new chemistries. For example, there are many cases in which the ability to chemically synthesize RNA molecules of a defined sequence would be invaluable. To date, RNA synthetic chemistry has lagged far behind DNA chemistry because of the difficulties in protecting the 2'-hydroxyl group of a ribonucleoside without adversely affecting reactivity at the adjacent 3'-hydroxyl and because of the far greater chemical instability of RNA (9). Once these hurdles are overcome, however, the automation of the chemistry, presumably on instruments similar to those used in automated DNA synthesis, will make it widely available to the research community. Another area of great interest is the devising of chemical procedures for synthesis of uncharged polynucleotide analogues (10). Such molecules are able to pass through cell membranes because of their nonpolar character, and once inside the cell, they are able to bind specifically to a complementary RNA molecule and thereby block synthesis of the corresponding protein. The lack of electrostatic repulsion increases the binding strength of the interaction, potentially allowing relatively small amounts of material to be employed. There are many medical and research applications for such techniques, although to date work done in this area still suffers from several shortcomings. A third area of interest involves development and application of new procedures for "molecular engineering" of the chemical properties of the synthetic DNA strands. Examples of this include the attachment of fluorescent dyes for use in automated DNA sequencing (see below); DNA cleaving agents for the production of synthetic restriction enzymes, that is, tailormade molecules that can cut DNA at any desired site determined by the oligonucleotide sequence; intercalating dyes for increasing hybridization strengths of oligonucleotides; crosslinking molecules to covalently join the two strands of DNA in a duplex; and enzymes for nonisotopic detection purposes. In a general sense, custom-designed molecules with specific recognition capabilities as well as desired functional moieties are very exciting and

powerful tools. This is an area of great activity, and many other creative and important applications of such techniques are being developed. The previous sections described the automated chemistries for the syntheses of protein and DNA molecules. The analytical counterparts of these methods are the techniques used to determine DNA and protein structure; for these linear polymers, structure determination is equivalent to determining the linear order, or sequence, of the amino acids (in proteins) or nucleosides (in DNA). The following sections describe the chemistries and instrumentation used in DNA and protein sequence analysis. Protein sequence analysis The most sensitive and widely used method of protein sequence analysis is

based on the Edman degradation chemistry (Figure 6; Reference 11). In this method, the compound phenyl isothiocyanate (PITC; also called the Edman reagent) is reacted with the free N-terminus of a protein or peptide to give the corresponding thiourea. Subsequent treatment with acid cleaves the modified N-terminal amino acid from the protein. This material is collected and analyzed to identify the amino acid. Although protein sequencing is considerably slower and more difficult than DNA sequencing (described below), it occupies the key niche between classical biochemistry and modern molecular biology. In many cases, protein sequencing is the only route to the cloning of a gene; determination of the protein sequence of even a short region of a protein permits oligonucleotide probes complementary to the cor-

Figure 6. The chemistry of protein sequence analysis by the Edman method. PTH represents phenylthiohydantoin.

386 A · ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988

Table II.

Levels of sensitivity of automated protein sequencers

nmol

HQ'

Instrument

100 10 0.1 0.02 0.005

5000 500 5 1 0.25

1967 Edman-Begg spinning cup 1971 Commercial spinning cup 1978 Modified commercial spinning cup 1979 Microsequencing spinning cup 1980 Gas-liquid phase sequenator

* Weight of 50,000-dalton protein, assuming 100% sequenceable material.

responding gene to be synthesized and used in the isolation of the gene by recombinant DNA methods. Once the gene has been isolated, the full arsenal of molecular biological techniques may be brought to bear on the problem, and it is possible to quickly determine the complete protein sequence and the structure of the corresponding gene, to analyze its expression in various tissues, and to produce large quantities for study. Instrumentation plays a key role in protein sequence analysis. There are two major aspects to the instrumentation. First, the degradation chemistry is performed automatically on a dedicated machine (12, 13). This is essential in order to be able to reproducibly and consistently manipulate the small (currently as little as 1 pmol) amounts of material being analyzed. Second, the analysis of the amino acid products is accomplished with HPLC, and the high sensitivity and separation capabilities of modern HPLC are essential to the performance of the method. Since the development of the Edman chemistry in 1950, the instrumentation used in the sequencing chemistry has evolved considerably (Table II). The amount of protein required for se-

quencing has decreased from 100 nmol (5 mg of a 50,000-MW protein) to 5 pmol (250 ng)—more than 4 orders of magnitude. The most sensitive instruments currently available are variations on the basic design used in the gas-liquid solid-phase protein sequencer (13). In this instrument, the protein to be sequenced is spotted on a quaternary ammonium-derivatized polymeric support (called Polybrene). The protein associates with the support through ionic interactions. This limits the types of solvents that may be used in protein sequencing to those that are fairly nonpolar and will not cause elution of the protein. This is accomplished in the gas-phase instruments by delivering both the base used in the coupling reaction and the acid used in the cleavage reaction in the gas phase. Thus these reagents can participate in the process effectively, without removing the protein from the support. A block diagram of the gas-liquid solid-phase protein sequenator is shown in Figure 7 (13). The protein or peptide sample to be sequenced is applied to a Polybrene-impregnated glass fiber disk, which is inserted into the reaction chamber cartridge of the sequenator. The reagents (R1-R4 in Fig-

Figure 7. Block diagram of an instrument for automated protein sequence analysis. (Adapted with permission from Reference 13.)

ure 7) and solvents (S1-S4) are delivered to the cartridge under the control of a microprocessor. After the acid cleavage step, in which the N-terminal amino acid derivative is cleaved from the protein, the amino acid is washed into the conversion flask, where it is converted into a more stable derivative in a second acid treatment. This material, in turn, is rinsed into the fraction collector, where it is collected for subsequent HPLC analysis. These instruments are the current state of the art in protein sequence analysis, and they constitute an extremely powerful and important technology. Nonetheless, there is much room for further improvement of protein sequencing (14), particularly in two areas. First, although the sensitivity of the current approach is extremely high, it is insufficient for many important applications. Many of the most interesting and important proteins are present in biological systems only in trace amounts. These include many of the proteins important in gene regulation, cellular development and differentiation, and the control of intracellular processes such as cell division and growth control. In these cases, the researcher often is limited to identification of the protein by its mobility on a gel and is hampered by a lack of structural information and an inability to obtain sufficient material for further study. Ideally, one would like to have sufficient sensitivity in protein sequencing to allow analysis of any spot that one could detect on a gel. On a two-dimensional gel, spots corresponding to more abundant proteins are present in amounts of about 1 pmol, and many proteins of great interest are present at one-hundredth or one-thousandth of that level. If it were possible to obtain sequence information on these proteins, the impact on biology and medicine would be tremendous, and we would see the establishment of new fields of research. There are three basic components to protein sequencing that limit the sensitivity to its current level (1-10 pmol). These are the lack of techniques for handling small amounts of protein without losses and chemical modification; the intrinsic detectability of the PTH amino acids, which have relatively low extinction coefficients and are detected by their absorption of light in the UV (a wavelength region prone to background noise); and the "chemical noise," visible as a background of peaks on the HPLC chromatograms on which PTH amino acids are analyzed. One approach to improving protein sequencing analysis involves redesign of the support chemistry, degradation chemistry, detection methods, and instrumentation to increase sensitivity by further orders of magnitude (14).

ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988 · 387 A

A second major area in protein sequencing that can be improved is the amount of sequence information obtained in a given analysis. The chemistry of protein sequence analysis is not perfect; incomplete reactions at each step of the process result in an increasing accumulation of background noise in the chromatograms as the sequencing run progresses. Typically, for a high-sensitivity analysis, fewer than 10 sequential amino acids are obtained in a given analysis, although in lower sensitivity applications as many as 100 may be obtained. If it were possible to improve the procedure so that a significant portion or even all of a protein sequence could be determined at one time, the procedure would have many important applications. In principle, mass spectrometry is one way in which this could be achieved, and indeed it is the only other practical method for protein sequence analysis (15-17). DNA sequence analysis

DNA sequence analysis is one of the pivotal methods of modern molecular biology. It plays a central role in virtually every project that involves the cloning, characterization, and manipulation of RNA or DNA. Since the development of rapid sequencing techniques in the early 1970s, more than 14 million bases (the current amount of sequence in GENBANK, the U.S. repository of sequence data) of DNA have been sequenced by manual techniques, and this database is growing at a rate of over 30% per year. This DNA sequencing has all been accomplished with either of two existing techniques: the enzymatic method of sequencing, developed by Sanger and Coulson (18), and the chemical degradation method, developed by Maxam and Gilbert (19). In both methods, sets of radiolabeled DNA fragments with a common origin, but terminating at a particular base or bases, are produced in the sequencing reactions. The two methods differ in the means by which the DNA fragments are produced. The enzymatic method is diagrammed in Figure 8. In this procedure, a cloned copy of the DNA region to be sequenced is used as the template for an enzymatic reaction that copies the DNA sequence into a new DNA strand. In this enzymatic reaction, a modified nucleoside lacking a 3'-hydroxyl group is included in the reaction mixture (see Figure 4 for the numbering system used in nucleosides). Whenever this nucleoside analogue is inserted by the enzyme in place of the normal nucleoside, the synthesis is terminated at that position because of the lack of an available 3'-hydroxyl to prime the subsequent addition (hence this method is also called the chain termination

Figure 8. Schematic diagram of an A reaction in DNA sequencing by the enzymatic method. The asterisk represents a fluorescent dye attached to the primer oligonucleotide for detection in automated sequence analysis.

method of sequence analysis). Four separate reactions are performed, each one using a chain-terminating analogue for one of the four bases: adenine (A), cytosine (C), guanine (G), and thymine (T). This results in four collections of fragments, and each set contains fragments terminating at only one of the four bases (Figure 8). These fragment sets are loaded in adjacent lanes of a high-resolution polyacrylamide slab gel and separated by electrophoresis, and a film image is then obtained by autoradiography (Figure 9). Analysis of the autoradiogram, either manual or with the aid of a computer digitization pad, yields the DNA sequence of the unknown region. In the chemical degradation method, the DNA fragment to be sequenced is radioactively labeled at one end and degraded in base-specific chemical reactions. Typically, four reactions are performed, giving rise to strand cleavage at G, G & A, C & T, and C. In each reaction a particular base or bases are modified chemically; subsequent treatment with piperidine leads to cleavage of the DNA strand at the site of modification. Separation and analysis are performed in the same manner as in enzymatic sequencing. The largest sequencing projects performed to date—the analysis of the Epstein-Barr viral genome (172 kb in Reference 20) and the analysis of the phage lambda genome (48 kb in Reference 21)—have both predominantly employed the enzymatic method, because it is generally more rapid and better suited to high-volume sequencing. Whichever method is used, it is necessary to sequence each region at least twice, preferably on opposite strands, to obtain accurate data. Although these methods are extremely powerful, they suffer from several limitations. The radioisotopes used for detection are hazardous, un-

388 A · ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988

stable, expensive, and difficult to obtain in many parts of the world. They may be employed only by highly trained personnel, and their disposal is becoming a problem of increasing magnitude. The procedures are repetitive, yet sufficiently technical that they require highly skilled personnel—typically, graduate students and postdoctoral fellows. For a sequencing project of a typical size, say the analysis of a 10,000 base pair DNA fragment, a year or more of dedicated work is often required to establish the DNA sequence. We have only begun to scratch the surface of the tremendous amount of important biological information present in the genomes of the organisms we study in biology. As an example, the human genome alone contains 3 billion nucleosides of DNA—more than 200 times the total amount of published sequence information acquired in the history of science. There is a tremendous need to improve the methods of DNA sequence analysis. Given the huge amount of sequence data needed, it is clearly essential to devise automated, computerized methods of sequencing that, as much as pos-

Figure 9. Blowup of a section of an autoradiogram produced in manual DNA sequence analysis by the enzymatic method. Each band on the film corresponds to a population of DNA molecules of a given length produced in the sequencing reactions. The sequence is determined from the lane in which a band of a given length is observed; the order of the bands corresponds to the DNA sequence.

bands of DNA as they pass by during electrophoresis and determines their color. The DNA sequence is encoded in the temporal order of the bands of different color as they pass through the detector. The data acquired by the detector are stored and analyzed by a microcomputer interfaced to the detector to yield the DNA sequence. The prototype of this instrument used a single-tube polyacrylamide gel and was adequate to prove the principle of the multiple fluorophore approach to automated sequence analysis. However, for widespread use it was very important to increase the throughput of the instrument. We collaborated with Applied Biosystems, Inc., to develop a commercial version of this machine (24). The commercial instrument uses a slab gel configuration similar to that used in conventional DNA sequencing. (Since then, another slab gel-based instrument has been de-

veloped by workers at Du Pont [25].) The key to the successful use of a slab gel was the design of an optical system that would allow sensitive fluorescence detection over the extended surface of the polyacrylamide gel. This was accomplished by introducing the excitation beam of light into the gel at the Bragg angle to minimize reflected light from the surface by miniaturizing the size of the optics and by mounting the entire input and output optics on a mechanical translatable stage that continuously scans the gel along a horizontal line (Figure 11). The specifications of this instrument are as follows: 300 bases per lane, a 12-h run, 16 lanes, 98% accuracy, and a theoretical capacity of 10,000 bases per day. In practice, one sequencer operator has been able to routinely obtain more than 30,000 bases per week with five overnight sequencings (26). These specifications represent a vast im-

Figure 10. Block diagram of the strategy used in automated fluorescencebased DNA sequence analysis by the enzymatic method. (a) Schematic diagram of the prototype apparatus, (b) Idealized representation of the data and the manner in which the measured fluorescence corresponds to the DNA sequence.

sible, eliminate the need for highly skilled researchers. During the past four years, we have developed one approach to automated sequence analysis (Figure 10; 22,23). In this approach, radioisotopic detection is replaced by fluorescence detection. Four different fluorophores are employed, one for each of the four sequencing reactions used in the enzymatic sequencing method. The fluorophores are attached to the DNA fragments by using chemical DNA synthesis to prepare dye-oligonucleotide primers (22). When these dye primers are used in the enzymatic sequencing reactions in place of the normal oligonucleotide primer, the dye is attached to the polynucleotide products of the sequencing reactions (Figure 8). A different dye is used in each of the four reactions A, C, G, and T. After the reactions are complete, the products of each are combined and co-electrophoresed on a single lane of a polyacrylamide gel. A high-sensitivity fluorescence detector positioned near the bottom of the gel detects the fluorescent

Figure 11. Block diagram of a commercially available automated DNA sequence analysis instrument. ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988 · 389 A

provement over what is possible using manual techniques. Nonetheless, this technology is still very young, and a great deal of work must be done to de­ velop a practical capability for effi­ cient, large-scale, automated DNA se­ quence analysis. This work falls into three major categories (for a more com­ plete discussion, see References 27 and 28). These are the front end, that is, the process of preparing the DNA to be sequenced and performing the se­ quencing reactions themselves; the separation and analysis, which is the aspect of the DNA sequencing problem addressed by the automated DNA se­ quencer described above; and the back end, that is, the problem of handling, cross-checking, and accessing the tre­ mendous amounts of data that will be produced by efficient large-scale se­ quencing operations. Currently there is much international and domestic in­ terest in the "Human Genome Initia­ tive," which is the undertaking of the sequence analysis of the 3 billion nucle­ osides of DNA encoded in the human genome. This interest, and the in­ creased funding available because of it, is serving as a focus for the develop­ ment of the technologies for large-scale sequencing. Conclusions

In this article four complementary technologies have been described. These technologies give the scientist the ability to manipulate and analyze the fundamental constituents of bio­ logical systems. Each of these tools is independently very powerful; however, they make the biggest impact when they are used in a synergistic fashion, as illustrated in Figure 12. These meth­ ods make it possible to go from the DNA level to the protein level and back in the study of any biological problem of interest. For example, if one has iso­ lated a small amount of an important protein, it is possible to sequence it, to use the sequence information to pre­ dict the DNA sequence of the corre­ sponding gene, to synthesize an oligo­ nucleotide of that DNA sequence, and to use this oligonucleotide to detect this gene in a recombinant DNA library and thereby isolate the gene. Converse­ ly, if one has isolated a gene of interest but does not know anything about the protein it encodes, it is possible to se­ quence the DNA, chemically synthe­ size peptides corresponding to the pro­ tein produced by the gene, use these peptides to prepare monoclonal anti­ bodies against the protein, and use these antibodies to detect the protein in the organism from which it is de­ rived. These tools give the researcher a tre­ mendous flexibility in investigating problems in biology and medicine. This is reflected in the explosive growth of

References

(13) Hewick, T. M.; Hunkapiller, M. W.; Hood, L. E.; Dreyer, W. J. J. Biol. Chem. 1981,256,7990-97. (14) Kent, S.; Hood, L.; Aebersold, R; Teplow, D.; Smith, L.; Farnsworth, V.; Cartier, P.; Hines, W.; Hughes, P.; Dodd, C. BioTechniques 1987,5, 314-21. (15) Hunt, D. F.; Yates, J. R. Ill; Shabanowitz, J.; Winston, S.; Haver, C. R. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 6233-37. (16) Biemann, K. Anal. Chem. 1986, 58, 1288 A-1300 A. (17) Biemann, K.; Scoble, H. A. Science 1987,237, 992-98. (18) Smith, A.J.H. Methods Enzymol. 1980,65, 560-80. (19) Maxam, A. M.; Gilbert, W. Methods Enzymol. 1980,65, 499-560. (20) Baer, R; Bankier, A. T.; Biggin, M. D.; Deiniger, P. L.; Farrell, P. J.; Gibson, T. J.; Hatfull, G; Hudson, G. S.; Satchwell, S. C.J Séguin, C; Tuffnell, P. S.; Barrell, B. G. Nature 1984,310, 207-11. (21) Sanger, F.; Coulson, A. R; Hong, G. F.; Hill, D. F.; Petersen, G. B. J. Mol. Biol. 1982,162, 729-73. (22) Smith, L. M.; Fung, S.; Hunkapiller, M.; Hunkapiller, T.; Hood, L. E. Nucleic Acids Res. 1985,13, 2399-2412. (23) Smith, L. M.; Sanders, J.; Kaiser, T.; Hughes, P.; Dodd, C; Hood, L. E. Nature 1986,321,674-79. (24) Connell, C; Fung, S.; Heiner, C Bridgham, J.; Chakerian, V.; Heron, E.; Jones, B.; Menchen, S.; Mordan, W.; Raff. M.; Recknor, M.; Smith, L.; Springer, J.: Woo, S.; Hunkapiller, M. BioTechniques 1987,5,342-48. (25) Prober, J. M.; Trainor, G. L.; Dam, R. J.; Hobbs, F. W.; Robertson, C. W.; Zagursky, R. J.; Cocuzza, A. J.; Jensen, M. Α.; Baumeister, K. Science 1987, 238, 336-41. (26) Gocayne, J.; Robinson, D.; Fitzgerald, M.; Chung, F.-Z.; Kerlavage, Α.; Lentes, K.-U.; Lai, J.; Wang, C.-D.; Fraser, C; Venter, J. Proc. Natl. Acad. Sci. U.S.A. 1987,84, 8296-8300. (27) Smith, L.; Hood, L. E. Biotechnology 1987 5 933—39 (28) Hood, L. E.; Hunkapiller, M. W.; Smith, L. M. Genomics, in press.

(1) Hunkapiller, M.; Kent, S.; Caruthers, M.; Dreyer, W.; Firca, J.; Griffin, C; Horvath, S.; Hunkapiller, T.; Tempst, P.; Hood, L. Nature 1984,310,105-11. (2) Merrifield, R. B. J. Am. Chem. Soc. 1963,85,2149-53. (3) Kent, S.; Clark-Lewis, I. In Synthetic Peptides in Biology and Medicine; Alitalo, K; Partanen, P.; Baheri, Α., Eds.; Elsevier Science Publishers: New York, 1985; pp. 29-57. (4) Clark-Lewis, I.; Aebersold, R; Ziltener, H.; Schrader, J. W.; Hood, L. E.; Kent, S.B.H. Science 1986,231,134-39. (5) Watson, J. D.; Crick, F.H.C. Nature 1953,171, 737-38. (6) Atkinson, T.; Smith, M. In Oligonucleo­ tide Synthesis: A Practical Approach; Gait, M. J., Ed.; IRL Press: Oxford, 1984; pp. 35-81. (7) Khorana, H. G.; Tazzell, W. E.; Gilham, P. T.; Tener, G. M; Pol, E. H. J. Am. Chem. Soc. 1957, 79,1002-1003. (8) Horvath, S. J.; Firca, J. R.; Hunkapiller, T.; Hunkapiller, M. W.; Hood, L. In Methods in Enzymology (Recombinant DNA); Wu, R., Ed., in press. (9) Reese, C. B. Tetrahedron 1978, 34, 3143-79. (10) Miller, P. S.; Junichi, Y.; Yano, E.; Car­ roll, C; Jayaraman, K; Ts'o, P.O.P. Biochem. J. 1979,18, 5134-43. (11) Edman, P. Acta Chem. Scand. 1956, 10, 761-68. (12) Edman, P.; Begg, G. Eur. J. Biochem. 1967,2,80-91.

Lloyd M. Smith is an assistant profes­ sor of chemistry at the University of Wisconsin-Madison. He received an A.B. degree in biochemistry from the University of California at Berkeley (1977) and a Ph.D. in biophysics from Stanford University (1981). As a se­ nior research fellow at the California Institute of Technology, he was the primary developer of the automated DNA sequencing technique described briefly in this article. His research in­ terests include automated DNA se­ quencing, high-sensitivity fluores­ cence-based protein sequencing, and DNA-based diagnostic methods.

Figure 12. Schematic illustration of the synergism of automated DNA and pro­ tein synthesis and sequence analysis.

the biotechnology industry during the past decade and in the increasingly commonplace production of novel hor­ mones, pharmaceuticals, and geneti­ cally engineered organisms. As powerful as these technologies are, much work must be done in the area where modern biology and the fields of chemistry and instrumenta­ tion development overlap. Both for the improvement of the existing chemical procedures and instruments and in the conception and development of radical new approaches to problem solving in biology, a new interdisciplinary ap­ proach is becoming increasingly impor­ tant. Workers in this area must be fa­ miliar with fields as disparate as organ­ ic chemistry, electronics, computer science, and molecular biology. This merging of fields has already had a tre­ mendous impact on modern science, and much more lies ahead.

390 A · ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988