Automated synthesis and sequence analysis of biological

Lloyd M. Smith. Anal. Chem. , 1988, 60 (6), ... Karen C. Waldron , Shaole Wu , Colin W. Earle , Heather R. Harke , Norman J. Dovichi. Electrophoresis ...
2 downloads 0 Views 13MB Size
Automated Synthesis . SequenceAnalysis of

Lloyd M. Smith

Department of Chemisny Universlty of Wisconsin--Madison Madison. Wis. 53706 The traditional distinctions between the fields of physics, chemistry, and biology have blurred with time. As the important questions in biological research have become increasingly detailed and molecular in nature, the techniques needed to answer these questions have drawn increasingly on principles and methods usually ascribed to the fields of physics and chemistry. This fusion has resulted in the instruments and chemistries that constitute the technological foundations of modern biology and that are critical components in the new methods responsible for the explosive growth of modern biology during the last decade. Many of these instruments, such as microscopes and spectrophotometers, have existed for decades; however, technological advances such as the use of imaging methods in NMR have greatly expanded their power and versatility. In the past several years, a new generation of instruments, whose everyday use has had revolutionary consequences, has come into existence. Central among these are the instru0003-2700188/0360-381A/$01 5010 @ 1988 American Chem cal Society

~ ~ ~ concerned ~ - ~ t with s the synthesis and sequence analysis of the two major biopolymers, protein and DNA ( I ) . This article contains descriptions of these instruments, the chemistries on which they are based, and some of their manifold applications. Proteln-peptlde Synthesis Protein-peptide synthetic methods are important tools in modern biological research. Small peptide fragments corresponding to a given region of a protein may be chemically synthesized in quantity and used as immunogens for

LA=preparation of antipeptide ,..tibodies. In many cases, these antibodies will react with the same peptide region in the intact protein and the scientist will thereby have generated with relative ease a powerful and versatile reagent for use in work directed at the isolation, purification, and study of the protein molecule. Peptide hormones may be chemically synthesized for use as pharmaceutical agents or as research tools to study the relationship between structure and function in the intact molecule. Synthetic peptides are also

important in vaccine development, where they permit the rapid investigation of the immunogenicity of different antigens and regions of a given protein. The synthesis of a molecule as complex as a protein by solution organic chemistry is difficult. Proteins are linear polymers composed of 20 different amino acid components that vary greatly in solubility, charge, size, and the presence of reactive substituents requiring protection during synthesis. This heterogeneity in the protein components imparts a similar diversity to the proteins, which complicates attempts to purify the molecules a t each step of synthesis. The major breakthrough permitting the chemical synthesis of peptides and small proteins was made by Merrifield in work for which he recently received the Nobel prize (2).The key innovation was the idea of synthesizing the protein molecule attached to a polymeric support. This allows the intermediates in a protein synthesis to be purified easily at each step simply by washing and filtration of the particles. A block diagram of one major strategy used in solid-phase peptide synthesis is shown in Figure 1(3). In such procedures an insoluble polymeric support is functionalized by the introduction of a reactive group. The C-terminal amino acid, protected

ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15. 1988

381 A

a t the N-terminus, is covalently coupled to the support at the carboxyl group. The N-terminal protecting group of this first support-bound amino acid is removed by treatment with trifluoroacetic acid (TFA), and an amide bond is formed with the activated carboxyl of a second N-protected amino acid. After appropriate washing steps, the N-terminal amino-protecting group on the newly added amino acid is again removed with TFA, and the cycle is repeated with subsequent amino acids. Any reactive groups on the amino acid substituents are protected during the synthesis with groups that are not removed by TFA, but only by a later treatment with hydrofluoric acid (HF). Thus the protected polypeptide chain is built up systematically on the polymeric support. When the synthesis is complete, the peptide is cleaved from the support, remaining protecting groups are removed by treatment with HF, and the free peptide is isolated. The process of solid-phase peptide synthesis, as described above, is essentially a repeated series of treatments of the support particles with a number of different solutions and solvents, interspersed with appropriate washing, filtration, and, in some cases, heating steps. Although the process itself is not difficult, it is tedious, repetitive, and time consuming, and close attention must be paid to chemical procedures, many of which are unfamiliar to most biologists interested in using the technology. It was therefore very important that it be automated so that it would be widely available to the research community. Several commercial instruments for peptide synthesis are currently available, and in a general sense they are fairly similar. A peptide synthesis instrument consists of a computer, which controls the instrument and may be programmed with the amino acid sequence to be synthesized; a set of chemicals and solvents, including the 20 or more suitably protected amino acids; a valving system for controlling the delivery of various solutions and solvents, one or more reaction chambers and an interconnecting tubing network; and a physical structure to house these subsystems. A block diagram of one such instrument is shown in Figure 2 (3).This instrument has three vessels in which the important reaction steps occur. In the activator vessel, the activated form of the protected amino acid to be coupled is prepared in a solution of dichloromethane (DCM). This solution is transferred to the concentrator vessel, where the DCM is evaporated with applied heat and replaced with a smaller volume of dimethylformamide (DMF). This solution is then transferred to the

I

I

CICHzOCHl

Functionalize

ZnC12

0 II 0 iI (CH3)3COC-NHCHRlC-O-Cs+

/ /

-

+ CICH2

I

a

II

(CH~~COC-NHCHRIC-OCH~

I

I

CF3COOH

I

R3N

II Boc-NHCHR2C-NHCHRlC-OCH2 I

-

Deprota

Neutralize

Cleave

Flguro 1. The chemistry of s(lllu-,,llrw~ ,,eptide synthb..-. (Adapted with permission fmm B/aned/~/Polyman:GoIdbe~, E. 0.:NakalllM. A,, EdS.; Aclldomlc: Cila&. 1980.)

Figure 2. Block diagram of an instrument for automated peptide synthe lix following abbreviatlmsare used In thls diagram: TFA (lrifluwcacetic acid), DlEA (diisoprnpylethylamine). DCC (dicyclohexylcarbodiimide), HOBT (h~oxybenzolriazole). B O C A A (butvbxvcarbonyiamino acid). and DMF (dimelhylfwmamk%). Numbers 1-6 demle valve blocks. (Adspled with permiasion fmm Refmen.% 3.)

382A * ANALYTICAL CHEMISTRY. VOL. 60. NO. 6 . MARCH 15, 1988

40 years of intensive invohremeni with electrochemical ion analysis make Metrohm one of the most experienced experts in this field. You should capitalize on such expertise with the new 690 Ion Chromatograph:

Qr ite S

I

---

I -

-

performance conductivity detector Dermits accurate analysesdown to the ppb rangi 0 ComDact combination of iniector, c'olumn and detector i/a single, thermally insulated compartment

r--A reasons to

con

e-----_-

ply, he sp

sts for electro-

e

7+&#!

.."......

\ . " . . . I . .

'zerlai

\ Phone 151

500,TWX 510

)I 24

reaction vessel and reacted with the free N-terminus of the support-bound peptide being synthesized. This N-terminus had been automatically deprotected using TFA in a separate series of steps. In each cycle of the synthesis, a small sample of the deprotected resin is automatically removed for subsequent determination of the level of free amino groups present on the support, thus permitting the efficiency of the synthesis to be monitored. A key feature of this process, and indeed of any support-based chemical process of this type, is the importance of high yield in each step of the solidphase synthesis. Table I shows the theoretical yields of product expected for peptides of different lengths with stepwise yields of either 99.8% or 96.0%. To obtain reasonahle overall yields, it is critical to have very high efficiency reactions a t each step of the synthesis. Using an optimized chemistry with an average yield of 99.4% per step, it was possible to chemically synthesize the hormone interleukin-3, which is 140 amino acids in length (4). This achievement represents the current state of the art in chemical protein synthesis. Current research on chemical peptide synthesis is focused on the following: further optimizing the chemistry, to permit the synthesis of longer and longer protein molecules in good yield; learning how to fold the resultant proteins into their biologically active conformations (a nontrivial task for the Larger molecules); and developing novel amino acid analogues that contain desired functional groups or detectable moieties that may be placed in the protein a t any desired site. Site-specific protein modification is particularly important because it is impossible to achieve this in any other manner.

msynthrels A fundamental property of nucleic acids, and one a t the heart of modem molecular biology, is hybridization,in which a DNA or RNA molecule specifically recognizes and binds to a complementary DNA or RNA sequence. This property derives from the doublestranded nature of DNA, deduced by Watson and Crick in 1953 (5).The specific base pairs formed between the purines and pyrimidines on opposite strands of nucleic acids drive the molecular recognition process (Figure 3). This phenomenon is exploited in DNA sequencing, a technique that permits the precise molecular structure of DNA molecules to be determined; in DNA cleavage and ligation, in which particular enzymes cut DNA molecules into fragments a t specific locations and other enzymes allow them to be reattached in new forms; and in cloning, which allows particular genes to be selectively S84A

I I

Table 1.

Calculated overai1 yieias In pepme syr~nesls Overall ylelds ( % ) For 96.0% For 98.8% yleld per step yleld per step 66

44 29 13

1.7

98 96 94 90 82

NO. o( residues

11

21

I

isolated and produced in large quantities for detailed analysis and manipulation. Hence arises the importance of DNA synthetic chemistry. This process allows the chemical synthesis of any desired short stretch of DNA, or oligonucleotide. The ability to obtain at will any specific single-stranded DNA fragment desired, which in turn will specifically recognize only its complementary sequence and not any other, is essential to a wide variety of key molecular biological applications.

It is possible today to routinely synthesize oligonucleotides up ta 200 nucleosides in length on commercially available instruments operated by a technician. The cost (from one to several dollars per base for synthesis of one micromole of material) is reasonable, and lengths of -20-40 bases are suitable for most applications. The instrumenta are available from several companies and are generally priced in the range of $20,000-%40,000. The most widely used technique involves phosphoramidite chemistry, a process that

Flgure 3. Structure of double-stranded D N A showing the specific

Ween the nucleoside subunits on opposite strands (basepairing).

ANALYTICAL CHEMISTRY. VOL. 60, NO. 6, MARCH 15, 1988

is described below (6). As was the case for peptides, the s y t ~ thesis of a molecule as large and complex as DNA, using conventional organic chemical methods, is so formidable that it is impossible. In 1957 the synthesis of a DNA fragment only two nucleosides in length was a substantial achievement (7). This situation changed dramatically in the early 19709 with the introduction of solidphase synthetic methods, similar conceptually to those described above for peptide synthesis. As in peptide synthesis, the oligonucleotide to be made is built up on the solid support one nucleoside a t a time in a repeated series of chemical reactions using protected nucleoside precursors (Figure 4). The first nucleoside is attached to the support with a cleavable linking group and protected a t the 5'-hydroxyl with an acid-labile protecting group (see Figure 4 for the numbering system used on the sugar ring of nucleosides). In each cycle of the process, the acidlabile protecting group is removed and the next monomer is activated and reacted with the now free 5'-hydroxyl to form a phosphotriester linkage, which is then oxidized to give a phosphodiester linkage. The support is then washed, and the acid-labile group on the newly added monomer is removed prior to the next addition. Thus a n y desired oligonucleotide sequence is built up on the support by sequential coupling with the various monomers in the appropriate order. At the conclusion of these reactions, the oligonucleotide is cleaved from the support and the protecting groups on the bases are removed to give the desired oligonucleotide product. The overall yield is the product of the yields at each coupling step; the highly optimized chemistry currently used has typical stepwise yields exceeding 9970,leading to a high recovery of the correct product-even for fairly long oligonucleotide sequences. When necessary, the product is further purified, either hy preparative gel electrophoresis or by high-performance liquid chromatography (HPLC). As in the case of the synthesis of peptides, the automation of DNA synthetic chemistry was essential for its widespread dissemination in the research community. Conceptually, the instruments used in automated DNA synthesis are very similar to those used in peptide synthesis. The major differences in the instruments stem from the different requirements of the chemistries. The DNA chemistry is somewhat simpler to automate than the peptide chemistry, because no solvent exchange or concentration is necessary, and the support used in DNA synthesis-controlled-pore glass-is mechanically more stable than the resin sup-

I

w

Figure 4. The chemistry of solid-phase DNA s y r ~ s ~ a s ~ s .

(Adapted with permissim tmm Reference 1.1

ports used in the peptide synthesis. A block diagram of one of the first prototype automated DNA synthesizers is shown in Figure 5 (8). In this early instrument, the activation of the nucleoside phosphoramidite with tetrazole

1

-ictor

vessel. The activated compound is then delivered to the reactor vessel for coupling to the support-bound oligonucleotide. After the subsequent washing and oxidation steps, the dimethoxytrityl (DMT) group protecting the 5'-hydroxyl is removed with a mild acid and

.

Figure 5. Block diagram of an instrument for automated DNA synthesis.

ANALYTICAL CHEMISTRY. VOL. 60, NO. 6. MARCH 15, 1988

98551

collected in a fraction collector. The DMT cation is highly colored, and therefore the reaction may be followed by spectrophotometric quantitation of the released trityl group. In newer instruments, the prereactor vessel is dispensed with, and activation of the phosphoramidite occurs in the reactor vessel itself, simplifying the instrument geometry. Although this technology is effective and is widely used, much room remains for further innovation, primarily in the development of new chemistries. For example, there are many cases in which the ability to chemically synthesize RNA molecules of a defined sequence would be invaluable. T o date, RNA synthetic chemistry has lagged far behind DNA chemistry because of the difficulties in protecting the 2’-hydroxyl group of a ribonucleoside without adversely affecting reactivity a t the adjacent 3’-hydroxyl and because of the far greater chemical instability of RNA (9). Once these hurdles are overcome, however, the automation of the chemistry, presumably on instruments similar to those used in automated DNA synthesis, will make it widely available to the research community. Another area of great interest is the devising of chemical procedures for synthesis of uncharged polynucleotide analogues (IO).Such molecules are able to pass through cell membranes because of their nonpolar character, and once inside the cell, they are able to bind specifically to a complementary RNA molecule and thereby block synthesis of the corresponding protein. The lack of electrostatic repulsion increases the binding strength of the interaction, potentially allowing relatively small amounts of material to be employed. There are many medical and research applications for such techniques, although to date work done in this area still suffers from several shortcomings. A third area of interest involves development and application of new procedures for “molecular engineering” of the chemical properties of the synthetic DNA strands. Examples of this include the attachment of fluorescent dyes for use in automated DNA sequencing (see below); DNA cleaving agents for the production of synthetic restriction enzymes, that is, tailormade molecules that can cut DNA a t any desired site determined by the oligonucleotide sequence; intercalating dyes for increasing hybridization strengths of oligonucleotides; crosslinking molecules to covalently join the two strands of DNA in a duplex; and enzymes for nonisotopic detection purposes. In a general sense, custom-designed molecules with specific recognition capabilities as well as desired functional moieties are very exciting and

powerful tools. This is an area of great activity, and many other creative and important applications of such techniques are being developed. The previous sections described the automated chemistries for the syntheses of protein and DNA molecules. The analytical counterparts of these methods are the techniques used to determine DNA and protein structure; for these linear polymers, structure determination is equivalent to determining the linear order, or sequence, of the amino acids (in proteins) or nucleosides (in DNA). The following sections describe the chemistries and instrumentation used in DNA and protein sequence analysis.

Proteln sequence analysis The most sensitive and widely used method of protein sequence analysis is

-

I

I

3mistry of pi PTH represents phenyllhiohydanloin.

386,. * ANALYTICAL CHEMISTRY, VOL. 60. NO. 6. MARCH 15, 1988

seque

based on the Edman degradation chemistry (Figure 6; Reference 11).In this method, the compound phenyl isothiocyanate (PITC; also called the Edman reagent) is reacted with the free N-terminus of a protein or peptide to give the corresponding thiourea. Subsequent treatment with acid cleaves the modified N-terminal amino acid from the protein. This material is collected and analyzed to identify the amino acid. Although protein sequencing is considerably slower and more difficult than DNA sequencing (described below), it occupies the key niche between classical biochemistry and mcdern molecular biology. In many cases, protein sequencing is the only route to the cloning of a gene; determination of the protein sequence of even a short region of a protein permits oligonucleotide probes complementary to the cor-

I

MI01

W*

InstrumMt

100 10 0.1 0.02 0.005

5000 500 5 1 0.25

1987 ~dman-~esg spinning cup 1971 commerciai spinning cup I978 Modified commercial spinning cup 1979 Miaossquencing spinning cup 1980 Gas-liquid phass sequenator

-weW

01 M).oOO4si1on pMsin. assvminp 100% seqwnceabie mterbl.

responding gene to he synthesized and used in the isolation of the gene by recombinant DNA methods. Once the gene has been isolated, the full arsenal of molecular biological techniques may be brought to hear on the problem, and it is possible to quickly determine the complete protein sequence and the structure of the corresponding gene, to analyze its expression in various tissues, and to produce large quantities for study. Instrumentation playa a key role in protein sequence analysis. There are two major aspects to the instrumentation. First, the degradation chemistry is performed automatically on a dedicated machine (12, 13).This is w e n tial in order to he able to reproducibly and consistently manipulate the small (currently as little as 1pmol) amounts of material being analyzed. Second, the analysis of the amino acid producta is accomplishedwithHPLC, and the high sensitivity and separation capabilities of modem HPLC are essential to the performance of the method. Since the development of the Edman chemistry in 1950, the instrumentation used in the sequencing chemistry has evolved considerably (Table 11). The 811 tired for se-

Figure 7. Block diagram of an instrum (Adapted wim pwnlsslon lrom Relerenca 13.)

quencing has decreased from 100 nmol (5 mg of a 50,000-MW protein) to 5 pmol (250 ng)-more than 4 orders of magnitude. The most sensitive instruments currently available are variations on the basic design used in the gas-liquid solid-phase protein sequencer (13). In this instrument, the protein to be sequenced is spotted on a quaternary ammonium-derivatized polymeric support (called Polyhrene). The protein assoeiates with the support through ionic interactions. This limits the types of solvents that may be used in protein sequencing to those that are fairly nonpolar and will not cause elution of the protein. This is accomplished in the gas-phase instruments hy delivering both the base used in the coupling reaction and the acid used in the cleavage reaction in the gas phase. Thus these reagents can participate in the process effectively, without removing the protein from the support. A block diagram of the gas-liquid solid-phase protein sequenator is shown in Figure 7 (13).The protein or peptide sample to he sequenced is applied to a Polybrene-impregnated glass fiber disk, which is inserted into the reaction chamber cartridge of the seator. The reagents (R1-k in Fig-

protein sequence analysis.

ure 7) and solvents ( S I S I )are delivered to the cartridge under the control of a microprocessor. After the acid cleavage step, in which the N-terminal amino acid derivative is cleaved from the protein, the amino acid is washed into the conversion flask, where it is converted into a more stable derivative in a second acid treatment. This material, in turn, is rinsed into the fraction collector, where it is collected for subsequent HPLC analysis. These instruments are the current state of the art in protein sequence analysis, and they constitute an extremely powerful and important technology. Nonetheless, there is much room for further improvement of protein sequencing (14), particularly in two areas. First, although the sensitivity of the current approach is extremely high, it is insufficient for many important applications. Many of the most interesting and important proteins are present in biological systems only in trace amounts. These include many of the proteins important in gene regulation, cellular development and differentiation, and the control of intracellular processes such as cell division and growth control. In these cases, the researcher often is limited to identification of the protein hy its mobility on a gel and is hampered by a lack of structural information and an inability to obtain sufficient material for further study. Ideally, one would like to have sufficient sensitivity in protein sequencing to allow analysis of any spot that one could detect on a gel. On a two-dimensionalgel, spots corresponding to more abundant proteins are present in amounts of about 1 pmol, and many proteins of great interest are present a t one-hundredth or one-thousandth of that level. If it were possible to obtain sequence information on these proteins, the impact on biology and medicine would be tremendous, and we would see the establishment of new fields of research. There are three basic components to protein sequencing that limit the sensitivity to its current level (1-10 pmol). These are the lack of techniques for handling small amounts of protein without losses and chemical modification; the intrinsic detectability of the PTH amino acids, which have relatively low extinction coefficients and are detected by their absorption of light in the UV (a wavelength region prone to background noise); and the “chemical noise,” visible as a background of peaks on the HPLC chromatograms on which PTH amino acids are analyzed. One approach to improving protein sequencing analysis involves redesign of the support chemistry, degradation chemistry, detection methods, and instrumentation to increase sensitivity by further orders of magnitude (14).

ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15. 1988

387A

A second major area in protein sequencing that can be improved is the amount of sequence information obtained in a given analysis. The chemistry of protein sequence analysis is not perfect; incomplete reactions a t each step of the process result in an increasing accumulation of background noise in the chromatograms as the sequencing run progresses. Typically, for a high-sensitivity analysis, fewer than 10 sequential amino acids are obtained in a given analysis, although in lower sen. sitivity applications as many as 100 may be obtained. If it were possible to improve the procedure so that a significant portion or even all of a protein sequence could be determined at one time, the procedure would have many important applications. In principle, mass spectrometry is one way in which this could be achieved, and indeed it is the only other practical method for protein sequence analysis (15-17). DNA squmce analysls DNA sequence analysis is one of the pivotal methods of modern molecular biology. It plays a central role in virtually every project that involves the cloning, characterization, and manipulation of RNA or DNA. Since the development of rapid sequencing techniques in the early 19708, more than 14 million bases (the current amount of sequence in GENBANK, the US. repository of sequence data) of DNA have been sequenced by manual techniques, and this database is growing a t a rate of over 30%per year. This DNA sequencing has all been accomplished with either of two existing techniques: the enzymatic method of sequencing, developed by Sanger and Coulson (18). and the chemical degradation method, developed by Maxam and Gilbert (19).In both methods, sets of radiolabeled DNA fragments with a common origin, but terminating a t a particular base or bases, are produced in the sequencing reactions. The two methods differ in the means by which the DNA fragments are produced. The enzymatic method is diagrammed in Figure 8. In this procedure, a cloned copy of the DNA region to be sequenced is used as the template for an enzymatic reaction that copies the DNA sequence into a new DNA strand. In this enzymatic reaction, a modified nucleoside lacking a 3’-hydroxyl group is included in the reaction mixture (see Figure 4 for the numbering system used in nucleosides). Whenever this nucleoside analogue is inserted by the enzyme in place of the normal nucleoside, the synthesis is terminated a t that position because of the lack of an available 3’-hydroxyl to prime the subsequent addition (hence this method is also called the chain termination

Flgure 8. Schematic diagram of an A reaction in DNA sequencing by the enzymatic method. The asterisk repre8enls a llwre5oent dye alt a m to Me primer oligonucleotide 101 detection in auiomated seq4wMoe analysis.

method of sequence analysis). Four separate reactions are performed, each one using a chain-terminating analogue for one of the four bases: adenine (A), cytosine (C), guanine (G),and thymine (T). This results in four collections of fragments, and each set contains fragments terminating at only one of the four bases (Figure 8). These fragment sets are loaded in adjacent lanes of a high-resolution polyacrylamide slab gel and separated by electrophoresis, and a film image is then obtained by autoradiography (Figure 9). Analysis of the autoradiogram, either manual or with the aid of a computer digitization pad, yields the DNA sequence of the unknown region. In the chemical degradation method, the DNA fragment to be sequenced is radioactively labeled at one end and degraded in base-specific chemical reactions. Typically, four reactions are performed, giving rise to strand cleavage at G, G & A, C & T, and C. In each reaction a particular base or bases are modified chemically; subsequent treatment with piperidine leads to cleavage of the DNA strand a t the site of modification. Separation and analysis are performed in the same manner as in enzymatic sequencing. The largest sequencing projects performed to date-the analysis of the Epstein-Barr viral genome (172 kb in Reference 20) and the analysis of the phage lambda genome (48 kb in Reference 21)-have both predominantly employed the enzymatic method, because it is generally more rapid and better suited to high-volume sequencing. Whichever method is used, it is necessary to sequence each region at least twice, preferably on opposite strands, to obtain accurate data. Although these methods are extremely powerful, t,hey suffer from several limitations. The radioisotopes used for detection are hazardous, un-

9888 * ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988

stable, expensive, and difficult to obtain in many parts of the world. They may be employed only by highly trained personnel, and their disposal is becoming a prohlem of increasing magnitude. The procedures are repetitive, yet sufficiently technical that they require highly skilled personnel-typically, graduate students and postdoctoral fellows. For a sequencing project of a typical size, say the analysis of a 10,000 base pair DNA fragment, a year or more of dedicated work is often required to establish the DNA sequence. We have only begun to scratch the surface of the tremendous amount of important biological information present in the genomes of the organisms we study in biology. As an example, the human genome alone contains 3 billion nucleosides of DNA-more than 200 times the total amount of published sequence information acquired in the history of science. There is a tremendous need to improve the methods of DNA sequence analysis. Given the huge amount of sequence data needed, it is clearly essential to devise automated, computerized methods of sequencing that, as much aa pos-

-- ---

yr

0

Lc

LI

Figure 9. Blowup of a section of autoradiogram produced in manual DNA sequence analysis by the enzymatlc method. b c h band on me tilm Wnesponds to a popuiation of DNA molecules 01 a given length produced in Me rwrqwnciq reactions The 8BquBnce Is db termined from ths lam in which a bard 01 a g h iemgm is observed. the order 01 the bands wmesponds to the DNA sequence

bands of DNA as they pass by during electrophoresis and determines their color. The DNA sequence is encoded in the temporal order of the bands of different color as they pass through the detector. The data acquired by the detector are stored and analyzed by a microcomputer interfaced to the detector to yield the DNA sequence. The prototype of this instrument used a single-tube polyacrylamide gel and was adequate to prove the principle of the multiple fluorophore approach to automated sequence analysis. However, for widespread use it was very important to increase the throughput of the instrument. We collaborated with Applied Biosystems, Inc., to develop a commercialversion of this machine (24).The commercial instrument uses a slab gel configuration similar to that used in conventional DNA sequencing. (Since then, another slab gel-based instrument has been de-

veloped by workers at Du Pont [%].) The key to the successful use of a slab gel was the design of an optical system that would allow sensitive fluorescence detection over the extended surface of the polyacrylamide gel. This was accomplished by introducing the excitation beam of light into the gel a t the Bragg angle to minimize reflected light from the surface by miniaturizing the size of the optics and by mounting the entire input and output optics on a mechanical translatable stage that continuously scans the gel along a horizontal line (Figure 11). The specifications of this instrument are as follows: 300 bases per lane, a 12-h run, 16 lanes, 98% accuracy, and a theoretical capacity of 10,ooObases per day. In practice, one sequencer operator has been able to routinely obtain more than 30,000 bases per week with five overnight sequencings (26). These specifications represent a vast im-

Sequence A A(green) n C(yell0W)

h_

G (orange) A T(red)

-

Figure 10. Block diagram of the strategy wed In automated fluorescencebas& DNA sequence analysis by the enzymatic method. (a) Sohematic diagarn of the prototype apparaIus. (b) Idealized representation 01 Ib data and lhs manner in w h l h the measured tlwxescence corm~ordsto the DNA sequence.

sible, eliminate the need for highly skilled researchers. During the past four years, we have developed one approach to automated sequence analysis (Figure 10;22,23).In this approach, radioisotopic detection is replaced by fluorescence detection. Four different fluorophores are employed, one for each of the four sequencing reactions used in the enzymatic sequencing method. The fluorophores are attached to the DNA fragments by using chemical DNA synthesis to prepare dye-oligonucleotide primers (22). When these dye primers are used in the enzymatic sequencing reactions in place of the normal oligonucleotide primer, the dye is attached to the polynucleotide products of the sequencing reactions (Figure 8). A different dye is used in each of the four reactions A, C, G, and T. After the reactions are complete, the products of each are combined and co-electrophoresed on a single lane of a polyacrylamide gel. A high-sensitivity fluorescence detector positioned near the bottom of the gel detects the fluorescent

A tragments ctrophorese lown the gel

Figure 11. Block diagram of a comr

analysis instrument. ANALYTICAL CHEMISTRY, VOL. 60. NO. 6, MARCH 15, 1988

389A

provement over what is possible using manual techniques. Nonetheless, this technology is still very young, and a great deal of work must be done to develop a practical capability for efficient, large-scale, automated DNA sequence analysis. This work falls into three major categories (for a more complete discussion, see References 27 and 28). These are the front end, that is, the process of preparing the DNA to be sequenced and performing the sequencing reactions themselves; the separation and analysis, which is the aspect of the DNA sequencing problem addressed by the automated DNA sequencer described above; and the back end, that is, the problem of handling, cross-checking, and accessing the tremendous amounts of'data that will be produced by efficient large-scale sequencing operations. Currently there is much international and domestic interest in the "Human Genome Initiative," which is the undertaking of the sequence analysis of the 3 billion nucleosides of DNA encoded in the human genome. This interest, and the increased funding available because of it, is serving as a focus for the development of the technologies for large-scale sequencing.

c o n c ~ In this article four complementary technologies have heen described. These technologies give the scientist the ability to manipulate and analyze the fundamental constituents of biological systems. Each of these tools is independently very powerful; however, they make the biggest impact when they are used in a synergistic fashion, as illustrated in Figure 12. These methods make it possible to go from the DNA level to the protein level and back in the study of any biological problem of interest. For example, if one has isolated a small amount of an important protein, it is possihle to sequence it, to use the sequence information to predict the DNA sequence of the corresponding gene, to synthesize an oligonucleotide of that DNA sequence, and to use this oligonucleotide to detect this gene in a recombinant DNA library and thereby isolate the gene. Conversely, if one has isolated a gene of interest but does not know anything about the protein i t encodes, it is possible to sequence the DNA, chemically syntbesize peptides corresponding to the protein produced by the gene, use these peptides to prepare monoclonal antibodies against the protein, and use these antibodies to detect the protein in the organism from which it is derived. These tools give the researcher a tremendous flexibility in investigating problems in biology and medicine. This is reflected in the explosive growth of 390A

Flgure 12. Schematic illustration of the synergism of automated DNA and pro-

tein synthesis and sequence analysis. the biotechnology industry during the past decade and in the increasingly commonplace production of novel hormones, pharmaceuticals, and geneti. cally engineered organisms. As powerful as these technologies are, much work must be done in the area where modern biology and the fields of chemistry and instrumentation development overlap. Both for the improvement of the existing chemical procedures and instruments and in the conception and development of radical new approaches to problem solving in biology, a new interdisciplinary approach is becoming increasingly important. Workers in this area must be familiar with fields as disparate as organic chemistry, electronics, computer science, and molecular biology. This merging of fields has already had a tremendous impact on modern science, and much more lies ahead. References

Hunkapiller, M.; Kent, S.; Caruthers, M.; Dreyer. W.;Firca, J.; Griffin, C.; Horvath, S.; Hunkapiller, T.; Tempst, P.; Haod, L. Nature 1984,310,105-11. (2) Merrifield, R. B. J . Am. Chem. Soc. (1)

1963,85,2149-53. (3) Kent, S.; Clark-Lewis, I. In

Synthetic Peptides in Biology and Medicine; Ahd o , K.; Partanen, P.1 Baheri, A,, Eds.; E1se.net Science Puhlshers: New York,

1985; pp. 29-57. (4) Clark-Lewis, I.; Aebersold, R.;Ziltener,

H.; Schrader. J. W.; Hood, L. E.; Kent, S.B.H. Science 1986,231,134-39. ( 5 ) Watson, J. D.; Crick, F.H.C. Nature 1953,171,73738.

(6)Atkinson, T.; Smith, M. In Oligonucleotide Synthesis: A Practical Approach; Gait, M. J., Ed.; IRL Press: Oxford, 1984; pp. 35-81.

(7) Khorana, H. G.;Tazzell, W. E.;Gilhm, P. T.; Tener, G. M.; Pol, E. H. J. Am.

Chem. Soc. 1957,79,1002-1003. (8)Horvath, S. J.; Firca, J. R.;Hunka iller, T.; Hunkapiller, M. W.;Hood, In

E.

Methods m Enzymology (Recombinant DNA); Wu. R., Ed., in press. (9) Reeae, C. B. Tetrahedron 1978, 34, 3143-79. (10) Miller,P.S.;Junichi,Y.;Yano,E.; Car-

roll. C.; Jayarman, K.; Ts'o, P.O.P. Bioehem. J. 1979,18,5134-43. (11) Edman, P. Actn Chem. Scand. 1956, 10,76168.

(12) Edman, P.; Begg. G. Eur. J. Biochem. 1967.1,8&91.

ANALYTICAL CHEMISTRY, VOL. 60, NO. 6 . MARCH 15. 1988

Lloyd M. Smith is an apsistant professor of chemistry at the Uniuersit) of Wisconsin-Madison. He receiued an A.B. degree in biochemist0 from the Uniuersirj of California at Berkeley (1977) and a Ph.D. in biophysics from Stanford Uniuersity (1981). As a senior research fellow at the California Institute of Technology, he was the primor) developer of the automated DNA sequencing technique described briefly in this article. His research interests include automated DNA sequencing, high-sensitivity fluorescence-based protein sequencing, and DNA-based diaRnosric methods.