Perspectives in biochemistry: Methods for DNA sequencing

untranslated region!' One major reason that such a detailed description of the mouse @-hemoglobin gene, and other genes, is now available is that the ...
0 downloads 0 Views 4MB Size
Perspectives in Biochemistry: Methods for DNA Sequencing Anne T. Wood Unlverslty of Puget Sound, Tacoma, WA 98416 In the early 1960's a summary of DNA's role in protein synthesis could be stated "DNA makes RNA makes protein!' A quote from a review article by Abelson (I) illustrates a 1980's version of DNA's structural role in protein synthesis. Ableson writes, "The entire mouse 8-hemoglobin gene has now been sequenced. The smaller intervening sequence interrupts codon 30 and 31 and is 116 base pairs in length. The larger intervening sequence interrupts d o n s 104and 105and is 646 base pairs in length. There are 52 base pairs in the 5' untranslated portion of the gene and 110 base pairs in the 3' untranslated region!' One major reason that such a detailed description of the mouse @-hemoglobingene, and other genes, is now available is that the relevant DNA fragments can be sequenced. Although the nucleotide sequence alone does not give complete understanding of chromosomal structure or function, it does, nevertheless, provide invaluable information about the chromosome. Many introdudory chemistry texts now include, a t the end of the book, a chapter or two entitled "the chemistry of life" or "an introduction to biochemistrv." These chaoters often spark the imagination of students wko have, up ththat point, had a hard time seeine a "real world" aoolication of chemistw. Some of these introdLctory texts inciide both a description of the maior classes of biomolecules and an introduction to metabolism using the citric acid cycle as an example. Due to the increased interest in descriptive biochemical topics, it is predicted that in the near future introductory generalchemistry texts will include a section on methods of DNA sequencing. These methods are conceptually elegant and lend themselves to a discussion of the techniques by which sequence data is obtained. Thus, as studrnis learn ahout the chemical structure of DNA, they can learn how DNA sequence data is obtained. A Protein Parallel

I t is interesting that the even@leading to the ability to sequence nucleotide polymers have paralleled much of what occurred in protein chemistry in the 1950's. The sequencing of insulin by Sanger's group (2) began in the early 1950's and took more than 5 years. The ol chain of insulin was found t o have 21 amino acids and the fl chain 30. A similar leneth of time was required hy Holley and his team (3) in theirbffort to seouence the first oolvnucleotlde. a 77-nucleotide oolvmer of yekt transfer RNA ?ofor alanine: The procedures f i r sequencing proteins changed drastically as chromatography techniques became more sophisticated, and the Moore and Stein amino acid analyzer (4) and automated Edman's sequential degradation technique (5)developed into common tools. What had previously taken years, now can be done in a few days or weeks, provided, of course, that one has a pure protein with which to work. Large pure proteins containing 1000 or more amino acids can now be sequenced. The technique depends on the ability to cleave selectively the large protein into smaller, more This paper is an adaptation of a presentation made as part of me ti. S. Chemistly program at t h e Spring 1983 ACS meeting in Seattle. WA.

888

Journal of Chemical Education

workable segments. This can be accomplished by using site specific enzymes, such as trypsin or chymotrypsin, which hydrolyze a peptide specific amino acid or by utilizing chemical reagents such as cyanogen bromide, which are also specific for a given amino acid sequence. Unfortunately, analogous fragmenting techniques were not able to be used with the DNA polymer. There were no known site-specific endonucleases (mzymes analogous to the trypsin and chymotrypsin proteases) which would hydrolyze the phosphodiester bonds within a double stranded DNA oolvmer at onlv ---specified base sequences within the polymer. The smallest, native nucleotide polymers that were available for study were the bacteriophage and viral DNA's. These polymers contain over 5,000 nucleotide base pairs. If one were to consider working with the Escherichia coli, a genetically well understood bacteria, one would have to deal with a chromosome containing an estimated 4 X lo6 base pairs, while the human genome contains approximately 3 X lo9 nucleotide base pairs. Nearest neighbor analysis, develo~edbv A. Kornbere in 1961 ( 6 ) ,was ;he first attempt to ohiain &era1 data aLout nucleotide sequences in DNA. Exonucleases, enzymes which catalyze the hydrolysis of phoaphudiester honds located at the 3' and/or Sends of a DNA strand, were by that time available. Kornberg studied bacteria and baderiop-hsge DNA molecules which had been radioactively labeled by the inclusion of an [a-32P]deoxynucleoside triphosphate, p.p.32 p-deoxynucleosides--into the replicating media By using an exonuclease which hydrolyzed single nucleotides from the 3' end of a DNA strand producing 3' nucleoside monophosphates, he was able to determine how often pairs of nucleotides occurred in a DNA polymer. For the bacteria and bacteriophage he analyzed, the frequency of the 16 possible combinations of base pairs was found to be nonrandom, but this data could not be used to obtain the actual nucleotide sequences. The problem still remained of obtaining defined DNA fragments of manageable size, and sequence analysis of portions of chromosomes appeared impossible. I t was the observation made in 1952 by Luria and Human (7) that, "the genotype of the bacterial host in which aphage reproduces affects the ~henotvoeof the now ohaee!' which evmtually opened the way to beparing definkd, L ~ N A frag: ments of manageable size. This 1952 paper described the fiwt observation of the action of a restriction endonuclease at work in vivo. Ten years passed before Arher and his colleagues ((I) began looking at host-controlled restriction/mdification, and another 10 passed before restriction endonucleases enzymes analogous to the trypsin and chymotrypsin proteases were ready to be applied to the DNA sequencing challenge (9.10). The term restriction endonucleaw comes from descriptions of the bacteria which were later shown to contain these enzymes. These bacteria did not serve as hosts for certain bacteriophage, thus were said to restrict phage infection. ADproximately 250 restriction enzymes habe been isolated frok a variety of bacteria. These restriction enzymes are site-soecific with regard to a given nucleotide sequence and are >apable of catalyzing double strand breaks in DNA. The names of these enzymes are derived from the bacterial genus and species and include a Roman numeral indicating order of A

~